diff --git a/README.md b/README.md index e570bcd7..f25d9b72 100644 --- a/README.md +++ b/README.md @@ -7,13 +7,12 @@ High-performance inference of [OpenAI's Whisper](https://github.com/openai/whisp - Mixed F16 / F32 precision - Low memory usage (Flash Attention + Flash Forward) - Zero memory allocations at runtime -- Runs on the CPU (Mac and Linux) +- Runs on the CPU - [C-style API](https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h) +- Supported platforms: Linux, Mac OS (Intel and Arm), Raspberry Pi, Android Incoming features: - [Realtime audio input transcription](https://github.com/ggerganov/whisper.cpp/issues/10#issuecomment-1264665959) -- [Raspberry Pi support](https://github.com/ggerganov/whisper.cpp/issues/7) -- [Android support](https://github.com/ggerganov/whisper.cpp/issues/8) ## Usage @@ -220,10 +219,16 @@ $ ./stream -m models/ggml-small.en.bin -t 8 https://user-images.githubusercontent.com/1991296/193465125-c163d304-64f6-4f5d-83e5-72239c9a203e.mp4 +## Implementation details + +- The core tensor operations are implemented in C (`ggml.h` / `ggml.c`) +- The high-level C-style API is implemented in C++ (`whisper.h` / `whisper.cpp`) +- Simple usage is demonstrated in `main.cpp` +- Sample real-time audio transcription from the microphone is demonstrated in `stream.cpp` + ## Limitations -- Very basic greedy sampling scheme - always pick up the top token -- Only 16-bit WAV at 16 kHz is supported +- Very basic greedy sampling scheme - always pick up the top token. You can implement your own strategy - Inference only - No GPU support diff --git a/stream.cpp b/stream.cpp index e9d0364b..1f84d667 100644 --- a/stream.cpp +++ b/stream.cpp @@ -265,6 +265,11 @@ int main(int argc, char ** argv) { wparams.print_progress = false; wparams.print_special_tokens = params.print_special_tokens; + wparams.print_realtime = false; + wparams.print_timestamps = !params.no_timestamps; + wparams.translate = params.translate; + wparams.language = params.language.c_str(); + wparams.n_threads = params.n_threads; if (whisper_full(ctx, wparams, pcmf32.data(), pcmf32.size()) != 0) { fprintf(stderr, "%s: failed to process audio\n", argv[0]);