* Do calculation with 64 bit precision to avoid overflow
* Update whisper.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Improves WASM performance:
On MacBook M1 Pro, I observe 25% faster using Firefox and 35% faster using Chrome
* Add support for SSE3 SIMD
* Add SSE3 to system information
* Add Imath support for fp16-fp32 conversions
* Add Imath to system information
* Wrap Imath calls to avoid static function warnings
* Drop Imath; Add lookup table for f16 -> f32 conversions
* Remove TODO comments
* Update SSE3 to new macro arguments
* Correct updated macro definitions
* Prefer static inline where possible
* ggml : static inlines + add public f16 <-> f32 conversions
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Small code cleanups
- fix indentation
- remove extra semicolons
- remove extra break after returns in case statements
- remove unnecessary call to .data() on string
- use empty() instead of checking size()
- no need to check for nullptr before free
- remove unnecessary initialization of string to ""
* minor : switch case always break
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Android armeabi-v7a's NEON support doesn't support FMA unless configured with `-mfpu=neon-fp-armv8`, which would need runtime checks.
* Also removed ABI filter from Android project.
* Resolves some of warnings when compiling with clang/clang++
Mostly nit stuff that clang catches when compiling with -Wall -Wextra
-pedantic.
- Fix comparison between sign/unsigned integers.
- Passes a constant reference (const&) instead of copying each time.
* minor : normalize coding style
* minor : fix warning
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
- Clear past prompt when there is very short audio left for processing.
My observation is that in these cases the decoding tends to repeat and
hallucinate stuff and I think this is induced by the existing prompt
- When we fail to sample timestamp token, retry by clearing the past
prompt. If it fails again, then we advance the window by 1 second
Also added a small wrapper function to more safely read model data without having to get the sizeof right. I tested this on tiny, base and large models, there was no change in behaviour.
Do not allow for text segments to go beyond end of audio.
This partially mitigates some issues when the last audio window is 1-2
seconds just before the end of the audio file and the decoding spirals
into a repetition of the last transcribed phrase.
* whisper : try to improve the token sampling strategy
- Add the "max_initial_timestaamp" token logic from OpenAI
- Disallow sampling timestamps that are in the past
* whisper : fix the max initial timestamp logic + fallback decoding
* feat: prompt previous tokens for streaming
I used a vector pointer instead of vector itself because it gave weird errors, and why not
* convert vector to use with C api
* feat: remove old refs, check for prompt size
* feat: use better way of getting the pointer
Used to overwrite the audio context size of the Encoder.
For example, setting "audio_ctx = 512" will make it run about 3 times
faster, processing about 10s of audio, instead of 30s.
The transcription quality drops, but this can be used for real-time
streaming purposes where performance is important.
Using a Phase Vocoder for speeding up the audio tempo by scaling down
the frequencies in the frequency domain.
This reduces the computation in the Encoder by a factor of 2.
The transcription accuracy is degraded, but for slow to normal speech -
it seems to be still very good.
I think this can find application for real-time transcription - i.e. the
"stream" example.