Commit Graph

203 Commits

Author SHA1 Message Date
Philippe Normand
44cb044e66
whisper : fix build with -Werror=undef (#1045) 2023-06-25 15:30:39 +03:00
Georgi Gerganov
5feb0dffba
ggml : sync latest ggml lib 2023-06-25 14:30:44 +03:00
Nicholas Albion
d7c936b44a
Feature/java bindings2 (#944)
* Java needs to call `whisper_full_default_params_by_ref()`, returning struct by val does not seem to work.
* added convenience methods to WhisperFullParams
* Remove unused WhisperJavaParams
2023-05-29 09:38:58 +10:00
Elkana Bardugo
56a87ba45d
whisper : fix hebrew language code (#935) 2023-05-20 18:17:54 +03:00
Georgi Gerganov
fd01209d09
coreml : support quantized model files 2023-05-14 18:09:44 +03:00
Georgi Gerganov
e693074aa6
ggml : sync latest ggml
- New Q4 and Q5 formats
- Various improvements
2023-05-14 18:04:23 +03:00
CRD716
b806420873
whisper : add detect-language mode (#853)
* add detectlanguage flag

* renaming and help

* no idea why that last one didn't commit

* run language detection if dl is set

* help message fix

* various fixes

* fix quitting

* fix language being english on print
2023-05-02 19:51:52 +03:00
Georgi Gerganov
d375d73b2e
bench : improve benchmarks 2023-05-01 14:44:39 +03:00
Georgi Gerganov
7765770f89
whisper : add memory sizes for Q8_0 (close #846) 2023-05-01 10:03:56 +03:00
Georgi Gerganov
c94c469592
whisper : fix quantize bug (#842)
* whisper : debug

* whisper : fix bug during quantization
2023-04-30 22:50:04 +03:00
Georgi Gerganov
794b162a46
whisper : add integer quantization support (#540)
* whisper : add integer quantization support

* examples : add common-ggml + prepare to add "quantize" tool

* whisper : quantization tool ready

* whisper : fix F32 support

* whisper : try to fix shared lib linkage

* wasm : update quantized models to Q5

* bench.wasm : remove "medium" button

* bench.wasm : fix custom model button

* ggml : add Q5_0 and Q5_1 WASM SIMD

* wasm : add quantized models to all WASM examples

* wasm : bump DB version number to 2

* talk-llama : update example to latest llama.cpp

* node : increase test timeout to 10s

* readme : add information for model quantization

* wasm : add links to other examples
2023-04-30 18:51:57 +03:00
Georgi Gerganov
5fd1bdd7fc
whisper : add GPU support via cuBLAS (#834)
* make : add WHISPER_CUBLAS

* make : fix CUBLAS build

* whisper : disable Flash Attention + adjust memory buffers

* whisper : remove old commented code

* readme : add cuBLAS instructions

* cmake : add WHISPER_CUBLAS option

* gitignore : ignore build-cublas
2023-04-30 12:14:33 +03:00
Thijs Raymakers
6108d3cc58
whisper : use correct seek_end when offset is used (#833)
Whenever an `offset_ms` is provided, the value of `seek_end` is
calculated incorrectly. This causes Whisper to keep transcribing
after the end of the file.

The current behavior looks like
```
[00:34:40.000 --> 00:34:47.000]   This is an example audio file.
[00:34:47.000 --> 00:34:49.000]   The text has been redacted
[00:34:49.000 --> 00:34:51.000]   This is the end of the audio.
[00:34:51.000 --> 00:34:52.000]   ***
[00:34:52.000 --> 00:34:53.000]   ***
[00:34:53.000 --> 00:34:54.000]   ***
[00:34:55.000 --> 00:34:56.000]   ***
...
```

The expected behavior should be
```
[00:34:40.000 --> 00:34:47.000]   This is an example audio file.
[00:34:47.000 --> 00:34:49.000]   The text has been redacted
[00:34:49.000 --> 00:34:51.000]   This is the end of the audio.
- end of program -
```

This commit changes the calculation of the `seek_end` variable to
only add `seek_start` if a custom `duration_ms` is provided.
Otherwise, it defaults to the end of the file.

Signed-off-by: Thijs Raymakers <thijs@raymakers.nl>
2023-04-29 18:55:37 +03:00
Georgi Gerganov
3efb81dec6
build : add WHISPER_COREML_ALLOW_FALLBACK to make / CMake (#812) 2023-04-29 10:55:24 +03:00
Canis Lupus
94a7cd2a07
whisper : allow non-CoreML fallback when Core ML cannot be loaded (#812)
if the Core ML model cannot be loaded, continue without Core ML instead of
returning. This allows a single build to transcribe using Core ML models
where available, and regular models when not.
2023-04-29 10:49:02 +03:00
Georgi Gerganov
3e82ff4747
whisper : fix bug from previous commit 2023-04-29 10:42:14 +03:00
Georgi Gerganov
b5bd2f43c5
whisper : avoid designated initializers 2023-04-29 10:36:50 +03:00
AsukaMinato
94aa56f19e
minor : improve C++ and Python style (#768)
* use some STL functions

* use self.field than setattr, use pathlib.Path

* recover some format

* const some iter

* Keep the original

* 2 space
2023-04-29 10:06:25 +03:00
Georgi Gerganov
5108b30e6d
whisper : pad audio instead of spectrogram (#579)
Also, fallback only if more temperatures are available and if we are
at least 3 seconds before the end of the audio
2023-04-15 17:19:19 +03:00
Georgi Gerganov
f19e23fbd1
whisper : restore decoder temperature fallbacks
I disabled this because there were many complaints about slow decoding.
The current implementation does not allow batching the decoders when
using the "best of" or "beam size" parameters, so the decoding time is
proportional to the number of decoders, which is obviously not great.

However, now there are even more complaints about wrong decodings and
repetition.

So, making a compromise by re-enabling the fallbacks, but defaulting to
just 2 "best of" / "beam size" decoders. Also, the temperature step is
increased from 0.2 to 0.4 - i.e. from maximum of 5 fallbacks to maximum
of 2.

Also, the stream example now has fallbacks enabled by default.

close #471 #477 #508 #612 #719 #731
2023-04-15 16:12:55 +03:00
Georgi Gerganov
3dead611bb
whisper : slightly faster Log Mel computation + n-1 FFT threads (#568) 2023-04-15 14:18:46 +03:00
Georgi Gerganov
5e47e223bd
whisper : add Core ML support (#566)
* coreml : use Core ML encoder inference

* coreml : simlpify whisper_encode + log messages

* whisper : resolve rebase conflicts

* coreml : add scripts for CoreML model generation

* bench-all : recognize COREML flag
2023-04-15 13:21:27 +03:00
Maximiliano Levi
794ff3074a
whisper : do not launch log_mel threads when n_thread is 1 (#763) 2023-04-14 22:35:34 +03:00
AfryMask
7e2afa4384
whisper : fix the bug related to word splitting errors in the "tokenize" function. (#760)
Co-authored-by: AfryMask <afrymask@gmail.com>
2023-04-14 20:35:03 +03:00
Bader-eddine Ouaich
2c856fb9e5
whisper : fix potential memory leaks (#740)
* fix potential memory leak if whisper_init_state failed

* fix potential memory leak if gpt2_init failed
2023-04-14 20:05:56 +03:00
Georgi Gerganov
514cd04452 whisper : fix bug in prompt processing (close #705)
Was dereferencing a dangling pointer
2023-04-14 19:17:07 +03:00
Georgi Gerganov
69b8503935
ggml : backport llama.cpp updates (close #709)
- About x2 overall performance improvement on Apple Silicon
- Results should now be the same for different number of threads (not
  tested)
2023-04-10 22:28:54 +03:00
pajowu
0a2d1210bc
whisper : add progress callback (#600) 2023-03-30 20:29:29 +03:00
Jhen-Jie Hong
eefed45e37
whisper : add initial_prompt param (#645) 2023-03-29 23:23:23 +03:00
Georgi Gerganov
42c6855103
whisper : bump "large" scratch buffer even mode (close #671) 2023-03-28 10:50:49 +03:00
Georgi Gerganov
0be9cd3497
whisper : increase scratch buffers after recent change (#671)
Should fix the error:

ggml_new_tensor_impl: not enough space in the scratch memory
2023-03-28 10:36:16 +03:00
Georgi Gerganov
4a0deb8b1e
talk-llama : add new example + sync ggml from llama.cpp (#664)
* talk-llama : talk with LLaMA AI

* talk.llama : disable EOS token

* talk-llama : add README instructions

* ggml : fix build in debug
2023-03-27 21:00:32 +03:00
Georgi Gerganov
8e361d90d7
whisper : disable fallbacks until the performance is improved (#588) 2023-03-22 22:34:39 +02:00
sandrohanea
d4fa0d92ad
fixed language auto-detection for state provided processing (#627)
Co-authored-by: Sandro Hanea <sandrohanea@microsoft.com>
2023-03-22 21:47:09 +02:00
Leo Moll
8fcd1a3b32
main : provide option for creating JSON output (#615)
* examples : provide option for exporting also as JSON file (ggerganov/whisper.cpp#614)

* main : remove leftovers

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-22 21:37:36 +02:00
Georgi Gerganov
1beff6f66d
models : change HF hosting from dataset to model 2023-03-22 20:44:56 +02:00
Takeshi Inoue
09e9068007
whisper.android : support benchmark for Android example. (#542)
* whisper.android: Support benchmark for Android example.

* whisper.android: update screenshot in README.

* update: Make text selectable for copy & paste.

* Update whisper.h to restore API name

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* whisper.android: Restore original API names.

---------

Co-authored-by: tinoue <tinoue@xevo.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-07 21:36:30 +02:00
sandrohanea
59fdcd19c8
whisper : add whisper_state + default state on the whisper_context (#523)
* Added whisper state + default state on the whisper_context

* Fixed some examples and bindings

* Fixed whisper_n_len (which was used in some binding) and added whisper_n_len_from_state

* Fixed comments

* whisper : reuse kv_cache_free() and fix compiler warnings

* whisper : clean-up the API comments

---------

Co-authored-by: Sandro Hanea <sandrohanea@microsoft.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-05 21:42:19 +02:00
Georgi Gerganov
478289a4b3
whisper : set no_context == true by default (#537) 2023-03-05 20:53:43 +02:00
Georgi Gerganov
373043cabe
whisper : zero-initialize some more context variables
Just in case
2023-02-21 19:00:42 +02:00
Finn Voorhees
fb4d0d470f whisper : fix uninitialized exp_n_audio_ctx 2023-02-21 18:58:08 +02:00
Georgi Gerganov
0d229163bb
whisper : add API for applying custom logits filters during decoding 2023-02-19 18:35:01 +02:00
Georgi Gerganov
a94897bcde
whisper : by default disable non-speech tokens suppression (#473)
This seems to be causing hallucinations in the end of the audio, e.g.:

"Thank you for listening"
"Amen"
..
2023-02-15 21:48:49 +02:00
shikokuchuo
0336161b7d
whisper : fix signedness compiler warning (#506) 2023-02-15 19:08:25 +02:00
shibukazu
cfc06bf8df
whisper : suppress non-speech-related token outputs (#473)
* add non-speech-token suppression

* add suppress non-speech_tokens param
2023-02-08 09:05:34 +02:00
sandrohanea
2bfe0ebc0f
whisper : fixed Beam Search Strategy and exposed whisper_pcm_to_mel_phase_vocoder (#474)
Co-authored-by: Sandro Hanea <sandrohanea@microsoft.com>
2023-02-08 09:01:47 +02:00
boolemancer
4dd7119deb
whisper : only trim if split_on_word is true (#476) 2023-02-08 08:43:23 +02:00
kamranjon
a1c1583cc7
whisper : add whisper_full_lang_id() for getting the context lang (#461) 2023-02-05 14:46:26 +02:00
Matija Pevec
d012b5c7e4
whisper : add "split_on_word" flag when using using "max_len" option (#455)
* Update whisper.cpp

* fix: trim function

* feat: added flag to split on word

* fix: arguments for main
2023-02-05 14:44:23 +02:00
Georgi Gerganov
f3ee4a9673
whisper : reduce memory usage during inference (#431)
* ggml : add "scratch" buffer support

* ggml : support for scratch ring-buffer

* ggml : bug fix in ggml_repeat()

* ggml : error on scratch buffer overflow

* whisper : use scratch buffers during inference (base model only)

* whisper : update memory usage for all models

* whisper : fix encoder memory usage

* whisper : use whisper_context functions instead of macros

* whisper : fix FF + remove it from README

* ggml : reuse ggml_new_i32

* ggml : refactor the scratch buffer storage

* whisper : reorder scratch buffers in the decoder

* main : add option to disable temp fallback

* Update README.md
2023-02-04 09:45:52 +02:00
Georgi Gerganov
291980369c
whisper : suppress task tokens (#442) 2023-02-04 09:03:14 +02:00
Georgi Gerganov
b992f3709e
whisper : do not provide past prompt when n_max_text_ctx == 0 2023-01-25 20:01:00 +02:00
Georgi Gerganov
b5ddb16ec7
whisper : condition timestamps to be monotonically increasing (#425) 2023-01-23 20:48:26 +02:00
fitzsim
ae16c21e9c
whisper : PPC64 big-endian support (#398)
* ggml : set cache line size to 128 on POWER9

* whisper : add PPC64 big endian support
2023-01-23 20:48:10 +02:00
Georgi Gerganov
78f166174f
whisper : fix condition for providing past prompt (critical)
This bug has been present since v1.1.0.

Effectively, the past transcribed text wasn't being used for following
transcriptions, which likely significantly reduces the transcription
quality.

Likely related to #419
2023-01-22 10:47:01 +02:00
Georgi Gerganov
21c569ba4a
whisper : extend information in whisper_print_timings() 2023-01-19 18:50:33 +02:00
Georgi Gerganov
1a91c19af9
whisper : perform entropy check only when we have at least 32 tokens (#412) 2023-01-18 22:52:18 +02:00
Georgi Gerganov
a6cf6f4c4a
bench : minor fixes 2023-01-18 21:40:10 +02:00
Georgi Gerganov
1ccb8a46a5
bench : fix Windows linkage by moving ggml benches in whisper lib .. 2023-01-18 21:19:50 +02:00
Georgi Gerganov
8088a977af
whisper : fix possible uninitialized variables (#291) 2023-01-16 21:44:40 +02:00
Georgi Gerganov
00ea21668b
whisper : account speed_up flag for short audio (close #405) 2023-01-15 12:42:15 +02:00
Georgi Gerganov
8de452c18b
Improve decoding (#291)
* whisper : prepare infra for new decoding strategies

* whisper : apply logit filters and compute logprobs

* whisper : add whisper_get_logits()

* whisper : separate self and cross attention memory

Initial step needed for supporting parallel decoders

* whisper : move probs_id buffer to whisper_context

* whisper : refactor kv cache into separate struct

* whisper : move self-attention kv cache to whisper_decoder

* whisper : wip decoding parameters + strategies

* whisper : wip decoding parameters + strategies (part 2)

* whisper : wip decoding parameters + strategies (part 3)

* whisper : wip decoding parameters + strategies (part 4)

* whisper : fix prompt_past update to not include prompt_init

* whisper : temperature + best_of support

* whisper : support for compression_ration_threshold

We actually use entropy, but it is similar

* command : fix example to use logits instead of obsolete probs

* whisper : handle empty sequence ranking

* whisper : add WHISPER_DEBUG + diagnostic prints + new main args

* whisper : minor fixes

* whisper : add beam-search support

* whisper : bug fix when there no previous context

* whisper : add comments

* stream : disable temperature fallback

For real-time processing, we always want a single decoder running at T=0

* whisper.swiftui : update example - fix paths + add empty folders
2023-01-15 11:29:57 +02:00
Georgi Gerganov
4ef3398e8f
ggml : remove obsolete zeroing + comment fixes (#390) 2023-01-08 20:21:03 +02:00
boolemancer
08dc705a69
whisper : fix sample_to_timestamp calculation with 64 bit precision to avoid overflow (#388)
* Do calculation with 64 bit precision to avoid overflow

* Update whisper.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-01-08 15:08:45 +02:00
Syahmi Azhar
1512545149
whisper : add loader class to allow loading from buffer and others (#353)
* whisper : add loader to allow loading from other than file

* whisper : rename whisper_init to whisper_init_from_file

* whisper : add whisper_init_from_buffer

* android : Delete local.properties

* android : load models directly from assets

* whisper : adding <stddef.h> needed for size_t + code style

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-01-08 13:03:33 +02:00
Georgi Gerganov
65fdcbbbbb
whisper : revert accidental MB change 2023-01-07 16:18:21 +02:00
Georgi Gerganov
d61d55cd4b
ggml : speed-up soft max via Accelerate + unroll 2023-01-07 16:16:42 +02:00
Abitofevrything
a62170c656
ggml : add SSE3 and fp16 conversion lookup table (#368)
* Improves WASM performance:
  On MacBook M1 Pro, I observe 25% faster using Firefox and 35% faster using Chrome

* Add support for SSE3 SIMD

* Add SSE3 to system information

* Add Imath support for fp16-fp32 conversions

* Add Imath to system information

* Wrap Imath calls to avoid static function warnings

* Drop Imath; Add lookup table for f16 -> f32 conversions

* Remove TODO comments

* Update SSE3 to new macro arguments

* Correct updated macro definitions

* Prefer static inline where possible

* ggml : static inlines + add public f16 <-> f32 conversions

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-01-06 18:45:59 +02:00
Thomas Fitzsimmons
1944e7c33e whisper : document POWER VSX support 2023-01-05 23:53:00 +02:00
Georgi Gerganov
ad2a4ffa03
whisper : do not use F16 tensors when in F32 mode (#369) 2023-01-05 22:56:25 +02:00
Andy Maloney
dd6d582977 whisper : use ranged-based for loops for readability 2023-01-05 21:20:44 +02:00
Georgi Gerganov
d51c5eb906
ggml : define MIN / MAX only if not defined (minor) 2023-01-05 21:16:52 +02:00
Georgi Gerganov
d97e6005e9
whisper : add whisper_n_audio_ctx and check for invalid audio_ctx
closes #344
2022-12-31 09:57:19 +02:00
Georgi Gerganov
68daf6e487
whisper : avoid some memory allocations 2022-12-30 13:43:48 +02:00
Georgi Gerganov
ac521a566e
ggml : simplify the SIMD code (#324)
* ggml : simplify the SIMD code

* ggml : generic reduce for all register sizes + comments
2022-12-24 10:22:28 +02:00
Andy Maloney
543bd5627e
whisper : use emplace_back in place of push_back (#319)
This avoids potential construction of temporaries.
2022-12-23 11:07:19 +02:00
Andy Maloney
62fee9a9cc
whisper : fix mem leak on failure to load model (#318) 2022-12-23 11:06:17 +02:00
Andy Maloney
fa463313ad
minor : small code cleanups (#302)
* Small code cleanups

- fix indentation
- remove extra semicolons
- remove extra break after returns in case statements
- remove unnecessary call to .data() on string
- use empty() instead of checking size()
- no need to check for nullptr before free
- remove unnecessary initialization of string to ""

* minor : switch case always break

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2022-12-22 17:06:19 +02:00
Georgi Gerganov
501a6b455c minor : flag "ARM FMA" -> "ARM_FMA" 2022-12-22 16:47:54 +02:00
Kevin Brothaler
e1432dd91a Check for both __ARM_NEON and __ARM_FEATURE_FMA so that the project can be compiled for armv7a.
Android armeabi-v7a's NEON support doesn't support FMA unless configured with `-mfpu=neon-fp-armv8`, which would need runtime checks.
* Also removed ABI filter from Android project.
2022-12-22 16:47:54 +02:00
Andy Maloney
42c6730732
whisper : use nullptr (C++11) instead of NULL macro (#299) 2022-12-22 16:35:18 +02:00
Georgi Gerganov
99da1e5cc8
cmake : enable and fix -Wall -Wextra -Wpedantic C++ warnings 2022-12-19 20:45:08 +02:00
Matheus de Sousa
8e3f129b4d
minor : resolves some of warnings when compiling with clang/clang++ (#294)
* Resolves some of warnings when compiling with clang/clang++

Mostly nit stuff that clang catches when compiling with -Wall -Wextra
-pedantic.

- Fix comparison between sign/unsigned integers.
- Passes a constant reference (const&) instead of copying each time.

* minor : normalize coding style

* minor : fix warning

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2022-12-19 20:19:01 +02:00
Georgi Gerganov
fba10a4c68 whisper : language auto-detect (#59) 2022-12-17 18:49:44 +02:00
Georgi Gerganov
6a69e3ae27 command : adding guided mode 2022-12-16 19:38:18 +02:00
Georgi Gerganov
bf69b669a0 whisper : add whisper_tokenize()
Tokenizes a string into a list of vocabulary tokens
2022-12-16 19:38:18 +02:00
Georgi Gerganov
6a7c82501e
whisper : improve decoding strategy (#244)
- Clear past prompt when there is very short audio left for processing.
  My observation is that in these cases the decoding tends to repeat and
  hallucinate stuff and I think this is induced by the existing prompt
- When we fail to sample timestamp token, retry by clearing the past
  prompt. If it fails again, then we advance the window by 1 second
2022-12-16 18:34:35 +02:00
Georgi Gerganov
124c718c73
whisper : fix UB when reading buffer of length 0 bytes (#265) 2022-12-13 23:14:47 +02:00
Roland Rabien
e70d47baab
Remove C++20 requirement (#257)
* Remove C++20 requirement

* Roll back C features not supported in VS2017
2022-12-11 20:03:07 +02:00
bert hubert
d1da35de06 fix potential bug reading model data into a small size optimized string which could lead to memory corruption. In an SSO string, you can't write data to &str[0] and expect it to work well.
Also added a small wrapper function to more safely read model data without having to get the sizeof right. I tested this on tiny, base and large models, there was no change in behaviour.
2022-12-10 16:20:48 +02:00
Georgi Gerganov
603f97ba11
whisper : minor improvemnt in decoding strategy (#244)
Do not allow for text segments to go beyond end of audio.
This partially mitigates some issues when the last audio window is 1-2
seconds just before the end of the audio file and the decoding spirals
into a repetition of the last transcribed phrase.
2022-12-10 13:38:26 +02:00
Georgi Gerganov
f8ec718b76
ggml : add F16C CPU flag check 2022-12-06 21:56:56 +02:00
Georgi Gerganov
78d13257be
Try to improve the token sampling strategy (#193)
* whisper : try to improve the token sampling strategy

- Add the "max_initial_timestaamp" token logic from OpenAI
- Disallow sampling timestamps that are in the past

* whisper : fix the max initial timestamp logic + fallback decoding
2022-12-02 21:51:50 +02:00
Georgi Gerganov
4698dcdb52 whisper : add mechanism for aborting the whisper_full() computation 2022-11-27 20:42:45 +02:00
Georgi Gerganov
e266cb0723
whisper.objc : add real-time processing (#97)
Similar to the "stream" app
2022-11-26 18:32:46 +02:00
Georgi Gerganov
c207eed431
whisper.objc : fix build warnings 2022-11-26 16:27:04 +02:00
Georgi Gerganov
be16dfa038
whisper.wasm : do not block page while processing (close #86) 2022-11-25 23:07:42 +02:00
Georgi Gerganov
b8ce25dec1
refactoring : more readable code 2022-11-25 19:28:04 +02:00
Georgi Gerganov
128aaadb93
whisper : improve printfs 2022-11-24 17:54:16 +02:00
katsu560
83456076f0 add AVX support 2022-11-23 22:16:33 +02:00