Georgi Gerganov
8de452c18b
Improve decoding ( #291 )
...
* whisper : prepare infra for new decoding strategies
* whisper : apply logit filters and compute logprobs
* whisper : add whisper_get_logits()
* whisper : separate self and cross attention memory
Initial step needed for supporting parallel decoders
* whisper : move probs_id buffer to whisper_context
* whisper : refactor kv cache into separate struct
* whisper : move self-attention kv cache to whisper_decoder
* whisper : wip decoding parameters + strategies
* whisper : wip decoding parameters + strategies (part 2)
* whisper : wip decoding parameters + strategies (part 3)
* whisper : wip decoding parameters + strategies (part 4)
* whisper : fix prompt_past update to not include prompt_init
* whisper : temperature + best_of support
* whisper : support for compression_ration_threshold
We actually use entropy, but it is similar
* command : fix example to use logits instead of obsolete probs
* whisper : handle empty sequence ranking
* whisper : add WHISPER_DEBUG + diagnostic prints + new main args
* whisper : minor fixes
* whisper : add beam-search support
* whisper : bug fix when there no previous context
* whisper : add comments
* stream : disable temperature fallback
For real-time processing, we always want a single decoder running at T=0
* whisper.swiftui : update example - fix paths + add empty folders
2023-01-15 11:29:57 +02:00
Georgi Gerganov
a6dbd9188b
stream : fix a bug that inserted a lot of empty audio at the start
...
The quality was terrible due to this
2023-01-14 19:20:47 +02:00
Syahmi Azhar
1512545149
whisper : add loader class to allow loading from buffer and others ( #353 )
...
* whisper : add loader to allow loading from other than file
* whisper : rename whisper_init to whisper_init_from_file
* whisper : add whisper_init_from_buffer
* android : Delete local.properties
* android : load models directly from assets
* whisper : adding <stddef.h> needed for size_t + code style
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-01-08 13:03:33 +02:00
Andy Maloney
84c6b42e65
cmake : update to 3.19 ( #351 )
...
- update from 3.0 (from 2014) to 3.19 (from 2020)
- move some global setting onto the targets (through a cmake include)
2023-01-05 21:22:48 +02:00
Georgi Gerganov
a466c3404d
stream : fix data race on bool + avoid division-by-zero
2023-01-02 10:20:50 +02:00
Andy Maloney
dc90efd504
examples : small code cleanups ( #322 )
...
- remove unnecessary initialization of string to ""
- use empty() instead of checking size()
- use emplace_back instead of push_back
- use nullptr instead of NULL
- remove unnecessary call to .data() on string
- use character overload of find_first_of() instead of passing a string
2022-12-23 20:18:51 +02:00
Georgi Gerganov
99da1e5cc8
cmake : enable and fix -Wall -Wextra -Wpedantic C++ warnings
2022-12-19 20:45:08 +02:00
Georgi Gerganov
a82d331034
stream : update README.md + comments
2022-12-16 18:04:19 +02:00
Georgi Gerganov
5a5c5ddcca
Update README.md
2022-12-15 20:38:08 +02:00
Georgi Gerganov
34e0b4b9ef
stream : fix build
2022-12-15 20:15:36 +02:00
Georgi Gerganov
b0f8013eb9
stream : add sliding window mode
2022-12-15 19:59:17 +02:00
Georgi Gerganov
3c390ffe38
stream.wasm : add web-based real-time transcription ( #112 )
2022-11-25 23:57:46 +02:00
Georgi Gerganov
be16dfa038
whisper.wasm : do not block page while processing ( close #86 )
2022-11-25 23:07:42 +02:00
Georgi Gerganov
b8ce25dec1
refactoring : more readable code
2022-11-25 19:28:04 +02:00
Georgi Gerganov
d7024cf9dc
main, stream : remove --verbose flag ( #178 )
2022-11-24 17:52:04 +02:00
Georgi Gerganov
385236d1d3
stream : "-kc" now enables context keeping from previous segment ( #90 )
...
By default, the context keeping is disabled
2022-11-22 18:21:15 +02:00
M. Eren Akbiyik
63ae03b8e0
Prompt previous tokens for streaming ( #163 )
...
* feat: prompt previous tokens for streaming
I used a vector pointer instead of vector itself because it gave weird errors, and why not
* convert vector to use with C api
* feat: remove old refs, check for prompt size
* feat: use better way of getting the pointer
2022-11-22 18:10:35 +02:00
Georgi Gerganov
f2df9bd768
stream : add "max_tokens" cli arg
...
Controls the max tokens per segment for the stream example
2022-11-20 21:22:41 +02:00
Georgi Gerganov
fb8d77f760
stream : add "audio_ctx" parameter
...
Used to overwrite the audio context size of the Encoder.
For example, setting "audio_ctx = 512" will make it run about 3 times
faster, processing about 10s of audio, instead of 30s.
The transcription quality drops, but this can be used for real-time
streaming purposes where performance is important.
2022-11-20 21:22:41 +02:00
Georgi Gerganov
62b5ff875c
stream : add "max_tokens" parameter
...
Used to limit the number of tokens in a segment.
Useful to battle with word repetition when using partial encoder context
2022-11-20 21:22:41 +02:00
Georgi Gerganov
d351771a4b
stream : add "single_segment" option
...
Force the entire audio chunk to be transcribed into a single segment
2022-11-20 21:22:41 +02:00
Georgi Gerganov
c058aaf22e
stream : partial encoder experiments
2022-11-20 21:22:41 +02:00
Georgi Gerganov
83c742f1a7
whisper : add option to speed up the audio tempo by x2
...
Using a Phase Vocoder for speeding up the audio tempo by scaling down
the frequencies in the frequency domain.
This reduces the computation in the Encoder by a factor of 2.
The transcription accuracy is degraded, but for slow to normal speech -
it seems to be still very good.
I think this can find application for real-time transcription - i.e. the
"stream" example.
2022-11-13 16:25:43 +02:00
Georgi Gerganov
5a9e4260a6
stream : add "--capture" option to select capture device (ref #10 )
2022-10-30 08:27:04 +02:00
Georgi Gerganov
8347a7bb6a
stream : few updates to make it compatible for Vim usage ( #99 )
2022-10-27 22:10:50 +03:00
Georgi Gerganov
113a4f06d8
Update README.md
2022-10-25 20:53:48 +03:00
Georgi Gerganov
c6710efde2
refactoring : move main + stream in examples + other stuff
2022-10-25 20:53:48 +03:00