198 Commits

Author SHA1 Message Date
Georgi Gerganov
62b5ff875c stream : add "max_tokens" parameter
Used to limit the number of tokens in a segment.
Useful to battle with word repetition when using partial encoder context
2022-11-20 21:22:41 +02:00
Georgi Gerganov
d351771a4b stream : add "single_segment" option
Force the entire audio chunk to be transcribed into a single segment
2022-11-20 21:22:41 +02:00
Georgi Gerganov
c058aaf22e stream : partial encoder experiments 2022-11-20 21:22:41 +02:00
greeshmay
2ba66360c9
fix: free ggml_context (close #149) (#150)
* fix: free ggml_context

* ggml : free the model's contexts in whisper_free()

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2022-11-17 22:12:51 +02:00
Georgi Gerganov
83c742f1a7 whisper : add option to speed up the audio tempo by x2
Using a Phase Vocoder for speeding up the audio tempo by scaling down
the frequencies in the frequency domain.

This reduces the computation in the Encoder by a factor of 2.
The transcription accuracy is degraded, but for slow to normal speech -
it seems to be still very good.

I think this can find application for real-time transcription - i.e. the
"stream" example.
2022-11-13 16:25:43 +02:00
Georgi Gerganov
c30bffc8a5
ref #22 : add "duration" option
Can be used to partially process a recording
2022-11-07 20:14:52 +02:00
Georgi Gerganov
d5afebd37c
whisper : token-level timestamp refactoring (#49, #120)
This turned out pretty good overall. The algorithm has been moved from
main.cpp to whisper.cpp and can be reused for all subtitles types. This
means that now you can specify the maximum length of the generated
lines. Simply provide the "-ml" argument specifying the max length in
number of characters
2022-11-02 21:45:54 +02:00
Georgi Gerganov
02dfd5b8c3
whisper : fix extra memory usage after recent processor changes
Had increased the memory buffer to the size of the model and forgot to
bring it down.
2022-11-02 18:31:18 +02:00
Georgi Gerganov
57fb46f307 main : add option for word-leve timestamps (very experimental) 2022-10-30 17:06:57 +02:00
Georgi Gerganov
eba62e0fa1
close #113 : fix struct whisper_token_data 2022-10-30 08:23:52 +02:00
Georgi Gerganov
014a119052 minor : fix multiple definitions of to_timestamp() 2022-10-29 19:37:19 +03:00
Georgi Gerganov
dec40be58f parallel : print time of audio boundaries + fix timings 2022-10-29 19:37:19 +03:00
Georgi Gerganov
0b2dc3c82c parallel : working 2022-10-29 19:37:19 +03:00
Georgi Gerganov
85d6e1e1e7 main : fix sampling time + add max_context parameter 2022-10-29 19:37:19 +03:00
Georgi Gerganov
72e9cdd6bf parallel : adding tool for parallel transformer inference 2022-10-29 19:37:19 +03:00
Borislav Stanimirov
c565c569e7 Define WHISPER_BUILD so as to export symbols on Windows 2022-10-29 13:23:09 +03:00
Georgi Gerganov
34bb3ab0cf ggml : add system info functions 2022-10-25 20:53:48 +03:00
Georgi Gerganov
5f7e9fa2dc
ref #68, #79 : fix segment time output 2022-10-23 13:30:30 +03:00
Georgi Gerganov
7affd309d3 whisper : add new-segment callback
Can be used to process new segments as they are being generated.
Sample usage in main, for printing the resulting segments during the
inference.
2022-10-22 21:17:21 +03:00
Georgi Gerganov
31ff0c6a1f wip : experimental color coding of tokens based on probabilities 2022-10-22 21:17:21 +03:00
Georgi Gerganov
8d15a1c635
ci : fix and re-enable tests (2nd try) 2022-10-21 15:57:20 +03:00
Georgi Gerganov
692aa0784f
Revert "ci : fix and re-enable tests"
This reverts commit 80aefc9514262a026e013902fa3722e53181cd82.
2022-10-21 15:36:19 +03:00
Georgi Gerganov
80aefc9514
ci : fix and re-enable tests 2022-10-21 15:27:30 +03:00
Georgi Gerganov
7eeef0358a
ref #52 : improve greedy sampling strategy
Force timestamp token to be sampled if the probability sum over all
timestamp tokens is above the probability of any other token
2022-10-18 19:48:15 +03:00
Georgi Gerganov
e30cf83158
ref #57, #62, #63 : remove unions in C-api + remove designated initializers
We are not ready for designated initializers - many compilers do not
support this C++ feature yet, so removing it's non-trivial usages.
2022-10-18 18:17:24 +03:00
Georgi Gerganov
d6b84b2a23
ref #62 : fix build for some compilers
For some reason, new version of GCC panic when the struct type is not
specified explicitly
2022-10-18 10:57:03 +03:00
Georgi Gerganov
b4a3875b2c
Revert recent sampling change
It does not actually help and seems to produce worse results on some of
the samples
2022-10-18 08:26:16 +03:00
Georgi Gerganov
cf67bfffa0 Fix EOT token handling
If it is the end of the audio, pick all sampled tokens.
Otherwise, print error message.
2022-10-18 00:53:06 +03:00
Georgi Gerganov
d14823582d Try to improve the sampling strategy a bit
It sill fails sometimes when it does not sample a timestamp token for
the entire segment. We now print a message in such cases
2022-10-18 00:12:51 +03:00
Georgi Gerganov
20d8e7a309 Fix memory sizes 2022-10-18 00:12:51 +03:00
Georgi Gerganov
72d967bce4 Use Accelerate framework on Apple silicon
Huge performance improvement in the Encode (almost x2 on MacBook M1 Pro)

Also various extra optimizations:

- Multi-threaded NORM operator
- Faster GELU via F16 cast
2022-10-18 00:12:51 +03:00
Georgi Gerganov
0ad085f5e8
ref #48 : clear results at the start of whisper_full
This way, even if the input audio is empty, the previous results will be
removed.
2022-10-15 09:55:28 +03:00
0/0
b799226973 check if spectogram length is <100 before doing anything else
fixes #39
2022-10-12 07:32:42 +03:00
Borislav Stanimirov
0b45d25151 Building with MSVC 2022-10-11 21:40:46 +03:00
Georgi Gerganov
63b6786767
Minor 2022-10-10 22:06:27 +03:00
lnyan
4bbb8a587b Add MinGW support 2022-10-09 22:26:37 +08:00
Georgi Gerganov
2ca8cc77b2 ref #17 : print whisper logs to stderr
Only the transcribed/translted text is printed to stdout.
This way, one can redirect the result to a file.
2022-10-08 17:28:06 +03:00
Georgi Gerganov
8c7c018893 ref #17 : add options to output result to file
Support for:

- plain text
- VTT
- SRT
2022-10-08 17:22:22 +03:00
Georgi Gerganov
b43b36e006 Update tests 2022-10-08 11:43:42 +03:00
Georgi Gerganov
2f069335ab Adding sanitizer tests 2022-10-08 11:43:42 +03:00
Georgi Gerganov
332c9d77fe whisper : fix bug in token sampling logic
Could overflow buffer
2022-10-08 09:02:41 +03:00
Georgi Gerganov
481cd685d5
ref #10 : option to keep context in "stream" example
Seems the results become worse when we keep the context, so by default
this is not enabled
2022-10-07 22:30:44 +03:00
Georgi Gerganov
7787b878e1
ref #16, #22 : add "offset" argument
Allows to start processing the input audio at some offset from the
beginning. Useful for splitting a long job into multiple tasks.
2022-10-07 22:00:40 +03:00
Georgi Gerganov
167324584b wip : rpi4 support 2022-10-05 23:03:46 +03:00
Georgi Gerganov
ce1fe95902 wip : improve makefile 2022-10-05 23:03:46 +03:00
Georgi Gerganov
6814cc9b02 Improve result printing 2022-10-04 23:18:15 +03:00
Georgi Gerganov
eba33adadd Extend C-style API with full inference methods 2022-10-04 23:18:15 +03:00
Georgi Gerganov
6b77124e01 Initial C-style interface for whisper.cpp 2022-10-04 23:18:15 +03:00