whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-02-24 10:14:52 +00:00

Author	SHA1	Message	Date
Georgi Gerganov	62b5ff875c	stream : add "max_tokens" parameter Used to limit the number of tokens in a segment. Useful to battle with word repetition when using partial encoder context	2022-11-20 21:22:41 +02:00
Georgi Gerganov	d351771a4b	stream : add "single_segment" option Force the entire audio chunk to be transcribed into a single segment	2022-11-20 21:22:41 +02:00
Georgi Gerganov	c058aaf22e	stream : partial encoder experiments	2022-11-20 21:22:41 +02:00
greeshmay	2ba66360c9	fix: free ggml_context (close #149 ) (#150 ) * fix: free ggml_context * ggml : free the model's contexts in whisper_free() Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2022-11-17 22:12:51 +02:00
Georgi Gerganov	83c742f1a7	whisper : add option to speed up the audio tempo by x2 Using a Phase Vocoder for speeding up the audio tempo by scaling down the frequencies in the frequency domain. This reduces the computation in the Encoder by a factor of 2. The transcription accuracy is degraded, but for slow to normal speech - it seems to be still very good. I think this can find application for real-time transcription - i.e. the "stream" example.	2022-11-13 16:25:43 +02:00
Georgi Gerganov	c30bffc8a5	ref #22 : add "duration" option Can be used to partially process a recording	2022-11-07 20:14:52 +02:00
Georgi Gerganov	d5afebd37c	whisper : token-level timestamp refactoring (#49 , #120 ) This turned out pretty good overall. The algorithm has been moved from main.cpp to whisper.cpp and can be reused for all subtitles types. This means that now you can specify the maximum length of the generated lines. Simply provide the "-ml" argument specifying the max length in number of characters	2022-11-02 21:45:54 +02:00
Georgi Gerganov	02dfd5b8c3	whisper : fix extra memory usage after recent processor changes Had increased the memory buffer to the size of the model and forgot to bring it down.	2022-11-02 18:31:18 +02:00
Georgi Gerganov	57fb46f307	main : add option for word-leve timestamps (very experimental)	2022-10-30 17:06:57 +02:00
Georgi Gerganov	eba62e0fa1	close #113 : fix struct whisper_token_data	2022-10-30 08:23:52 +02:00
Georgi Gerganov	014a119052	minor : fix multiple definitions of to_timestamp()	2022-10-29 19:37:19 +03:00
Georgi Gerganov	dec40be58f	parallel : print time of audio boundaries + fix timings	2022-10-29 19:37:19 +03:00
Georgi Gerganov	0b2dc3c82c	parallel : working	2022-10-29 19:37:19 +03:00
Georgi Gerganov	85d6e1e1e7	main : fix sampling time + add max_context parameter	2022-10-29 19:37:19 +03:00
Georgi Gerganov	72e9cdd6bf	parallel : adding tool for parallel transformer inference	2022-10-29 19:37:19 +03:00
Borislav Stanimirov	c565c569e7	Define WHISPER_BUILD so as to export symbols on Windows	2022-10-29 13:23:09 +03:00
Georgi Gerganov	34bb3ab0cf	ggml : add system info functions	2022-10-25 20:53:48 +03:00
Georgi Gerganov	5f7e9fa2dc	ref #68 , #79 : fix segment time output	2022-10-23 13:30:30 +03:00
Georgi Gerganov	7affd309d3	whisper : add new-segment callback Can be used to process new segments as they are being generated. Sample usage in main, for printing the resulting segments during the inference.	2022-10-22 21:17:21 +03:00
Georgi Gerganov	31ff0c6a1f	wip : experimental color coding of tokens based on probabilities	2022-10-22 21:17:21 +03:00
Georgi Gerganov	8d15a1c635	ci : fix and re-enable tests (2nd try)	2022-10-21 15:57:20 +03:00
Georgi Gerganov	692aa0784f	Revert "ci : fix and re-enable tests" This reverts commit 80aefc9514262a026e013902fa3722e53181cd82.	2022-10-21 15:36:19 +03:00
Georgi Gerganov	80aefc9514	ci : fix and re-enable tests	2022-10-21 15:27:30 +03:00
Georgi Gerganov	7eeef0358a	ref #52 : improve greedy sampling strategy Force timestamp token to be sampled if the probability sum over all timestamp tokens is above the probability of any other token	2022-10-18 19:48:15 +03:00
Georgi Gerganov	e30cf83158	ref #57 , #62 , #63 : remove unions in C-api + remove designated initializers We are not ready for designated initializers - many compilers do not support this C++ feature yet, so removing it's non-trivial usages.	2022-10-18 18:17:24 +03:00
Georgi Gerganov	d6b84b2a23	ref #62 : fix build for some compilers For some reason, new version of GCC panic when the struct type is not specified explicitly	2022-10-18 10:57:03 +03:00
Georgi Gerganov	b4a3875b2c	Revert recent sampling change It does not actually help and seems to produce worse results on some of the samples	2022-10-18 08:26:16 +03:00
Georgi Gerganov	cf67bfffa0	Fix EOT token handling If it is the end of the audio, pick all sampled tokens. Otherwise, print error message.	2022-10-18 00:53:06 +03:00
Georgi Gerganov	d14823582d	Try to improve the sampling strategy a bit It sill fails sometimes when it does not sample a timestamp token for the entire segment. We now print a message in such cases	2022-10-18 00:12:51 +03:00
Georgi Gerganov	20d8e7a309	Fix memory sizes	2022-10-18 00:12:51 +03:00
Georgi Gerganov	72d967bce4	Use Accelerate framework on Apple silicon Huge performance improvement in the Encode (almost x2 on MacBook M1 Pro) Also various extra optimizations: - Multi-threaded NORM operator - Faster GELU via F16 cast	2022-10-18 00:12:51 +03:00
Georgi Gerganov	0ad085f5e8	ref #48 : clear results at the start of whisper_full This way, even if the input audio is empty, the previous results will be removed.	2022-10-15 09:55:28 +03:00
0/0	b799226973	check if spectogram length is <100 before doing anything else fixes #39	2022-10-12 07:32:42 +03:00
Borislav Stanimirov	0b45d25151	Building with MSVC	2022-10-11 21:40:46 +03:00
Georgi Gerganov	63b6786767	Minor	2022-10-10 22:06:27 +03:00
lnyan	4bbb8a587b	Add MinGW support	2022-10-09 22:26:37 +08:00
Georgi Gerganov	2ca8cc77b2	ref #17 : print whisper logs to stderr Only the transcribed/translted text is printed to stdout. This way, one can redirect the result to a file.	2022-10-08 17:28:06 +03:00
Georgi Gerganov	8c7c018893	ref #17 : add options to output result to file Support for: - plain text - VTT - SRT	2022-10-08 17:22:22 +03:00
Georgi Gerganov	b43b36e006	Update tests	2022-10-08 11:43:42 +03:00
Georgi Gerganov	2f069335ab	Adding sanitizer tests	2022-10-08 11:43:42 +03:00
Georgi Gerganov	332c9d77fe	whisper : fix bug in token sampling logic Could overflow buffer	2022-10-08 09:02:41 +03:00
Georgi Gerganov	481cd685d5	ref #10 : option to keep context in "stream" example Seems the results become worse when we keep the context, so by default this is not enabled	2022-10-07 22:30:44 +03:00
Georgi Gerganov	7787b878e1	ref #16 , #22 : add "offset" argument Allows to start processing the input audio at some offset from the beginning. Useful for splitting a long job into multiple tasks.	2022-10-07 22:00:40 +03:00
Georgi Gerganov	167324584b	wip : rpi4 support	2022-10-05 23:03:46 +03:00
Georgi Gerganov	ce1fe95902	wip : improve makefile	2022-10-05 23:03:46 +03:00
Georgi Gerganov	6814cc9b02	Improve result printing	2022-10-04 23:18:15 +03:00
Georgi Gerganov	eba33adadd	Extend C-style API with full inference methods	2022-10-04 23:18:15 +03:00
Georgi Gerganov	6b77124e01	Initial C-style interface for whisper.cpp	2022-10-04 23:18:15 +03:00

1 2 3 4

198 Commits