whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2024-12-20 21:23:06 +00:00

Author	SHA1	Message	Date
bobqianic	08fa34882f	examples : move wav_writer from stream.cpp to common.h (#1317 ) * Allocate class on the stack instead of on the heap * Add class wav_writer * fix some minor issues * fix some minor issues * remove potential misleading API	2023-10-03 22:56:11 +03:00
litong	707507ff6d	Examples: Add save audio to file option in stream.cpp (#1310 ) * save the recorded audio to a file * Alignment -help * Save the correct audio * chage to a consistent coding style * Correct typo * Update examples/stream/stream.cpp * Update examples/stream/stream.cpp * Correct variable misuse * Update examples/stream/stream.cpp * Update examples/stream/stream.cpp * Update examples/stream/stream.cpp * Update examples/stream/stream.cpp --------- Co-authored-by: bobqianic <129547291+bobqianic@users.noreply.github.com>	2023-09-22 23:43:21 +08:00
Przemysław Pawełczyk	b55b505690	build : do not use _GNU_SOURCE gratuitously (#1129 ) * Do not use _GNU_SOURCE gratuitously. What is needed to build whisper.cpp and examples is availability of stuff defined in The Open Group Base Specifications Issue 6 (https://pubs.opengroup.org/onlinepubs/009695399/) known also as Single Unix Specification v3 (SUSv3) or POSIX.1-2001 + XSI extensions, plus some stuff from BSD that is not specified in POSIX.1. Well, that was true until NUMA support was added recently in ggml, so enable GNU libc extensions for Linux builds to cover that. There is no need to penalize musl libc which simply follows standards. Not having feature test macros in source code gives greater flexibility to those wanting to reuse it in 3rd party app, as they can build it with minimal FTM (_XOPEN_SOURCE=600) or other FTM depending on their needs. It builds without issues in Alpine (musl libc), Ubuntu (glibc), MSYS2. * examples : include SDL headers before other headers Avoid macOS build error when _DARWIN_C_SOURCE is not defined, brought by SDL2 relying on Darwin extension memset_pattern4/8/16 (from string.h). * make : enable BSD extensions for DragonFlyBSD to expose RLIMIT_MEMLOCK * make : use BSD-specific FTMs to enable alloca on BSDs * make : fix OpenBSD build by exposing newer POSIX definitions * cmake : follow recent FTM improvements from Makefile	2023-09-07 12:36:14 +03:00
Duncan McConnell	b948361956	examples : add tinydiarization support for streaming (#1137 )	2023-08-03 11:24:07 +03:00
Georgi Gerganov	8ba42095c5	Revert "ggml : do not use _GNU_SOURCE gratuitously (#1027 )" This reverts commit `3f7a03ebe3`.	2023-07-02 21:53:52 +03:00
Przemysław Pawełczyk	3f7a03ebe3	ggml : do not use _GNU_SOURCE gratuitously (#1027 ) * Do not use _GNU_SOURCE gratuitously. What is needed to build whisper.cpp and examples is availability of stuff defined in The Open Group Base Specifications Issue 6 (https://pubs.opengroup.org/onlinepubs/009695399/) known also as Single Unix Specification v3 (SUSv3) or POSIX.1-2001 + XSI extensions. There is no need to penalize musl libc which simply follows standards. Not having feature test macros in source code gives greater flexibility to those wanting to reuse it in 3rd party app, as they can build it with minimal FTM (_XOPEN_SOURCE=600) or other FTM depending on their needs. It builds without issues in Alpine (musl libc), Ubuntu (glibc), MSYS2. * examples : include SDL headers before other headers This is an attempt at fixing macOS build error coming from SDL2 relying on Darwin extension memset_pattern4/8/16 coming from Apple's string.h.	2023-06-25 16:34:30 +03:00
Taras Glek	02ec83c5d5	stream : flush upon finishing inference (#811 )	2023-04-23 17:00:30 +03:00
Georgi Gerganov	f19e23fbd1	whisper : restore decoder temperature fallbacks I disabled this because there were many complaints about slow decoding. The current implementation does not allow batching the decoders when using the "best of" or "beam size" parameters, so the decoding time is proportional to the number of decoders, which is obviously not great. However, now there are even more complaints about wrong decodings and repetition. So, making a compromise by re-enabling the fallbacks, but defaulting to just 2 "best of" / "beam size" decoders. Also, the temperature step is increased from 0.2 to 0.4 - i.e. from maximum of 5 fallbacks to maximum of 2. Also, the stream example now has fallbacks enabled by default. close #471 #477 #508 #612 #719 #731	2023-04-15 16:12:55 +03:00
Ali Alameh	2c4ac2627d	stream : support language auto-detect (#501 ) #445 fix Language auto-detect "auto" flag does not work using the stream tool	2023-04-14 20:02:18 +03:00
Georgi Gerganov	478289a4b3	whisper : set no_context == true by default (#537 )	2023-03-05 20:53:43 +02:00
Georgi Gerganov	09d7d2b68e	examples : refactor in order to reuse code and reduce duplication (#482 ) * examples : refactor common code into a library * examples : refactor common SDL code into a library * make : update Makefile to use common libs * common : fix MSVC M_PI .. * addon.node : link common lib	2023-02-15 19:28:10 +02:00
Georgi Gerganov	9ba66c2fad	stream : fix handling of --step == --length (#416 )	2023-01-18 21:22:52 +02:00
Georgi Gerganov	c9aeb33676	stream : fix --keep_context argument to be used correctly (#354 )	2023-01-16 19:37:40 +02:00
Georgi Gerganov	8de452c18b	Improve decoding (#291 ) * whisper : prepare infra for new decoding strategies * whisper : apply logit filters and compute logprobs * whisper : add whisper_get_logits() * whisper : separate self and cross attention memory Initial step needed for supporting parallel decoders * whisper : move probs_id buffer to whisper_context * whisper : refactor kv cache into separate struct * whisper : move self-attention kv cache to whisper_decoder * whisper : wip decoding parameters + strategies * whisper : wip decoding parameters + strategies (part 2) * whisper : wip decoding parameters + strategies (part 3) * whisper : wip decoding parameters + strategies (part 4) * whisper : fix prompt_past update to not include prompt_init * whisper : temperature + best_of support * whisper : support for compression_ration_threshold We actually use entropy, but it is similar * command : fix example to use logits instead of obsolete probs * whisper : handle empty sequence ranking * whisper : add WHISPER_DEBUG + diagnostic prints + new main args * whisper : minor fixes * whisper : add beam-search support * whisper : bug fix when there no previous context * whisper : add comments * stream : disable temperature fallback For real-time processing, we always want a single decoder running at T=0 * whisper.swiftui : update example - fix paths + add empty folders	2023-01-15 11:29:57 +02:00
Georgi Gerganov	a6dbd9188b	stream : fix a bug that inserted a lot of empty audio at the start The quality was terrible due to this	2023-01-14 19:20:47 +02:00
Syahmi Azhar	1512545149	whisper : add loader class to allow loading from buffer and others (#353 ) * whisper : add loader to allow loading from other than file * whisper : rename whisper_init to whisper_init_from_file * whisper : add whisper_init_from_buffer * android : Delete local.properties * android : load models directly from assets * whisper : adding <stddef.h> needed for size_t + code style Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-01-08 13:03:33 +02:00
Georgi Gerganov	a466c3404d	stream : fix data race on bool + avoid division-by-zero	2023-01-02 10:20:50 +02:00
Andy Maloney	dc90efd504	examples : small code cleanups (#322 ) - remove unnecessary initialization of string to "" - use empty() instead of checking size() - use emplace_back instead of push_back - use nullptr instead of NULL - remove unnecessary call to .data() on string - use character overload of find_first_of() instead of passing a string	2022-12-23 20:18:51 +02:00
Georgi Gerganov	99da1e5cc8	cmake : enable and fix -Wall -Wextra -Wpedantic C++ warnings	2022-12-19 20:45:08 +02:00
Georgi Gerganov	a82d331034	stream : update README.md + comments	2022-12-16 18:04:19 +02:00
Georgi Gerganov	34e0b4b9ef	stream : fix build	2022-12-15 20:15:36 +02:00
Georgi Gerganov	b0f8013eb9	stream : add sliding window mode	2022-12-15 19:59:17 +02:00
Georgi Gerganov	be16dfa038	whisper.wasm : do not block page while processing (close #86 )	2022-11-25 23:07:42 +02:00
Georgi Gerganov	b8ce25dec1	refactoring : more readable code	2022-11-25 19:28:04 +02:00
Georgi Gerganov	d7024cf9dc	main, stream : remove --verbose flag (#178 )	2022-11-24 17:52:04 +02:00
Georgi Gerganov	385236d1d3	stream : "-kc" now enables context keeping from previous segment (#90 ) By default, the context keeping is disabled	2022-11-22 18:21:15 +02:00
M. Eren Akbiyik	63ae03b8e0	Prompt previous tokens for streaming (#163 ) * feat: prompt previous tokens for streaming I used a vector pointer instead of vector itself because it gave weird errors, and why not * convert vector to use with C api * feat: remove old refs, check for prompt size * feat: use better way of getting the pointer	2022-11-22 18:10:35 +02:00
Georgi Gerganov	f2df9bd768	stream : add "max_tokens" cli arg Controls the max tokens per segment for the stream example	2022-11-20 21:22:41 +02:00
Georgi Gerganov	fb8d77f760	stream : add "audio_ctx" parameter Used to overwrite the audio context size of the Encoder. For example, setting "audio_ctx = 512" will make it run about 3 times faster, processing about 10s of audio, instead of 30s. The transcription quality drops, but this can be used for real-time streaming purposes where performance is important.	2022-11-20 21:22:41 +02:00
Georgi Gerganov	62b5ff875c	stream : add "max_tokens" parameter Used to limit the number of tokens in a segment. Useful to battle with word repetition when using partial encoder context	2022-11-20 21:22:41 +02:00
Georgi Gerganov	d351771a4b	stream : add "single_segment" option Force the entire audio chunk to be transcribed into a single segment	2022-11-20 21:22:41 +02:00
Georgi Gerganov	c058aaf22e	stream : partial encoder experiments	2022-11-20 21:22:41 +02:00
Georgi Gerganov	83c742f1a7	whisper : add option to speed up the audio tempo by x2 Using a Phase Vocoder for speeding up the audio tempo by scaling down the frequencies in the frequency domain. This reduces the computation in the Encoder by a factor of 2. The transcription accuracy is degraded, but for slow to normal speech - it seems to be still very good. I think this can find application for real-time transcription - i.e. the "stream" example.	2022-11-13 16:25:43 +02:00
Georgi Gerganov	5a9e4260a6	stream : add "--capture" option to select capture device (ref #10 )	2022-10-30 08:27:04 +02:00
Georgi Gerganov	8347a7bb6a	stream : few updates to make it compatible for Vim usage (#99 )	2022-10-27 22:10:50 +03:00
Georgi Gerganov	c6710efde2	refactoring : move main + stream in examples + other stuff	2022-10-25 20:53:48 +03:00

36 Commits