whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-06-17 14:28:07 +00:00

Author	SHA1	Message	Date
Hrishikesh Barman	925915ae37	whisper : move progress calculation out of whisper.cpp (#1081 ) Current `progress_step` was hardcoded into whisper.cpp, this resulted in bindings having to access progress only at that step even if progress callback was being called at every iteration. With this change we get greater granularity progress reporting from whisper.cpp and bindings/implementations can define their own progress step.	2023-07-25 18:53:34 +03:00
AustinMroz	97f4a7fee0	examples : add Vim plugin (#1131 ) * Initial proof of concept Vim plugin At present, this is likely only slightly better than feature parity with the existing whisper.nvim Known issues: Trailing whitespace Up to an existing length(5 seconds) of speech may be processed when listening is enabled CPU cycles are spent processing speech even when not listening. Fixing these issues is likely dependent upon future efforts to create a dedicated library instead of wrapping examples/stream * Support $WHISPER_CPP_HOME environment variable A minor misunderstanding of the whisper.nvim implementation resulted in a plugin that was functional, but not a drop in replacement as it should be now.	2023-07-25 18:34:23 +03:00
Georgi Gerganov	4774d2feb0	whisper : minor OpenVINO refactoring (#1037 ) Hopefully I didn't break something - haven't tested	2023-07-04 20:28:27 +03:00
Ryan Metcalfe	62b81276e0	whisper : add OpenVINO support (#1037 ) * openvino: use OpenVINO encoder inference * openvino: add python script for OpenVINO model generation * whisper: Fix 'unused' warnings when OpenVINO isn't enabled in build * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * whisper: Fix compilation error * whisper: revert whisper_get_openvino_path_encoder & whisper_get_openvino_path_cache to non-const func signatures * cmake: Add openvino-encoder as separate object target * whisper : minor style fixes * minor : indentation fixes --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-07-04 15:56:11 +03:00
Akash Mahajan	c8d0f5fe98	whisper : support speaker segmentation (local diarization) of mono audio via tinydiarize (#1058 ) * add HuggingFace mirror to download ggml model * support tdrz via simple hack overriding solm tokens * fix incorrect translate/transcribe token_ids that are not static const * add apollo 13 sample for tdrz demo * render [SPEAKER TURN] consistently in all terminal output using vocab.id_to_token * extend whisper_segment with speaker_turn_next field and save in json output * fix failing go build * slipped in some python syntax whoops * whisper : finalize tinydiarize support (add flag + fixes) * whisper : tdrz support for word-level timestamps (respect max_len) * java : try to fix tests after adding tdrz_enable flag * main : remove TODO leftover * java : fix params order list after adding "tdrz_enable" * whisper : fix solm and add nosp token * main : print tinydiarize help --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-07-04 09:45:00 +03:00
Georgi Gerganov	fdf58a6668	talk-llama : fix new rope interface	2023-07-03 19:24:01 +03:00
Georgi Gerganov	8ba42095c5	Revert "ggml : do not use _GNU_SOURCE gratuitously (#1027 )" This reverts commit `3f7a03ebe3`.	2023-07-02 21:53:52 +03:00
Georgi Gerganov	d6509bf78d	ggml : sync latest repo (mostly refactoring changes)	2023-07-02 21:46:09 +03:00
Przemysław Pawełczyk	85ed71aaec	talk-llama : fix build on macOS (#1062 ) * talk-llama : use posix_madvise() instead of madvise() derived from BSD sed -i 's,\<madvise\>,posix_&,g;s,\<MADV_,POSIX_&,g' examples/talk-llama/llama-util.h * make : enable Darwin extensions for macOS builds This is an attempt at fixing macOS build error coming from the fact that RLIMIT_MEMLOCK define is not available there without Darwin extensions.	2023-06-28 22:34:50 +03:00
Przemysław Pawełczyk	3f7a03ebe3	ggml : do not use _GNU_SOURCE gratuitously (#1027 ) * Do not use _GNU_SOURCE gratuitously. What is needed to build whisper.cpp and examples is availability of stuff defined in The Open Group Base Specifications Issue 6 (https://pubs.opengroup.org/onlinepubs/009695399/) known also as Single Unix Specification v3 (SUSv3) or POSIX.1-2001 + XSI extensions. There is no need to penalize musl libc which simply follows standards. Not having feature test macros in source code gives greater flexibility to those wanting to reuse it in 3rd party app, as they can build it with minimal FTM (_XOPEN_SOURCE=600) or other FTM depending on their needs. It builds without issues in Alpine (musl libc), Ubuntu (glibc), MSYS2. * examples : include SDL headers before other headers This is an attempt at fixing macOS build error coming from SDL2 relying on Darwin extension memset_pattern4/8/16 coming from Apple's string.h.	2023-06-25 16:34:30 +03:00
Przemysław Pawełczyk	62642bb61c	talk-llama : fix build after ggml sync (#1049 ) sed -i 's,GGML_BACKEND_CUDA,GGML_BACKEND_GPU,g' examples/talk-llama/llama.cpp	2023-06-25 16:13:50 +03:00
Roddur Dasgupta	f11f33f1c0	models : cd statements are quoted to allow spaces in path (#1041 )	2023-06-25 15:27:28 +03:00
Colin	14baf2e7f3	main : add diarization support for all current output types (#1031 ) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-06-25 15:07:57 +03:00
Georgi Gerganov	5feb0dffba	ggml : sync latest ggml lib	2023-06-25 14:30:44 +03:00
faker	598f607e28	main : gracefully exit when invalid params are passed (#1002 ) * Refactor whisper_params_parse to return false on failure * Updated help flag behavior	2023-06-25 13:51:59 +03:00
Nicholas Albion	5b9e59bc07	`speak` scripts for Windows	2023-06-01 22:45:00 +10:00
geniusnut	ce6f747064	whisper.android : support decode wav file has 2 channels (#972 )	2023-05-31 10:13:14 +03:00
DGdev91	5e2b3407ef	examples : update elevenlabs scripts to use official python API (#837 ) * Update elevenlabs example to use ufficial python API * Update elevenlabs example to use official python API	2023-05-24 21:11:01 +03:00
Georgi Gerganov	77eab3fbfe	talk-llama : sync latest llama.cpp (close #922 , close #954 )	2023-05-23 14:04:39 +03:00
Georgi Gerganov	e410cfc3ce	ggml : sync latest ggml repo - new Q4 and Q8 quantization - updated CUDA	2023-05-20 18:56:30 +03:00
Georgi Gerganov	0cb820e0f9	talk-llama : fix build + sync latest llama.cpp	2023-05-14 18:46:42 +03:00
Georgi Gerganov	e693074aa6	ggml : sync latest ggml - New Q4 and Q5 formats - Various improvements	2023-05-14 18:04:23 +03:00
Rich Jones	d652cf12ec	main : fix help for --no-timestamps arg (#908 )	2023-05-14 17:54:57 +03:00
Jhen-Jie Hong	5300117471	whisper.objc : enable Core ML in example & fix segmentation fault (#910 ) * coreml : update endcoder header import path * coreml : force objc_arc in whisper-encoder.mm * whisper.objc : create coreml/ group link * whisper.objc : add coreml model link * whisper.objc : update readme * coreml : use -fobjc-arc for coreml/whisper-encoder.mm * ci: create dummy .mlmodelc for pass ios build * whisper.objc : update readme --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-05-14 09:47:02 +03:00
Luis Herrera	4e4d00c67a	talk-llama : only copy used KV cache in get / set state (#890 ) --------- Co-authored-by: ejones <evan.q.jones@gmail.com>	2023-05-08 20:59:21 +03:00
Luis Herrera	0bf680fea2	talk-llama : fix session prompt load (#854 )	2023-05-02 20:05:27 +03:00
CRD716	b806420873	whisper : add detect-language mode (#853 ) * add detectlanguage flag * renaming and help * no idea why that last one didn't commit * run language detection if dl is set * help message fix * various fixes * fix quitting * fix language being english on print	2023-05-02 19:51:52 +03:00
Luis Herrera	be5911a9f3	talk-llama : add --session support (#845 ) * feat: adding session support * readme: adding --session info in examples/talk-llama * llama: adding session fixes * readme: updating session doc * talk-llama: update the value of need_to_save_session to true in order to save the session in the subsequent interaction * talk-llama: adding missing function which updates session_tokens	2023-05-01 20:18:10 +03:00
Georgi Gerganov	7765770f89	whisper : add memory sizes for Q8_0 (close #846 )	2023-05-01 10:03:56 +03:00
Baffin Lee	872a85ae94	whisper.wasm : fix typo in readme (#832 )	2023-05-01 09:28:05 +03:00
Georgi Gerganov	c94c469592	whisper : fix quantize bug (#842 ) * whisper : debug * whisper : fix bug during quantization	2023-04-30 22:50:04 +03:00
Georgi Gerganov	4a7d49af95	examples : fix + refactor Levenshtein distance	2023-04-30 19:12:49 +03:00
Georgi Gerganov	794b162a46	whisper : add integer quantization support (#540 ) * whisper : add integer quantization support * examples : add common-ggml + prepare to add "quantize" tool * whisper : quantization tool ready * whisper : fix F32 support * whisper : try to fix shared lib linkage * wasm : update quantized models to Q5 * bench.wasm : remove "medium" button * bench.wasm : fix custom model button * ggml : add Q5_0 and Q5_1 WASM SIMD * wasm : add quantized models to all WASM examples * wasm : bump DB version number to 2 * talk-llama : update example to latest llama.cpp * node : increase test timeout to 10s * readme : add information for model quantization * wasm : add links to other examples	2023-04-30 18:51:57 +03:00
Georgi Gerganov	5fd1bdd7fc	whisper : add GPU support via cuBLAS (#834 ) * make : add WHISPER_CUBLAS * make : fix CUBLAS build * whisper : disable Flash Attention + adjust memory buffers * whisper : remove old commented code * readme : add cuBLAS instructions * cmake : add WHISPER_CUBLAS option * gitignore : ignore build-cublas	2023-04-30 12:14:33 +03:00
Zollner	5cc17418c7	whisper.android : add some tips (#816 )	2023-04-29 11:00:20 +03:00
Laytan Laats	70567eff23	main : escape quotes in csv output (#815 )	2023-04-23 19:01:59 +03:00
Taras Glek	02ec83c5d5	stream : flush upon finishing inference (#811 )	2023-04-23 17:00:30 +03:00
Philipp Zabel	2bd4b8d577	examples : add missing #include <cstdint> (#798 ) common.cpp uses uint8_t and uint64_t, which are defined in <cstdint>.	2023-04-23 16:52:52 +03:00
Tauseef Mohiuddin	eecf2c3d41	main : update escape_double_quotes() function (#776 ) Updated the escape_double_quotes() function such that the function now escapes both double quotes and backslashes in the input string. Changes Made: - Renamed the function to escape_quotes_and_backslashes - Modified the condition in the first loop to increment the value of 'escaped_length' for both double quotes and backslashes. - Modified the condition in second loop to add a backslash before the current character if it is a double quote or a backslash. Resolves: #769	2023-04-23 16:47:30 +03:00
Georgi Gerganov	f19e23fbd1	whisper : restore decoder temperature fallbacks I disabled this because there were many complaints about slow decoding. The current implementation does not allow batching the decoders when using the "best of" or "beam size" parameters, so the decoding time is proportional to the number of decoders, which is obviously not great. However, now there are even more complaints about wrong decodings and repetition. So, making a compromise by re-enabling the fallbacks, but defaulting to just 2 "best of" / "beam size" decoders. Also, the temperature step is increased from 0.2 to 0.4 - i.e. from maximum of 5 fallbacks to maximum of 2. Also, the stream example now has fallbacks enabled by default. close #471 #477 #508 #612 #719 #731	2023-04-15 16:12:55 +03:00
Bader-eddine Ouaich	2c856fb9e5	whisper : fix potential memory leaks (#740 ) * fix potential memory leak if whisper_init_state failed * fix potential memory leak if gpt2_init failed	2023-04-14 20:05:56 +03:00
Ali Alameh	2c4ac2627d	stream : support language auto-detect (#501 ) #445 fix Language auto-detect "auto" flag does not work using the stream tool	2023-04-14 20:02:18 +03:00
DGdev91	001083a769	talk, talk-llama : add basic example script for eleven-labs tts (#728 )	2023-04-14 19:53:58 +03:00
Maciek	78548dc03f	talk-llama : correct default speak.sh path (#720 ) There is `speak.sh` file in `./examples/talk-llama` as described in README. However `./examples/talk/speak.sh` is used in `talk-llama.cpp`, this commit corrects that.	2023-04-14 19:36:09 +03:00
LittleLoli	66110dafcc	main : add lrc output support (#718 ) * add lrc output support. * fix wrong comment	2023-04-14 19:35:33 +03:00
Georgi Gerganov	514cd04452	whisper : fix bug in prompt processing (close #705 ) Was dereferencing a dangling pointer	2023-04-14 19:17:07 +03:00
Georgi Gerganov	114df388fe	talk-llama : increase context to 2048	2023-04-10 23:09:15 +03:00
Georgi Gerganov	ea36831459	talk-llama : update to latest llama.cpp (improved performance)	2023-04-10 22:59:13 +03:00
InconsolableCellist	5e6e2187a3	talk-llama : fixing usage message for talk-llama (#687 ) "-ml" instead of "-mg" for specifying the llama file	2023-03-30 00:10:20 +03:00
Georgi Gerganov	a7f1f33715	main : add <cstring> header	2023-03-29 23:59:45 +03:00

1 2 3 4 5

222 Commits