whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-01-14 08:50:59 +00:00

Author	SHA1	Message	Date
bobqianic	7e54df414e	whisper : significantly improve the inference quality (#1148 ) * Fix MSVC compile error C3688 Instead of simply using 'add_compile_options(/utf-8)' to address the MSVC compile error C3688, a better approach would be to handle it in a way that prevents passing '/utf-8' to NVCC. * Significantly improve inference quality In the function `log_mel_spectrogram_worker_thread`, there's an array out-of-bounds issue occurring during the calculation of complex number moduli. This issue is causing disruptions in the FFT spectrum, which, in turn, is reducing the quality of inference. * Significantly improve inference quality At last, I've pinpointed the actual source of the problem. Given that the frequency spectrum generated from real input data is symmetrical around the Nyquist frequency, there's a for-loop within the `log_mel_spectrogram_worker_thread` function that attempts to fold the frequency spectrum. Regrettably, a bug within this for-loop is causing a frame shift in the frequency spectrum. The previous attempt to remedy this, which involved using `fft_size + 1` when calculating the modulus, was merely a band-aid solution and did not address the underlying issue. * Addressed a few minor issues Fixed the issue of `fft_out` continuously expanding. Resolved the fallback caused by using 'break' instead of `fft_in[j] = 0`. * Significantly improve inference quality Thanks for your patience everyone. It's finally sorted out. Now, the right side of the FFT spectrum is being flipped over to the left, and the amplitudes at corresponding positions on the left and right are added together (the spectrum on the left needs to be shifted by one position), then the average is calculated. FFT_OUT[0] is no longer discarded, making full use of the limited space to pack in more information. * Add annotation and performance improvement * Calculate FFT only when fft_in are not all zero * Some minor performance improvement * Fixed a bug impacting inference quality * The first version after all the analysis is completed. * Fix some bugs and add debug mode * Fixed several bugs * Temporarily disable speed-up mode and add debug mode. * Add debug mode * Disable speed-up mode and add debug mode * Fix CI error (#1) * Fix error * Fix error * Fixed several bugs including [BLANK_AUDIO] problem * Remove Hard-coded hann window * Some Final Fix (#2) * Fix error * Fix error * Probably the last commit * Probably the last commit * whisper : minor coding style changes * whisper : remove debug from public API --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-08-27 19:51:33 +03:00
junkfood	20a80972f4	whisper.android : migrate from ndk-build to CMake (#1204 )	2023-08-27 19:35:16 +03:00
Yunès	7ef3f3837e	main : log probs to text file (#1205 ) * token/probability file generated with -ls * code comment cleaning * main : indentations --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-08-27 19:09:06 +03:00
Jhen-Jie Hong	a4bb2df36a	quantize : fix load vocab crash when len is 128 (#1160 ) * quantize : fix load vocab crash when len is 128 * ci : add quantize job	2023-08-06 11:04:42 +03:00
Duncan McConnell	b948361956	examples : add tinydiarization support for streaming (#1137 )	2023-08-03 11:24:07 +03:00
Hrishikesh Barman	925915ae37	whisper : move progress calculation out of whisper.cpp (#1081 ) Current `progress_step` was hardcoded into whisper.cpp, this resulted in bindings having to access progress only at that step even if progress callback was being called at every iteration. With this change we get greater granularity progress reporting from whisper.cpp and bindings/implementations can define their own progress step.	2023-07-25 18:53:34 +03:00
AustinMroz	97f4a7fee0	examples : add Vim plugin (#1131 ) * Initial proof of concept Vim plugin At present, this is likely only slightly better than feature parity with the existing whisper.nvim Known issues: Trailing whitespace Up to an existing length(5 seconds) of speech may be processed when listening is enabled CPU cycles are spent processing speech even when not listening. Fixing these issues is likely dependent upon future efforts to create a dedicated library instead of wrapping examples/stream * Support $WHISPER_CPP_HOME environment variable A minor misunderstanding of the whisper.nvim implementation resulted in a plugin that was functional, but not a drop in replacement as it should be now.	2023-07-25 18:34:23 +03:00
Georgi Gerganov	4774d2feb0	whisper : minor OpenVINO refactoring (#1037 ) Hopefully I didn't break something - haven't tested	2023-07-04 20:28:27 +03:00
Ryan Metcalfe	62b81276e0	whisper : add OpenVINO support (#1037 ) * openvino: use OpenVINO encoder inference * openvino: add python script for OpenVINO model generation * whisper: Fix 'unused' warnings when OpenVINO isn't enabled in build * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * whisper: Fix compilation error * whisper: revert whisper_get_openvino_path_encoder & whisper_get_openvino_path_cache to non-const func signatures * cmake: Add openvino-encoder as separate object target * whisper : minor style fixes * minor : indentation fixes --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-07-04 15:56:11 +03:00
Akash Mahajan	c8d0f5fe98	whisper : support speaker segmentation (local diarization) of mono audio via tinydiarize (#1058 ) * add HuggingFace mirror to download ggml model * support tdrz via simple hack overriding solm tokens * fix incorrect translate/transcribe token_ids that are not static const * add apollo 13 sample for tdrz demo * render [SPEAKER TURN] consistently in all terminal output using vocab.id_to_token * extend whisper_segment with speaker_turn_next field and save in json output * fix failing go build * slipped in some python syntax whoops * whisper : finalize tinydiarize support (add flag + fixes) * whisper : tdrz support for word-level timestamps (respect max_len) * java : try to fix tests after adding tdrz_enable flag * main : remove TODO leftover * java : fix params order list after adding "tdrz_enable" * whisper : fix solm and add nosp token * main : print tinydiarize help --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-07-04 09:45:00 +03:00
Georgi Gerganov	fdf58a6668	talk-llama : fix new rope interface	2023-07-03 19:24:01 +03:00
Georgi Gerganov	8ba42095c5	Revert "ggml : do not use _GNU_SOURCE gratuitously (#1027 )" This reverts commit `3f7a03ebe3`.	2023-07-02 21:53:52 +03:00
Georgi Gerganov	d6509bf78d	ggml : sync latest repo (mostly refactoring changes)	2023-07-02 21:46:09 +03:00
Przemysław Pawełczyk	85ed71aaec	talk-llama : fix build on macOS (#1062 ) * talk-llama : use posix_madvise() instead of madvise() derived from BSD sed -i 's,\<madvise\>,posix_&,g;s,\<MADV_,POSIX_&,g' examples/talk-llama/llama-util.h * make : enable Darwin extensions for macOS builds This is an attempt at fixing macOS build error coming from the fact that RLIMIT_MEMLOCK define is not available there without Darwin extensions.	2023-06-28 22:34:50 +03:00
Przemysław Pawełczyk	3f7a03ebe3	ggml : do not use _GNU_SOURCE gratuitously (#1027 ) * Do not use _GNU_SOURCE gratuitously. What is needed to build whisper.cpp and examples is availability of stuff defined in The Open Group Base Specifications Issue 6 (https://pubs.opengroup.org/onlinepubs/009695399/) known also as Single Unix Specification v3 (SUSv3) or POSIX.1-2001 + XSI extensions. There is no need to penalize musl libc which simply follows standards. Not having feature test macros in source code gives greater flexibility to those wanting to reuse it in 3rd party app, as they can build it with minimal FTM (_XOPEN_SOURCE=600) or other FTM depending on their needs. It builds without issues in Alpine (musl libc), Ubuntu (glibc), MSYS2. * examples : include SDL headers before other headers This is an attempt at fixing macOS build error coming from SDL2 relying on Darwin extension memset_pattern4/8/16 coming from Apple's string.h.	2023-06-25 16:34:30 +03:00
Przemysław Pawełczyk	62642bb61c	talk-llama : fix build after ggml sync (#1049 ) sed -i 's,GGML_BACKEND_CUDA,GGML_BACKEND_GPU,g' examples/talk-llama/llama.cpp	2023-06-25 16:13:50 +03:00
Roddur Dasgupta	f11f33f1c0	models : cd statements are quoted to allow spaces in path (#1041 )	2023-06-25 15:27:28 +03:00
Colin	14baf2e7f3	main : add diarization support for all current output types (#1031 ) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-06-25 15:07:57 +03:00
Georgi Gerganov	5feb0dffba	ggml : sync latest ggml lib	2023-06-25 14:30:44 +03:00
faker	598f607e28	main : gracefully exit when invalid params are passed (#1002 ) * Refactor whisper_params_parse to return false on failure * Updated help flag behavior	2023-06-25 13:51:59 +03:00
Nicholas Albion	5b9e59bc07	`speak` scripts for Windows	2023-06-01 22:45:00 +10:00
geniusnut	ce6f747064	whisper.android : support decode wav file has 2 channels (#972 )	2023-05-31 10:13:14 +03:00
DGdev91	5e2b3407ef	examples : update elevenlabs scripts to use official python API (#837 ) * Update elevenlabs example to use ufficial python API * Update elevenlabs example to use official python API	2023-05-24 21:11:01 +03:00
Georgi Gerganov	77eab3fbfe	talk-llama : sync latest llama.cpp (close #922 , close #954 )	2023-05-23 14:04:39 +03:00
Georgi Gerganov	e410cfc3ce	ggml : sync latest ggml repo - new Q4 and Q8 quantization - updated CUDA	2023-05-20 18:56:30 +03:00
Georgi Gerganov	0cb820e0f9	talk-llama : fix build + sync latest llama.cpp	2023-05-14 18:46:42 +03:00
Georgi Gerganov	e693074aa6	ggml : sync latest ggml - New Q4 and Q5 formats - Various improvements	2023-05-14 18:04:23 +03:00
Rich Jones	d652cf12ec	main : fix help for --no-timestamps arg (#908 )	2023-05-14 17:54:57 +03:00
Jhen-Jie Hong	5300117471	whisper.objc : enable Core ML in example & fix segmentation fault (#910 ) * coreml : update endcoder header import path * coreml : force objc_arc in whisper-encoder.mm * whisper.objc : create coreml/ group link * whisper.objc : add coreml model link * whisper.objc : update readme * coreml : use -fobjc-arc for coreml/whisper-encoder.mm * ci: create dummy .mlmodelc for pass ios build * whisper.objc : update readme --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-05-14 09:47:02 +03:00
Luis Herrera	4e4d00c67a	talk-llama : only copy used KV cache in get / set state (#890 ) --------- Co-authored-by: ejones <evan.q.jones@gmail.com>	2023-05-08 20:59:21 +03:00
Luis Herrera	0bf680fea2	talk-llama : fix session prompt load (#854 )	2023-05-02 20:05:27 +03:00
CRD716	b806420873	whisper : add detect-language mode (#853 ) * add detectlanguage flag * renaming and help * no idea why that last one didn't commit * run language detection if dl is set * help message fix * various fixes * fix quitting * fix language being english on print	2023-05-02 19:51:52 +03:00
Luis Herrera	be5911a9f3	talk-llama : add --session support (#845 ) * feat: adding session support * readme: adding --session info in examples/talk-llama * llama: adding session fixes * readme: updating session doc * talk-llama: update the value of need_to_save_session to true in order to save the session in the subsequent interaction * talk-llama: adding missing function which updates session_tokens	2023-05-01 20:18:10 +03:00
Georgi Gerganov	7765770f89	whisper : add memory sizes for Q8_0 (close #846 )	2023-05-01 10:03:56 +03:00
Baffin Lee	872a85ae94	whisper.wasm : fix typo in readme (#832 )	2023-05-01 09:28:05 +03:00
Georgi Gerganov	c94c469592	whisper : fix quantize bug (#842 ) * whisper : debug * whisper : fix bug during quantization	2023-04-30 22:50:04 +03:00
Georgi Gerganov	4a7d49af95	examples : fix + refactor Levenshtein distance	2023-04-30 19:12:49 +03:00
Georgi Gerganov	794b162a46	whisper : add integer quantization support (#540 ) * whisper : add integer quantization support * examples : add common-ggml + prepare to add "quantize" tool * whisper : quantization tool ready * whisper : fix F32 support * whisper : try to fix shared lib linkage * wasm : update quantized models to Q5 * bench.wasm : remove "medium" button * bench.wasm : fix custom model button * ggml : add Q5_0 and Q5_1 WASM SIMD * wasm : add quantized models to all WASM examples * wasm : bump DB version number to 2 * talk-llama : update example to latest llama.cpp * node : increase test timeout to 10s * readme : add information for model quantization * wasm : add links to other examples	2023-04-30 18:51:57 +03:00
Georgi Gerganov	5fd1bdd7fc	whisper : add GPU support via cuBLAS (#834 ) * make : add WHISPER_CUBLAS * make : fix CUBLAS build * whisper : disable Flash Attention + adjust memory buffers * whisper : remove old commented code * readme : add cuBLAS instructions * cmake : add WHISPER_CUBLAS option * gitignore : ignore build-cublas	2023-04-30 12:14:33 +03:00
Zollner	5cc17418c7	whisper.android : add some tips (#816 )	2023-04-29 11:00:20 +03:00
Laytan Laats	70567eff23	main : escape quotes in csv output (#815 )	2023-04-23 19:01:59 +03:00
Taras Glek	02ec83c5d5	stream : flush upon finishing inference (#811 )	2023-04-23 17:00:30 +03:00
Philipp Zabel	2bd4b8d577	examples : add missing #include <cstdint> (#798 ) common.cpp uses uint8_t and uint64_t, which are defined in <cstdint>.	2023-04-23 16:52:52 +03:00
Tauseef Mohiuddin	eecf2c3d41	main : update escape_double_quotes() function (#776 ) Updated the escape_double_quotes() function such that the function now escapes both double quotes and backslashes in the input string. Changes Made: - Renamed the function to escape_quotes_and_backslashes - Modified the condition in the first loop to increment the value of 'escaped_length' for both double quotes and backslashes. - Modified the condition in second loop to add a backslash before the current character if it is a double quote or a backslash. Resolves: #769	2023-04-23 16:47:30 +03:00
Georgi Gerganov	f19e23fbd1	whisper : restore decoder temperature fallbacks I disabled this because there were many complaints about slow decoding. The current implementation does not allow batching the decoders when using the "best of" or "beam size" parameters, so the decoding time is proportional to the number of decoders, which is obviously not great. However, now there are even more complaints about wrong decodings and repetition. So, making a compromise by re-enabling the fallbacks, but defaulting to just 2 "best of" / "beam size" decoders. Also, the temperature step is increased from 0.2 to 0.4 - i.e. from maximum of 5 fallbacks to maximum of 2. Also, the stream example now has fallbacks enabled by default. close #471 #477 #508 #612 #719 #731	2023-04-15 16:12:55 +03:00
Bader-eddine Ouaich	2c856fb9e5	whisper : fix potential memory leaks (#740 ) * fix potential memory leak if whisper_init_state failed * fix potential memory leak if gpt2_init failed	2023-04-14 20:05:56 +03:00
Ali Alameh	2c4ac2627d	stream : support language auto-detect (#501 ) #445 fix Language auto-detect "auto" flag does not work using the stream tool	2023-04-14 20:02:18 +03:00
DGdev91	001083a769	talk, talk-llama : add basic example script for eleven-labs tts (#728 )	2023-04-14 19:53:58 +03:00
Maciek	78548dc03f	talk-llama : correct default speak.sh path (#720 ) There is `speak.sh` file in `./examples/talk-llama` as described in README. However `./examples/talk/speak.sh` is used in `talk-llama.cpp`, this commit corrects that.	2023-04-14 19:36:09 +03:00
LittleLoli	66110dafcc	main : add lrc output support (#718 ) * add lrc output support. * fix wrong comment	2023-04-14 19:35:33 +03:00
Georgi Gerganov	514cd04452	whisper : fix bug in prompt processing (close #705 ) Was dereferencing a dangling pointer	2023-04-14 19:17:07 +03:00
Georgi Gerganov	114df388fe	talk-llama : increase context to 2048	2023-04-10 23:09:15 +03:00
Georgi Gerganov	ea36831459	talk-llama : update to latest llama.cpp (improved performance)	2023-04-10 22:59:13 +03:00
InconsolableCellist	5e6e2187a3	talk-llama : fixing usage message for talk-llama (#687 ) "-ml" instead of "-mg" for specifying the llama file	2023-03-30 00:10:20 +03:00
Georgi Gerganov	a7f1f33715	main : add <cstring> header	2023-03-29 23:59:45 +03:00
Lucas Zanek	86ecfc6333	whisper.addon : fixed test to new async implementation (#686 ) * fixed blocking code on node addon * modify the example to run async * format * added logic to see the whisper output * added logic to see the whisper output * removed extra function for more clean example * fixed whisper test to new async implementation	2023-03-29 23:59:17 +03:00
Egor Egorov	0f759f125d	main : fix typo in JSON output (#648 ) * typo in JSON output * fix double quotes in JSON output	2023-03-29 23:26:39 +03:00
Jhen-Jie Hong	eefed45e37	whisper : add initial_prompt param (#645 )	2023-03-29 23:23:23 +03:00
Jonno	21c1e6afc5	whisper.swiftui : update README.md (#682 ) - Slight tweaks to README for improved comprehension.	2023-03-29 23:04:38 +03:00
Evan Jones	a47e812a54	talk-llama : add alpaca support (#668 )	2023-03-29 23:01:14 +03:00
Georgi Gerganov	e5c197d8aa	talk-llama : add discussion link	2023-03-28 10:11:34 +03:00
Georgi Gerganov	7cd1d3bc34	talk-llama : try to fix windows build ..	2023-03-27 22:40:59 +03:00
Georgi Gerganov	4a0deb8b1e	talk-llama : add new example + sync ggml from llama.cpp (#664 ) * talk-llama : talk with LLaMA AI * talk.llama : disable EOS token * talk-llama : add README instructions * ggml : fix build in debug	2023-03-27 21:00:32 +03:00
Lucas Zanek	21165580a1	Nodejs Addon blocking main thread. Implemented Napi::AsyncWorker (#642 ) * fixed blocking code on node addon * modify the example to run async * format * added logic to see the whisper output * added logic to see the whisper output * removed extra function for more clean example	2023-03-22 22:19:22 +02:00
Jhen-Jie Hong	1d749919e3	whisper.objc : add `-O3 -DNDEBUG` in release mode (#640 )	2023-03-22 22:16:04 +02:00
Leo Moll	8fcd1a3b32	main : provide option for creating JSON output (#615 ) * examples : provide option for exporting also as JSON file (ggerganov/whisper.cpp#614) * main : remove leftovers --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-22 21:37:36 +02:00
Georgi Gerganov	1beff6f66d	models : change HF hosting from dataset to model	2023-03-22 20:44:56 +02:00
Takeshi Inoue	09e9068007	whisper.android : support benchmark for Android example. (#542 ) * whisper.android: Support benchmark for Android example. * whisper.android: update screenshot in README. * update: Make text selectable for copy & paste. * Update whisper.h to restore API name Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * whisper.android: Restore original API names. --------- Co-authored-by: tinoue <tinoue@xevo.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-07 21:36:30 +02:00
venkr	b597c5a779	qual-bench.sh : add quality comparison tool, and update main.cpp to allow using a font file (#569 )	2023-03-06 19:18:11 +02:00
Takeshi Inoue	a3fb6c507f	whisper.android : enable fp16 instrinsics (FP16_VA) which is supported by ARMv8.2 or later. (#572 )	2023-03-06 19:15:57 +02:00
sandrohanea	59fdcd19c8	whisper : add whisper_state + default state on the whisper_context (#523 ) * Added whisper state + default state on the whisper_context * Fixed some examples and bindings * Fixed whisper_n_len (which was used in some binding) and added whisper_n_len_from_state * Fixed comments * whisper : reuse kv_cache_free() and fix compiler warnings * whisper : clean-up the API comments --------- Co-authored-by: Sandro Hanea <sandrohanea@microsoft.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-05 21:42:19 +02:00
Georgi Gerganov	478289a4b3	whisper : set no_context == true by default (#537 )	2023-03-05 20:53:43 +02:00
HY. Kelvin Lee	72af0f5697	main : add csv header (#552 )	2023-03-02 18:32:16 +02:00
Georgi Gerganov	f254e78737	yt-wsp.sh : print help on empty args	2023-02-18 09:42:31 +02:00
conradg	69e6e4644a	main : fix std in input (#503 ) if we don't add this as an explicit check, then we get an "error: unknown argument: -" later on	2023-02-15 19:31:16 +02:00
Georgi Gerganov	09d7d2b68e	examples : refactor in order to reuse code and reduce duplication (#482 ) * examples : refactor common code into a library * examples : refactor common SDL code into a library * make : update Makefile to use common libs * common : fix MSVC M_PI .. * addon.node : link common lib	2023-02-15 19:28:10 +02:00
genevera (she/her)	459753342d	yt-wsp.sh : add unique filename generation (#495 ) Co-authored-by: genevera <genevera@noreply.users.github.com>	2023-02-14 20:12:51 +02:00
Qianhe Chen	ab1916fc59	ci : add node addon test and optimize compilation configuration (#468 ) * addon: implement node addon call whisper through cpp * addon: modify the license to MIT * addon: remove iostream * addon: rename dir * addon: fix typo * addon: configure cmake to build when cmake-js is used * ci: add addon.node test ci * addon: remove build WHISPER_BUILD_TESTS * addon: update build command * addon: add test * addon: add test file * addon: adapt to compile on Windows * addon: fix typo * addon: reuse jfk.wav Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * addon: reuse jfk.wav --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-02-05 15:02:08 +02:00
Matija Pevec	d012b5c7e4	whisper : add "split_on_word" flag when using using "max_len" option (#455 ) * Update whisper.cpp * fix: trim function * feat: added flag to split on word * fix: arguments for main	2023-02-05 14:44:23 +02:00
Georgi Gerganov	f3ee4a9673	whisper : reduce memory usage during inference (#431 ) * ggml : add "scratch" buffer support * ggml : support for scratch ring-buffer * ggml : bug fix in ggml_repeat() * ggml : error on scratch buffer overflow * whisper : use scratch buffers during inference (base model only) * whisper : update memory usage for all models * whisper : fix encoder memory usage * whisper : use whisper_context functions instead of macros * whisper : fix FF + remove it from README * ggml : reuse ggml_new_i32 * ggml : refactor the scratch buffer storage * whisper : reorder scratch buffers in the decoder * main : add option to disable temp fallback * Update README.md	2023-02-04 09:45:52 +02:00
Qianhe Chen	c306a7fd89	addon.node : using whisper as a Node.js addon (#443 ) * addon: implement node addon call whisper through cpp * addon: modify the license to MIT * addon: remove iostream * addon: rename dir * addon: fix typo * addon: configure cmake to build when cmake-js is used	2023-02-04 09:10:25 +02:00
Taisei Mima	86ef64a855	wasm : fix typo in helper.js (#459 )	2023-02-04 08:49:15 +02:00
Alex Bacart	3b1960520a	main : CSV format export trimmed spaces fix (#444 ) * Update main.cpp Removed string trimming * Update main.cpp * Update main.cpp * Revert "Update main.cpp" This reverts commit `d8924fdcfe`. * Revert "Update main.cpp" This reverts commit `252e508d85`.	2023-02-04 08:48:35 +02:00
Eric Tendian	47737b2e82	livestream.sh : run main with model arg instead of default (#453 ) Actually utilizes the $model var when calling ./main.	2023-01-27 01:13:31 +02:00
Georgi Gerganov	60337f5306	wasm : check if navigator.storage.estimate() is available Safari does not support it	2023-01-25 20:00:59 +02:00
Ondrej Kokes	11f61cecd6	whisper.wasm : add labels for easier radio selection (#435 )	2023-01-23 20:49:00 +02:00
Georgi Gerganov	f583e2d2f5	main : we had accidentally disabled the temperature fallback .. (#291 )	2023-01-18 22:51:41 +02:00
Georgi Gerganov	206fc93396	whisper.wasm : add small and small.en models	2023-01-18 21:58:55 +02:00
Chia-Hsiang Cheng	472a473fd1	main : add an option to accept optional output filenames (#424 ) * Add an option to accept optional output filenames * Format the file Co-authored-by: Chia-Hsiang Cheng <gary.chiahsiang.cheng@gmail.com>	2023-01-18 21:26:31 +02:00
Georgi Gerganov	9ba66c2fad	stream : fix handling of --step == --length (#416 )	2023-01-18 21:22:52 +02:00
Georgi Gerganov	1ccb8a46a5	bench : fix Windows linkage by moving ggml benches in whisper lib ..	2023-01-18 21:19:50 +02:00
Georgi Gerganov	1290fc6457	bench : add memcpy and ggml_mul_mat benchmarks	2023-01-18 20:31:46 +02:00
Digipom	49b529ba74	whisper.android : add support for loading directly from asset in C (#415 )	2023-01-16 21:57:35 +02:00
Georgi Gerganov	c9aeb33676	stream : fix --keep_context argument to be used correctly (#354 )	2023-01-16 19:37:40 +02:00
Georgi Gerganov	c3991bbb24	Update README.md	2023-01-15 14:08:12 +02:00
Georgi Gerganov	fafd78945d	bench.wasm : print system info	2023-01-15 11:34:03 +02:00
Georgi Gerganov	8de452c18b	Improve decoding (#291 ) * whisper : prepare infra for new decoding strategies * whisper : apply logit filters and compute logprobs * whisper : add whisper_get_logits() * whisper : separate self and cross attention memory Initial step needed for supporting parallel decoders * whisper : move probs_id buffer to whisper_context * whisper : refactor kv cache into separate struct * whisper : move self-attention kv cache to whisper_decoder * whisper : wip decoding parameters + strategies * whisper : wip decoding parameters + strategies (part 2) * whisper : wip decoding parameters + strategies (part 3) * whisper : wip decoding parameters + strategies (part 4) * whisper : fix prompt_past update to not include prompt_init * whisper : temperature + best_of support * whisper : support for compression_ration_threshold We actually use entropy, but it is similar * command : fix example to use logits instead of obsolete probs * whisper : handle empty sequence ranking * whisper : add WHISPER_DEBUG + diagnostic prints + new main args * whisper : minor fixes * whisper : add beam-search support * whisper : bug fix when there no previous context * whisper : add comments * stream : disable temperature fallback For real-time processing, we always want a single decoder running at T=0 * whisper.swiftui : update example - fix paths + add empty folders	2023-01-15 11:29:57 +02:00
Georgi Gerganov	a6dbd9188b	stream : fix a bug that inserted a lot of empty audio at the start The quality was terrible due to this	2023-01-14 19:20:47 +02:00
Syahmi Azhar	1512545149	whisper : add loader class to allow loading from buffer and others (#353 ) * whisper : add loader to allow loading from other than file * whisper : rename whisper_init to whisper_init_from_file * whisper : add whisper_init_from_buffer * android : Delete local.properties * android : load models directly from assets * whisper : adding <stddef.h> needed for size_t + code style Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-01-08 13:03:33 +02:00
Georgi Gerganov	52a3e0c92a	ggml : improve vec_dot_f16 unrolling in flash_attn_f16	2023-01-08 11:41:18 +02:00
Georgi Gerganov	d1ea1220ff	command : clean-up / refactoring / formatting (#383 )	2023-01-07 21:43:24 +02:00
David	9c4a1522f6	command : always-prompt mode (#383 )	2023-01-07 21:41:11 +02:00
Georgi Gerganov	87dd4a3081	talk.wasm : bump memory usage + update whisper.js	2023-01-06 21:13:44 +02:00
Georgi Gerganov	6b351bb669	command : add "guided-mode" video demo in the README.md	2023-01-06 18:59:26 +02:00
Georgi Gerganov	b3c865083e	ci : add emscripten build	2023-01-05 22:10:20 +02:00
Georgi Gerganov	a0d4f8e65c	main : make whisper_print_segment_callback() more readable (close #371 )	2023-01-05 21:45:05 +02:00
Georgi Gerganov	196d738974	minor : close #370 + Makefile build info print change	2023-01-05 21:35:45 +02:00
Andy Maloney	84c6b42e65	cmake : update to 3.19 (#351 ) - update from 3.0 (from 2014) to 3.19 (from 2020) - move some global setting onto the targets (through a cmake include)	2023-01-05 21:22:48 +02:00
Georgi Gerganov	a466c3404d	stream : fix data race on bool + avoid division-by-zero	2023-01-02 10:20:50 +02:00
Andy Maloney	f00509d57c	command : refactor to split command list & general transcription modes (#331 ) This makes it easier to understand if you're looking for only one of the capabilities.	2022-12-31 14:08:57 +02:00
Niels Mayer	a593b932e4	main : add -ocsv, aka --output-csv to output a CSV file Adds -ocsv, aka --output-csv feature to examples/main, which outputs a CSV file containing lines formatted as follows <startTime-in-integer-milliseconds>, <endTime-in-integer-milliseconds>, "<transcript-line-including-commas>".	2022-12-29 14:04:00 +02:00
Andy Maloney	331c0bbddc	examples : fix memory leak on failure to load gpt2 model (#323 )	2022-12-23 20:19:07 +02:00
Andy Maloney	dc90efd504	examples : small code cleanups (#322 ) - remove unnecessary initialization of string to "" - use empty() instead of checking size() - use emplace_back instead of push_back - use nullptr instead of NULL - remove unnecessary call to .data() on string - use character overload of find_first_of() instead of passing a string	2022-12-23 20:18:51 +02:00
Digipom	0f4227d9ee	examples : add whisper.swiftui demo app (#308 ) * Add SwiftUI demo project. * Add -DGGML_USE_ACCELERATE	2022-12-23 10:56:18 +02:00
Kevin Brothaler	91fc08c641	Build a vfpv4 library for armeabi-v7a and do runtime detection to select the right library	2022-12-22 16:47:54 +02:00
Kevin Brothaler	e1432dd91a	Check for both __ARM_NEON and __ARM_FEATURE_FMA so that the project can be compiled for armv7a. Android armeabi-v7a's NEON support doesn't support FMA unless configured with `-mfpu=neon-fp-armv8`, which would need runtime checks. * Also removed ABI filter from Android project.	2022-12-22 16:47:54 +02:00
Kevin Brothaler	22193cbfe8	Bump NDK version	2022-12-22 16:47:54 +02:00
Georgi Gerganov	90564f85f9	Update README.md	2022-12-19 22:09:21 +02:00
Georgi Gerganov	99da1e5cc8	cmake : enable and fix -Wall -Wextra -Wpedantic C++ warnings	2022-12-19 20:45:08 +02:00
Matheus de Sousa	8e3f129b4d	minor : resolves some of warnings when compiling with clang/clang++ (#294 ) * Resolves some of warnings when compiling with clang/clang++ Mostly nit stuff that clang catches when compiling with -Wall -Wextra -pedantic. - Fix comparison between sign/unsigned integers. - Passes a constant reference (const&) instead of copying each time. * minor : normalize coding style * minor : fix warning Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2022-12-19 20:19:01 +02:00
Georgi Gerganov	fba10a4c68	whisper : language auto-detect (#59 )	2022-12-17 18:49:44 +02:00
Georgi Gerganov	32fbc8cd04	main : add option to print the progress (#276 )	2022-12-16 20:20:43 +02:00
Georgi Gerganov	b8065d90f5	main : add "--prompt" command line argument (#90 ) This allows to provide an initial prompt to be used at the start of the processing.	2022-12-16 19:43:16 +02:00
Georgi Gerganov	4312995974	command : better indentation	2022-12-16 19:38:18 +02:00
Georgi Gerganov	5eeeb3412d	command : update README, show how to use guided mode	2022-12-16 19:38:18 +02:00
Georgi Gerganov	6a69e3ae27	command : adding guided mode	2022-12-16 19:38:18 +02:00
Georgi Gerganov	ea19ed33f1	Update README.md (#46 ) Add references to the new Android app	2022-12-16 19:28:51 +02:00
Digipom	675e787171	Add Android sample (#277 ) * Add Android sample * Use main project C files * Stop existing playback before starting new playback * Make text scrollable * Stop playback when starting to record * Remove extra var	2022-12-16 19:20:13 +02:00
Georgi Gerganov	a82d331034	stream : update README.md + comments	2022-12-16 18:04:19 +02:00
Georgi Gerganov	5a5c5ddcca	Update README.md	2022-12-15 20:38:08 +02:00
Georgi Gerganov	34e0b4b9ef	stream : fix build	2022-12-15 20:15:36 +02:00
Georgi Gerganov	b0f8013eb9	stream : add sliding window mode	2022-12-15 19:59:17 +02:00
Georgi Gerganov	a613f16aec	talk : improve prompting	2022-12-12 23:44:36 +02:00
Georgi Gerganov	f309f97df6	Node.js package (#260 ) * npm : preparing infra for node package * npm : package infra ready * npm : initial version ready * npm : change name to whisper.cpp whisper.js is taken	2022-12-12 20:17:27 +02:00
Georgi Gerganov	aa6adda26e	talk : make compatible with c++11 (part 2)	2022-12-11 20:34:04 +02:00
Georgi Gerganov	444349f4ec	talk : make compatible with c++11	2022-12-11 20:19:17 +02:00
Lexevolution	6ed786957e	Add newline per segment for text output (#254 )	2022-12-11 20:00:29 +02:00
Georgi Gerganov	fcf515de60	bench.wasm : same as "bench" but runs in the browser (#89 )	2022-12-11 11:09:10 +02:00
Georgi Gerganov	85c9ac18b5	Update README.md	2022-12-10 16:54:57 +02:00
Georgi Gerganov	b7c85d1ea6	talk : fix build for MSVC	2022-12-10 16:51:58 +02:00
Georgi Gerganov	3b1aacbe6d	talk : talk with AI in the terminal	2022-12-10 16:51:58 +02:00
Georgi Gerganov	56822621a8	twitch.sh : various fixes and polishing - check if streamlink is installed - fix audio chunking - change default threads to 4	2022-12-08 19:20:04 +02:00
keyehzy	9e5f3ddc16	Allow for Twitch.tv live transcription We rely on streamlink library to give us a stream, then we proceed similarly to the radio livestream example.	2022-12-08 19:20:04 +02:00
Georgi Gerganov	47afb93c3c	yt-wsp.sh : improve usage instructions	2022-12-07 22:12:08 +02:00
Georgi Gerganov	575c53dc41	yt-wsp.sh : fix usage instruction + comment	2022-12-07 21:12:55 +02:00
Georgi Gerganov	faa85f9840	livestream.sh : remove obsolete comment	2022-12-07 04:41:43 +02:00
Georgi Gerganov	9fe7306f4b	models : add the new "large" model release by OpenAI The old "large" model is now renamed "large-v1". If you have been using it, make sure to rename it and download the new "large" model for best results.	2022-12-06 18:48:57 +02:00
Georgi Gerganov	57e0e6b700	livestream : handle ffmpeg errors gracefully and stabilize transcript	2022-12-01 20:49:09 +02:00
Georgi Gerganov	4f7363077f	livestream : minor changes	2022-12-01 19:47:58 +02:00
semiformal-net	093c840dee	livestream : fix losing words across audio chunk (#195 ) * improve livestream script * Update examples/livestream.sh Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Paul Edwards <paul.edwards@semiformal.net> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2022-12-01 19:18:22 +02:00
Georgi Gerganov	4698dcdb52	whisper : add mechanism for aborting the whisper_full() computation	2022-11-27 20:42:45 +02:00
Georgi Gerganov	164df0d447	whisper.objc : fix context + broken readme links	2022-11-27 10:52:27 +02:00
Georgi Gerganov	e266cb0723	whisper.objc : add real-time processing (#97 ) Similar to the "stream" app	2022-11-26 18:32:46 +02:00
Georgi Gerganov	c207eed431	whisper.objc : fix build warnings	2022-11-26 16:27:04 +02:00
Georgi Gerganov	a425365b82	yt-wsp.sh : script to easily transcribe VODs Thanks to @DaniruKun ref: https://gist.github.com/DaniruKun/96f763ec1a037cc92fe1a059b643b818 Usage: cd whisper.cpp make ./examples/yt-wsp.sh <video-url>	2022-11-26 12:54:42 +02:00
Georgi Gerganov	68ecadbbc9	command.wasm : add voice assistant example for the Web (#171 ) Same as the command-line tool "command", but runs in the browser Also, added helper script "extra/deploy-wasm.sh" and fixed some timing constants for the WASM examples.	2022-11-26 11:40:06 +02:00
Georgi Gerganov	c536ff4005	minor : add comment for using "generate_karaoke.sh"	2022-11-26 10:22:42 +02:00
Georgi Gerganov	cb70b07db5	livestream.sh : simple tool to transcribe audio livestreams (#185 )	2022-11-26 10:05:37 +02:00
Georgi Gerganov	3c390ffe38	stream.wasm : add web-based real-time transcription (#112 )	2022-11-25 23:57:46 +02:00
Georgi Gerganov	be16dfa038	whisper.wasm : do not block page while processing (close #86 )	2022-11-25 23:07:42 +02:00
Georgi Gerganov	0f619b52ce	main : add stereo-channel-based diarization (#64 ) Not tested - I don't have stereo dialog audio	2022-11-25 22:08:58 +02:00
Georgi Gerganov	1246dd023e	command : add demonstration video	2022-11-25 20:23:58 +02:00
Georgi Gerganov	0be27bbd92	command : fix build + fix README + add bold printing	2022-11-25 19:53:50 +02:00
Georgi Gerganov	bc88eb13c6	examples : add "command" tool (#171 )	2022-11-25 19:36:57 +02:00
Georgi Gerganov	b8ce25dec1	refactoring : more readable code	2022-11-25 19:28:04 +02:00
Georgi Gerganov	e4805d9601	wasm : refactor wasm example + reuse fetch mechanism	2022-11-24 23:13:26 +02:00
Georgi Gerganov	ff36415a86	talk.wasm : update video link + some minor fixes	2022-11-24 20:15:24 +02:00
Georgi Gerganov	025ff465b6	Update README.md Use a less cringy video to demo talk.wasm lol	2022-11-24 20:09:45 +02:00
Georgi Gerganov	abce28ea99	talk.wasm : move to https://whisper.ggerganov.com/talk This way, we can share the same models across different WASM examples and not have to download them for each page	2022-11-24 18:24:06 +02:00
Georgi Gerganov	454b91de16	main : fix dangling pointer when using stdin for input (#65 )	2022-11-24 17:53:51 +02:00
Georgi Gerganov	d7024cf9dc	main, stream : remove --verbose flag (#178 )	2022-11-24 17:52:04 +02:00
Georgi Gerganov	37422ed733	talk.wasm : add audio pre-processing + bump memory	2022-11-24 00:34:00 +02:00
Georgi Gerganov	be3b720f96	talk.wasm : refactoring + update README.md	2022-11-24 00:08:57 +02:00
Georgi Gerganov	49706a658a	minor : updates few prints + fix buttons in whisper.wasm	2022-11-23 17:19:21 +02:00
Georgi Gerganov	e5dcdabbb8	unicode : fix character replacement (thanks to @tamo)	2022-11-23 08:24:29 +02:00
Georgi Gerganov	dad109c3f1	close #109 : add fetching of the model over HTTP (whisper.wasm)	2022-11-22 22:48:56 +02:00
Georgi Gerganov	326573de9a	talk.wasm : final touches	2022-11-22 22:22:17 +02:00
Georgi Gerganov	9aea96f774	talk.wasm : polishing + adding many AI personalities	2022-11-22 20:10:20 +02:00
Georgi Gerganov	385236d1d3	stream : "-kc" now enables context keeping from previous segment (#90 ) By default, the context keeping is disabled	2022-11-22 18:21:15 +02:00
M. Eren Akbiyik	63ae03b8e0	Prompt previous tokens for streaming (#163 ) * feat: prompt previous tokens for streaming I used a vector pointer instead of vector itself because it gave weird errors, and why not * convert vector to use with C api * feat: remove old refs, check for prompt size * feat: use better way of getting the pointer	2022-11-22 18:10:35 +02:00
Georgi Gerganov	78116f8eda	talk.wasm : update README.md	2022-11-21 22:42:29 +02:00
Georgi Gerganov	a4dfbeecf9	talk.wasm : GPT-2 meets Whisper in WebAssembly (#155 ) * talk : initial real-time transcription in the browser * talk : polishing the UI * talk : ready for beta testing * talk.wasm : rename example	2022-11-21 22:20:42 +02:00
Georgi Gerganov	f2df9bd768	stream : add "max_tokens" cli arg Controls the max tokens per segment for the stream example	2022-11-20 21:22:41 +02:00
Georgi Gerganov	fb8d77f760	stream : add "audio_ctx" parameter Used to overwrite the audio context size of the Encoder. For example, setting "audio_ctx = 512" will make it run about 3 times faster, processing about 10s of audio, instead of 30s. The transcription quality drops, but this can be used for real-time streaming purposes where performance is important.	2022-11-20 21:22:41 +02:00
Georgi Gerganov	62b5ff875c	stream : add "max_tokens" parameter Used to limit the number of tokens in a segment. Useful to battle with word repetition when using partial encoder context	2022-11-20 21:22:41 +02:00
Georgi Gerganov	d351771a4b	stream : add "single_segment" option Force the entire audio chunk to be transcribed into a single segment	2022-11-20 21:22:41 +02:00
Georgi Gerganov	c058aaf22e	stream : partial encoder experiments	2022-11-20 21:22:41 +02:00
Georgi Gerganov	83c742f1a7	whisper : add option to speed up the audio tempo by x2 Using a Phase Vocoder for speeding up the audio tempo by scaling down the frequencies in the frequency domain. This reduces the computation in the Encoder by a factor of 2. The transcription accuracy is degraded, but for slow to normal speech - it seems to be still very good. I think this can find application for real-time transcription - i.e. the "stream" example.	2022-11-13 16:25:43 +02:00
Alan	7519eabf65	Adds support for stdin wav input	2022-11-09 20:37:23 +02:00
Georgi Gerganov	c30bffc8a5	ref #22 : add "duration" option Can be used to partially process a recording	2022-11-07 20:14:52 +02:00
Georgi Gerganov	c71363f14c	examples : add simple script for generating Karaoke video	2022-11-06 09:22:50 +02:00
Georgi Gerganov	d42cf6d0df	Update README.md	2022-11-04 22:26:08 +02:00
Georgi Gerganov	ef47d77492	main : fix generated bash script	2022-11-04 18:30:38 +02:00
Georgi Gerganov	d5afebd37c	whisper : token-level timestamp refactoring (#49 , #120 ) This turned out pretty good overall. The algorithm has been moved from main.cpp to whisper.cpp and can be reused for all subtitles types. This means that now you can specify the maximum length of the generated lines. Simply provide the "-ml" argument specifying the max length in number of characters	2022-11-02 21:45:54 +02:00
Georgi Gerganov	6fb98370ba	main : add some comments for the word-level timestamp algorithm	2022-11-01 22:35:21 +02:00
Georgi Gerganov	0729da9a3b	main : fix some edge cases for word-level timestamps	2022-11-01 22:09:25 +02:00
Georgi Gerganov	5dc74e3aff	Update README.md	2022-10-31 22:06:05 +02:00
Georgi Gerganov	ac8ef34039	Update README.md	2022-10-31 20:19:41 +02:00
Georgi Gerganov	dc12994603	Update README.md	2022-10-30 17:11:37 +02:00
Georgi Gerganov	57fb46f307	main : add option for word-leve timestamps (very experimental)	2022-10-30 17:06:57 +02:00

... 2 3 4 5 6 ...

377 Commits