whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-06-13 04:28:07 +00:00

Author	SHA1	Message	Date
Georgi Gerganov	9c61f5f585	release : v1.4.1 v1.4.1	2023-04-30 22:57:42 +03:00
Georgi Gerganov	c94c469592	whisper : fix quantize bug (#842 ) * whisper : debug * whisper : fix bug during quantization	2023-04-30 22:50:04 +03:00
Georgi Gerganov	feac80dd3f	ggml : fix UB (int << 31)	2023-04-30 22:27:30 +03:00
Georgi Gerganov	fa8dbdc888	release : v1.4.0 v1.4.0	2023-04-30 19:23:37 +03:00
Georgi Gerganov	4a7d49af95	examples : fix + refactor Levenshtein distance	2023-04-30 19:12:49 +03:00
Georgi Gerganov	794b162a46	whisper : add integer quantization support (#540 ) * whisper : add integer quantization support * examples : add common-ggml + prepare to add "quantize" tool * whisper : quantization tool ready * whisper : fix F32 support * whisper : try to fix shared lib linkage * wasm : update quantized models to Q5 * bench.wasm : remove "medium" button * bench.wasm : fix custom model button * ggml : add Q5_0 and Q5_1 WASM SIMD * wasm : add quantized models to all WASM examples * wasm : bump DB version number to 2 * talk-llama : update example to latest llama.cpp * node : increase test timeout to 10s * readme : add information for model quantization * wasm : add links to other examples	2023-04-30 18:51:57 +03:00
Georgi Gerganov	5fd1bdd7fc	whisper : add GPU support via cuBLAS (#834 ) * make : add WHISPER_CUBLAS * make : fix CUBLAS build * whisper : disable Flash Attention + adjust memory buffers * whisper : remove old commented code * readme : add cuBLAS instructions * cmake : add WHISPER_CUBLAS option * gitignore : ignore build-cublas	2023-04-30 12:14:33 +03:00
Georgi Gerganov	0ccd6746c9	ggml : fix WASM build	2023-04-29 21:37:23 +03:00
Georgi Gerganov	d9b550c0a1	ggml : fix 32-bit ARM NEON (#836 ) * ggml : add support for 32-bit ARM * ggml : fix * ggml : fix	2023-04-29 21:33:33 +03:00
Georgi Gerganov	e9b091c92a	ggml : use vzip instead of vuzp for consistency	2023-04-29 21:14:09 +03:00
Georgi Gerganov	1f30b99208	ggml : fix WASM build	2023-04-29 20:21:25 +03:00
Georgi Gerganov	05c3ea3bc8	ggml : sync with ggml repo (warning fixes + asserts)	2023-04-29 19:33:28 +03:00
Thijs Raymakers	6108d3cc58	whisper : use correct seek_end when offset is used (#833 ) Whenever an `offset_ms` is provided, the value of `seek_end` is calculated incorrectly. This causes Whisper to keep transcribing after the end of the file. The current behavior looks like ``` [00:34:40.000 --> 00:34:47.000] This is an example audio file. [00:34:47.000 --> 00:34:49.000] The text has been redacted [00:34:49.000 --> 00:34:51.000] This is the end of the audio. [00:34:51.000 --> 00:34:52.000] * [00:34:52.000 --> 00:34:53.000] * [00:34:53.000 --> 00:34:54.000] * [00:34:55.000 --> 00:34:56.000] * ... ``` The expected behavior should be ``` [00:34:40.000 --> 00:34:47.000] This is an example audio file. [00:34:47.000 --> 00:34:49.000] The text has been redacted [00:34:49.000 --> 00:34:51.000] This is the end of the audio. - end of program - ``` This commit changes the calculation of the `seek_end` variable to only add `seek_start` if a custom `duration_ms` is provided. Otherwise, it defaults to the end of the file. Signed-off-by: Thijs Raymakers <thijs@raymakers.nl>	2023-04-29 18:55:37 +03:00
Georgi Gerganov	bab97c83d0	tests : add "threads" to run-tests.sh	2023-04-29 12:32:28 +03:00
Georgi Gerganov	3eaeb030ff	extra : add sync-ggml.sh script	2023-04-29 12:32:28 +03:00
Georgi Gerganov	acec73ab6e	ggml : sync latest ggml + llama.cpp updates (quantization)	2023-04-29 12:32:28 +03:00
Zollner	5cc17418c7	whisper.android : add some tips (#816 )	2023-04-29 11:00:20 +03:00
Georgi Gerganov	3efb81dec6	build : add WHISPER_COREML_ALLOW_FALLBACK to make / CMake (#812 )	2023-04-29 10:55:24 +03:00
Canis Lupus	94a7cd2a07	whisper : allow non-CoreML fallback when Core ML cannot be loaded (#812 ) if the Core ML model cannot be loaded, continue without Core ML instead of returning. This allows a single build to transcribe using Core ML models where available, and regular models when not.	2023-04-29 10:49:02 +03:00
Georgi Gerganov	3e82ff4747	whisper : fix bug from previous commit	2023-04-29 10:42:14 +03:00
Georgi Gerganov	b5bd2f43c5	whisper : avoid designated initializers	2023-04-29 10:36:50 +03:00
AsukaMinato	94aa56f19e	minor : improve C++ and Python style (#768 ) * use some STL functions * use self.field than setattr, use pathlib.Path * recover some format * const some iter * Keep the original * 2 space	2023-04-29 10:06:25 +03:00
Georgi Gerganov	4d89ee2e59	readme : add logo	2023-04-28 22:41:29 +03:00
Laytan Laats	70567eff23	main : escape quotes in csv output (#815 )	2023-04-23 19:01:59 +03:00
Taras Glek	02ec83c5d5	stream : flush upon finishing inference (#811 )	2023-04-23 17:00:30 +03:00
Philipp Zabel	2bd4b8d577	examples : add missing #include <cstdint> (#798 ) common.cpp uses uint8_t and uint64_t, which are defined in <cstdint>.	2023-04-23 16:52:52 +03:00
Tauseef Mohiuddin	eecf2c3d41	main : update escape_double_quotes() function (#776 ) Updated the escape_double_quotes() function such that the function now escapes both double quotes and backslashes in the input string. Changes Made: - Renamed the function to escape_quotes_and_backslashes - Modified the condition in the first loop to increment the value of 'escaped_length' for both double quotes and backslashes. - Modified the condition in second loop to add a backslash before the current character if it is a double quote or a backslash. Resolves: #769	2023-04-23 16:47:30 +03:00
Georgi Gerganov	c23588cc4b	release : v1.3.0 v1.3.0	2023-04-15 17:30:44 +03:00
Georgi Gerganov	5108b30e6d	whisper : pad audio instead of spectrogram (#579 ) Also, fallback only if more temperatures are available and if we are at least 3 seconds before the end of the audio	2023-04-15 17:19:19 +03:00
Georgi Gerganov	f19e23fbd1	whisper : restore decoder temperature fallbacks I disabled this because there were many complaints about slow decoding. The current implementation does not allow batching the decoders when using the "best of" or "beam size" parameters, so the decoding time is proportional to the number of decoders, which is obviously not great. However, now there are even more complaints about wrong decodings and repetition. So, making a compromise by re-enabling the fallbacks, but defaulting to just 2 "best of" / "beam size" decoders. Also, the temperature step is increased from 0.2 to 0.4 - i.e. from maximum of 5 fallbacks to maximum of 2. Also, the stream example now has fallbacks enabled by default. close #471 #477 #508 #612 #719 #731	2023-04-15 16:12:55 +03:00
Jhen-Jie Hong	ea1f8a50d4	ggml, ci : fix build on whisper.android (ARM_NEON) + add CI (#764 ) * ggml : fix undefined symbol by remove inline handle * ggml : make own ggml_aligned_malloc function * ci: add ios/android build	2023-04-15 14:21:58 +03:00
Georgi Gerganov	3dead611bb	whisper : slightly faster Log Mel computation + n-1 FFT threads (#568 )	2023-04-15 14:18:46 +03:00
Georgi Gerganov	355da83690	readme : fix link	2023-04-15 13:30:36 +03:00
Georgi Gerganov	3e5c49e59a	readme : add usage instructions for Core ML	2023-04-15 13:30:07 +03:00
Georgi Gerganov	5e47e223bd	whisper : add Core ML support (#566 ) * coreml : use Core ML encoder inference * coreml : simlpify whisper_encode + log messages * whisper : resolve rebase conflicts * coreml : add scripts for CoreML model generation * bench-all : recognize COREML flag	2023-04-15 13:21:27 +03:00
Maximiliano Levi	794ff3074a	whisper : do not launch log_mel threads when n_thread is 1 (#763 )	2023-04-14 22:35:34 +03:00
AfryMask	7e2afa4384	whisper : fix the bug related to word splitting errors in the "tokenize" function. (#760 ) Co-authored-by: AfryMask <afrymask@gmail.com>	2023-04-14 20:35:03 +03:00
Aaron Taylor	1c5edc3cb3	readme : add SwiftWhisper to listed bindings (#755 )	2023-04-14 20:24:00 +03:00
Georgi Gerganov	34b772727d	gitignore : add .test	2023-04-14 20:13:47 +03:00
Bader-eddine Ouaich	2c856fb9e5	whisper : fix potential memory leaks (#740 ) * fix potential memory leak if whisper_init_state failed * fix potential memory leak if gpt2_init failed	2023-04-14 20:05:56 +03:00
Anton Kostin	7727a40dc9	license : update year (#739 )	2023-04-14 20:04:42 +03:00
GitAritron	b5639ed313	whisper : fix typos in whisper.h (#737 ) Fixed a couple of typos (in comments, so nothing major). Keep up the great work 😄	2023-04-14 20:03:16 +03:00
Ali Alameh	2c4ac2627d	stream : support language auto-detect (#501 ) #445 fix Language auto-detect "auto" flag does not work using the stream tool	2023-04-14 20:02:18 +03:00
Alex Evgrashin	674a8e579b	readme : add unity bindings (#733 )	2023-04-14 19:59:44 +03:00
DGdev91	001083a769	talk, talk-llama : add basic example script for eleven-labs tts (#728 )	2023-04-14 19:53:58 +03:00
Ivan Gorin	62b51c3070	models : change convert-pt-to-ggml to use .tiktoken tokenizer files (#725 )	2023-04-14 19:50:39 +03:00
LittleLoli	61128870b8	cmake : add msvc compiler args /utf-8 fix error C3688 (#721 ) * force msvc compiler use utf-8 encode * only enable on msvc	2023-04-14 19:36:38 +03:00
Maciek	78548dc03f	talk-llama : correct default speak.sh path (#720 ) There is `speak.sh` file in `./examples/talk-llama` as described in README. However `./examples/talk/speak.sh` is used in `talk-llama.cpp`, this commit corrects that.	2023-04-14 19:36:09 +03:00
LittleLoli	66110dafcc	main : add lrc output support (#718 ) * add lrc output support. * fix wrong comment	2023-04-14 19:35:33 +03:00
Sam	b73a4638ac	readme : make the quick start instructions clearer. (#716 ) Users wanting to make use of this implementation of the whisper model with no prior knowledge of C/C++ may download the Whisper model but fail to use of the "make" command as specified given that they forgot or didn't know they needed to clone the repository first. Hope this modification clears things up.	2023-04-14 19:33:06 +03:00

... 3 4 5 6 7 ...

775 Commits