whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-05-11 04:52:50 +00:00

Author	SHA1	Message	Date
ulatekh	671b4bde6c	main : allow a response-file as the sole parameter (#2019 ) * The "main" example now allows a response-file as the sole parameter. A response-file is a text file with command-line parameters, one per line. Prefix the name of the response-file with "@" to identify it as such. It's used under MS Windows to work around command-line length limits. It may be useful under other platforms to simplify character-escaping. * minor : style --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-09 18:31:16 +03:00
ulatekh	c8eeb93a6a	whisper : suppress tokens with a regex (#1997 ) * Allow a regular expression to describe tokens to suppress. Example: --suppress-tokens-re "[,\.]\|[ ]?[0-9]+" will suppress commas, periods, and numeric tokens. Technique inspired by https://github.com/openai/whisper/discussions/1041 Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Blind change to fix Java test. --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-09 18:27:28 +03:00
ulatekh	319fe5146e	cmake : create solution folders (#2004 ) * Create solution folders in the CMake build. * Fixed non-SDL2 build. * Fixed emscripten build.	2024-04-09 18:23:33 +03:00
Georgi Gerganov	81a3c41aa0	talk-llama : sync llama.cpp	2024-04-07 16:21:08 +03:00
ulatekh	fc366b807a	main : add command-style grammar (#1998 ) * Implemented command-style grammar in the main example. Mostly just copied the relevant parts from the command example. * main : code style --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-28 12:02:10 +02:00
Georgi Gerganov	9fb308d90f	make : add grammar parser to common objects	2024-03-28 11:59:48 +02:00
Georgi Gerganov	2948c740a2	sync : ggml (#2001 ) * sync : update scripts * sync : ggml * talk-llama : sync llama.cpp * make : WHISPER_CUBLAS -> WHISPER_CUDA * ci : try to fix sycl build * talk-llama : fix make build	2024-03-27 18:55:10 +02:00
Georgi Gerganov	1558ec5a16	whisper : improve handling of prompts (#1981 ) * whisper : improve handling of prompts * whisper : add whisper_token_count helper	2024-03-25 14:48:19 +02:00
Mohammadreza Hendiani	04e48094e4	readme : add Fedora dependencies (#1970 ) * README.md fix documentaion and added fedora liunx dependencies for stream build * fix documentaion and added fedora liunx dependencies for command build * fix documentaion and added fedora liunx dependencies for talk build * fix documentaion and added fedora liunx dependencies for talk-llama build * reverted back mistakenly removed MacOS documentaion	2024-03-20 18:42:11 +02:00
denersc	741abb162c	whisper : token-level timestamps with DTW (#1485 ) * whisper.cpp: impl dtw algo * WIP: producing and placing DTW timestamps on tokens * Fix compile and assertion errors. Attempt to DTW timestamp with single_segment=false. * Fix mistake causing incorrect alignment of dtw timestamps * implement N_TOP_MOST and CUSTOM alignment heads setting * whisper: fix typo on alignment heads enum * Fix issues related to changes in whisper.cpp * Fixed excessive memory use when using DTW timestamps. Other minor fixes to DTW timestamping function * decoder: save cross QKs only if requested * Calling median filter with ggml_map_custom1 * Reimpl aheads n_top_most and custom. Sanity checks on chosen aheads * Copying cross QKs from decoder backend correctly * dtw: cleanup * Fix incorrect n_frames passed to dtw when near end of audio * Fix aheads_masks_init for backend != CPU * whisper : minor style * main : add dtw (wip) * whisper: fix invalid memory access in aheads_masks_init * main : add dtw (cont) * whisper : minor --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-20 18:25:26 +02:00
Jo Liss	e7794a868f	examples : rename --audio-context to --audio-ctx per help text (#1953 )	2024-03-18 17:53:33 +02:00
Georgi Gerganov	de4d067f1e	talk-llama : sync llama.cpp	2024-03-15 14:21:59 +02:00
slaren	f60ccfd83b	update examples and tests	2024-03-15 14:01:14 +02:00
Georgi Gerganov	2f5a5a66dd	talk-llama : use llama_decode instead of llama_eval	2024-03-08 12:04:43 +02:00
Georgi Gerganov	8e409d1113	talk-llama : sync llama.cpp	2024-03-08 11:55:50 +02:00
Georgi Gerganov	05d1b61af4	talk-llama : sync llama.cpp	2024-03-08 11:52:47 +02:00
F1L1P	2e2626b167	examples : Auto lowercase language parameter in main.cpp (#1928 ) * Auto lowercase language parameter * Update examples/main/main.cpp Co-authored-by: bobqianic <129547291+bobqianic@users.noreply.github.com> --------- Co-authored-by: bobqianic <129547291+bobqianic@users.noreply.github.com>	2024-03-06 22:25:10 +00:00
zhouwg	c0c0ae2dea	examples : fix typo in bench.cpp (#1933 )	2024-03-06 22:21:44 +00:00
zhouwg	f22d27a385	whisper.android.java : fix returns in JNI (#1929 )	2024-03-05 15:59:26 +02:00
Georgi Gerganov	25d313b38b	talk-llama : sync llama.cpp	2024-02-28 13:04:05 +02:00
Georgi Gerganov	1711bb3881	sync : llama.cpp (ggml/0)	2024-02-28 13:00:30 +02:00
Andrew S	0d8fd8483a	stream.wasm : fix invalid memory access when no segments (#1902 ) No segments may be returned when a smaller sample buffer (EG 2048 samples) is sent to the worker.	2024-02-26 10:12:35 +02:00
Georgi Gerganov	3170841ed9	talk-llama : sync llama.cpp	2024-02-25 20:00:10 +02:00
Georgi Gerganov	578e47e70c	sync : llama.cpp (ggml/0)	2024-02-25 19:58:46 +02:00
Tamotsu Takahashi	f18738f247	talk, talk-llama : pass text_to_speak as a file (#1865 ) * talk-llama: pass file instead of arg it is too hard to quote text in a portable way * talk-llama: pass heard_ok as a file * talk-llama: let eleven-labs.py accept options Options: -v voice, -s savefile, -p (--play) * talk-llama: check installed commands in "speak" Pass "-q" to eleven-labs.py to skip checking whether elevenlabs is installed * talk-llama: pass voice_id again in order to sync talk with talk-llama * talk: sync with talk-llama Passing text_to_speak as a file is safer and more portable cf. https://stackoverflow.com/a/59036879/45375 * talk and talk-llama: get all installed voices in speak.ps1 * talk and talk-llama: get voices from api * talk and talk-llama: add more options to eleven-labs.py and remove DEFAULT_VOICE because it is deprecated (https://www.reddit.com/r/ElevenLabs/comments/1830abt/what_happened_to_bella/) ``` usage: eleven-labs.py [-q] [-l] [-h] [-n NAME \| -v NUMBER] [-f KEY=VAL] [-s FILE \| -p] [TEXTFILE] options: -q, --quick skip checking the required library action: TEXTFILE read the text file (default: stdin) -l, --list show the list of voices and exit -h, --help show this help and exit voice selection: -n NAME, --name NAME get a voice object by name (default: Arnold) -v NUMBER, --voice NUMBER get a voice object by number (see --list) -f KEY=VAL, --filter KEY=VAL filter voices by labels (default: "use case=narration") this option can be used multiple times filtering will be disabled if the first -f has no "=" (e.g. -f "any") output: -s FILE, --save FILE save the TTS to a file (default: audio.mp3) -p, --play play the TTS with ffplay ``` * examples: add speak_with_file() as suggested in the review * talk and talk-llama: ignore to_speak.txt	2024-02-24 09:24:47 +02:00
Abhilash Majumder	a0ddd8392c	whisper : add SYCL support (#1863 ) * add changes from llama upstream * add sycl abstraction * add sycl build * update cmake * add sycl build config * fix bug * fix bug * refactor build * fix bug * update build * call build * use sycl header * add examples * add target * fix typecast in quant.c * readd fp16 and readme * fix quant typecast * add sample * add readme * remove cxx file check	2024-02-23 09:22:24 +02:00
Georgi Gerganov	a2506909b1	talk-llama : sync llama.cpp	2024-02-22 23:30:53 +02:00
Georgi Gerganov	5fdb27ff80	ggml : 32-bit arm compat (#1891 ) * ggml : 32-bit arm compat * ggml : add ggml_vqtbl1q_s8 impl * ggml : cont	2024-02-22 18:31:40 +02:00
Georgi Gerganov	ce411498f6	sync : llama.cpp (ggml/0) ggml-ci	2024-02-22 15:12:36 +02:00
Davidson Francis	c56344b509	main : fix file existence check in main.cpp (#1889 ) In commit dda4b0e of PR #1872, I've introduced a check for the existence of files before loading the model. However, I haven't considered the case where whisper.cpp might read from stdin as well, and in such cases, the checks should ignore the "-" argument as it does not represent a regular file. Additionally, this commit removes the usage of 'stat()' in favor of the recently introduced function 'is_file_exist()' in common.cpp from PR #1871. Apologies for the bug introduced in the previous PR and any inconvenience it may have caused.	2024-02-22 15:01:08 +02:00
Georgi Gerganov	59119f4f20	talk-llama : sync llama.cpp	2024-02-20 12:09:57 +02:00
Georgi Gerganov	83afebe872	common : add IQ1_S (ggml/0) ggml-ci	2024-02-19 15:53:25 +02:00
Davidson Francis	dda4b0ed06	main : check if input files exist before proceeding (#1872 ) Until the most recent commit (3d42463), the main.cpp sample file does not check whether the input files exist or not. Consequently, the model is loaded first before reporting whether there was a failure or not when processing a file. In environments with HDD, this can take about 50 seconds or more, depending on the loaded model. This commit addresses this issue by checking in advance whether the input files exist or not.	2024-02-19 10:51:26 +02:00
Felix	07d04280be	examples : clean up common code (#1871 ) move some utility functions into common.h	2024-02-19 10:50:15 +02:00
Georgi Gerganov	551529290d	talk-llama : sync llama.cpp	2024-02-12 10:39:58 +02:00
dscripka	a6fb6ab597	examples : added audio_ctx argument to main and server (#1857 ) * added audio_ctx argument to main and server examples * Better default value Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * better default value (again) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-02-12 09:19:07 +02:00
Georgi Gerganov	f273e66dc6	examples : initialize context params properly (#1852 )	2024-02-11 16:39:12 +02:00
Georgi Gerganov	02b4c52c12	talk-llama : sync llama.cpp	2024-02-10 10:10:59 +02:00
Valentin Gosu	80e8a2ea39	server : allow CORS request with authorization headers (#1850 ) Whisper plugin in Obsidian requires an API key which is then sent as an authorization header. However, the presence of an authorization header requires a CORS Preflight, so both the OPTIONS method and the Access-Control-Allow-Headers: authorization must be handled.	2024-02-09 17:42:41 +02:00
Neuman Vong	19f8048139	whisper.android : how to build with CLBlast (#1809 ) * FetchContent * OpenCL * Documentation and make optional * Specify GGML build options in build.gradle * Use gradle properties * @ggerganov Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * @gpokat --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-02-09 17:39:05 +02:00
Georgi Gerganov	434b8f3b96	talk-llama : stream response (#1121 )	2024-02-06 19:56:12 +02:00
Georgi Gerganov	7a74e929c8	sync : ggml (#0 )	2024-01-30 21:30:26 +02:00
JacobLinCool	ae5c4f7340	common : fix wav buffer detection (#1819 )	2024-01-30 19:35:08 +02:00
JacobLinCool	baa30bacdb	server : add fields to `verbose_json` response (#1802 ) * server: include additional fields in the verbose_json response as OpenAI does * server: show request examples on home page * server: todo note for compression_ratio and no_speech_prob * server: add simple demo form to the homepage	2024-01-30 14:15:55 +02:00
Georgi Gerganov	e72e4158de	talk-llama : sync llama.cpp	2024-01-28 19:44:10 +02:00
Georgi Gerganov	52cce82493	common : fix input buffer check (#1812 )	2024-01-27 17:33:09 +02:00
Georgi Gerganov	ef3c9ed9eb	talk-llama : sync llama.cpp	2024-01-27 17:24:53 +02:00
Michael Rienstra	4bbb60efce	docs : make model options / model install methods clearer (#1806 ) * Make models more "discoverable" * Clean up code block language identifiers * make 3 options clearer * undo Prettier formatter change * docs: `$` shell prompt, consistently * docs: minor changes	2024-01-26 17:39:54 +02:00
Neuman Vong	d6b9be21d7	whisper.android : return output from benchmarks (#1785 ) Benchmarks are failing because JNI expects a jstring and the benchmarks are missing a return statement (i.e., returning null). The functions actually build a jstring but don't return it, so this seems to have been an oversight. This patch returns the jstring and now the benchmarks run successfully. Fixes #1783.	2024-01-19 16:17:38 +02:00
Ryan Hitchman	c0329acde8	server : implement "verbose_json" format with token details (#1781 ) * examples/server: implement "verbose_json" format with token details. This is intended to mirror the format of openai's Python whisper.transcribe() return values. * server: don't write WAV to a temporary file if not converting * server: use std::lock_guard instead of manual lock/unlock	2024-01-18 22:58:42 +02:00

1 2 3 4 5 ...

356 Commits