whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2024-12-24 22:56:42 +00:00

Author	SHA1	Message	Date
ulatekh	c8eeb93a6a	whisper : suppress tokens with a regex (#1997 ) * Allow a regular expression to describe tokens to suppress. Example: --suppress-tokens-re "[,\.]\|[ ]?[0-9]+" will suppress commas, periods, and numeric tokens. Technique inspired by https://github.com/openai/whisper/discussions/1041 Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Blind change to fix Java test. --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-09 18:27:28 +03:00
ulatekh	319fe5146e	cmake : create solution folders (#2004 ) * Create solution folders in the CMake build. * Fixed non-SDL2 build. * Fixed emscripten build.	2024-04-09 18:23:33 +03:00
Georgi Gerganov	81a3c41aa0	talk-llama : sync llama.cpp	2024-04-07 16:21:08 +03:00
ulatekh	fc366b807a	main : add command-style grammar (#1998 ) * Implemented command-style grammar in the main example. Mostly just copied the relevant parts from the command example. * main : code style --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-28 12:02:10 +02:00
Georgi Gerganov	9fb308d90f	make : add grammar parser to common objects	2024-03-28 11:59:48 +02:00
Georgi Gerganov	2948c740a2	sync : ggml (#2001 ) * sync : update scripts * sync : ggml * talk-llama : sync llama.cpp * make : WHISPER_CUBLAS -> WHISPER_CUDA * ci : try to fix sycl build * talk-llama : fix make build	2024-03-27 18:55:10 +02:00
Georgi Gerganov	1558ec5a16	whisper : improve handling of prompts (#1981 ) * whisper : improve handling of prompts * whisper : add whisper_token_count helper	2024-03-25 14:48:19 +02:00
Mohammadreza Hendiani	04e48094e4	readme : add Fedora dependencies (#1970 ) * README.md fix documentaion and added fedora liunx dependencies for stream build * fix documentaion and added fedora liunx dependencies for command build * fix documentaion and added fedora liunx dependencies for talk build * fix documentaion and added fedora liunx dependencies for talk-llama build * reverted back mistakenly removed MacOS documentaion	2024-03-20 18:42:11 +02:00
denersc	741abb162c	whisper : token-level timestamps with DTW (#1485 ) * whisper.cpp: impl dtw algo * WIP: producing and placing DTW timestamps on tokens * Fix compile and assertion errors. Attempt to DTW timestamp with single_segment=false. * Fix mistake causing incorrect alignment of dtw timestamps * implement N_TOP_MOST and CUSTOM alignment heads setting * whisper: fix typo on alignment heads enum * Fix issues related to changes in whisper.cpp * Fixed excessive memory use when using DTW timestamps. Other minor fixes to DTW timestamping function * decoder: save cross QKs only if requested * Calling median filter with ggml_map_custom1 * Reimpl aheads n_top_most and custom. Sanity checks on chosen aheads * Copying cross QKs from decoder backend correctly * dtw: cleanup * Fix incorrect n_frames passed to dtw when near end of audio * Fix aheads_masks_init for backend != CPU * whisper : minor style * main : add dtw (wip) * whisper: fix invalid memory access in aheads_masks_init * main : add dtw (cont) * whisper : minor --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-20 18:25:26 +02:00
Jo Liss	e7794a868f	examples : rename --audio-context to --audio-ctx per help text (#1953 )	2024-03-18 17:53:33 +02:00
Georgi Gerganov	de4d067f1e	talk-llama : sync llama.cpp	2024-03-15 14:21:59 +02:00
slaren	f60ccfd83b	update examples and tests	2024-03-15 14:01:14 +02:00
Georgi Gerganov	2f5a5a66dd	talk-llama : use llama_decode instead of llama_eval	2024-03-08 12:04:43 +02:00
Georgi Gerganov	8e409d1113	talk-llama : sync llama.cpp	2024-03-08 11:55:50 +02:00
Georgi Gerganov	05d1b61af4	talk-llama : sync llama.cpp	2024-03-08 11:52:47 +02:00
F1L1P	2e2626b167	examples : Auto lowercase language parameter in main.cpp (#1928 ) * Auto lowercase language parameter * Update examples/main/main.cpp Co-authored-by: bobqianic <129547291+bobqianic@users.noreply.github.com> --------- Co-authored-by: bobqianic <129547291+bobqianic@users.noreply.github.com>	2024-03-06 22:25:10 +00:00
zhouwg	c0c0ae2dea	examples : fix typo in bench.cpp (#1933 )	2024-03-06 22:21:44 +00:00
zhouwg	f22d27a385	whisper.android.java : fix returns in JNI (#1929 )	2024-03-05 15:59:26 +02:00
Georgi Gerganov	25d313b38b	talk-llama : sync llama.cpp	2024-02-28 13:04:05 +02:00
Georgi Gerganov	1711bb3881	sync : llama.cpp (ggml/0)	2024-02-28 13:00:30 +02:00
Andrew S	0d8fd8483a	stream.wasm : fix invalid memory access when no segments (#1902 ) No segments may be returned when a smaller sample buffer (EG 2048 samples) is sent to the worker.	2024-02-26 10:12:35 +02:00
Georgi Gerganov	3170841ed9	talk-llama : sync llama.cpp	2024-02-25 20:00:10 +02:00
Georgi Gerganov	578e47e70c	sync : llama.cpp (ggml/0)	2024-02-25 19:58:46 +02:00
Tamotsu Takahashi	f18738f247	talk, talk-llama : pass text_to_speak as a file (#1865 ) * talk-llama: pass file instead of arg it is too hard to quote text in a portable way * talk-llama: pass heard_ok as a file * talk-llama: let eleven-labs.py accept options Options: -v voice, -s savefile, -p (--play) * talk-llama: check installed commands in "speak" Pass "-q" to eleven-labs.py to skip checking whether elevenlabs is installed * talk-llama: pass voice_id again in order to sync talk with talk-llama * talk: sync with talk-llama Passing text_to_speak as a file is safer and more portable cf. https://stackoverflow.com/a/59036879/45375 * talk and talk-llama: get all installed voices in speak.ps1 * talk and talk-llama: get voices from api * talk and talk-llama: add more options to eleven-labs.py and remove DEFAULT_VOICE because it is deprecated (https://www.reddit.com/r/ElevenLabs/comments/1830abt/what_happened_to_bella/) ``` usage: eleven-labs.py [-q] [-l] [-h] [-n NAME \| -v NUMBER] [-f KEY=VAL] [-s FILE \| -p] [TEXTFILE] options: -q, --quick skip checking the required library action: TEXTFILE read the text file (default: stdin) -l, --list show the list of voices and exit -h, --help show this help and exit voice selection: -n NAME, --name NAME get a voice object by name (default: Arnold) -v NUMBER, --voice NUMBER get a voice object by number (see --list) -f KEY=VAL, --filter KEY=VAL filter voices by labels (default: "use case=narration") this option can be used multiple times filtering will be disabled if the first -f has no "=" (e.g. -f "any") output: -s FILE, --save FILE save the TTS to a file (default: audio.mp3) -p, --play play the TTS with ffplay ``` * examples: add speak_with_file() as suggested in the review * talk and talk-llama: ignore to_speak.txt	2024-02-24 09:24:47 +02:00
Abhilash Majumder	a0ddd8392c	whisper : add SYCL support (#1863 ) * add changes from llama upstream * add sycl abstraction * add sycl build * update cmake * add sycl build config * fix bug * fix bug * refactor build * fix bug * update build * call build * use sycl header * add examples * add target * fix typecast in quant.c * readd fp16 and readme * fix quant typecast * add sample * add readme * remove cxx file check	2024-02-23 09:22:24 +02:00
Georgi Gerganov	a2506909b1	talk-llama : sync llama.cpp	2024-02-22 23:30:53 +02:00
Georgi Gerganov	5fdb27ff80	ggml : 32-bit arm compat (#1891 ) * ggml : 32-bit arm compat * ggml : add ggml_vqtbl1q_s8 impl * ggml : cont	2024-02-22 18:31:40 +02:00
Georgi Gerganov	ce411498f6	sync : llama.cpp (ggml/0) ggml-ci	2024-02-22 15:12:36 +02:00
Davidson Francis	c56344b509	main : fix file existence check in main.cpp (#1889 ) In commit `dda4b0e` of PR #1872, I've introduced a check for the existence of files before loading the model. However, I haven't considered the case where whisper.cpp might read from stdin as well, and in such cases, the checks should ignore the "-" argument as it does not represent a regular file. Additionally, this commit removes the usage of 'stat()' in favor of the recently introduced function 'is_file_exist()' in common.cpp from PR #1871. Apologies for the bug introduced in the previous PR and any inconvenience it may have caused.	2024-02-22 15:01:08 +02:00
Georgi Gerganov	59119f4f20	talk-llama : sync llama.cpp	2024-02-20 12:09:57 +02:00
Georgi Gerganov	83afebe872	common : add IQ1_S (ggml/0) ggml-ci	2024-02-19 15:53:25 +02:00
Davidson Francis	dda4b0ed06	main : check if input files exist before proceeding (#1872 ) Until the most recent commit (`3d42463`), the main.cpp sample file does not check whether the input files exist or not. Consequently, the model is loaded first before reporting whether there was a failure or not when processing a file. In environments with HDD, this can take about 50 seconds or more, depending on the loaded model. This commit addresses this issue by checking in advance whether the input files exist or not.	2024-02-19 10:51:26 +02:00
Felix	07d04280be	examples : clean up common code (#1871 ) move some utility functions into common.h	2024-02-19 10:50:15 +02:00
Georgi Gerganov	551529290d	talk-llama : sync llama.cpp	2024-02-12 10:39:58 +02:00
dscripka	a6fb6ab597	examples : added audio_ctx argument to main and server (#1857 ) * added audio_ctx argument to main and server examples * Better default value Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * better default value (again) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-02-12 09:19:07 +02:00
Georgi Gerganov	f273e66dc6	examples : initialize context params properly (#1852 )	2024-02-11 16:39:12 +02:00
Georgi Gerganov	02b4c52c12	talk-llama : sync llama.cpp	2024-02-10 10:10:59 +02:00
Valentin Gosu	80e8a2ea39	server : allow CORS request with authorization headers (#1850 ) Whisper plugin in Obsidian requires an API key which is then sent as an authorization header. However, the presence of an authorization header requires a CORS Preflight, so both the OPTIONS method and the Access-Control-Allow-Headers: authorization must be handled.	2024-02-09 17:42:41 +02:00
Neuman Vong	19f8048139	whisper.android : how to build with CLBlast (#1809 ) * FetchContent * OpenCL * Documentation and make optional * Specify GGML build options in build.gradle * Use gradle properties * @ggerganov Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * @gpokat --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-02-09 17:39:05 +02:00
Georgi Gerganov	434b8f3b96	talk-llama : stream response (#1121 )	2024-02-06 19:56:12 +02:00
Georgi Gerganov	7a74e929c8	sync : ggml (#0 )	2024-01-30 21:30:26 +02:00
JacobLinCool	ae5c4f7340	common : fix wav buffer detection (#1819 )	2024-01-30 19:35:08 +02:00
JacobLinCool	baa30bacdb	server : add fields to `verbose_json` response (#1802 ) * server: include additional fields in the verbose_json response as OpenAI does * server: show request examples on home page * server: todo note for compression_ratio and no_speech_prob * server: add simple demo form to the homepage	2024-01-30 14:15:55 +02:00
Georgi Gerganov	e72e4158de	talk-llama : sync llama.cpp	2024-01-28 19:44:10 +02:00
Georgi Gerganov	52cce82493	common : fix input buffer check (#1812 )	2024-01-27 17:33:09 +02:00
Georgi Gerganov	ef3c9ed9eb	talk-llama : sync llama.cpp	2024-01-27 17:24:53 +02:00
Michael Rienstra	4bbb60efce	docs : make model options / model install methods clearer (#1806 ) * Make models more "discoverable" * Clean up code block language identifiers * make 3 options clearer * undo Prettier formatter change * docs: `$` shell prompt, consistently * docs: minor changes	2024-01-26 17:39:54 +02:00
Neuman Vong	d6b9be21d7	whisper.android : return output from benchmarks (#1785 ) Benchmarks are failing because JNI expects a jstring and the benchmarks are missing a return statement (i.e., returning null). The functions actually build a jstring but don't return it, so this seems to have been an oversight. This patch returns the jstring and now the benchmarks run successfully. Fixes #1783.	2024-01-19 16:17:38 +02:00
Ryan Hitchman	c0329acde8	server : implement "verbose_json" format with token details (#1781 ) * examples/server: implement "verbose_json" format with token details. This is intended to mirror the format of openai's Python whisper.transcribe() return values. * server: don't write WAV to a temporary file if not converting * server: use std::lock_guard instead of manual lock/unlock	2024-01-18 22:58:42 +02:00
Georgi Gerganov	1f50a7d29f	sync : llama.cpp	2024-01-17 21:23:33 +02:00
Benjamin Heiniger	f6614155e4	talk-llama : optional wake-up command and audio confirmation (#1765 ) * talk-llama: add optional wake-word detection from command * talk-llama: add optional audio confirmation before generating answer * talk-llama: fix small formatting issue in output * talk-llama.cpp: fix Windows build	2024-01-16 15:52:01 +02:00
Przemysław Pawełczyk	f5f159c320	server : fix building and simplify lib deps on Windows (#1772 ) * make : fix server example building on MSYS2 environments (Windows) It was not working since commit `eff3570f78` when server was introduced. * cmake : simplify server example lib deps on Windows server uses httplib::Server, not httplib::SSLServer, so there is no need to mention cryptographic libraries in target_link_libraries. Winsock (ws2_32) suffices here. Also use plain library names like we use in other places.	2024-01-15 15:48:13 +02:00
Georgi Gerganov	6ebba525f1	talk-llama : sync llama.cpp	2024-01-14 18:08:20 +02:00
Georgi Gerganov	2a5874441d	talk-llama : llama.cpp	2024-01-14 11:06:28 +02:00
Georgi Gerganov	d08445c9ad	sync : ggml	2024-01-14 10:55:18 +02:00
Georgi Gerganov	f001a3b7b6	talk-llama : sync llama.cpp	2024-01-14 00:13:17 +02:00
RhinoDevel	db078a9ba8	talk-llama : add optional CLI arg to set the bot name (#1764 )	2024-01-13 20:51:35 +02:00
james wolf	a13a7da5ad	examples : add python example for transcription (#1744 ) * rebase and add simple python interface * moved python files to examples/python	2024-01-13 19:37:18 +02:00
Georgi Gerganov	40ae0962f4	talk-llama : sync llama.cpp	2024-01-12 22:04:51 +02:00
George Hindle	fbcb52d3cd	server : add more parameters to server api (#1754 ) * feat(server): add more parameters to server api * fix(server): reset params to original parsed values for each request	2024-01-12 13:42:52 +02:00
George Hindle	f7908f9bb8	params : don't compute timestamps when not printing them (#1755 )	2024-01-12 13:24:38 +02:00
Georgi Gerganov	00b7a4be02	talk-llama : sync llama.cpp	2024-01-11 22:10:10 +02:00
Georgi Gerganov	32e71a1861	sync : ggml	2024-01-11 21:54:17 +02:00
Georgi Gerganov	9c857cf280	sync : llama.cpp	2024-01-11 21:50:01 +02:00
RhinoDevel	bcc1658cd0	talk-llama : add optional Piper TTS support (#1749 ) Add commented-out command to optionally use Piper (https://github.com/rhasspy/piper) as text-to-speech solution for the talk-llama example. Piper voices sound almost like real people which is a big improvement (e.g.) from something like espeak.	2024-01-10 16:15:28 +02:00
Emmanuel Schmidbauer	c46886f599	server : add request path option(#1741 )	2024-01-08 22:39:51 +00:00
Georgi Gerganov	29f78392c1	main : add cli option to disable system prints (#1740 )	2024-01-08 16:41:28 +02:00
Georgi Gerganov	022756a872	server : fix server temperature + add temperature_inc (#1729 ) * server : fix server temperature + add temperature_inc * server : change dashes to underscores in parameter names	2024-01-07 13:35:14 +02:00
Georgi Gerganov	3b8c2dff57	talk-llama : sync latest llama.cpp	2024-01-06 17:22:57 +02:00
Georgi Gerganov	ab0a8593c5	whisper.swiftui : add .gitignore	2024-01-04 15:00:27 +02:00
Tamotsu Takahashi	d87de61ae6	ci : build with CLBlast + ggml-opencl use GGML_API (#1576 ) * Build with CLBlast * Declare GGML_API After rebasing, examples/talk-llama failed: "D:\a\whisper.cpp\whisper.cpp\build\ALL_BUILD.vcxproj" (build target) (1) -> "D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj" (default target) (14) -> (Link target) -> llama.obj : error LNK2019: unresolved external symbol ggml_cl_free_data referenced in function "public: __cdecl llama_model::~llama_model(void)" (??1llama_model@@QEAA@XZ) [D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj] llama.obj : error LNK2019: unresolved external symbol ggml_cl_transform_tensor referenced in function "public: void __cdecl llama_model_loader::load_all_data(struct ggml_context ,void (__cdecl)(float,void ),void ,struct llama_mlock *)" (?load_all_data@llama_model_loader@@QEAAXPEAUggml_context@@P6AXMPEAX@Z1PEAUllama_mlock@@@Z) [D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj] D:\a\whisper.cpp\whisper.cpp\build\bin\Release\talk-llama.exe : fatal error LNK1120: 2 unresolved externals [D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj]	2023-12-29 12:23:27 +02:00
Georgi Gerganov	3a5302108d	sync : ggml (ggml_scale, ggml_row_size, etc.) (#1677 ) * sync : ggml * sync : llama.cpp * talk-llama : fix obsolete param * ggml-alloc : fix ggml_tallocr_is_own * talk.wasm : update to new ggml * ggml : fix type punning in ggml_scale * ggml : cuda jetson + arm quants warnings	2023-12-22 17:53:39 +02:00
bobqianic	d2419030b0	examples : Revert CMakeLists.txt for talk-llama (#1669 )	2023-12-21 22:48:52 +02:00
Georgi Gerganov	940de9dbe9	wchess : update README.md	2023-12-14 22:00:47 +02:00
Georgi Gerganov	375585c07c	wchess : update readme	2023-12-14 17:51:14 +02:00
fraxy-v	fd99ece8e3	wchess : whisper assisted chess (#1595 ) * wchess: whisper assisted chess * wchess: fix allowed moves in check * wchess: touchstart, touchend events * wchess: css, disabled button * wchess : html touches * wchess : minor fixes and code style * wchess : bump encoder context to 1280 * wchess : index.html * wchess : fix CI warnings * wchess : add array header * wchess : build static library * wchess : display grammar * wchess : update UX * wchess : add comment * wchess : add README --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-12-14 15:58:26 +02:00
Kreijstal	ec03661b20	cmake : target windows 8 or above for prefetchVirtualMemory in llama-talk (#1617 ) Since we use prefetchVirtualMemory we specify we target win 8 or above, otherwise other compilers will refuse to use the prefetchVirtualMemory api, (I understand you are loading it dynamically but the header definition has this limitation)	2023-12-12 11:35:00 +00:00
Kreijstal	6335933a5b	cmake : Fix bug in httplib.h for mingw (#1615 ) Fix bug in httlib.h for mingw, please see https://github.com/yhirose/cpp-httplib/issues/1669	2023-12-10 17:47:52 +00:00
Georgi Gerganov	9521ba6801	whisper.objc : disable timestamps for real-time transcription	2023-12-08 13:43:37 +02:00
Oleg Sidorov	3163090d89	server : pass max-len argument to the server (#1574 ) This commit fixes the missing parameter binding for max-len between the input arguments and wparams.	2023-12-05 23:01:45 +02:00
Aleksander Andrzejewski	a0ec3fac54	Server : Add support for .vtt format to Whisper server (#1578 ) - The code comes from examples/main - The output mimetype is set to text/vtt Example usage: ```shell curl 127.0.0.1:8080/inference \ -H "Content-Type: multipart/form-data" \ -F file="@samples/jfk.wav" \ -F temperature="0.2" \ -F response-format="vtt" ```	2023-11-30 23:44:26 +00:00
Oleg Sidorov	6559b538e5	server : backport .srt output format (#1565 ) This commit adds a support of .srt format to Whisper server. The code is effectively backported from examples/main. The output mimetype is set to application/x-subrip as per https://en.wikipedia.org/wiki/SubRip. Example usage: curl 127.0.0.1:8080/inference \ -H "Content-Type: multipart/form-data" \ -F file="@<file-path>" \ -F temperature="0.2" \ -F response-format="srt"	2023-11-28 15:42:58 +02:00
Kasumi	6b094b6dfe	server : set default CORS headers to allow all (#1567 )	2023-11-28 11:55:20 +02:00
Hang	641f2f4282	readme : update help (#1560 )	2023-11-27 12:04:08 +02:00
Ismatulla Mansurov	23c21e92eb	server : automatically convert audio on the server (#1539 ) * server : automatically convert audio on the server * server : remove rebundant comments * server : automatic conversion refactor * server : update server readme * server : remove unnecessary comments and tabs * server : put back remove calling * server : apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * server : check ffmpeg before the server lunch * server : fix indentation * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * server : fix function typo calling * server : fix function typo calling * server : add warning in readme --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-11-27 11:28:34 +02:00
ecneladis	a5881d619c	server : add --print-realtime param (#1541 ) * server : add --print-realtime param * Fix duplicate realtime output	2023-11-24 09:35:02 +02:00
Okabintaro	8328d1900f	fix(server): typo in temperature parameter (#1545 ) Also fixed another typo in comments.	2023-11-23 20:59:36 +02:00
Felix	5c7be85fdc	Change temp file name for server application (#1535 ) Avoid issue of removing file if it exists in the current working directory	2023-11-22 09:23:36 +01:00
Felix	9ac88f2b57	Close file after writing in server application (#1533 ) Fix of mistake leaving file open while reading it again as wav	2023-11-21 20:36:10 +01:00
Georgi Gerganov	46f5b6cb08	server : add video to readme	2023-11-21 17:30:43 +02:00
Felix	eff3570f78	server : add a REST Whisper server example with OAI-like API (#1380 ) * Add first draft of server * Added json support and base funcs for server.cpp * Add more user input via api-request also some clean up * Add reqest params and load post function Also some general clean up * Remove unused function * Add readme * Add exception handlers * Update examples/server/server.cpp * make : add server target * Add magic curl syntax Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-11-20 21:40:24 +02:00
Georgi Gerganov	a01b2e0971	sdl : fix audio callback (#1523 )	2023-11-20 13:16:38 +02:00
Georgi Gerganov	bebf0da983	quantize : add support for K-quant types	2023-11-16 16:18:24 +02:00
Sam Pullara	7883d1cae4	talk-llama : improve quote and backtick handling (#1364 ) * ISSUE-1329: replace " with ' so it doesn't try to execute code in backticks. * Typo * Update to keep possessives in the output Closes the ' then puts a ' in quotes then reopens the ' to escape the ' characters.	2023-11-16 10:34:05 +02:00
Georgi Gerganov	ccc85b4ff8	talk-llama : enable GPU by default	2023-11-15 21:33:00 +02:00
Georgi Gerganov	bfbaa4dce5	whisper : make large version explicit + fix data size units (#1493 )	2023-11-15 19:42:25 +02:00
Georgi Gerganov	b6c5f49b78	whisper : add batched decoding (#1486 ) * whisper : add whisper_batch * whisper : move kv_self to whisper_state * whisper : full batched decoding support * whisper : fix memory leak in whisper_batch * whisper : fix mem leak again + remove oboslete function * whisper : clear kv cache when using whisper_decode API * whisper : speed-up sampling * whisper : fix decoders initializer * bench : add batch size 5 bench * whisper : add comment about the KV cache size * whisper : add check for max number of decoders * whisper : avoid starting sampling threads with bs=1 * whisper : enable beam-search by default * cuda : sync llama.cpp fixes	2023-11-15 16:12:52 +02:00
Evan Jones	3e5c7feeff	whisper : add grammar-based sampling (#1229 ) * whisper : add grammar-based sampling * build : fix after master merge * command : fix exception when recognizing the command * whisper : fine-tuning grammar functionality * command : grammar-related improvements - option to read grammar from file - add sample grammars for colors and chess moves - fine-tune the performance further * grammars : add assistant + update comments * command : enable beam-search, add "no_timestamps", add "context", add p * whisper : remove comment --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-11-13 10:51:34 +02:00
rlapray	c23598e4ca	talk-llama : add n_gpu_layers parameter (#1475 )	2023-11-13 10:04:16 +02:00
Tong Li	54a08bde29	examples : add whisper.android.java for compatibility with older Android versions using Java (#1382 ) * save the recorded audio to a file * Alignment -help * Save the correct audio * chage to a consistent coding style * Correct typo * Update examples/stream/stream.cpp * Update examples/stream/stream.cpp * Correct variable misuse * Update examples/stream/stream.cpp * Update examples/stream/stream.cpp * Update examples/stream/stream.cpp * Update examples/stream/stream.cpp * add .bin .cxx/ .gradle/ cmake-build-debug/ to gitignore add whisper.android.java * Added support for older versions of Android of Java * add examples for android java * add README.md for android java * add fullTranscribeWithTime * 增加 toString()方法和测试 * change return type to void * update to v1.4.1 * add WhisperService * chage to whisper_full_get_segment_t1 * add method transcribeDataWithTime * modified toString ``` return "[" + start + " --> " + end + "]:" + sentence; ``` * Optimize code logic * update text view on handle * set max lines * change Chinese to English * Update bindings/java/build.gradle * Update .gitignore * add android.java to github action * chage android.java to android_java in build.yml * remove gradle * chage jdk to temurin in android_java of CI * chage jdk to temurin 11 in android_java of CI * add x to gradlew * set api-level for android_java of CI * Update examples/whisper.android.java/app/src/main/jni/whisper/CMakeLists.txt * add ndk version in build.gradle * remove local.properties * add testFullTranscribeWithTime --------- Co-authored-by: litongmacos <litongjava@qq.com> Co-authored-by: bobqianic <129547291+bobqianic@users.noreply.github.com>	2023-11-12 18:31:58 +02:00

1 2 3 4 5 ...

405 Commits