whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2024-12-26 23:51:05 +00:00

Author	SHA1	Message	Date
Tamotsu Takahashi	3df6c14fca	Build with OpenBLAS and SDL2 on windows	2022-11-23 22:09:54 +02:00
Georgi Gerganov	d64d6ca3fd	models : minor changes to the HF convert script (#157 )	2022-11-23 22:07:20 +02:00
Georgi Gerganov	93482d0373	models : add "convert-h5-to-ggml.py" script (#157 ) Converts transformers models to ggml. Although the conversion is successful, it does not work for some reason. Not sure why	2022-11-23 17:19:22 +02:00
Georgi Gerganov	49706a658a	minor : updates few prints + fix buttons in whisper.wasm	2022-11-23 17:19:21 +02:00
Georgi Gerganov	363a2dadec	Update README.md	2022-11-23 09:53:55 +02:00
Georgi Gerganov	623a486056	Update README.md	2022-11-23 09:52:36 +02:00
Tamotsu Takahashi	2f596f5b33	Find libopenblas.dll.a on windows "lib" is needed for windows. With this change, you can build whisper.cpp with OpenBLAS's prebuilt DLL. 1. extract a zip from https://github.com/xianyi/OpenBLAS/releases 2. copy the headers in (openblas)/include to the root directory of whisper.cpp 3. invoke cmake with -DCMAKE_LIBRARY_PATH=(openblas)\lib -DWHISPER_SUPPORT_OPENBLAS=ON 4. copy (openblas)/bin/libopenblas.dll to the same directory of whisper.dll after msbuild https://github.com/ggerganov/whisper.cpp/issues/89#issuecomment-1324391258	2022-11-23 08:26:45 +02:00
Georgi Gerganov	e5dcdabbb8	unicode : fix character replacement (thanks to @tamo)	2022-11-23 08:24:29 +02:00
Georgi Gerganov	dad109c3f1	close #109 : add fetching of the model over HTTP (whisper.wasm)	2022-11-22 22:48:56 +02:00
Georgi Gerganov	326573de9a	talk.wasm : final touches	2022-11-22 22:22:17 +02:00
Georgi Gerganov	9aea96f774	talk.wasm : polishing + adding many AI personalities	2022-11-22 20:10:20 +02:00
Georgi Gerganov	385236d1d3	stream : "-kc" now enables context keeping from previous segment (#90 ) By default, the context keeping is disabled	2022-11-22 18:21:15 +02:00
M. Eren Akbiyik	63ae03b8e0	Prompt previous tokens for streaming (#163 ) * feat: prompt previous tokens for streaming I used a vector pointer instead of vector itself because it gave weird errors, and why not * convert vector to use with C api * feat: remove old refs, check for prompt size * feat: use better way of getting the pointer	2022-11-22 18:10:35 +02:00
Georgi Gerganov	78116f8eda	talk.wasm : update README.md	2022-11-21 22:42:29 +02:00
Georgi Gerganov	a4dfbeecf9	talk.wasm : GPT-2 meets Whisper in WebAssembly (#155 ) * talk : initial real-time transcription in the browser * talk : polishing the UI * talk : ready for beta testing * talk.wasm : rename example	2022-11-21 22:20:42 +02:00
Georgi Gerganov	2e311a2917	Update README.md	2022-11-21 18:52:20 +02:00
Georgi Gerganov	2065572a11	ggml : fix Windows build	2022-11-20 22:47:03 +02:00
Georgi Gerganov	5c2176e314	ci : add Windows build	2022-11-20 22:47:03 +02:00
Georgi Gerganov	f2df9bd768	stream : add "max_tokens" cli arg Controls the max tokens per segment for the stream example	2022-11-20 21:22:41 +02:00
Georgi Gerganov	fb8d77f760	stream : add "audio_ctx" parameter Used to overwrite the audio context size of the Encoder. For example, setting "audio_ctx = 512" will make it run about 3 times faster, processing about 10s of audio, instead of 30s. The transcription quality drops, but this can be used for real-time streaming purposes where performance is important.	2022-11-20 21:22:41 +02:00
Georgi Gerganov	62b5ff875c	stream : add "max_tokens" parameter Used to limit the number of tokens in a segment. Useful to battle with word repetition when using partial encoder context	2022-11-20 21:22:41 +02:00
Georgi Gerganov	d351771a4b	stream : add "single_segment" option Force the entire audio chunk to be transcribed into a single segment	2022-11-20 21:22:41 +02:00
Georgi Gerganov	c058aaf22e	stream : partial encoder experiments	2022-11-20 21:22:41 +02:00
greeshmay	2ba66360c9	fix: free ggml_context (close #149 ) (#150 ) * fix: free ggml_context * ggml : free the model's contexts in whisper_free() Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2022-11-17 22:12:51 +02:00
Georgi Gerganov	e70e5c8b53	models : simplify the conversion script "transformers" dependency is not actually needed	2022-11-16 19:22:32 +02:00
Dody Suria Wijaya	55a0e1a64e	Update download-ggml-model.sh follow curl redirect to new hosting site	2022-11-16 18:59:44 +02:00
Georgi Gerganov	864a78a8d0	models : change default hosting to Hugging Face My Linode is running out of monthly bandwidth due to the big interest in the project	2022-11-15 19:47:06 +02:00
Georgi Gerganov	83c742f1a7	whisper : add option to speed up the audio tempo by x2 Using a Phase Vocoder for speeding up the audio tempo by scaling down the frequencies in the frequency domain. This reduces the computation in the Encoder by a factor of 2. The transcription accuracy is degraded, but for slow to normal speech - it seems to be still very good. I think this can find application for real-time transcription - i.e. the "stream" example.	2022-11-13 16:25:43 +02:00
Georgi Gerganov	41b48ab7f1	make : add libwhisper.so target (#144 )	2022-11-13 09:09:48 +02:00
Chidi Williams	a728be9cdb	Add WHISPER_NO_AVX and WHISPER_NO_AVX2 to CMakeLists (#136 ) * Check for AVX and AVX2 on Darwin * Add AVX options to CMakeLists	2022-11-11 18:10:01 +02:00
Georgi Gerganov	46a68fb9b5	minor : remove one more redundant line	2022-11-11 18:02:58 +02:00
Georgi Gerganov	ccd56a9c5b	minor : fix double float32 conversion in python script	2022-11-11 17:58:51 +02:00
Georgi Gerganov	3500ce8727	ref #40 : start working on the documentation	2022-11-09 21:41:40 +02:00
Alan	7519eabf65	Adds support for stdin wav input	2022-11-09 20:37:23 +02:00
Georgi Gerganov	b21213c23e	js : update whipser.js to latest	2022-11-09 19:33:10 +02:00
Chidi Williams	9e700e1821	Check for AVX and AVX2 on Darwin	2022-11-09 18:49:55 +02:00
boolemancer	0bfe728b84	Fix the Windows pthread_create shim The current implementation doesn't actually set the out parameter, and it returns 0 on failure instead of on success.	2022-11-08 15:02:32 +02:00
Georgi Gerganov	4e5674a5d5	sync : submodule whisper.spm	2022-11-07 21:48:13 +02:00
Georgi Gerganov	4c66b6a828	cmake : add submodule whisper.spm	2022-11-07 20:50:24 +02:00
Georgi Gerganov	c30bffc8a5	ref #22 : add "duration" option Can be used to partially process a recording	2022-11-07 20:14:52 +02:00
Georgi Gerganov	8fdfb0ba92	Update README.md	2022-11-06 21:04:21 +02:00
Georgi Gerganov	c71363f14c	examples : add simple script for generating Karaoke video	2022-11-06 09:22:50 +02:00
Georgi Gerganov	a09e9123ca	Update README.md	2022-11-05 08:44:41 +02:00
Georgi Gerganov	d42cf6d0df	Update README.md	2022-11-04 22:26:08 +02:00
Georgi Gerganov	ef47d77492	main : fix generated bash script	2022-11-04 18:30:38 +02:00
Georgi Gerganov	75171c2b79	ggml : multi-thread the ggml_add operator	2022-11-03 20:53:44 +02:00
Georgi Gerganov	a2eeb941f6	cmake : fix passing GGML_PERF compile option	2022-11-03 20:19:06 +02:00
Georgi Gerganov	0e689f83d8	Update README.md	2022-11-02 22:03:27 +02:00
Georgi Gerganov	d5afebd37c	whisper : token-level timestamp refactoring (#49 , #120 ) This turned out pretty good overall. The algorithm has been moved from main.cpp to whisper.cpp and can be reused for all subtitles types. This means that now you can specify the maximum length of the generated lines. Simply provide the "-ml" argument specifying the max length in number of characters	2022-11-02 21:45:54 +02:00
Georgi Gerganov	4b1c32e8ea	Update README.md	2022-11-02 18:33:29 +02:00

1 2 3 4 5

228 Commits