Georgi Gerganov
128aaadb93
whisper : improve printfs
2022-11-24 17:54:16 +02:00
Georgi Gerganov
454b91de16
main : fix dangling pointer when using stdin for input ( #65 )
2022-11-24 17:53:51 +02:00
Georgi Gerganov
d7024cf9dc
main, stream : remove --verbose flag ( #178 )
2022-11-24 17:52:04 +02:00
Georgi Gerganov
37422ed733
talk.wasm : add audio pre-processing + bump memory
2022-11-24 00:34:00 +02:00
Georgi Gerganov
be3b720f96
talk.wasm : refactoring + update README.md
2022-11-24 00:08:57 +02:00
Georgi Gerganov
00f46dbc1d
models : add usage comments to the HF convert script ( #157 )
2022-11-23 23:22:40 +02:00
Georgi Gerganov
5698bddbc9
models : fix HF fine-tuned model conversion script ( #157 )
...
It works now
2022-11-23 23:14:11 +02:00
Georgi Gerganov
388e9f79ad
ggml : fix the fix
2022-11-23 22:40:06 +02:00
Georgi Gerganov
35cd29ce1f
ggml : fix cross-compile Linux -> Window with mingw ( #168 )
2022-11-23 22:28:41 +02:00
Georgi Gerganov
a156a358ca
Revert "update README.md"
...
This reverts commit 6a84147113
.
2022-11-23 22:16:50 +02:00
katsu560
6a84147113
update README.md
2022-11-23 22:16:33 +02:00
katsu560
804f36aa2c
ggml: change inline ggml_fp16_to_fp32, ggml_fp16_t ggml_fp32_to_fp16
2022-11-23 22:16:33 +02:00
katsu560
4b2f51b479
add gprof option
2022-11-23 22:16:33 +02:00
katsu560
800ae5b808
fix AVX,AVX2,FMA,F16C detection on Linux and add flags for OpenBLAS
2022-11-23 22:16:33 +02:00
katsu560
83456076f0
add AVX support
2022-11-23 22:16:33 +02:00
Tamotsu Takahashi
3df6c14fca
Build with OpenBLAS and SDL2 on windows
2022-11-23 22:09:54 +02:00
Georgi Gerganov
d64d6ca3fd
models : minor changes to the HF convert script ( #157 )
2022-11-23 22:07:20 +02:00
Georgi Gerganov
93482d0373
models : add "convert-h5-to-ggml.py" script ( #157 )
...
Converts transformers models to ggml.
Although the conversion is successful, it does not work for some reason.
Not sure why
2022-11-23 17:19:22 +02:00
Georgi Gerganov
49706a658a
minor : updates few prints + fix buttons in whisper.wasm
2022-11-23 17:19:21 +02:00
Georgi Gerganov
363a2dadec
Update README.md
2022-11-23 09:53:55 +02:00
Georgi Gerganov
623a486056
Update README.md
2022-11-23 09:52:36 +02:00
Tamotsu Takahashi
2f596f5b33
Find libopenblas.dll.a on windows
...
"lib" is needed for windows.
With this change, you can build whisper.cpp with OpenBLAS's prebuilt DLL.
1. extract a zip from https://github.com/xianyi/OpenBLAS/releases
2. copy the headers in (openblas)/include to the root directory of whisper.cpp
3. invoke cmake with -DCMAKE_LIBRARY_PATH=(openblas)\lib -DWHISPER_SUPPORT_OPENBLAS=ON
4. copy (openblas)/bin/libopenblas.dll to the same directory of whisper.dll after msbuild
https://github.com/ggerganov/whisper.cpp/issues/89#issuecomment-1324391258
2022-11-23 08:26:45 +02:00
Georgi Gerganov
e5dcdabbb8
unicode : fix character replacement (thanks to @tamo)
2022-11-23 08:24:29 +02:00
Georgi Gerganov
dad109c3f1
close #109 : add fetching of the model over HTTP (whisper.wasm)
2022-11-22 22:48:56 +02:00
Georgi Gerganov
326573de9a
talk.wasm : final touches
2022-11-22 22:22:17 +02:00
Georgi Gerganov
9aea96f774
talk.wasm : polishing + adding many AI personalities
2022-11-22 20:10:20 +02:00
Georgi Gerganov
385236d1d3
stream : "-kc" now enables context keeping from previous segment ( #90 )
...
By default, the context keeping is disabled
2022-11-22 18:21:15 +02:00
M. Eren Akbiyik
63ae03b8e0
Prompt previous tokens for streaming ( #163 )
...
* feat: prompt previous tokens for streaming
I used a vector pointer instead of vector itself because it gave weird errors, and why not
* convert vector to use with C api
* feat: remove old refs, check for prompt size
* feat: use better way of getting the pointer
2022-11-22 18:10:35 +02:00
Georgi Gerganov
78116f8eda
talk.wasm : update README.md
2022-11-21 22:42:29 +02:00
Georgi Gerganov
a4dfbeecf9
talk.wasm : GPT-2 meets Whisper in WebAssembly ( #155 )
...
* talk : initial real-time transcription in the browser
* talk : polishing the UI
* talk : ready for beta testing
* talk.wasm : rename example
2022-11-21 22:20:42 +02:00
Georgi Gerganov
2e311a2917
Update README.md
2022-11-21 18:52:20 +02:00
Georgi Gerganov
2065572a11
ggml : fix Windows build
2022-11-20 22:47:03 +02:00
Georgi Gerganov
5c2176e314
ci : add Windows build
2022-11-20 22:47:03 +02:00
Georgi Gerganov
f2df9bd768
stream : add "max_tokens" cli arg
...
Controls the max tokens per segment for the stream example
2022-11-20 21:22:41 +02:00
Georgi Gerganov
fb8d77f760
stream : add "audio_ctx" parameter
...
Used to overwrite the audio context size of the Encoder.
For example, setting "audio_ctx = 512" will make it run about 3 times
faster, processing about 10s of audio, instead of 30s.
The transcription quality drops, but this can be used for real-time
streaming purposes where performance is important.
2022-11-20 21:22:41 +02:00
Georgi Gerganov
62b5ff875c
stream : add "max_tokens" parameter
...
Used to limit the number of tokens in a segment.
Useful to battle with word repetition when using partial encoder context
2022-11-20 21:22:41 +02:00
Georgi Gerganov
d351771a4b
stream : add "single_segment" option
...
Force the entire audio chunk to be transcribed into a single segment
2022-11-20 21:22:41 +02:00
Georgi Gerganov
c058aaf22e
stream : partial encoder experiments
2022-11-20 21:22:41 +02:00
greeshmay
2ba66360c9
fix: free ggml_context ( close #149 ) ( #150 )
...
* fix: free ggml_context
* ggml : free the model's contexts in whisper_free()
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2022-11-17 22:12:51 +02:00
Georgi Gerganov
e70e5c8b53
models : simplify the conversion script
...
"transformers" dependency is not actually needed
2022-11-16 19:22:32 +02:00
Dody Suria Wijaya
55a0e1a64e
Update download-ggml-model.sh
...
follow curl redirect to new hosting site
2022-11-16 18:59:44 +02:00
Georgi Gerganov
864a78a8d0
models : change default hosting to Hugging Face
...
My Linode is running out of monthly bandwidth due to the big interest in
the project
2022-11-15 19:47:06 +02:00
Georgi Gerganov
83c742f1a7
whisper : add option to speed up the audio tempo by x2
...
Using a Phase Vocoder for speeding up the audio tempo by scaling down
the frequencies in the frequency domain.
This reduces the computation in the Encoder by a factor of 2.
The transcription accuracy is degraded, but for slow to normal speech -
it seems to be still very good.
I think this can find application for real-time transcription - i.e. the
"stream" example.
2022-11-13 16:25:43 +02:00
Georgi Gerganov
41b48ab7f1
make : add libwhisper.so target ( #144 )
2022-11-13 09:09:48 +02:00
Chidi Williams
a728be9cdb
Add WHISPER_NO_AVX and WHISPER_NO_AVX2 to CMakeLists ( #136 )
...
* Check for AVX and AVX2 on Darwin
* Add AVX options to CMakeLists
2022-11-11 18:10:01 +02:00
Georgi Gerganov
46a68fb9b5
minor : remove one more redundant line
2022-11-11 18:02:58 +02:00
Georgi Gerganov
ccd56a9c5b
minor : fix double float32 conversion in python script
2022-11-11 17:58:51 +02:00
Georgi Gerganov
3500ce8727
ref #40 : start working on the documentation
2022-11-09 21:41:40 +02:00
Alan
7519eabf65
Adds support for stdin wav input
2022-11-09 20:37:23 +02:00
Georgi Gerganov
b21213c23e
js : update whipser.js to latest
2022-11-09 19:33:10 +02:00