Commit Graph

228 Commits

Author SHA1 Message Date
Tamotsu Takahashi
3df6c14fca Build with OpenBLAS and SDL2 on windows 2022-11-23 22:09:54 +02:00
Georgi Gerganov
d64d6ca3fd
models : minor changes to the HF convert script (#157) 2022-11-23 22:07:20 +02:00
Georgi Gerganov
93482d0373
models : add "convert-h5-to-ggml.py" script (#157)
Converts transformers models to ggml.
Although the conversion is successful, it does not work for some reason.
Not sure why
2022-11-23 17:19:22 +02:00
Georgi Gerganov
49706a658a
minor : updates few prints + fix buttons in whisper.wasm 2022-11-23 17:19:21 +02:00
Georgi Gerganov
363a2dadec
Update README.md 2022-11-23 09:53:55 +02:00
Georgi Gerganov
623a486056
Update README.md 2022-11-23 09:52:36 +02:00
Tamotsu Takahashi
2f596f5b33 Find libopenblas.dll.a on windows
"lib" is needed for windows.

With this change, you can build whisper.cpp with OpenBLAS's prebuilt DLL.
1. extract a zip from https://github.com/xianyi/OpenBLAS/releases
2. copy the headers in (openblas)/include to the root directory of whisper.cpp
3. invoke cmake with -DCMAKE_LIBRARY_PATH=(openblas)\lib -DWHISPER_SUPPORT_OPENBLAS=ON
4. copy (openblas)/bin/libopenblas.dll to the same directory of whisper.dll after msbuild

https://github.com/ggerganov/whisper.cpp/issues/89#issuecomment-1324391258
2022-11-23 08:26:45 +02:00
Georgi Gerganov
e5dcdabbb8
unicode : fix character replacement (thanks to @tamo) 2022-11-23 08:24:29 +02:00
Georgi Gerganov
dad109c3f1
close #109 : add fetching of the model over HTTP (whisper.wasm) 2022-11-22 22:48:56 +02:00
Georgi Gerganov
326573de9a
talk.wasm : final touches 2022-11-22 22:22:17 +02:00
Georgi Gerganov
9aea96f774
talk.wasm : polishing + adding many AI personalities 2022-11-22 20:10:20 +02:00
Georgi Gerganov
385236d1d3
stream : "-kc" now enables context keeping from previous segment (#90)
By default, the context keeping is disabled
2022-11-22 18:21:15 +02:00
M. Eren Akbiyik
63ae03b8e0
Prompt previous tokens for streaming (#163)
* feat: prompt previous tokens for streaming

I used a vector pointer instead of vector itself because it gave weird errors, and why not

* convert vector to use with C api

* feat: remove old refs, check for prompt size

* feat: use better way of getting the pointer
2022-11-22 18:10:35 +02:00
Georgi Gerganov
78116f8eda
talk.wasm : update README.md 2022-11-21 22:42:29 +02:00
Georgi Gerganov
a4dfbeecf9
talk.wasm : GPT-2 meets Whisper in WebAssembly (#155)
* talk : initial real-time transcription in the browser

* talk : polishing the UI

* talk : ready for beta testing

* talk.wasm : rename example
2022-11-21 22:20:42 +02:00
Georgi Gerganov
2e311a2917
Update README.md 2022-11-21 18:52:20 +02:00
Georgi Gerganov
2065572a11 ggml : fix Windows build 2022-11-20 22:47:03 +02:00
Georgi Gerganov
5c2176e314 ci : add Windows build 2022-11-20 22:47:03 +02:00
Georgi Gerganov
f2df9bd768 stream : add "max_tokens" cli arg
Controls the max tokens per segment for the stream example
2022-11-20 21:22:41 +02:00
Georgi Gerganov
fb8d77f760 stream : add "audio_ctx" parameter
Used to overwrite the audio context size of the Encoder.
For example, setting "audio_ctx = 512" will make it run about 3 times
faster, processing about 10s of audio, instead of 30s.

The transcription quality drops, but this can be used for real-time
streaming purposes where performance is important.
2022-11-20 21:22:41 +02:00
Georgi Gerganov
62b5ff875c stream : add "max_tokens" parameter
Used to limit the number of tokens in a segment.
Useful to battle with word repetition when using partial encoder context
2022-11-20 21:22:41 +02:00
Georgi Gerganov
d351771a4b stream : add "single_segment" option
Force the entire audio chunk to be transcribed into a single segment
2022-11-20 21:22:41 +02:00
Georgi Gerganov
c058aaf22e stream : partial encoder experiments 2022-11-20 21:22:41 +02:00
greeshmay
2ba66360c9
fix: free ggml_context (close #149) (#150)
* fix: free ggml_context

* ggml : free the model's contexts in whisper_free()

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2022-11-17 22:12:51 +02:00
Georgi Gerganov
e70e5c8b53
models : simplify the conversion script
"transformers" dependency is not actually needed
2022-11-16 19:22:32 +02:00
Dody Suria Wijaya
55a0e1a64e Update download-ggml-model.sh
follow curl redirect to new hosting site
2022-11-16 18:59:44 +02:00
Georgi Gerganov
864a78a8d0
models : change default hosting to Hugging Face
My Linode is running out of monthly bandwidth due to the big interest in
the project
2022-11-15 19:47:06 +02:00
Georgi Gerganov
83c742f1a7 whisper : add option to speed up the audio tempo by x2
Using a Phase Vocoder for speeding up the audio tempo by scaling down
the frequencies in the frequency domain.

This reduces the computation in the Encoder by a factor of 2.
The transcription accuracy is degraded, but for slow to normal speech -
it seems to be still very good.

I think this can find application for real-time transcription - i.e. the
"stream" example.
2022-11-13 16:25:43 +02:00
Georgi Gerganov
41b48ab7f1
make : add libwhisper.so target (#144) 2022-11-13 09:09:48 +02:00
Chidi Williams
a728be9cdb
Add WHISPER_NO_AVX and WHISPER_NO_AVX2 to CMakeLists (#136)
* Check for AVX and AVX2 on Darwin

* Add AVX options to CMakeLists
2022-11-11 18:10:01 +02:00
Georgi Gerganov
46a68fb9b5
minor : remove one more redundant line 2022-11-11 18:02:58 +02:00
Georgi Gerganov
ccd56a9c5b
minor : fix double float32 conversion in python script 2022-11-11 17:58:51 +02:00
Georgi Gerganov
3500ce8727
ref #40 : start working on the documentation 2022-11-09 21:41:40 +02:00
Alan
7519eabf65 Adds support for stdin wav input 2022-11-09 20:37:23 +02:00
Georgi Gerganov
b21213c23e
js : update whipser.js to latest 2022-11-09 19:33:10 +02:00
Chidi Williams
9e700e1821 Check for AVX and AVX2 on Darwin 2022-11-09 18:49:55 +02:00
boolemancer
0bfe728b84 Fix the Windows pthread_create shim
The current implementation doesn't actually set the out parameter,
and it returns 0 on failure instead of on success.
2022-11-08 15:02:32 +02:00
Georgi Gerganov
4e5674a5d5
sync : submodule whisper.spm 2022-11-07 21:48:13 +02:00
Georgi Gerganov
4c66b6a828
cmake : add submodule whisper.spm 2022-11-07 20:50:24 +02:00
Georgi Gerganov
c30bffc8a5
ref #22 : add "duration" option
Can be used to partially process a recording
2022-11-07 20:14:52 +02:00
Georgi Gerganov
8fdfb0ba92
Update README.md 2022-11-06 21:04:21 +02:00
Georgi Gerganov
c71363f14c
examples : add simple script for generating Karaoke video 2022-11-06 09:22:50 +02:00
Georgi Gerganov
a09e9123ca
Update README.md 2022-11-05 08:44:41 +02:00
Georgi Gerganov
d42cf6d0df
Update README.md 2022-11-04 22:26:08 +02:00
Georgi Gerganov
ef47d77492
main : fix generated bash script 2022-11-04 18:30:38 +02:00
Georgi Gerganov
75171c2b79
ggml : multi-thread the ggml_add operator 2022-11-03 20:53:44 +02:00
Georgi Gerganov
a2eeb941f6
cmake : fix passing GGML_PERF compile option 2022-11-03 20:19:06 +02:00
Georgi Gerganov
0e689f83d8
Update README.md 2022-11-02 22:03:27 +02:00
Georgi Gerganov
d5afebd37c
whisper : token-level timestamp refactoring (#49, #120)
This turned out pretty good overall. The algorithm has been moved from
main.cpp to whisper.cpp and can be reused for all subtitles types. This
means that now you can specify the maximum length of the generated
lines. Simply provide the "-ml" argument specifying the max length in
number of characters
2022-11-02 21:45:54 +02:00
Georgi Gerganov
4b1c32e8ea
Update README.md 2022-11-02 18:33:29 +02:00