2e311a2917
Update README.md
2022-11-21 18:52:20 +02:00
2065572a11
ggml : fix Windows build
2022-11-20 22:47:03 +02:00
5c2176e314
ci : add Windows build
2022-11-20 22:47:03 +02:00
f2df9bd768
stream : add "max_tokens" cli arg
...
Controls the max tokens per segment for the stream example
2022-11-20 21:22:41 +02:00
fb8d77f760
stream : add "audio_ctx" parameter
...
Used to overwrite the audio context size of the Encoder.
For example, setting "audio_ctx = 512" will make it run about 3 times
faster, processing about 10s of audio, instead of 30s.
The transcription quality drops, but this can be used for real-time
streaming purposes where performance is important.
2022-11-20 21:22:41 +02:00
62b5ff875c
stream : add "max_tokens" parameter
...
Used to limit the number of tokens in a segment.
Useful to battle with word repetition when using partial encoder context
2022-11-20 21:22:41 +02:00
d351771a4b
stream : add "single_segment" option
...
Force the entire audio chunk to be transcribed into a single segment
2022-11-20 21:22:41 +02:00
c058aaf22e
stream : partial encoder experiments
2022-11-20 21:22:41 +02:00
2ba66360c9
fix: free ggml_context ( close #149 ) ( #150 )
...
* fix: free ggml_context
* ggml : free the model's contexts in whisper_free()
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2022-11-17 22:12:51 +02:00
e70e5c8b53
models : simplify the conversion script
...
"transformers" dependency is not actually needed
2022-11-16 19:22:32 +02:00
55a0e1a64e
Update download-ggml-model.sh
...
follow curl redirect to new hosting site
2022-11-16 18:59:44 +02:00
864a78a8d0
models : change default hosting to Hugging Face
...
My Linode is running out of monthly bandwidth due to the big interest in
the project
2022-11-15 19:47:06 +02:00
83c742f1a7
whisper : add option to speed up the audio tempo by x2
...
Using a Phase Vocoder for speeding up the audio tempo by scaling down
the frequencies in the frequency domain.
This reduces the computation in the Encoder by a factor of 2.
The transcription accuracy is degraded, but for slow to normal speech -
it seems to be still very good.
I think this can find application for real-time transcription - i.e. the
"stream" example.
2022-11-13 16:25:43 +02:00
41b48ab7f1
make : add libwhisper.so target ( #144 )
2022-11-13 09:09:48 +02:00
a728be9cdb
Add WHISPER_NO_AVX and WHISPER_NO_AVX2 to CMakeLists ( #136 )
...
* Check for AVX and AVX2 on Darwin
* Add AVX options to CMakeLists
2022-11-11 18:10:01 +02:00
46a68fb9b5
minor : remove one more redundant line
2022-11-11 18:02:58 +02:00
ccd56a9c5b
minor : fix double float32 conversion in python script
2022-11-11 17:58:51 +02:00
3500ce8727
ref #40 : start working on the documentation
2022-11-09 21:41:40 +02:00
7519eabf65
Adds support for stdin wav input
2022-11-09 20:37:23 +02:00
b21213c23e
js : update whipser.js to latest
2022-11-09 19:33:10 +02:00
9e700e1821
Check for AVX and AVX2 on Darwin
2022-11-09 18:49:55 +02:00
0bfe728b84
Fix the Windows pthread_create shim
...
The current implementation doesn't actually set the out parameter,
and it returns 0 on failure instead of on success.
2022-11-08 15:02:32 +02:00
4e5674a5d5
sync : submodule whisper.spm
2022-11-07 21:48:13 +02:00
4c66b6a828
cmake : add submodule whisper.spm
2022-11-07 20:50:24 +02:00
c30bffc8a5
ref #22 : add "duration" option
...
Can be used to partially process a recording
2022-11-07 20:14:52 +02:00
8fdfb0ba92
Update README.md
2022-11-06 21:04:21 +02:00
c71363f14c
examples : add simple script for generating Karaoke video
2022-11-06 09:22:50 +02:00
a09e9123ca
Update README.md
2022-11-05 08:44:41 +02:00
d42cf6d0df
Update README.md
2022-11-04 22:26:08 +02:00
ef47d77492
main : fix generated bash script
2022-11-04 18:30:38 +02:00
75171c2b79
ggml : multi-thread the ggml_add operator
2022-11-03 20:53:44 +02:00
a2eeb941f6
cmake : fix passing GGML_PERF compile option
2022-11-03 20:19:06 +02:00
0e689f83d8
Update README.md
2022-11-02 22:03:27 +02:00
d5afebd37c
whisper : token-level timestamp refactoring ( #49 , #120 )
...
This turned out pretty good overall. The algorithm has been moved from
main.cpp to whisper.cpp and can be reused for all subtitles types. This
means that now you can specify the maximum length of the generated
lines. Simply provide the "-ml" argument specifying the max length in
number of characters
2022-11-02 21:45:54 +02:00
4b1c32e8ea
Update README.md
2022-11-02 18:33:29 +02:00
b5dde365e9
extra : compute SHA of all models files
2022-11-02 18:31:55 +02:00
02dfd5b8c3
whisper : fix extra memory usage after recent processor changes
...
Had increased the memory buffer to the size of the model and forgot to
bring it down.
2022-11-02 18:31:18 +02:00
c63ce24834
Allow building with Accelerate for x86_64 Macs ( #123 )
...
* Cross compile windows
* set env properly
* rm log
* fix review
* Add back space
* Don't force architecture
* Allow building x86_64 with accelerate
2022-11-02 18:00:19 +02:00
137321915f
ggml : fix the check for NEON support ( #7 )
...
Was using the wrong preprocessor macro
2022-11-02 17:52:24 +02:00
24cd12f647
Cross compilation ( #121 )
...
* Cross compile windows
* set env properly
* rm log
* fix review
* Add back space
2022-11-02 08:46:49 +02:00
e46bc56e71
Update README.md
2022-11-01 22:47:58 +02:00
6fb98370ba
main : add some comments for the word-level timestamp algorithm
2022-11-01 22:35:21 +02:00
0729da9a3b
main : fix some edge cases for word-level timestamps
2022-11-01 22:09:25 +02:00
5dc74e3aff
Update README.md
2022-10-31 22:06:05 +02:00
ac8ef34039
Update README.md
2022-10-31 20:19:41 +02:00
b26345cc7b
Added for Windows implemenated script download-ggml-model.cmd
2022-10-31 19:38:20 +02:00
8dac3c6e10
Fixed sched_yield
2022-10-30 21:38:18 +02:00
6417e59aad
Implemenated sched_yield function for Windows
2022-10-30 21:38:18 +02:00
dc12994603
Update README.md
2022-10-30 17:11:37 +02:00
b0f2aa0ea6
Update README.md
2022-10-30 17:10:46 +02:00