49706a658a
minor : updates few prints + fix buttons in whisper.wasm
2022-11-23 17:19:21 +02:00
e5dcdabbb8
unicode : fix character replacement (thanks to @tamo)
2022-11-23 08:24:29 +02:00
dad109c3f1
close #109 : add fetching of the model over HTTP (whisper.wasm)
2022-11-22 22:48:56 +02:00
326573de9a
talk.wasm : final touches
2022-11-22 22:22:17 +02:00
9aea96f774
talk.wasm : polishing + adding many AI personalities
2022-11-22 20:10:20 +02:00
385236d1d3
stream : "-kc" now enables context keeping from previous segment ( #90 )
...
By default, the context keeping is disabled
2022-11-22 18:21:15 +02:00
63ae03b8e0
Prompt previous tokens for streaming ( #163 )
...
* feat: prompt previous tokens for streaming
I used a vector pointer instead of vector itself because it gave weird errors, and why not
* convert vector to use with C api
* feat: remove old refs, check for prompt size
* feat: use better way of getting the pointer
2022-11-22 18:10:35 +02:00
78116f8eda
talk.wasm : update README.md
2022-11-21 22:42:29 +02:00
a4dfbeecf9
talk.wasm : GPT-2 meets Whisper in WebAssembly ( #155 )
...
* talk : initial real-time transcription in the browser
* talk : polishing the UI
* talk : ready for beta testing
* talk.wasm : rename example
2022-11-21 22:20:42 +02:00
f2df9bd768
stream : add "max_tokens" cli arg
...
Controls the max tokens per segment for the stream example
2022-11-20 21:22:41 +02:00
fb8d77f760
stream : add "audio_ctx" parameter
...
Used to overwrite the audio context size of the Encoder.
For example, setting "audio_ctx = 512" will make it run about 3 times
faster, processing about 10s of audio, instead of 30s.
The transcription quality drops, but this can be used for real-time
streaming purposes where performance is important.
2022-11-20 21:22:41 +02:00
62b5ff875c
stream : add "max_tokens" parameter
...
Used to limit the number of tokens in a segment.
Useful to battle with word repetition when using partial encoder context
2022-11-20 21:22:41 +02:00
d351771a4b
stream : add "single_segment" option
...
Force the entire audio chunk to be transcribed into a single segment
2022-11-20 21:22:41 +02:00
c058aaf22e
stream : partial encoder experiments
2022-11-20 21:22:41 +02:00
83c742f1a7
whisper : add option to speed up the audio tempo by x2
...
Using a Phase Vocoder for speeding up the audio tempo by scaling down
the frequencies in the frequency domain.
This reduces the computation in the Encoder by a factor of 2.
The transcription accuracy is degraded, but for slow to normal speech -
it seems to be still very good.
I think this can find application for real-time transcription - i.e. the
"stream" example.
2022-11-13 16:25:43 +02:00
7519eabf65
Adds support for stdin wav input
2022-11-09 20:37:23 +02:00
c30bffc8a5
ref #22 : add "duration" option
...
Can be used to partially process a recording
2022-11-07 20:14:52 +02:00
c71363f14c
examples : add simple script for generating Karaoke video
2022-11-06 09:22:50 +02:00
d42cf6d0df
Update README.md
2022-11-04 22:26:08 +02:00
ef47d77492
main : fix generated bash script
2022-11-04 18:30:38 +02:00
d5afebd37c
whisper : token-level timestamp refactoring ( #49 , #120 )
...
This turned out pretty good overall. The algorithm has been moved from
main.cpp to whisper.cpp and can be reused for all subtitles types. This
means that now you can specify the maximum length of the generated
lines. Simply provide the "-ml" argument specifying the max length in
number of characters
2022-11-02 21:45:54 +02:00
6fb98370ba
main : add some comments for the word-level timestamp algorithm
2022-11-01 22:35:21 +02:00
0729da9a3b
main : fix some edge cases for word-level timestamps
2022-11-01 22:09:25 +02:00
5dc74e3aff
Update README.md
2022-10-31 22:06:05 +02:00
ac8ef34039
Update README.md
2022-10-31 20:19:41 +02:00
dc12994603
Update README.md
2022-10-30 17:11:37 +02:00
57fb46f307
main : add option for word-leve timestamps (very experimental)
2022-10-30 17:06:57 +02:00
5a9e4260a6
stream : add "--capture" option to select capture device (ref #10 )
2022-10-30 08:27:04 +02:00
12fb303d9d
whisper.wasm : update system info print
2022-10-29 20:32:41 +03:00
2827cbbbe8
main : merge parallel example in main
2022-10-29 19:37:19 +03:00
0b2dc3c82c
parallel : working
2022-10-29 19:37:19 +03:00
85d6e1e1e7
main : fix sampling time + add max_context parameter
2022-10-29 19:37:19 +03:00
72e9cdd6bf
parallel : adding tool for parallel transformer inference
2022-10-29 19:37:19 +03:00
b89f8960ca
Update README.md
2022-10-28 21:40:52 +03:00
6f82320b05
Create README.md
2022-10-28 20:25:37 +03:00
2298310dd8
whisper.nvim : add helper script for the Neovim integration
2022-10-28 20:25:37 +03:00
8347a7bb6a
stream : few updates to make it compatible for Vim usage ( #99 )
2022-10-27 22:10:50 +03:00
ebb01b9e33
Print system info at start of program
2022-10-27 17:22:19 +03:00
2400660f3f
Print system info in main
2022-10-26 22:54:09 +03:00
a6c786d5dc
Update README.md
2022-10-25 20:53:48 +03:00
91dcf5f35b
Update README.md
2022-10-25 20:53:48 +03:00
113a4f06d8
Update README.md
2022-10-25 20:53:48 +03:00
47e78b7288
Update README.md
2022-10-25 20:53:48 +03:00
34bb3ab0cf
ggml : add system info functions
2022-10-25 20:53:48 +03:00
c6710efde2
refactoring : move main + stream in examples + other stuff
2022-10-25 20:53:48 +03:00
d4f94ce427
Update README.md
2022-10-24 18:23:07 +03:00
a52ee08c1e
objc : polishing the sample application
2022-10-24 18:23:07 +03:00
b41f4a90eb
Create README.md
2022-10-24 18:23:07 +03:00
bb1ee266d2
ios : whisper.objc example
2022-10-24 18:23:07 +03:00
3e69a6071d
Update README.md
2022-10-23 08:04:33 +03:00