Commit Graph

283 Commits

Author SHA1 Message Date
575c53dc41 yt-wsp.sh : fix usage instruction + comment 2022-12-07 21:12:55 +02:00
faa85f9840 livestream.sh : remove obsolete comment 2022-12-07 04:41:43 +02:00
9fe7306f4b models : add the new "large" model release by OpenAI
The old "large" model is now renamed "large-v1".
If you have been using it, make sure to rename it and download the new
"large" model for best results.
2022-12-06 18:48:57 +02:00
57e0e6b700 livestream : handle ffmpeg errors gracefully and stabilize transcript 2022-12-01 20:49:09 +02:00
4f7363077f livestream : minor changes 2022-12-01 19:47:58 +02:00
093c840dee livestream : fix losing words across audio chunk (#195)
* improve livestream script

* Update examples/livestream.sh

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Co-authored-by: Paul Edwards <paul.edwards@semiformal.net>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2022-12-01 19:18:22 +02:00
4698dcdb52 whisper : add mechanism for aborting the whisper_full() computation 2022-11-27 20:42:45 +02:00
164df0d447 whisper.objc : fix context + broken readme links 2022-11-27 10:52:27 +02:00
e266cb0723 whisper.objc : add real-time processing (#97)
Similar to the "stream" app
2022-11-26 18:32:46 +02:00
c207eed431 whisper.objc : fix build warnings 2022-11-26 16:27:04 +02:00
a425365b82 yt-wsp.sh : script to easily transcribe VODs
Thanks to @DaniruKun
ref: https://gist.github.com/DaniruKun/96f763ec1a037cc92fe1a059b643b818

Usage:

  cd whisper.cpp
  make

  ./examples/yt-wsp.sh <video-url>
2022-11-26 12:54:42 +02:00
68ecadbbc9 command.wasm : add voice assistant example for the Web (#171)
Same as the command-line tool "command", but runs in the browser

Also, added helper script "extra/deploy-wasm.sh" and fixed some timing
constants for the WASM examples.
2022-11-26 11:40:06 +02:00
c536ff4005 minor : add comment for using "generate_karaoke.sh" 2022-11-26 10:22:42 +02:00
cb70b07db5 livestream.sh : simple tool to transcribe audio livestreams (#185) 2022-11-26 10:05:37 +02:00
3c390ffe38 stream.wasm : add web-based real-time transcription (#112) 2022-11-25 23:57:46 +02:00
be16dfa038 whisper.wasm : do not block page while processing (close #86) 2022-11-25 23:07:42 +02:00
0f619b52ce main : add stereo-channel-based diarization (#64)
Not tested - I don't have stereo dialog audio
2022-11-25 22:08:58 +02:00
1246dd023e command : add demonstration video 2022-11-25 20:23:58 +02:00
0be27bbd92 command : fix build + fix README + add bold printing 2022-11-25 19:53:50 +02:00
bc88eb13c6 examples : add "command" tool (#171) 2022-11-25 19:36:57 +02:00
b8ce25dec1 refactoring : more readable code 2022-11-25 19:28:04 +02:00
e4805d9601 wasm : refactor wasm example + reuse fetch mechanism 2022-11-24 23:13:26 +02:00
ff36415a86 talk.wasm : update video link + some minor fixes 2022-11-24 20:15:24 +02:00
025ff465b6 Update README.md
Use a less cringy video to demo talk.wasm lol
2022-11-24 20:09:45 +02:00
abce28ea99 talk.wasm : move to https://whisper.ggerganov.com/talk
This way, we can share the same models across different WASM examples
and not have to download them for each page
2022-11-24 18:24:06 +02:00
454b91de16 main : fix dangling pointer when using stdin for input (#65) 2022-11-24 17:53:51 +02:00
d7024cf9dc main, stream : remove --verbose flag (#178) 2022-11-24 17:52:04 +02:00
37422ed733 talk.wasm : add audio pre-processing + bump memory 2022-11-24 00:34:00 +02:00
be3b720f96 talk.wasm : refactoring + update README.md 2022-11-24 00:08:57 +02:00
49706a658a minor : updates few prints + fix buttons in whisper.wasm 2022-11-23 17:19:21 +02:00
e5dcdabbb8 unicode : fix character replacement (thanks to @tamo) 2022-11-23 08:24:29 +02:00
dad109c3f1 close #109 : add fetching of the model over HTTP (whisper.wasm) 2022-11-22 22:48:56 +02:00
326573de9a talk.wasm : final touches 2022-11-22 22:22:17 +02:00
9aea96f774 talk.wasm : polishing + adding many AI personalities 2022-11-22 20:10:20 +02:00
385236d1d3 stream : "-kc" now enables context keeping from previous segment (#90)
By default, the context keeping is disabled
2022-11-22 18:21:15 +02:00
63ae03b8e0 Prompt previous tokens for streaming (#163)
* feat: prompt previous tokens for streaming

I used a vector pointer instead of vector itself because it gave weird errors, and why not

* convert vector to use with C api

* feat: remove old refs, check for prompt size

* feat: use better way of getting the pointer
2022-11-22 18:10:35 +02:00
78116f8eda talk.wasm : update README.md 2022-11-21 22:42:29 +02:00
a4dfbeecf9 talk.wasm : GPT-2 meets Whisper in WebAssembly (#155)
* talk : initial real-time transcription in the browser

* talk : polishing the UI

* talk : ready for beta testing

* talk.wasm : rename example
2022-11-21 22:20:42 +02:00
f2df9bd768 stream : add "max_tokens" cli arg
Controls the max tokens per segment for the stream example
2022-11-20 21:22:41 +02:00
fb8d77f760 stream : add "audio_ctx" parameter
Used to overwrite the audio context size of the Encoder.
For example, setting "audio_ctx = 512" will make it run about 3 times
faster, processing about 10s of audio, instead of 30s.

The transcription quality drops, but this can be used for real-time
streaming purposes where performance is important.
2022-11-20 21:22:41 +02:00
62b5ff875c stream : add "max_tokens" parameter
Used to limit the number of tokens in a segment.
Useful to battle with word repetition when using partial encoder context
2022-11-20 21:22:41 +02:00
d351771a4b stream : add "single_segment" option
Force the entire audio chunk to be transcribed into a single segment
2022-11-20 21:22:41 +02:00
c058aaf22e stream : partial encoder experiments 2022-11-20 21:22:41 +02:00
83c742f1a7 whisper : add option to speed up the audio tempo by x2
Using a Phase Vocoder for speeding up the audio tempo by scaling down
the frequencies in the frequency domain.

This reduces the computation in the Encoder by a factor of 2.
The transcription accuracy is degraded, but for slow to normal speech -
it seems to be still very good.

I think this can find application for real-time transcription - i.e. the
"stream" example.
2022-11-13 16:25:43 +02:00
7519eabf65 Adds support for stdin wav input 2022-11-09 20:37:23 +02:00
c30bffc8a5 ref #22 : add "duration" option
Can be used to partially process a recording
2022-11-07 20:14:52 +02:00
c71363f14c examples : add simple script for generating Karaoke video 2022-11-06 09:22:50 +02:00
d42cf6d0df Update README.md 2022-11-04 22:26:08 +02:00
ef47d77492 main : fix generated bash script 2022-11-04 18:30:38 +02:00
d5afebd37c whisper : token-level timestamp refactoring (#49, #120)
This turned out pretty good overall. The algorithm has been moved from
main.cpp to whisper.cpp and can be reused for all subtitles types. This
means that now you can specify the maximum length of the generated
lines. Simply provide the "-ml" argument specifying the max length in
number of characters
2022-11-02 21:45:54 +02:00