Commit Graph

407 Commits

Author SHA1 Message Date
Georgi Gerganov
3c390ffe38
stream.wasm : add web-based real-time transcription (#112) 2022-11-25 23:57:46 +02:00
Georgi Gerganov
be16dfa038
whisper.wasm : do not block page while processing (close #86) 2022-11-25 23:07:42 +02:00
Georgi Gerganov
0f619b52ce
main : add stereo-channel-based diarization (#64)
Not tested - I don't have stereo dialog audio
2022-11-25 22:08:58 +02:00
Georgi Gerganov
1246dd023e
command : add demonstration video 2022-11-25 20:23:58 +02:00
Georgi Gerganov
0be27bbd92
command : fix build + fix README + add bold printing 2022-11-25 19:53:50 +02:00
Georgi Gerganov
bc88eb13c6
examples : add "command" tool (#171) 2022-11-25 19:36:57 +02:00
Georgi Gerganov
b8ce25dec1
refactoring : more readable code 2022-11-25 19:28:04 +02:00
vicalloy
fd113687aa correct model name display on running samples 2022-11-25 07:17:02 +02:00
Georgi Gerganov
e4805d9601
wasm : refactor wasm example + reuse fetch mechanism 2022-11-24 23:13:26 +02:00
Georgi Gerganov
ff36415a86
talk.wasm : update video link + some minor fixes 2022-11-24 20:15:24 +02:00
Georgi Gerganov
025ff465b6
Update README.md
Use a less cringy video to demo talk.wasm lol
2022-11-24 20:09:45 +02:00
Georgi Gerganov
2c0501b38a
Update README.md 2022-11-24 20:06:51 +02:00
Georgi Gerganov
abce28ea99
talk.wasm : move to https://whisper.ggerganov.com/talk
This way, we can share the same models across different WASM examples
and not have to download them for each page
2022-11-24 18:24:06 +02:00
Georgi Gerganov
a2ecd54455
models : add instructions for using HF fine-tuned models 2022-11-24 17:54:41 +02:00
Georgi Gerganov
128aaadb93
whisper : improve printfs 2022-11-24 17:54:16 +02:00
Georgi Gerganov
454b91de16
main : fix dangling pointer when using stdin for input (#65) 2022-11-24 17:53:51 +02:00
Georgi Gerganov
d7024cf9dc
main, stream : remove --verbose flag (#178) 2022-11-24 17:52:04 +02:00
Georgi Gerganov
37422ed733
talk.wasm : add audio pre-processing + bump memory 2022-11-24 00:34:00 +02:00
Georgi Gerganov
be3b720f96
talk.wasm : refactoring + update README.md 2022-11-24 00:08:57 +02:00
Georgi Gerganov
00f46dbc1d
models : add usage comments to the HF convert script (#157) 2022-11-23 23:22:40 +02:00
Georgi Gerganov
5698bddbc9
models : fix HF fine-tuned model conversion script (#157)
It works now
2022-11-23 23:14:11 +02:00
Georgi Gerganov
388e9f79ad
ggml : fix the fix 2022-11-23 22:40:06 +02:00
Georgi Gerganov
35cd29ce1f
ggml : fix cross-compile Linux -> Window with mingw (#168) 2022-11-23 22:28:41 +02:00
Georgi Gerganov
a156a358ca
Revert "update README.md"
This reverts commit 6a84147113.
2022-11-23 22:16:50 +02:00
katsu560
6a84147113 update README.md 2022-11-23 22:16:33 +02:00
katsu560
804f36aa2c ggml: change inline ggml_fp16_to_fp32, ggml_fp16_t ggml_fp32_to_fp16 2022-11-23 22:16:33 +02:00
katsu560
4b2f51b479 add gprof option 2022-11-23 22:16:33 +02:00
katsu560
800ae5b808 fix AVX,AVX2,FMA,F16C detection on Linux and add flags for OpenBLAS 2022-11-23 22:16:33 +02:00
katsu560
83456076f0 add AVX support 2022-11-23 22:16:33 +02:00
Tamotsu Takahashi
3df6c14fca Build with OpenBLAS and SDL2 on windows 2022-11-23 22:09:54 +02:00
Georgi Gerganov
d64d6ca3fd
models : minor changes to the HF convert script (#157) 2022-11-23 22:07:20 +02:00
Georgi Gerganov
93482d0373
models : add "convert-h5-to-ggml.py" script (#157)
Converts transformers models to ggml.
Although the conversion is successful, it does not work for some reason.
Not sure why
2022-11-23 17:19:22 +02:00
Georgi Gerganov
49706a658a
minor : updates few prints + fix buttons in whisper.wasm 2022-11-23 17:19:21 +02:00
Georgi Gerganov
363a2dadec
Update README.md 2022-11-23 09:53:55 +02:00
Georgi Gerganov
623a486056
Update README.md 2022-11-23 09:52:36 +02:00
Tamotsu Takahashi
2f596f5b33 Find libopenblas.dll.a on windows
"lib" is needed for windows.

With this change, you can build whisper.cpp with OpenBLAS's prebuilt DLL.
1. extract a zip from https://github.com/xianyi/OpenBLAS/releases
2. copy the headers in (openblas)/include to the root directory of whisper.cpp
3. invoke cmake with -DCMAKE_LIBRARY_PATH=(openblas)\lib -DWHISPER_SUPPORT_OPENBLAS=ON
4. copy (openblas)/bin/libopenblas.dll to the same directory of whisper.dll after msbuild

https://github.com/ggerganov/whisper.cpp/issues/89#issuecomment-1324391258
2022-11-23 08:26:45 +02:00
Georgi Gerganov
e5dcdabbb8
unicode : fix character replacement (thanks to @tamo) 2022-11-23 08:24:29 +02:00
Georgi Gerganov
dad109c3f1
close #109 : add fetching of the model over HTTP (whisper.wasm) 2022-11-22 22:48:56 +02:00
Georgi Gerganov
326573de9a
talk.wasm : final touches 2022-11-22 22:22:17 +02:00
Georgi Gerganov
9aea96f774
talk.wasm : polishing + adding many AI personalities 2022-11-22 20:10:20 +02:00
Georgi Gerganov
385236d1d3
stream : "-kc" now enables context keeping from previous segment (#90)
By default, the context keeping is disabled
2022-11-22 18:21:15 +02:00
M. Eren Akbiyik
63ae03b8e0
Prompt previous tokens for streaming (#163)
* feat: prompt previous tokens for streaming

I used a vector pointer instead of vector itself because it gave weird errors, and why not

* convert vector to use with C api

* feat: remove old refs, check for prompt size

* feat: use better way of getting the pointer
2022-11-22 18:10:35 +02:00
Georgi Gerganov
78116f8eda
talk.wasm : update README.md 2022-11-21 22:42:29 +02:00
Georgi Gerganov
a4dfbeecf9
talk.wasm : GPT-2 meets Whisper in WebAssembly (#155)
* talk : initial real-time transcription in the browser

* talk : polishing the UI

* talk : ready for beta testing

* talk.wasm : rename example
2022-11-21 22:20:42 +02:00
Georgi Gerganov
2e311a2917
Update README.md 2022-11-21 18:52:20 +02:00
Georgi Gerganov
2065572a11 ggml : fix Windows build 2022-11-20 22:47:03 +02:00
Georgi Gerganov
5c2176e314 ci : add Windows build 2022-11-20 22:47:03 +02:00
Georgi Gerganov
f2df9bd768 stream : add "max_tokens" cli arg
Controls the max tokens per segment for the stream example
2022-11-20 21:22:41 +02:00
Georgi Gerganov
fb8d77f760 stream : add "audio_ctx" parameter
Used to overwrite the audio context size of the Encoder.
For example, setting "audio_ctx = 512" will make it run about 3 times
faster, processing about 10s of audio, instead of 30s.

The transcription quality drops, but this can be used for real-time
streaming purposes where performance is important.
2022-11-20 21:22:41 +02:00
Georgi Gerganov
62b5ff875c stream : add "max_tokens" parameter
Used to limit the number of tokens in a segment.
Useful to battle with word repetition when using partial encoder context
2022-11-20 21:22:41 +02:00