mirror of https://github.com/ggerganov/whisper.cpp.git synced 2024-12-24 14:46:39 +00:00

History

AustinMroz 97f4a7fee0 examples : add Vim plugin (#1131 ) * Initial proof of concept Vim plugin At present, this is likely only slightly better than feature parity with the existing whisper.nvim Known issues: Trailing whitespace Up to an existing length(5 seconds) of speech may be processed when listening is enabled CPU cycles are spent processing speech even when not listening. Fixing these issues is likely dependent upon future efforts to create a dedicated library instead of wrapping examples/stream * Support $WHISPER_CPP_HOME environment variable A minor misunderstanding of the whisper.nvim implementation resulted in a plugin that was functional, but not a drop in replacement as it should be now.		2023-07-25 18:34:23 +03:00
..
README.md	Update README.md	2022-11-04 22:26:08 +02:00
whisper.nvim	models : cd statements are quoted to allow spaces in path (#1041 )	2023-06-25 15:27:28 +03:00
whisper.vim	examples : add Vim plugin (#1131 )	2023-07-25 18:34:23 +03:00

README.md

whisper.nvim

Speech-to-text in Neovim

The transcription is performed on the CPU and no data leaves your computer. Works best on Apple Silicon devices.

https://user-images.githubusercontent.com/1991296/198382564-784e9663-2037-4d04-99b8-f39136929b7e.mp4

Usage

Simply press Ctrl-G in INSERT, VISUAL or NORMAL mode and say something
When you are done - press Ctrl-C to end the transcription and insert the transcribed text under the cursor

Installation

Note: this is a bit tedious and hacky atm, but I hope it will be improved with time

Clone this repo and build the stream tool:

git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
make stream

Download the base.en Whisper model (140 MB):
```
./models/download-ggml-model.sh base.en
```
Place the whisper.nvim script somewhere in your PATH and give it execute permissions:
```
cp examples/whisper.nvim/whisper.nvim ~/bin/
chmod u+x ~/bin/whisper.nvim
```
Fine-tune the script to your preference and machine parameters:
```
./stream -t 8 -m models/ggml-base.en.bin --step 350 --length 10000 -f /tmp/whisper.nvim 2> /dev/null
```
On slower machines, try to increase the step parameter.
Add the following shortcuts to your ~/.config/nvim/init.vim:
```
inoremap <C-G>  <C-O>:!whisper.nvim<CR><C-O>:let @a = system("cat /tmp/whisper.nvim \| tail -n 1 \| xargs -0 \| tr -d '\\n' \| sed -e 's/^[[:space:]]*//'")<CR><C-R>a
nnoremap <C-G>       :!whisper.nvim<CR>:let @a = system("cat /tmp/whisper.nvim \| tail -n 1 \| xargs -0 \| tr -d '\\n' \| sed -e 's/^[[:space:]]*//'")<CR>"ap
vnoremap <C-G> c<C-O>:!whisper.nvim<CR><C-O>:let @a = system("cat /tmp/whisper.nvim \| tail -n 1 \| xargs -0 \| tr -d '\\n' \| sed -e 's/^[[:space:]]*//'")<CR><C-R>a
```
Explanation: pressing Ctrl-G runs the whisper.nvim script which in turn calls the stream binary to transcribe your speech through the microphone. The results from the transcription are continuously dumped into /tmp/whisper.nvim. After you kill the program with Ctrl-C, the vim command grabs the last line from the /tmp/whisper.nvim file and puts it under the cursor.

Probably there is a much more intelligent way to achieve all this, but this is what I could hack in an hour. Any suggestions how to improve this are welcome.

You are now ready to use speech-to-text in Neovim!

TODO

There are a lot of ways to improve this idea and I don't have much experience with Vim plugin programming, so contributions are welcome!

Wrap this into a plugin

It would be great to make a standalone plugin out of this that can be installed with vim-plug or similar
Simplify the init.vim mappings (maybe factor out the common call into a separate function)
Add Copilot/GPT-3 integration

This is probably a very long shot, but I think it will be very cool to have the functionality to select some code and then hit Ctrl-G and say something like:

"refactor this using stl containers"

or

"optimize by sorting the data first"

The plugin would then make an appropriate query using the selected text and code context to Copilot or GPT-3 and return the result.

Here is a proof-of-concept:

https://user-images.githubusercontent.com/1991296/199078847-0278fcde-5667-4748-ba0d-7d55381d6047.mp4

https://user-images.githubusercontent.com/1991296/200067939-f98d2ac2-7519-438a-85f9-79db0841ba4f.mp4

For explanation how this works see: https://twitter.com/ggerganov/status/1587168771789258756

Discussion

If you find this idea interesting, you can join the discussion here: https://github.com/ggerganov/whisper.cpp/discussions/108