whisper.cpp/examples/whisper.nvim
AustinMroz 97f4a7fee0
examples : add Vim plugin ()
* Initial proof of concept Vim plugin

At present, this is likely only slightly better than feature parity with
the existing whisper.nvim

Known issues:
 Trailing whitespace
 Up to an existing length(5 seconds) of speech may be processed when
  listening is enabled
 CPU cycles are spent processing speech even when not listening.

Fixing these issues is likely dependent upon future efforts to create a
dedicated library instead of wrapping examples/stream

* Support $WHISPER_CPP_HOME environment variable

A minor misunderstanding of the whisper.nvim implementation resulted in
a plugin that was functional, but not a drop in replacement as it should
be now.
2023-07-25 18:34:23 +03:00
..
README.md Update README.md 2022-11-04 22:26:08 +02:00
whisper.nvim models : cd statements are quoted to allow spaces in path () 2023-06-25 15:27:28 +03:00
whisper.vim examples : add Vim plugin () 2023-07-25 18:34:23 +03:00

whisper.nvim

Speech-to-text in Neovim

The transcription is performed on the CPU and no data leaves your computer. Works best on Apple Silicon devices.

https://user-images.githubusercontent.com/1991296/198382564-784e9663-2037-4d04-99b8-f39136929b7e.mp4

Usage

  • Simply press Ctrl-G in INSERT, VISUAL or NORMAL mode and say something
  • When you are done - press Ctrl-C to end the transcription and insert the transcribed text under the cursor

Installation

Note: this is a bit tedious and hacky atm, but I hope it will be improved with time

  • Clone this repo and build the stream tool:

    git clone https://github.com/ggerganov/whisper.cpp
    cd whisper.cpp
    make stream
    
  • Download the base.en Whisper model (140 MB):

    ./models/download-ggml-model.sh base.en
    
  • Place the whisper.nvim script somewhere in your PATH and give it execute permissions:

    cp examples/whisper.nvim/whisper.nvim ~/bin/
    chmod u+x ~/bin/whisper.nvim
    
  • Fine-tune the script to your preference and machine parameters:

    ./stream -t 8 -m models/ggml-base.en.bin --step 350 --length 10000 -f /tmp/whisper.nvim 2> /dev/null
    

    On slower machines, try to increase the step parameter.

  • Add the following shortcuts to your ~/.config/nvim/init.vim:

    inoremap <C-G>  <C-O>:!whisper.nvim<CR><C-O>:let @a = system("cat /tmp/whisper.nvim \| tail -n 1 \| xargs -0 \| tr -d '\\n' \| sed -e 's/^[[:space:]]*//'")<CR><C-R>a
    nnoremap <C-G>       :!whisper.nvim<CR>:let @a = system("cat /tmp/whisper.nvim \| tail -n 1 \| xargs -0 \| tr -d '\\n' \| sed -e 's/^[[:space:]]*//'")<CR>"ap
    vnoremap <C-G> c<C-O>:!whisper.nvim<CR><C-O>:let @a = system("cat /tmp/whisper.nvim \| tail -n 1 \| xargs -0 \| tr -d '\\n' \| sed -e 's/^[[:space:]]*//'")<CR><C-R>a
    

    Explanation: pressing Ctrl-G runs the whisper.nvim script which in turn calls the stream binary to transcribe your speech through the microphone. The results from the transcription are continuously dumped into /tmp/whisper.nvim. After you kill the program with Ctrl-C, the vim command grabs the last line from the /tmp/whisper.nvim file and puts it under the cursor.

    Probably there is a much more intelligent way to achieve all this, but this is what I could hack in an hour. Any suggestions how to improve this are welcome.

You are now ready to use speech-to-text in Neovim!

TODO

There are a lot of ways to improve this idea and I don't have much experience with Vim plugin programming, so contributions are welcome!

Discussion

If you find this idea interesting, you can join the discussion here: https://github.com/ggerganov/whisper.cpp/discussions/108