mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-06-04 00:10:41 +00:00

History

Tamotsu Takahashi f18738f247

talk, talk-llama : pass text_to_speak as a file (#1865 )

* talk-llama: pass file instead of arg

it is too hard to quote text in a portable way

* talk-llama: pass heard_ok as a file

* talk-llama: let eleven-labs.py accept options

Options: -v voice, -s savefile, -p (--play)

* talk-llama: check installed commands in "speak"

Pass "-q" to eleven-labs.py to skip checking whether elevenlabs is installed

* talk-llama: pass voice_id again

in order to sync talk with talk-llama

* talk: sync with talk-llama

Passing text_to_speak as a file is safer and more portable
cf. https://stackoverflow.com/a/59036879/45375

* talk and talk-llama: get all installed voices in speak.ps1

* talk and talk-llama: get voices from api

* talk and talk-llama: add more options to eleven-labs.py

and remove DEFAULT_VOICE because it is deprecated (https://www.reddit.com/r/ElevenLabs/comments/1830abt/what_happened_to_bella/)

```
usage: eleven-labs.py [-q] [-l] [-h] [-n NAME | -v NUMBER] [-f KEY=VAL] [-s FILE | -p] [TEXTFILE]

options:
  -q, --quick           skip checking the required library

action:
  TEXTFILE              read the text file (default: stdin)
  -l, --list            show the list of voices and exit
  -h, --help            show this help and exit

voice selection:
  -n NAME, --name NAME  get a voice object by name (default: Arnold)
  -v NUMBER, --voice NUMBER
                        get a voice object by number (see --list)
  -f KEY=VAL, --filter KEY=VAL
                        filter voices by labels (default: "use case=narration")
                        this option can be used multiple times
                        filtering will be disabled if the first -f has no "=" (e.g. -f "any")

output:
  -s FILE, --save FILE  save the TTS to a file (default: audio.mp3)
  -p, --play            play the TTS with ffplay
```

* examples: add speak_with_file()

as suggested in the review

* talk and talk-llama: ignore to_speak.txt

2024-02-24 09:24:47 +02:00

prompts

talk-llama : add alpaca support (#668 )

2023-03-29 23:01:14 +03:00

.gitignore

talk, talk-llama : pass text_to_speak as a file (#1865 )

2024-02-24 09:24:47 +02:00

CMakeLists.txt

ci : build with CLBlast + ggml-opencl use GGML_API (#1576 )

2023-12-29 12:23:27 +02:00

eleven-labs.py

talk, talk-llama : pass text_to_speak as a file (#1865 )

2024-02-24 09:24:47 +02:00

llama.cpp

talk-llama : sync llama.cpp

2024-02-22 23:30:53 +02:00

llama.h

talk-llama : sync llama.cpp

2024-02-22 23:30:53 +02:00

README.md

talk-llama : move up-to-date demo to top (#1417 )

2023-11-02 18:50:13 +02:00

speak

talk, talk-llama : pass text_to_speak as a file (#1865 )

2024-02-24 09:24:47 +02:00

speak.bat

talk, talk-llama : pass text_to_speak as a file (#1865 )

2024-02-24 09:24:47 +02:00

speak.ps1

talk, talk-llama : pass text_to_speak as a file (#1865 )

2024-02-24 09:24:47 +02:00

talk-llama.cpp

talk, talk-llama : pass text_to_speak as a file (#1865 )

2024-02-24 09:24:47 +02:00

unicode.h

talk-llama : sync llama.cpp

2024-02-20 12:09:57 +02:00

README.md

talk-llama

Talk with an LLaMA AI in your terminal

Latest perf as of 2 Nov 2023 using Whisper Medium + LLaMA v2 13B Q8_0 on M2 Ultra:

https://github.com/ggerganov/whisper.cpp/assets/1991296/d97a3788-bf2a-4756-9a43-60c6b391649e

Previous demo running on CPUs

Demo Talk

Building

The talk-llama tool depends on SDL2 library to capture audio from the microphone. You can build it like this:

# Install SDL2 on Linux
sudo apt-get install libsdl2-dev

# Install SDL2 on Mac OS
brew install sdl2

# Build the "talk-llama" executable
make talk-llama

# Run it
./talk-llama -mw ./models/ggml-small.en.bin -ml ../llama.cpp/models/llama-13b/ggml-model-q4_0.gguf -p "Georgi" -t 8

The -mw argument specifies the Whisper model that you would like to use. Recommended base or small for real-time experience
The -ml argument specifies the LLaMA model that you would like to use. Read the instructions in https://github.com/ggerganov/llama.cpp for information about how to obtain a ggml compatible LLaMA model

Session

The talk-llama tool supports session management to enable more coherent and continuous conversations. By maintaining context from previous interactions, it can better understand and respond to user requests in a more natural way.

To enable session support, use the --session FILE command line option when running the program. The talk-llama model state will be saved to the specified file after each interaction. If the file does not exist, it will be created. If the file exists, the model state will be loaded from it, allowing you to resume a previous session.

This feature is especially helpful for maintaining context in long conversations or when interacting with the AI assistant across multiple sessions. It ensures that the assistant remembers the previous interactions and can provide more relevant and contextual responses.

Example usage:

./talk-llama --session ./my-session-file -mw ./models/ggml-small.en.bin -ml ../llama.cpp/models/llama-13b/ggml-model-q4_0.gguf -p "Georgi" -t 8

TTS

For best experience, this example needs a TTS tool to convert the generated text responses to voice. You can use any TTS engine that you would like - simply edit the speak script to your needs. By default, it is configured to use MacOS's say or Windows SpeechSynthesizer, but you can use whatever you wish.

Discussion

If you have any feedback, please let "us" know in the following discussion: https://github.com/ggerganov/whisper.cpp/discussions/672?converting=1