LocalAI/text-to-audio.md at f93fe303500cf2b8f90a6e38dad9bba36303239d

mirror of https://github.com/mudler/LocalAI.git synced 2025-05-06 18:48:24 +00:00

Chakib Benziane b99182c8d4

* update doc on COQUI_LANGUAGE env variable

Signed-off-by: blob42 <contact@blob42.xyz>

* return errors from tts gRPC backend

Signed-off-by: blob42 <contact@blob42.xyz>

* handle speaker_id and language in coqui TTS backend

Signed-off-by: blob42 <contact@blob42.xyz>

* TTS endpoint: add optional language paramter

Signed-off-by: blob42 <contact@blob42.xyz>

* tts fix: empty language string breaks non-multilingual models

Signed-off-by: blob42 <contact@blob42.xyz>

* allow tts param definition in config file

- consolidate TTS options under `tts` config entry

Signed-off-by: blob42 <contact@blob42.xyz>

* tts: update doc

Signed-off-by: blob42 <contact@blob42.xyz>

---------

Signed-off-by: blob42 <contact@blob42.xyz>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

2024-06-01 18:26:27 +00:00

5.7 KiB

Raw Blame History

+++ disableToc = false title = "🗣 Text to audio (TTS)" weight = 11 url = "/features/text-to-audio/" +++

API Compatibility

The LocalAI TTS API is compatible with the OpenAI TTS API and the Elevenlabs API.

LocalAI API

The /tts endpoint can also be used to generate speech from text.

Usage

Input: input, model

For example, to generate an audio file, you can send a POST request to the /tts endpoint with the instruction as the request body:

curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
  "input": "Hello world",
  "model": "tts"
}'

Returns an audio/wav file.

Backends

🐸 Coqui

Required: Don't use LocalAI images ending with the -core tag,. Python dependencies are required in order to use this backend.

Coqui works without any configuration, to test it, you can run the following curl command:

    curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{         
        "backend": "coqui",
        "model": "tts_models/en/ljspeech/glow-tts",
        "input":"Hello, this is a test!"
        }'

You can use the env variable COQUI_LANGUAGE to set the language used by the coqui backend.

You can also use config files to configure tts models (see section below on how to use config files).

Bark

Bark allows to generate audio from text prompts.

This is an extra backend - in the container is already available and there is nothing to do for the setup.

Model setup

There is nothing to be done for the model setup. You can already start to use bark. The models will be downloaded the first time you use the backend.

Usage

Use the tts endpoint by specifying the bark backend:

curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{         
     "backend": "bark",
     "input":"Hello!"
   }' | aplay

To specify a voice from https://github.com/suno-ai/bark#-voice-presets ( https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c ), use the model parameter:

curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{         
     "backend": "bark",
     "input":"Hello!",
     "model": "v2/en_speaker_4"
   }' | aplay

Piper

To install the piper audio models manually:

Download Voices from https://github.com/rhasspy/piper/releases/tag/v0.0.2
Extract the .tar.tgz files (.onnx,.json) inside models
Run the following command to test the model is working

To use the tts endpoint, run the following command. You can specify a backend with the backend parameter. For example, to use the piper backend:

curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
  "model":"it-riccardo_fasol-x-low.onnx",
  "backend": "piper",
  "input": "Ciao, sono Ettore"
}' | aplay

Note:

aplay is a Linux command. You can use other tools to play the audio file.
The model name is the filename with the extension.
The model name is case sensitive.
LocalAI must be compiled with the GO_TAGS=tts flag.

Transformers-musicgen

LocalAI also has experimental support for transformers-musicgen for the generation of short musical compositions. Currently, this is implemented via the same requests used for text to speech:

curl --request POST \
  --url http://localhost:8080/tts \
  --header 'Content-Type: application/json' \
  --data '{
    "backend": "transformers-musicgen",
    "model": "facebook/musicgen-medium",
    "input": "Cello Rave"
}' | aplay

Future versions of LocalAI will expose additional control over audio generation beyond the text prompt.

Vall-E-X

VALL-E-X is an open source implementation of Microsoft's VALL-E X zero-shot TTS model.

Setup

The backend will automatically download the required files in order to run the model.

This is an extra backend - in the container is already available and there is nothing to do for the setup. If you are building manually, you need to install Vall-E-X manually first.

Usage

Use the tts endpoint by specifying the vall-e-x backend:

curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{         
     "backend": "vall-e-x",
     "input":"Hello!"
   }' | aplay

Voice cloning

In order to use voice cloning capabilities you must create a YAML configuration file to setup a model:

name: cloned-voice
backend: vall-e-x
parameters:
  model: "cloned-voice"
tts:
    vall-e:
      # The path to the audio file to be cloned
      # relative to the models directory
      # Max 15s
      audio_path: "audio-sample.wav"

Then you can specify the model name in the requests:

curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{         
     "model": "cloned-voice",
     "input":"Hello!"
   }' | aplay

Parler-tts

parler-tts. It is possible to install and configure the model directly from the gallery. https://github.com/huggingface/parler-tts

Using config files

You can also use a config-file to specify TTS models and their parameters.

In the following example we define a custom config to load the xtts_v2 model, and specify a voice and language.


name: xtts_v2
backend: coqui
parameters:
  language: fr
  model: tts_models/multilingual/multi-dataset/xtts_v2

tts:
  voice: Ana Florence

With this config, you can now use the following curl command to generate a text-to-speech audio file:

curl -L http://localhost:8080/tts \
    -H "Content-Type: application/json" \
    -d '{
"model": "xtts_v2",
"input": "Bonjour, je suis Ana Florence. Comment puis-je vous aider?"
}' | aplay

5.7 KiB Raw Blame History

API Compatibility

LocalAI API

Usage

Backends

🐸 Coqui

Bark

Model setup

Usage

Piper

Transformers-musicgen

Vall-E-X

Setup

Usage

Voice cloning

Parler-tts

Using config files

5.7 KiB

Raw Blame History