mirror of
https://github.com/mudler/LocalAI.git
synced 2025-01-01 10:26:39 +00:00
8b6e601405
Transformers-MusicGen --------- Signed-off-by: Dave <dave@gray101.com>
77 lines
2.3 KiB
Markdown
77 lines
2.3 KiB
Markdown
|
|
+++
|
|
disableToc = false
|
|
title = "🗣 Text to audio (TTS)"
|
|
weight = 2
|
|
+++
|
|
|
|
The `/tts` endpoint can be used to generate speech from text.
|
|
|
|
Input: `input`, `model`
|
|
|
|
For example, to generate an audio file, you can send a POST request to the `/tts` endpoint with the instruction as the request body:
|
|
|
|
```bash
|
|
curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
|
|
"input": "Hello world",
|
|
"model": "tts"
|
|
}'
|
|
```
|
|
|
|
Returns an `audio/wav` file.
|
|
|
|
#### Setup
|
|
|
|
LocalAI supports [bark]({{%relref "model-compatibility/bark" %}}) , `piper` and `vall-e-x`:
|
|
|
|
{{% notice note %}}
|
|
|
|
The `piper` backend is used for `onnx` models and requires the modules to be downloaded first.
|
|
|
|
To install the `piper` audio models manually:
|
|
|
|
- Download Voices from https://github.com/rhasspy/piper/releases/tag/v0.0.2
|
|
- Extract the `.tar.tgz` files (.onnx,.json) inside `models`
|
|
- Run the following command to test the model is working
|
|
|
|
{{% /notice %}}
|
|
|
|
To use the tts endpoint, run the following command. You can specify a backend with the `backend` parameter. For example, to use the `piper` backend:
|
|
```bash
|
|
curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
|
|
"model":"it-riccardo_fasol-x-low.onnx",
|
|
"backend": "piper",
|
|
"input": "Ciao, sono Ettore"
|
|
}' | aplay
|
|
```
|
|
|
|
Note:
|
|
|
|
- `aplay` is a Linux command. You can use other tools to play the audio file.
|
|
- The model name is the filename with the extension.
|
|
- The model name is case sensitive.
|
|
- LocalAI must be compiled with the `GO_TAGS=tts` flag.
|
|
|
|
LocalAI also has experimental support for `transformers-musicgen` for the generation of short musical compositions. Currently, this is implemented via the same requests used for text to speech:
|
|
|
|
```
|
|
curl --request POST \
|
|
--url http://localhost:8080/tts \
|
|
--header 'Content-Type: application/json' \
|
|
--data '{
|
|
"backend": "transformers-musicgen",
|
|
"model": "facebook/musicgen-medium",
|
|
"input": "Cello Rave"
|
|
}' | aplay```
|
|
|
|
Future versions of LocalAI will expose additional control over audio generation beyond the text prompt.
|
|
|
|
#### Configuration
|
|
|
|
Audio models can be configured via `YAML` files. This allows to configure specific setting for each backend. For instance, backends might be specifying a voice or supports voice cloning which must be specified in the configuration file.
|
|
|
|
```yaml
|
|
name: tts
|
|
backend: vall-e-x
|
|
parameters: ...
|
|
``` |