chore: update Image generation docs and examples (#4841)

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-06-05 16:51:36 +00:00 · 2025-02-17 16:51:06 +01:00 · 2025-02-17 16:51:06 +01:00 · f3ae94ca70
commit f3ae94ca70
parent 09c9f67a02
4 changed files with 50 additions and 141 deletions
--- a/docs/content/docs/features/image-generation.md
+++ b/docs/content/docs/features/image-generation.md
@ -38,98 +38,40 @@ curl http://localhost:8080/v1/images/generations -H "Content-Type: application/j

 ## Backends

-### stablediffusion-cpp
+### stablediffusion-ggml

-| mode=0                                                                                                                | mode=1 (winograd/sgemm)                                                                                                                |
-|------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------|
-| ![test](https://github.com/go-skynet/LocalAI/assets/2420543/7145bdee-4134-45bb-84d4-f11cb08a5638)                      | ![b643343452981](https://github.com/go-skynet/LocalAI/assets/2420543/abf14de1-4f50-4715-aaa4-411d703a942a)          |
-| ![b6441997879](https://github.com/go-skynet/LocalAI/assets/2420543/d50af51c-51b7-4f39-b6c2-bf04c403894c)              | ![winograd2](https://github.com/go-skynet/LocalAI/assets/2420543/1935a69a-ecce-4afc-a099-1ac28cb649b3)                |
-| ![winograd](https://github.com/go-skynet/LocalAI/assets/2420543/1979a8c4-a70d-4602-95ed-642f382f6c6a)                | ![winograd3](https://github.com/go-skynet/LocalAI/assets/2420543/e6d184d4-5002-408f-b564-163986e1bdfb)                |
+This backend is based on [stable-diffusion.cpp](https://github.com/leejet/stable-diffusion.cpp). Every model supported by that backend is suppoerted indeed with LocalAI.

-Note: image generator supports images up to 512x512. You can use other tools however to upscale the image, for instance: https://github.com/upscayl/upscayl.

 #### Setup

-Note: In order to use the `images/generation` endpoint with the `stablediffusion` C++ backend, you need to build LocalAI with `GO_TAGS=stablediffusion`. If you are using the container images, it is already enabled.
-
-{{< tabs >}}
-{{% tab name="Prepare the model in runtime" %}}
-
-While the API is running, you can install the model by using the `/models/apply` endpoint and point it to the `stablediffusion` model in the [models-gallery](https://github.com/go-skynet/model-gallery#image-generation-stable-diffusion):
+There are already several models in the gallery that are available to install and get up and running with this backend, you can for example run flux by searching it in the Model gallery (`flux.1-dev-ggml`) or start LocalAI with `run`:

 ```bash
-curl http://localhost:8080/models/apply -H "Content-Type: application/json" -d '{
-  "url": "github:go-skynet/model-gallery/stablediffusion.yaml"
-}'
+local-ai run flux.1-dev-ggml
 ```

-{{% /tab %}}
-{{% tab name="Automatically prepare the model before start" %}}
-
-You can set the `PRELOAD_MODELS` environment variable:
-
-```bash
-PRELOAD_MODELS=[{"url": "github:go-skynet/model-gallery/stablediffusion.yaml"}]
-```
-
-or as arg:
-
-```bash
-local-ai --preload-models '[{"url": "github:go-skynet/model-gallery/stablediffusion.yaml"}]'
-```
-
-or in a YAML file:
-
-```bash
-local-ai --preload-models-config "/path/to/yaml"
-```
-
-YAML:
-
-```yaml
- url: github:go-skynet/model-gallery/stablediffusion.yaml
-```
-
-{{% /tab %}}
-{{% tab name="Install manually" %}}
+To use a custom model, you can follow these steps:

 1. Create a model file `stablediffusion.yaml` in the models folder:

 ```yaml
 name: stablediffusion
-backend: stablediffusion
+backend: stablediffusion-ggml
 parameters:
-  model: stablediffusion_assets
+  model: gguf_model.gguf
+step: 25
+cfg_scale: 4.5
+options:
+- "clip_l_path:clip_l.safetensors"
+- "clip_g_path:clip_g.safetensors"
+- "t5xxl_path:t5xxl-Q5_0.gguf"
+- "sampler:euler"
 ```

-2. Create a `stablediffusion_assets` directory inside your `models` directory
-3. Download the ncnn assets from https://github.com/EdVince/Stable-Diffusion-NCNN#out-of-box and place them in `stablediffusion_assets`.
+2. Download the required assets to the `models` repository
+3. Start LocalAI

-The models directory should look like the following:
-
-```bash
-models
-├── stablediffusion_assets
-│   ├── AutoencoderKL-256-256-fp16-opt.param
-│   ├── AutoencoderKL-512-512-fp16-opt.param
-│   ├── AutoencoderKL-base-fp16.param
-│   ├── AutoencoderKL-encoder-512-512-fp16.bin
-│   ├── AutoencoderKL-fp16.bin
-│   ├── FrozenCLIPEmbedder-fp16.bin
-│   ├── FrozenCLIPEmbedder-fp16.param
-│   ├── log_sigmas.bin
-│   ├── tmp-AutoencoderKL-encoder-256-256-fp16.param
-│   ├── UNetModel-256-256-MHA-fp16-opt.param
-│   ├── UNetModel-512-512-MHA-fp16-opt.param
-│   ├── UNetModel-base-MHA-fp16.param
-│   ├── UNetModel-MHA-fp16.bin
-│   └── vocab.txt
-└── stablediffusion.yaml
-```
-
-{{% /tab %}}
-
-{{< /tabs >}}

 ### Diffusers

@ -213,6 +155,9 @@ The following parameters are available in the configuration file:
 | `cfg_scale` | Configuration scale | `8` |
 | `clip_skip` | Clip skip | None |
 | `pipeline_type` | Pipeline type | `AutoPipelineForText2Image` |
+| `lora_adapters` | A list of lora adapters (file names relative to model directory) to apply | None |
+| `lora_scales` | A list of lora scales (floats) to apply | None |
+

 There are available several types of schedulers:

@ -246,6 +191,36 @@ Pipelines types available:
 | `StableDiffusionDepth2ImgPipeline` | Stable diffusion depth to image pipeline |
 | `DiffusionPipeline` | Diffusion pipeline |
 | `StableDiffusionXLPipeline` | Stable diffusion XL pipeline |
+| `StableVideoDiffusionPipeline` | Stable video diffusion pipeline |
+| `AutoPipelineForText2Image` | Automatic detection pipeline for text to image |
+| `VideoDiffusionPipeline` | Video diffusion pipeline |
+| `StableDiffusion3Pipeline` | Stable diffusion 3 pipeline |
+| `FluxPipeline` | Flux pipeline |
+| `FluxTransformer2DModel` | Flux transformer 2D model |
+| `SanaPipeline` | Sana pipeline |
+
+##### Advanced: Additional parameters
+
+Additional arbitrarly parameters can be specified in the option field in key/value separated by `:`:
+
+```yaml
+name: animagine-xl
+# ...
+options:
+- "cfg_scale:6"
+```
+
+**Note**: There is no complete parameter list. Any parameter can be passed arbitrarly and is passed to the model directly as argument to the pipeline. Different pipelines/implementations support different parameters.
+
+The example above, will result in the following python code when generating images:
+
+```python
+pipe(
+    prompt="A cute baby sea otter", # Options passed via API
+    size="256x256", # Options passed via API
+    cfg_scale=6 # Additional parameter passed via configuration file
+)
+```

 #### Usage

--- a/docs/content/docs/reference/compatibility-table.md
+++ b/docs/content/docs/reference/compatibility-table.md
@ -17,27 +17,20 @@ LocalAI will attempt to automatically load models which are not explicitly confi
 | Backend and Bindings                                                             | Compatible models     | Completion/Chat endpoint | Capability | Embeddings support                | Token stream support | Acceleration |
 |----------------------------------------------------------------------------------|-----------------------|--------------------------|---------------------------|-----------------------------------|----------------------|--------------|
 | [llama.cpp]({{%relref "docs/features/text-generation#llama.cpp" %}})        | LLama, Mamba, RWKV, Falcon, Starcoder, GPT-2, [and many others](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#description) | yes                      | GPT and Functions                        | yes | yes                  | CUDA, openCL, cuBLAS, Metal |
-| [llama.cpp's ggml model (backward compatibility with old format, before GGUF)](https://github.com/ggerganov/llama.cpp) ([binding](https://github.com/go-skynet/go-llama.cpp))  | LLama, GPT-2, [and many others](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#description) | yes                      | GPT and Functions                        | yes | yes                  | CUDA, openCL, cuBLAS, Metal |
 | [whisper](https://github.com/ggerganov/whisper.cpp)         | whisper               | no                       | Audio                 | no                                | no                   | N/A |
-| [stablediffusion](https://github.com/EdVince/Stable-Diffusion-NCNN) ([binding](https://github.com/mudler/go-stable-diffusion))        | stablediffusion               | no                       | Image                 | no                                | no                   | N/A |
 | [langchain-huggingface](https://github.com/tmc/langchaingo)                                                                    | Any text generators available on HuggingFace through API | yes                      | GPT                        | no                                | no                   | N/A |
 | [piper](https://github.com/rhasspy/piper) ([binding](https://github.com/mudler/go-piper))                                                                     | Any piper onnx model | no                      | Text to voice                        | no                                | no                   | N/A |
 | [sentencetransformers](https://github.com/UKPLab/sentence-transformers) | BERT                   | no                       | Embeddings only                  | yes                               | no                   | N/A |
 | `bark`  | bark                   | no                       | Audio generation                  | no                               | no                   | yes |
 | `autogptq` | GPTQ                   | yes                       | GPT                  | yes                               | no                   | N/A |
-| `exllama`  | GPTQ                   | yes                       | GPT only                  | no                               | no                   | N/A |
 | `diffusers`  | SD,...                   | no                       | Image generation    | no                               | no                   | N/A |
-| `vall-e-x` | Vall-E    | no                       | Audio generation and Voice cloning    | no                               | no                   | CPU/CUDA |
 | `vllm` | Various GPTs and quantization formats | yes                      | GPT             | no | no                  | CPU/CUDA |
-| `mamba` | Mamba models architecture | yes                      | GPT             | no | no                  | CPU/CUDA |
 | `exllama2`  | GPTQ                   | yes                       | GPT only                  | no                               | no                   | N/A |
 | `transformers-musicgen`  |                    | no                       | Audio generation                | no                               | no                   | N/A |
 | stablediffusion               | no                       | Image                 | no                                | no                   | N/A |
 | `coqui` | Coqui    | no                       | Audio generation and Voice cloning    | no                               | no                   | CPU/CUDA |
-| `openvoice` | Open voice    | no                       | Audio generation and Voice cloning    | no                               | no                   | CPU/CUDA |
-| `parler-tts` | Open voice    | no                       | Audio generation and Voice cloning    | no                               | no                   | CPU/CUDA |
 | [rerankers](https://github.com/AnswerDotAI/rerankers) | Reranking API    | no                       | Reranking   | no                               | no                   | CPU/CUDA |
-| `transformers` | Various GPTs and quantization formats | yes                      | GPT, embeddings            | yes | yes*                  | CPU/CUDA/XPU |
+| `transformers` | Various GPTs and quantization formats  | yes                      | GPT, embeddings, Audio generation            | yes | yes*                  | CPU/CUDA/XPU |
 | [bark-cpp](https://github.com/PABannier/bark.cpp)        | bark               | no                       | Audio-Only                 | no                                | no                   | yes |
 | [stablediffusion-cpp](https://github.com/leejet/stable-diffusion.cpp)         | stablediffusion-1, stablediffusion-2, stablediffusion-3, flux, PhotoMaker               | no                       | Image                 | no                                | no                   | N/A |
 | [silero-vad](https://github.com/snakers4/silero-vad) with [Golang bindings](https://github.com/streamer45/silero-vad-go) | Silero VAD    | no                       | Voice Activity Detection    | no                               | no                   | CPU |
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@ -12340,16 +12340,6 @@
    embeddings: true
    parameters:
      model: llama-3.2-1b-instruct-q4_k_m.gguf
-## Stable Diffusion
- url: github:mudler/LocalAI/gallery/stablediffusion.yaml@master
-  license: "BSD-3"
-  urls:
-    - https://github.com/EdVince/Stable-Diffusion-NCNN
-    - https://github.com/EdVince/Stable-Diffusion-NCNN/blob/main/LICENSE
-  description: |
-    Stable Diffusion in NCNN with c++, supported txt2img and img2img
-  name: stablediffusion-cpp
-  icon: https://avatars.githubusercontent.com/u/100950301
 - &piper
  url: github:mudler/LocalAI/gallery/piper.yaml@master ## Piper TTS
  name: voice-en-us-kathleen-low
--- a/gallery/stablediffusion.yaml
+++ b/gallery/stablediffusion.yaml
@ -1,49 +0,0 @@
---
-name: "stablediffusion-cpp"
-
-config_file: |
-  name: stablediffusion-cpp
-  backend: stablediffusion
-  parameters:
-    model: stablediffusion_assets
-
-files:
-  - filename: "stablediffusion_assets/AutoencoderKL-256-256-fp16-opt.param"
-    sha256: "18ca4b66685e21406bcf64c484b3b680b4949900415536d599cc876579c85c82"
-    uri: "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/AutoencoderKL-256-256-fp16-opt.param"
-  - filename: "stablediffusion_assets/AutoencoderKL-512-512-fp16-opt.param"
-    sha256: "cf45f63aacf3dbbab0f59ed92a6f2c14d9a1801314631cd3abe91e3c85639a20"
-    uri: "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/AutoencoderKL-512-512-fp16-opt.param"
-  - filename: "stablediffusion_assets/AutoencoderKL-base-fp16.param"
-    sha256: "0254a056dce61b0c27dc9ec1b78b53bcf55315c540f55f051eb841aa992701ba"
-    uri: "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/AutoencoderKL-base-fp16.param"
-  - filename: "stablediffusion_assets/AutoencoderKL-encoder-512-512-fp16.bin"
-    sha256: "ddcb79a9951b9f91e05e087739ed69da2c1c4ae30ba4168cce350b49d617c9fa"
-    uri: "https://github.com/EdVince/Stable-Diffusion-NCNN/releases/download/naifu/AutoencoderKL-encoder-512-512-fp16.bin"
-  - filename: "stablediffusion_assets/AutoencoderKL-fp16.bin"
-    sha256: "f02e71f80e70252734724bbfaed5c4ddd3a8ed7e61bb2175ff5f53099f0e35dd"
-    uri: "https://github.com/EdVince/Stable-Diffusion-NCNN/releases/download/naifu/AutoencoderKL-fp16.bin"
-  - filename: "stablediffusion_assets/FrozenCLIPEmbedder-fp16.bin"
-    sha256: "1c9a12f4e1dd1b295a388045f7f28a2352a4d70c3dc96a542189a3dd7051fdd6"
-    uri: "https://github.com/EdVince/Stable-Diffusion-NCNN/releases/download/naifu/FrozenCLIPEmbedder-fp16.bin"
-  - filename: "stablediffusion_assets/FrozenCLIPEmbedder-fp16.param"
-    sha256: "471afbe678dd1fd3fe764ef9c6eccaccb0a7d7e601f27b462aa926b20eb368c9"
-    uri: "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/FrozenCLIPEmbedder-fp16.param"
-  - filename: "stablediffusion_assets/log_sigmas.bin"
-    sha256: "a2089f8aa4c61f9c200feaec541ab3f5c94233b28deb6d5e8bcd974fa79b68ac"
-    uri: "https://github.com/EdVince/Stable-Diffusion-NCNN/raw/main/x86/linux/assets/log_sigmas.bin"
-  - filename: "stablediffusion_assets/UNetModel-256-256-MHA-fp16-opt.param"
-    sha256: "a58c380229f09491776df837b7aa7adffc0a87821dc4708b34535da2e36e3da1"
-    uri: "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/UNetModel-256-256-MHA-fp16-opt.param"
-  - filename: "stablediffusion_assets/UNetModel-512-512-MHA-fp16-opt.param"
-    sha256: "f12034067062827bd7f43d1d21888d1f03905401acf6c6eea22be23c259636fa"
-    uri: "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/UNetModel-512-512-MHA-fp16-opt.param"
-  - filename: "stablediffusion_assets/UNetModel-base-MHA-fp16.param"
-    sha256: "696f6975de49f4325b53ce32aff81861a6d6c07cd9ce3f0aae2cc405350af38d"
-    uri: "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/UNetModel-base-MHA-fp16.param"
-  - filename: "stablediffusion_assets/UNetModel-MHA-fp16.bin"
-    sha256: "d618918d011bfc1f644c0f2a33bf84931bd53b28a98492b0a8ed6f3a818852c3"
-    uri: "https://github.com/EdVince/Stable-Diffusion-NCNN/releases/download/naifu/UNetModel-MHA-fp16.bin"
-  - filename: "stablediffusion_assets/vocab.txt"
-    sha256: "e30e57b6f1e47616982ef898d8922be24e535b4fa3d0110477b3a6f02ebbae7d"
-    uri: "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/vocab.txt"