LocalAI/docs/content/docs/advanced/run-other-models.md
Ettore Di Giacinto 607900a4bb
docs: more swagger, update docs (#2907)
* docs(swagger): finish convering gallery section

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* docs: add section to explain how to install models with local-ai run

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Minor docs adjustments

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-07-18 11:25:21 +02:00

17 KiB

+++ disableToc = false title = "Run other Models" weight = 23 icon = "rocket_launch"

+++

Running other models

Do you have already a model file? Skip to [Run models manually]({{%relref "docs/getting-started/models" %}}).

To load models into LocalAI, you can either [use models manually]({{%relref "docs/getting-started/models" %}}) or configure LocalAI to pull the models from external sources, like Huggingface and configure the model.

To do that, you can point LocalAI to an URL to a YAML configuration file - however - LocalAI does also have some popular model configuration embedded in the binary as well. Below you can find a list of the models configuration that LocalAI has pre-built, see [Model customization]({{%relref "docs/getting-started/customize-model" %}}) on how to configure models from URLs.

There are different categories of models: [LLMs]({{%relref "docs/features/text-generation" %}}), [Multimodal LLM]({{%relref "docs/features/gpt-vision" %}}) , [Embeddings]({{%relref "docs/features/embeddings" %}}), [Audio to Text]({{%relref "docs/features/audio-to-text" %}}), and [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) depending on the backend being used and the model architecture.

{{% alert icon="💡" %}}

To customize the models, see [Model customization]({{%relref "docs/getting-started/customize-model" %}}). For more model configurations, visit the Examples Section and the configurations for the models below is available here. {{% /alert %}}

{{< tabs tabTotal="3" >}} {{% tab tabName="CPU-only" %}}

💡Don't need GPU acceleration? use the CPU images which are lighter and do not have Nvidia dependencies

Model Category Docker command
phi-2 [LLM]({{%relref "docs/features/text-generation" %}}) docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core phi-2
🌋 bakllava [Multimodal LLM]({{%relref "docs/features/gpt-vision" %}}) docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core bakllava
🌋 llava-1.5 [Multimodal LLM]({{%relref "docs/features/gpt-vision" %}}) docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core llava-1.5
🌋 llava-1.6-mistral [Multimodal LLM]({{%relref "docs/features/gpt-vision" %}}) docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core llava-1.6-mistral
🌋 llava-1.6-vicuna [Multimodal LLM]({{%relref "docs/features/gpt-vision" %}}) docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core llava-1.6-vicuna
mistral-openorca [LLM]({{%relref "docs/features/text-generation" %}}) docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core mistral-openorca
bert-cpp [Embeddings]({{%relref "docs/features/embeddings" %}}) docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core bert-cpp
all-minilm-l6-v2 [Embeddings]({{%relref "docs/features/embeddings" %}}) docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg all-minilm-l6-v2
whisper-base [Audio to Text]({{%relref "docs/features/audio-to-text" %}}) docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core whisper-base
rhasspy-voice-en-us-amy [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core rhasspy-voice-en-us-amy
🐸 coqui [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg coqui
🐶 bark [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg bark
🔊 vall-e-x [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg vall-e-x
mixtral-instruct Mixtral-8x7B-Instruct-v0.1 [LLM]({{%relref "docs/features/text-generation" %}}) docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core mixtral-instruct
tinyllama-chat original model [LLM]({{%relref "docs/features/text-generation" %}}) docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core tinyllama-chat
dolphin-2.5-mixtral-8x7b [LLM]({{%relref "docs/features/text-generation" %}}) docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core dolphin-2.5-mixtral-8x7b
🐍 mamba [LLM]({{%relref "docs/features/text-generation" %}}) GPU-only
animagine-xl [Text to Image]({{%relref "docs/features/image-generation" %}}) GPU-only
transformers-tinyllama [LLM]({{%relref "docs/features/text-generation" %}}) GPU-only
codellama-7b (with transformers) [LLM]({{%relref "docs/features/text-generation" %}}) GPU-only
codellama-7b-gguf (with llama.cpp) [LLM]({{%relref "docs/features/text-generation" %}}) docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core codellama-7b-gguf
hermes-2-pro-mistral [LLM]({{%relref "docs/features/text-generation" %}}) docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core hermes-2-pro-mistral
{{% /tab %}}

{{% tab tabName="GPU (CUDA 11)" %}}

To know which version of CUDA do you have available, you can check with nvidia-smi or nvcc --version see also [GPU acceleration]({{%relref "docs/features/gpu-acceleration" %}}).

Model Category Docker command
phi-2 [LLM]({{%relref "docs/features/text-generation" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core phi-2
🌋 bakllava [Multimodal LLM]({{%relref "docs/features/gpt-vision" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core bakllava
🌋 llava-1.5 [Multimodal LLM]({{%relref "docs/features/gpt-vision" %}}) docker run -ti -p 8080:8080 localai/localai:{{< version >}}-cublas-cuda11-core llava-1.5
🌋 llava-1.6-mistral [Multimodal LLM]({{%relref "docs/features/gpt-vision" %}}) docker run -ti -p 8080:8080 localai/localai:{{< version >}}-cublas-cuda11-core llava-1.6-mistral
🌋 llava-1.6-vicuna [Multimodal LLM]({{%relref "docs/features/gpt-vision" %}}) docker run -ti -p 8080:8080 localai/localai:{{< version >}}-cublas-cuda11-core llava-1.6-vicuna
mistral-openorca [LLM]({{%relref "docs/features/text-generation" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core mistral-openorca
bert-cpp [Embeddings]({{%relref "docs/features/embeddings" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core bert-cpp
all-minilm-l6-v2 [Embeddings]({{%relref "docs/features/embeddings" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11 all-minilm-l6-v2
whisper-base [Audio to Text]({{%relref "docs/features/audio-to-text" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core whisper-base
rhasspy-voice-en-us-amy [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core rhasspy-voice-en-us-amy
🐸 coqui [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11 coqui
🐶 bark [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11 bark
🔊 vall-e-x [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11 vall-e-x
mixtral-instruct Mixtral-8x7B-Instruct-v0.1 [LLM]({{%relref "docs/features/text-generation" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core mixtral-instruct
tinyllama-chat original model [LLM]({{%relref "docs/features/text-generation" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core tinyllama-chat
dolphin-2.5-mixtral-8x7b [LLM]({{%relref "docs/features/text-generation" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core dolphin-2.5-mixtral-8x7b
🐍 mamba [LLM]({{%relref "docs/features/text-generation" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11 mamba-chat
animagine-xl [Text to Image]({{%relref "docs/features/image-generation" %}}) docker run -ti -p 8080:8080 -e COMPEL=0 --gpus all localai/localai:{{< version >}}-cublas-cuda11 animagine-xl
transformers-tinyllama [LLM]({{%relref "docs/features/text-generation" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11 transformers-tinyllama
codellama-7b [LLM]({{%relref "docs/features/text-generation" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11 codellama-7b
codellama-7b-gguf [LLM]({{%relref "docs/features/text-generation" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core codellama-7b-gguf
hermes-2-pro-mistral [LLM]({{%relref "docs/features/text-generation" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core hermes-2-pro-mistral
{{% /tab %}}

{{% tab tabName="GPU (CUDA 12)" %}}

To know which version of CUDA do you have available, you can check with nvidia-smi or nvcc --version see also [GPU acceleration]({{%relref "docs/features/gpu-acceleration" %}}).

Model Category Docker command
phi-2 [LLM]({{%relref "docs/features/text-generation" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core phi-2
🌋 bakllava [Multimodal LLM]({{%relref "docs/features/gpt-vision" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core bakllava
🌋 llava-1.5 [Multimodal LLM]({{%relref "docs/features/gpt-vision" %}}) docker run -ti -p 8080:8080 localai/localai:{{< version >}}-cublas-cuda12-core llava-1.5
🌋 llava-1.6-mistral [Multimodal LLM]({{%relref "docs/features/gpt-vision" %}}) docker run -ti -p 8080:8080 localai/localai:{{< version >}}-cublas-cuda12-core llava-1.6-mistral
🌋 llava-1.6-vicuna [Multimodal LLM]({{%relref "docs/features/gpt-vision" %}}) docker run -ti -p 8080:8080 localai/localai:{{< version >}}-cublas-cuda12-core llava-1.6-vicuna
mistral-openorca [LLM]({{%relref "docs/features/text-generation" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core mistral-openorca
bert-cpp [Embeddings]({{%relref "docs/features/embeddings" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core bert-cpp
all-minilm-l6-v2 [Embeddings]({{%relref "docs/features/embeddings" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12 all-minilm-l6-v2
whisper-base [Audio to Text]({{%relref "docs/features/audio-to-text" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core whisper-base
rhasspy-voice-en-us-amy [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core rhasspy-voice-en-us-amy
🐸 coqui [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12 coqui
🐶 bark [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12 bark
🔊 vall-e-x [Text to Audio]({{%relref "docs/features/text-to-audio" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12 vall-e-x
mixtral-instruct Mixtral-8x7B-Instruct-v0.1 [LLM]({{%relref "docs/features/text-generation" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core mixtral-instruct
tinyllama-chat original model [LLM]({{%relref "docs/features/text-generation" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core tinyllama-chat
dolphin-2.5-mixtral-8x7b [LLM]({{%relref "docs/features/text-generation" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core dolphin-2.5-mixtral-8x7b
🐍 mamba [LLM]({{%relref "docs/features/text-generation" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12 mamba-chat
animagine-xl [Text to Image]({{%relref "docs/features/image-generation" %}}) docker run -ti -p 8080:8080 -e COMPEL=0 --gpus all localai/localai:{{< version >}}-cublas-cuda12 animagine-xl
transformers-tinyllama [LLM]({{%relref "docs/features/text-generation" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12 transformers-tinyllama
codellama-7b [LLM]({{%relref "docs/features/text-generation" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12 codellama-7b
codellama-7b-gguf [LLM]({{%relref "docs/features/text-generation" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core codellama-7b-gguf
hermes-2-pro-mistral [LLM]({{%relref "docs/features/text-generation" %}}) docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core hermes-2-pro-mistral
{{% /tab %}}

{{< /tabs >}}

{{% alert icon="💡" %}} Tip You can actually specify multiple models to start an instance with the models loaded, for example to have both llava and phi-2 configured:

docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core llava phi-2

{{% /alert %}}