Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
14 KiB
+++ disableToc = false title = "Getting started" weight = 1 url = '/basics/getting_started/' +++
LocalAI
is available as a container image and binary. It can be used with docker, podman, kubernetes and any container engine.
Container images are published to quay.io and Dockerhub.
See also our [How to]({{%relref "howtos" %}}) section for end-to-end guided examples curated by the community.
How to get started
The easiest way to run LocalAI is by using docker compose
or with Docker (to build locally, see the [build section]({{%relref "build" %}})).
LocalAI needs at least a model file to work, or a configuration YAML file, or both. You can customize further model defaults and specific settings with a configuration file (see [advanced]({{%relref "advanced" %}})).
{{% notice note %}} To run with GPU Accelleration, see [GPU acceleration]({{%relref "features/gpu-acceleration" %}}). {{% /notice %}}
{{< tabs >}} {{% tab name="Docker" %}}
# Prepare the models into the `model` directory
mkdir models
# copy your models to it
cp your-model.gguf models/
# run the LocalAI container
docker run -p 8080:8080 -v $PWD/models:/models -ti --rm quay.io/go-skynet/local-ai:latest --models-path /models --context-size 700 --threads 4
# You should see:
#
# ┌───────────────────────────────────────────────────┐
# │ Fiber v2.42.0 │
# │ http://127.0.0.1:8080 │
# │ (bound on host 0.0.0.0 and port 8080) │
# │ │
# │ Handlers ............. 1 Processes ........... 1 │
# │ Prefork ....... Disabled PID ................. 1 │
# └───────────────────────────────────────────────────┘
# Try the endpoint with curl
curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
"model": "your-model.gguf",
"prompt": "A long time ago in a galaxy far, far away",
"temperature": 0.7
}'
{{% notice note %}}
- If running on Apple Silicon (ARM) it is not suggested to run on Docker due to emulation. Follow the [build instructions]({{%relref "build" %}}) to use Metal acceleration for full GPU support.
- If you are running Apple x86_64 you can use
docker
, there is no additional gain into building it from source. {{% /notice %}}
{{% /tab %}} {{% tab name="Docker compose" %}}
# Clone LocalAI
git clone https://github.com/go-skynet/LocalAI
cd LocalAI
# (optional) Checkout a specific LocalAI tag
# git checkout -b build <TAG>
# copy your models to models/
cp your-model.gguf models/
# (optional) Edit the .env file to set things like context size and threads
# vim .env
# start with docker compose
docker compose up -d --pull always
# or you can build the images with:
# docker compose up -d --build
# Now API is accessible at localhost:8080
curl http://localhost:8080/v1/models
# {"object":"list","data":[{"id":"your-model.gguf","object":"model"}]}
curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
"model": "your-model.gguf",
"prompt": "A long time ago in a galaxy far, far away",
"temperature": 0.7
}'
Note: If you are on Windows, please make sure the project is on the Linux Filesystem, otherwise loading models might be slow. For more Info: Microsoft Docs
{{% /tab %}}
{{% tab name="Kubernetes" %}}
For installing LocalAI in Kubernetes, you can use the following helm chart:
# Install the helm repository
helm repo add go-skynet https://go-skynet.github.io/helm-charts/
# Update the repositories
helm repo update
# Get the values
helm show values go-skynet/local-ai > values.yaml
# Edit the values value if needed
# vim values.yaml ...
# Install the helm chart
helm install local-ai go-skynet/local-ai -f values.yaml
{{% /tab %}} {{% tab name="From binary" %}}
LocalAI binary releases are available in Github.
{{% /tab %}}
{{% tab name="From source" %}}
See the [build section]({{%relref "build" %}}).
{{% /tab %}}
{{< /tabs >}}
Running Popular models (one-click!)
You can run local-ai
directly with a model name, and it will download the model and start the API with the model loaded.
Don't need GPU acceleration? use the CPU images which are lighter and do not have Nvidia dependencies To know which version of CUDA do you have available, you can check with
nvidia-smi
ornvcc --version
{{< tabs >}} {{% tab name="CPU-only" %}}
Model | Category | Docker command |
---|---|---|
phi-2 | LLM | docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core phi-2 |
llava | Multimodal LLM | docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core llava |
mistral-openorca | LLM | docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core mistral-openorca |
bert-cpp | Embeddings | docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core bert-cpp |
all-minilm-l6-v2 | Embeddings | docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg all-minilm-l6-v2 |
whisper-base | Audio to Text | docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core whisper-base |
rhasspy-voice-en-us-amy | Text to Audio | docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core rhasspy-voice-en-us-amy |
coqui | Text to Audio | docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg coqui |
bark | Text to Audio | docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg bark |
vall-e-x | Text to Audio | docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg vall-e-x |
{{% /tab %}} {{% tab name="GPU (CUDA 11)" %}}
Model | Category | Docker command |
---|---|---|
phi-2 | LLM | docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core phi-2 |
llava | Multimodal LLM | docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core llava |
mistral-openorca | LLM | docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core mistral-openorca |
bert-cpp | Embeddings | docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core bert-cpp |
all-minilm-l6-v2 | Embeddings | docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11 all-minilm-l6-v2 |
whisper-base | Audio to Text | docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core whisper-base |
rhasspy-voice-en-us-amy | Text to Audio | docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core rhasspy-voice-en-us-amy |
coqui | Text to Audio | docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11 coqui |
bark | Text to Audio | docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11 bark |
vall-e-x | Text to Audio | docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11 vall-e-x |
{{% /tab %}}
{{% tab name="GPU (CUDA 12)" %}}
Model | Category | Docker command |
---|---|---|
phi-2 | LLM | docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core phi-2 |
llava | Multimodal LLM | docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core llava |
mistral-openorca | LLM | docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core mistral-openorca |
bert-cpp | Embeddings | docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core bert-cpp |
all-minilm-l6-v2 | Embeddings | docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12 all-minilm-l6-v2 |
whisper-base | Audio to Text | docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core whisper-base |
rhasspy-voice-en-us-amy | Text to Audio | docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core rhasspy-voice-en-us-amy |
coqui | Text to Audio | docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12 coqui |
bark | Text to Audio | docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12 bark |
vall-e-x | Text to Audio | docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12 vall-e-x |
{{% /tab %}}
{{< /tabs >}}
{{% notice note %}}
LocalAI can be started (either the container image or the binary) with a list of model config files URLs or our short-handed format (e.g. huggingface://
. github://
). It works by passing the urls as arguments or environment variable, for example:
local-ai github://owner/repo/file.yaml@branch
# Env
MODELS="github://owner/repo/file.yaml@branch,github://owner/repo/file.yaml@branch" local-ai
# Args
local-ai --models github://owner/repo/file.yaml@branch --models github://owner/repo/file.yaml@branch
For example, to start localai with phi-2, it's possible for instance to also use a full config file from gists:
docker run -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core https://gist.githubusercontent.com/mudler/ad601a0488b497b69ec549150d9edd18/raw/a8a8869ef1bb7e3830bf5c0bae29a0cce991ff8d/phi-2.yaml
The file should be a valid LocalAI YAML configuration file, for the full syntax see [advanced]({{%relref "advanced" %}}). {{% /notice %}}
Container images
LocalAI has a set of images to support CUDA, ffmpeg and 'vanilla' (CPU-only). The image list is on quay:
{{< tabs >}} {{% tab name="Vanilla / CPU Images" %}}
master
latest
{{< version >}}
{{< version >}}-ffmpeg
{{< version >}}-ffmpeg-core
Core Images - Smaller images without predownload python dependencies {{% /tab %}}
{{% tab name="GPU Images CUDA 11" %}}
Images with Nvidia accelleration support
If you do not know which version of CUDA do you have available, you can check with
nvidia-smi
ornvcc --version
master-cublas-cuda11
master-cublas-cuda11-core
{{< version >}}-cublas-cuda11
{{< version >}}-cublas-cuda11-core
{{< version >}}-cublas-cuda11-ffmpeg
{{< version >}}-cublas-cuda11-ffmpeg-core
Core Images - Smaller images without predownload python dependencies {{% /tab %}}
{{% tab name="GPU Images CUDA 12" %}}
Images with Nvidia accelleration support
If you do not know which version of CUDA do you have available, you can check with
nvidia-smi
ornvcc --version
master-cublas-cuda12
master-cublas-cuda12-core
{{< version >}}-cublas-cuda12
{{< version >}}-cublas-cuda12-core
{{< version >}}-cublas-cuda12-ffmpeg
{{< version >}}-cublas-cuda12-ffmpeg-core
Core Images - Smaller images without predownload python dependencies
{{% /tab %}}
{{< /tabs >}}
Example:
- Standard (GPT +
stablediffusion
):quay.io/go-skynet/local-ai:latest
- FFmpeg:
quay.io/go-skynet/local-ai:{{< version >}}-ffmpeg
- CUDA 11+FFmpeg:
quay.io/go-skynet/local-ai:{{< version >}}-cublas-cuda11-ffmpeg
- CUDA 12+FFmpeg:
quay.io/go-skynet/local-ai:{{< version >}}-cublas-cuda12-ffmpeg
{{% notice note %}}
Note: the binary inside the image is pre-compiled, and might not suite all CPUs.
To enable CPU optimizations for the execution environment,
the default behavior is to rebuild when starting the container.
To disable this auto-rebuild behavior,
set the environment variable REBUILD
to false
.
See [docs on all environment variables]({{%relref "advanced#environment-variables" %}}) for more info. {{% /notice %}}
Example: Use luna-ai-llama2 model with docker
mkdir models
# Download luna-ai-llama2 to models/
wget https://huggingface.co/TheBloke/Luna-AI-Llama2-Uncensored-GGUF/resolve/main/luna-ai-llama2-uncensored.Q4_0.gguf -O models/luna-ai-llama2
# Use a template from the examples
cp -rf prompt-templates/getting_started.tmpl models/luna-ai-llama2.tmpl
docker run -p 8080:8080 -v $PWD/models:/models -ti --rm quay.io/go-skynet/local-ai:latest --models-path /models --context-size 700 --threads 4
# Now API is accessible at localhost:8080
curl http://localhost:8080/v1/models
# {"object":"list","data":[{"id":"luna-ai-llama2","object":"model"}]}
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "luna-ai-llama2",
"messages": [{"role": "user", "content": "How are you?"}],
"temperature": 0.9
}'
# {"model":"luna-ai-llama2","choices":[{"message":{"role":"assistant","content":"I'm doing well, thanks. How about you?"}}]}
To see other model configurations, see also the example section here.
Examples
To see other examples on how to integrate with other projects for instance for question answering or for using it with chatbot-ui, see: examples.