mirror of https://github.com/mudler/LocalAI.git synced 2025-05-23 10:43:54 +00:00

Ettore Di Giacinto 9a7ad75bff

docs: update to include installer and update advanced YAML options (#2631 )

* docs: update quickstart and advanced sections

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* docs: improvements

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* examples(kubernete): add nvidia example

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2024-06-22 12:00:38 +02:00

5.8 KiB

Raw Blame History

disableToc, title, weight, icon

disableToc	title	weight	icon
false	Run models manually	5	rocket_launch

Run Models Manually

Follow these steps to manually run models using LocalAI:

Prepare Your Model and Configuration Files: Ensure you have a model file and a configuration YAML file, if necessary. Customize model defaults and specific settings with a configuration file. For advanced configurations, refer to the [Advanced Documentation]({{% relref "docs/advanced" %}}).
GPU Acceleration: For instructions on GPU acceleration, visit the [GPU acceleration]({{% relref "docs/features/gpu-acceleration" %}}) page.
Run LocalAI: Choose one of the following methods to run LocalAI:

# Prepare the models into the `models` directory
mkdir models

# Copy your models to the directory
cp your-model.gguf models/

# Run the LocalAI container
docker run -p 8080:8080 -v $PWD/models:/models -ti --rm quay.io/go-skynet/local-ai:latest --models-path /models --context-size 700 --threads 4

# Expected output:
# ┌───────────────────────────────────────────────────┐
# │                   Fiber v2.42.0                   │
# │               http://127.0.0.1:8080               │
# │       (bound on host 0.0.0.0 and port 8080)       │
# │                                                   │
# │ Handlers ............. 1  Processes ........... 1 │
# │ Prefork ....... Disabled  PID ................. 1 │
# └───────────────────────────────────────────────────┘

# Test the endpoint with curl
curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
     "model": "your-model.gguf",
     "prompt": "A long time ago in a galaxy far, far away",
     "temperature": 0.7
   }'

{{% alert icon="💡" %}} Other Docker Images:

For other Docker images, please refer to the table in [the container images section]({{% relref "docs/getting-started/container-images" %}}). {{% /alert %}}

Example:

mkdir models

# Download luna-ai-llama2 to models/
wget https://huggingface.co/TheBloke/Luna-AI-Llama2-Uncensored-GGUF/resolve/main/luna-ai-llama2-uncensored.Q4_0.gguf -O models/luna-ai-llama2

# Use a template from the examples, if needed
cp -rf prompt-templates/getting_started.tmpl models/luna-ai-llama2.tmpl

docker run -p 8080:8080 -v $PWD/models:/models -ti --rm quay.io/go-skynet/local-ai:latest --models-path /models --context-size 700 --threads 4

# Now the API is accessible at localhost:8080
curl http://localhost:8080/v1/models
# {"object":"list","data":[{"id":"luna-ai-llama2","object":"model"}]}

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "luna-ai-llama2",
     "messages": [{"role": "user", "content": "How are you?"}],
     "temperature": 0.9
   }'
# {"model":"luna-ai-llama2","choices":[{"message":{"role":"assistant","content":"I'm doing well, thanks. How about you?"}}]}

If running on Apple Silicon (ARM), it is not recommended to run on Docker due to emulation. Follow the [build instructions]({{% relref "docs/getting-started/build" %}}) to use Metal acceleration for full GPU support.
If you are running on Apple x86_64, you can use Docker without additional gain from building it from source. {{% /alert %}}

# Clone LocalAI
git clone https://github.com/go-skynet/LocalAI

cd LocalAI

# (Optional) Checkout a specific LocalAI tag
# git checkout -b build <TAG>

# Copy your models to the models directory
cp your-model.gguf models/

# (Optional) Edit the .env file to set parameters like context size and threads
# vim .env

# Start with Docker Compose
docker compose up -d --pull always
# Or build the images with:
# docker compose up -d --build

# Now the API is accessible at localhost:8080
curl http://localhost:8080/v1/models
# {"object":"list","data":[{"id":"your-model.gguf","object":"model"}]}

curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
     "model": "your-model.gguf",
     "prompt": "A long time ago in a galaxy far, far away",
     "temperature": 0.7
   }'

{{% alert icon="💡" %}} Other Docker Images:

For other Docker images, please refer to the table in Getting Started. {{% /alert %}}

Note: If you are on Windows, ensure the project is on the Linux filesystem to avoid slow model loading. For more information, see the Microsoft Docs.

For Kubernetes deployment, see the [Kubernetes section]({{% relref "docs/getting-started/kubernetes" %}}).

LocalAI binary releases are available on GitHub.

{{% alert icon="⚠️" %}} If installing on macOS, you might encounter a message saying:

"local-ai-git-Darwin-arm64" (or the name you gave the binary) can't be opened because Apple cannot check it for malicious software.

Hit OK, then go to Settings > Privacy & Security > Security and look for the message:

"local-ai-git-Darwin-arm64" was blocked from use because it is not from an identified developer.

Press "Allow Anyway." {{% /alert %}}

For instructions on building LocalAI from source, see the [Build Section]({{% relref "docs/getting-started/build" %}}).

For more model configurations, visit the Examples Section.

5.8 KiB Raw Blame History

Run Models Manually

Example:

5.8 KiB

Raw Blame History