LocalAI/docs/content/build/_index.en.md
Ettore Di Giacinto 66fa4f1767
feat: share models by url (#1522)
* feat: allow to pass by models via args

* expose it also as an env/arg

* docs: enhancements to build/requirements

* do not display status always

* print download status

* not all mesages are debug
2024-01-01 10:31:03 +01:00

5.3 KiB

+++ disableToc = false title = "Build" weight = 5 url = '/basics/build/'

+++

Build

Container image

Requirements:

  • Docker or podman, or a container engine

In order to build the LocalAI container image locally you can use docker, for example:

# build the image
docker build -t localai .
docker run localai

Locally

In order to build LocalAI locally, you need the following requirements:

  • Golang >= 1.21
  • Cmake/make
  • GCC
  • GRPC

To install the dependencies follow the instructions below:

{{< tabs >}} {{% tab name="Apple" %}}

brew install abseil cmake go grpc protobuf wget

{{% /tab %}} {{% tab name="Debian" %}}

apt install protobuf-compiler-grpc libgrpc-dev make cmake

{{% /tab %}} {{% tab name="From source" %}}

Specify BUILD_GRPC_FOR_BACKEND_LLAMA=true to build automatically the gRPC dependencies

make ... BUILD_GRPC_FOR_BACKEND_LLAMA=true build

{{% /tab %}} {{< /tabs >}}

To build LocalAI with make:

git clone https://github.com/go-skynet/LocalAI
cd LocalAI
make build

This should produce the binary local-ai

{{% notice note %}}

CPU flagset compatibility

LocalAI uses different backends based on ggml and llama.cpp to run models. If your CPU doesn't support common instruction sets, you can disable them during build:

CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_AVX=OFF -DLLAMA_FMA=OFF" make build

To have effect on the container image, you need to set REBUILD=true:

docker run  quay.io/go-skynet/localai
docker run --rm -ti -p 8080:8080 -e DEBUG=true -e MODELS_PATH=/models -e THREADS=1 -e REBUILD=true -e CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_AVX=OFF -DLLAMA_FMA=OFF" -v $PWD/models:/models quay.io/go-skynet/local-ai:latest

{{% /notice %}}

Example: Build on mac

Building on Mac (M1 or M2) works, but you may need to install some prerequisites using brew.

The below has been tested by one mac user and found to work. Note that this doesn't use Docker to run the server:

# install build dependencies
brew install abseil cmake go grpc protobuf wget

# clone the repo
git clone https://github.com/go-skynet/LocalAI.git

cd LocalAI

# build the binary
make build

# Download gpt4all-j to models/
wget https://gpt4all.io/models/ggml-gpt4all-j.bin -O models/ggml-gpt4all-j

# Use a template from the examples
cp -rf prompt-templates/ggml-gpt4all-j.tmpl models/

# Run LocalAI
./local-ai --models-path=./models/ --debug=true

# Now API is accessible at localhost:8080
curl http://localhost:8080/v1/models

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "ggml-gpt4all-j",
     "messages": [{"role": "user", "content": "How are you?"}],
     "temperature": 0.9 
   }'

Build with Image generation support

Requirements: OpenCV, Gomp

Image generation is experimental and requires GO_TAGS=stablediffusion to be set during build:

make GO_TAGS=stablediffusion build

Build with Text to audio support

Requirements: piper-phonemize

Text to audio support is experimental and requires GO_TAGS=tts to be set during build:

make GO_TAGS=tts build

Acceleration

List of the variables available to customize the build:

Variable Default Description
BUILD_TYPE None Build type. Available: cublas, openblas, clblas, metal,hipblas
GO_TAGS tts stablediffusion Go tags. Available: stablediffusion, tts
CLBLAST_DIR Specify a CLBlast directory
CUDA_LIBPATH Specify a CUDA library path

OpenBLAS

Software acceleration.

Requirements: OpenBLAS

make BUILD_TYPE=openblas build

CuBLAS

Nvidia Acceleration.

Requirement: Nvidia CUDA toolkit

Note: CuBLAS support is experimental, and has not been tested on real HW. please report any issues you find!

make BUILD_TYPE=cublas build

More informations available in the upstream PR: https://github.com/ggerganov/llama.cpp/pull/1412

Hipblas (AMD GPU with ROCm on Arch Linux)

Packages:

pacman -S base-devel git rocm-hip-sdk rocm-opencl-sdk opencv clblast grpc

Library links:

export CGO_CFLAGS="-I/usr/include/opencv4"
export CGO_CXXFLAGS="-I/usr/include/opencv4"
export CGO_LDFLAGS="-L/opt/rocm/hip/lib -lamdhip64 -L/opt/rocm/lib -lOpenCL -L/usr/lib -lclblast -lrocblas -lhipblas -lrocrand -lomp -O3 --rtlib=compiler-rt -unwindlib=libgcc -lhipblas -lrocblas --hip-link"

Build:

make BUILD_TYPE=hipblas GPU_TARGETS=gfx1030

ClBLAS

AMD/Intel GPU acceleration.

Requirement: OpenCL, CLBlast

make BUILD_TYPE=clblas build

To specify a clblast dir set: CLBLAST_DIR

Metal (Apple Silicon)

make BUILD_TYPE=metal build

# Set `gpu_layers: 1` to your YAML model config file and `f16: true`
# Note: only models quantized with q4_0 are supported!

Build only a single backend

You can control the backends that are built by setting the GRPC_BACKENDS environment variable. For instance, to build only the llama-cpp backend only:

make GRPC_BACKENDS=backend-assets/grpc/llama-cpp build

By default, all the backends are built.

Windows compatibility

Make sure to give enough resources to the running container. See https://github.com/go-skynet/LocalAI/issues/2