Distributed inference with P2P

Distribute computation by sharing and loadbalancing instances or sharding model weights

LocalAI uses P2P technologies to enable distribution of work between peers. It is possible to share an instance with Federation and/or split the weights of a model across peers (only available with llama.cpp models). You can now share computational resources between your devices or your friends!

Network Token

{{.P2PToken}}

The network token can be used to either share the instance or join a federation or a worker network. Below you will find examples on how to start a new instance or a worker with this token.

Warning: P2P mode is disabled or no token was specified

You have to enable P2P mode by starting LocalAI with --p2p. Please restart the server with --p2p to generate a new token automatically that can be used to discover other nodes. If you already have a token, specify it with export TOKEN=".." Check out the documentation for more information.

Federated Nodes:

You can start LocalAI in federated mode to share your instance, or start the federated server to balance requests between nodes of the federation.

Start a new instance to share:


# Start a new instance to share with --federated and a TOKEN

export TOKEN="{{.P2PToken}}"

local-ai run --federated --p2p

Note: If you don't have a token do not specify it and use the generated one that you can find in this page.

Start a new federated load balancer:


export TOKEN="{{.P2PToken}}"

local-ai federated

Note: Token is needed when starting the federated server.

For all the options available, please refer to the documentation.

Start a new federated instance:


docker run -ti --net host -e TOKEN="{{.P2PToken}}" --name local-ai -p 8080:8080 localai/localai:latest-cpu run --federated --p2p

Start a new federated server with Docker (port to 9090):


docker run -ti --net host -e TOKEN="{{.P2PToken}}" --name local-ai -p 9090:8080 localai/localai:latest-cpu federated

For all the options available and see what image to use, please refer to the Container images documentation and CLI parameters documentation.

Workers (llama.cpp):

You can start llama.cpp workers to distribute weights between the workers and offload part of the computation. To start a new worker, you can use the CLI or Docker.

Start a new llama.cpp P2P worker

CLI
Container images

Start a new worker:


export TOKEN="{{.P2PToken}}"

local-ai worker p2p-llama-cpp-rpc

For all the options available, please refer to the documentation.

Start a new worker with Docker:


docker run -ti --net host -e TOKEN="{{.P2PToken}}" --name local-ai -p 8080:8080 localai/localai:latest-cpu worker p2p-llama-cpp-rpc

For all the options available and see what image to use, please refer to the Container images documentation and CLI parameters documentation.