Distributed inference with P2P

LocalAI uses P2P technologies to enable distribution of work between peers. It is possible to share an instance with Federation and/or split the weights of a model across peers (only available with llama.cpp models). You can now share computational resources between your devices or your friends!

Warning: P2P mode is disabled or no token was specified

You have to enable P2P mode by starting LocalAI with --p2p. Please restart the server with --p2p to generate a new token automatically that can be used to automatically discover other nodes. If you already have a token specify it with export TOKEN=".." Check out the documentation for more information.

Federated Nodes:

You can start LocalAI in federated mode to share your instance, or start the federated server to balance requests between nodes of the federation.

To start a new instance to share:


                            # Start a new instance to share with --federated and a TOKEN

                            export TOKEN="{{.P2PToken}}"

                            local-ai run --federated --p2p

Note: If you don't have a token do not specify it and use the generated one that you can find in this page.

To start a new federated load balancer:


                            export TOKEN="{{.P2PToken}}"

                            local-ai federated

Note: Token is needed when starting the federated server.

For all the options available, please refer to the documentation.

To start a new federated instance:


                            docker run -ti --net host -e TOKEN="{{.P2PToken}}" --name local-ai -p 8080:8080 localai/localai:latest-cpu run --federated --p2p

To start a new federated server (port to 9090):


                            docker run -ti --net host -e TOKEN="{{.P2PToken}}" --name local-ai -p 9090:8080 localai/localai:latest-cpu federated

For all the options available and see what image to use, please refer to the Container images documentation and CLI parameters documentation.

Workers (llama.cpp):

You can start llama.cpp workers to distribute weights between the workers and offload part of the computation. To start a new worker, you can use the CLI or Docker.

Start a new llama.cpp P2P worker

CLI
Container images

To start a new worker, run the following command:


                            export TOKEN="{{.P2PToken}}"

                            local-ai worker p2p-llama-cpp-rpc

For all the options available, please refer to the documentation.

To start a new worker with docker, run the following command:


                            docker run -ti --net host -e TOKEN="{{.P2PToken}}" --name local-ai -p 8080:8080 localai/localai:latest-cpu worker p2p-llama-cpp-rpc

For all the options available and see what image to use, please refer to the Container images documentation and CLI parameters documentation.