{{template "views/partials/head" .}}
{{template "views/partials/navbar" .}}

Distributed inference with P2P

LocalAI uses P2P technologies to enable distribution of work between peers. It is possible to share an instance with Federation and/or split the weights of a model across peers (only available with llama.cpp models). You can now share computational resources between your devices or your friends!

Federated Nodes:

You can start LocalAI in federated mode to share your instance, or start the federated server to balance requests between nodes of the federation.


Start a federated instance

Workers (llama.cpp):

You can start llama.cpp workers to distribute weights between the workers and offload part of the computation. To start a new worker, you can use the CLI or Docker.


Start a new llama.cpp P2P worker

{{template "views/partials/footer" .}}