* feat(explorer): allow to specify a worker target
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(explorer): correctly load balance requests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(explorer): mark load balanced by default
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: make sure to delete tunnels that might not exist anymore
If a worker goes off and on might change tunnel address, and we want to
load balance only on the active tunnels.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* WIP
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Wire up a simple explorer DB
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* wip
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* WIP
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactor: group services id so can be identified easily in the ledger table
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(discovery): discovery service now gather worker informations correctly
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(explorer): display network token
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(explorer): display form to add new networks
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(explorer): stop from overwriting networks
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(explorer): display only networks with active workers
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(explorer): list only clusters in a network if it has online workers
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* remove invalid and inactive networks
if networks have no workers delete them from the database, similarly,
if invalid.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci: add workflow to deploy new explorer versions automatically
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* build-api: build with p2p tag
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Allow to specify a connection timeout
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* logging
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Better p2p defaults
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Set loglevel
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fix dht enable
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Default to info for loglevel
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add navbar
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Slightly improve rendering
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Allow to copy the token easily
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
feat(p2p): allow to run multiple clusters in the same network
Allow to specify a network ID via CLI which allows to run multiple
clusters, logically separated within the same network (by using the same
shared token).
Note: This segregation is not "secure" by any means, anyone having the
network token can see the services available in all the network,
however, this provides a way to separate the inference endpoints.
This allows for instance to have a node which is both federated and
having attached a set of llama.cpp workers.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Fixes:
```
panic: invalid argument to IntN
goroutine 401 [running]:
math/rand/v2.(*Rand).IntN(...)
/home/mudler/_git/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.22.4.linux-amd64/src/math/rand/v2/rand.go:190
math/rand/v2.IntN(...)
/home/mudler/_git/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.22.4.linux-amd64/src/math/rand/v2/rand.go:307
github.com/mudler/LocalAI/core/cli.Proxy.func2()
/home/mudler/_git/LocalAI/core/cli/federated.go:104 +0x76e
created by github.com/mudler/LocalAI/core/cli.Proxy in goroutine 1
/home/mudler/_git/LocalAI/core/cli/federated.go:91 +0x3c5
```
When no nodes are found and something is trying to hit the federated
endpoint (and no tunnels are ready yet).
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This allows LocalAI to be less noisy avoiding to connect outside.
Needed if e.g. there is no plan into using p2p across separate networks.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Wip p2p enhancements
* get online state
* Pass-by token to show in the dashboard
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Style
* Minor fixups
* parametrize SearchID
* Refactoring
* Allow to expose/bind more services
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add federation
* Display federated mode in the WebUI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Small fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* make federated nodes visible from the WebUI
* Fix version display
* improve web page
* live page update
* visual enhancements
* enhancements
* visual enhancements
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(llama.cpp): Enable decentralized, distributed inference
As https://github.com/mudler/LocalAI/pull/2324 introduced distributed inferencing thanks to
@rgerganov implementation in https://github.com/ggerganov/llama.cpp/pull/6829 in upstream llama.cpp, now
it is possible to distribute the workload to remote llama.cpp gRPC server.
This changeset now uses mudler/edgevpn to establish a secure, distributed network between the nodes using a shared token.
The token is generated automatically when starting the server with the `--p2p` flag, and can be used by starting the workers
with `local-ai worker p2p-llama-cpp-rpc` by passing the token via environment variable (TOKEN) or with args (--token).
As per how mudler/edgevpn works, a network is established between the server and the workers with dht and mdns discovery protocols,
the llama.cpp rpc server is automatically started and exposed to the underlying p2p network so the API server can connect on.
When the HTTP server is started, it will discover the workers in the network and automatically create the port-forwards to the service locally.
Then llama.cpp is configured to use the services.
This feature is behind the "p2p" GO_FLAGS
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* go mod tidy
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci: add p2p tag
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* better message
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>