Like with CUDA builds, we don't need all the variants when we are
compiling against the accelerated variants - in this way we save space
and we avoid to exceed embedFS golang size limits.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(exllama): drop exllama backend
For polishing and cleaning up it makes now sense to drop exllama which
is completely unmaintained, and was only supporting the llamav1
architecture (nowadays it's superseded by llamav1) .
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(gosec): fix CI
downgrade to latest known version of the gosec action
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* stash initial fixes, attempt to open branch inside container
Signed-off-by: Dave Lee <dave@gray101.com>
* add yq, from inside DC
Signed-off-by: Dave Lee <dave@gray101.com>
* stash progress, rebuild container
Signed-off-by: Dave Lee <dave@gray101.com>
* snap
Signed-off-by: Dave Lee <dave@gray101.com>
* split builder into builder-sd, will speed up devcontainer build times and potentially help caching in other situations.
Signed-off-by: Dave Lee <dave@gray101.com>
* fix yq
Signed-off-by: Dave Lee <dave@gray101.com>
* fix paths
Signed-off-by: Dave Lee <dave@gray101.com>
* fix paths - new folder to bypass the .dockerignore which _should_ exclude the other files
Signed-off-by: Dave Lee <dave@gray101.com>
* fix
Signed-off-by: Dave Lee <dave@gray101.com>
* fix ]
Signed-off-by: Dave Lee <dave@gray101.com>
---------
Signed-off-by: Dave Lee <dave@gray101.com>
* fix(cuda): downgrade to 12.0 to increase compatibility range
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* improve messaging
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* use a sed hack to jam a missing line in place for grpc's abseil version.
Signed-off-by: Dave Lee <dave@gray101.com>
---------
Signed-off-by: Dave Lee <dave@gray101.com>
* feat(llama.cpp): Enable decentralized, distributed inference
As https://github.com/mudler/LocalAI/pull/2324 introduced distributed inferencing thanks to
@rgerganov implementation in https://github.com/ggerganov/llama.cpp/pull/6829 in upstream llama.cpp, now
it is possible to distribute the workload to remote llama.cpp gRPC server.
This changeset now uses mudler/edgevpn to establish a secure, distributed network between the nodes using a shared token.
The token is generated automatically when starting the server with the `--p2p` flag, and can be used by starting the workers
with `local-ai worker p2p-llama-cpp-rpc` by passing the token via environment variable (TOKEN) or with args (--token).
As per how mudler/edgevpn works, a network is established between the server and the workers with dht and mdns discovery protocols,
the llama.cpp rpc server is automatically started and exposed to the underlying p2p network so the API server can connect on.
When the HTTP server is started, it will discover the workers in the network and automatically create the port-forwards to the service locally.
Then llama.cpp is configured to use the services.
This feature is behind the "p2p" GO_FLAGS
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* go mod tidy
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci: add p2p tag
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* better message
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(parler-tts): Add new backend
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(parler-tts): try downgrade protobuf
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(parler-tts): add parler conda env
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Revert "feat(parler-tts): try downgrade protobuf"
This reverts commit bd5941d5cfc00676b45a99f71debf3c34249cf3c.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* deps: add grpc
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: try to gen proto with same environment
* workaround
* Revert "fix: try to gen proto with same environment"
This reverts commit 998c745e2f.
* Workaround fixup
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Dave <dave@gray101.com>
* feat(build): adjust number of parallel make jobs
* fix: update make on MacOS from brew to support --output-sync argument
* fix: cache grpc with version as part of key to improve validity of cache hits
* fix: use gmake for tests-apple to use the updated GNU make version
* fix: actually use the new make version for tests-apple
* feat: parallelize tests-extra
* feat: attempt to cache grpc build for docker images
* fix: don't quote GRPC version
* fix: don't cache go modules, we have limited cache space, better used elsewhere
* fix: release with the same version of go that we test with
* fix: don't fail on exporting cache layers
* fix: remove deprecated BUILD_GRPC docker arg from Makefile
* docs(aio): Add AIO images docs
* add image generation link to quickstart
* while reviewing I noticed this one link was missing, so quickly adding it.
Signed-off-by: Dave <dave@gray101.com>
Co-authored-by: Dave <dave@gray101.com>
* test with gguf instead of ggml. Updates testPrompt to match? Adds debugging line to Dockerfile that I've found helpful recently.
* fix testPrompt slightly
* Sad Experiment: Test GH runner without metal?
* break apart CGO_LDFLAGS
* switch runner
* upstream llama.cpp disables Metal on Github CI!
* missed a dir from clean-tests
* CGO_LDFLAGS
* tmate failure + NO_ACCELERATE
* whisper.cpp has a metal fix
* do the exact opposite of the name of this branch, but keep it around for unrelated fixes?
* add back newlines
* add tmate to linux for testing
* update fixtures
* timeout for tmate
* fix: clean up Makefile dependencies to allow for parallel builds
* refactor: remove old unused backend from Makefile
* fix: finish removing legacy backend, update piper
* fix: I broke llama... I fixed llama
* feat: give the tests and builds a few threads
* fix: ensure libraries are replaced before build, add dropreplace target
* Fix image build workflows