Commit Graph

116 Commits

Author SHA1 Message Date
fa5c98549a chore(refactor): track grpcProcess in the model structure (#3663)
* chore(refactor): track grpcProcess in the model structure

This avoids to have to handle in two parts the data relative to the same
model. It makes it easier to track and use mutex with.

This also fixes races conditions while accessing to the model.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(tests): run protogen-go before starting aio tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(tests): install protoc in aio tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-09-26 12:44:55 +02:00
a3d69872e3 feat(api): list loaded models in /system (#3661)
feat(api): list loaded models in /system

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-09-25 18:00:23 +02:00
a53392f919 chore(refactor): drop duplicated shutdown logics (#3589)
* chore(refactor): drop duplicated shutdown logics

- Handle locking in Shutdown and CheckModelIsLoaded in a more go-idiomatic way
- Drop duplicated code and re-organize shutdown code

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: drop leftover

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore: improve logging and add missing locks

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-09-17 16:51:40 +02:00
d0f2bf3181 fix(shutdown): do not shutdown immediately busy backends (#3543)
* fix(shutdown): do not shutdown immediately busy backends

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(refactor): avoid duplicate functions

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: multiplicative backoff for shutdown (#3547)

* multiplicative backoff for shutdown

Rather than always retry every two seconds, back off the shutdown attempt rate? 

Signed-off-by: Dave <dave@gray101.com>

* Update loader.go

Signed-off-by: Dave <dave@gray101.com>

* add clamp of 2 minutes

Signed-off-by: Dave Lee <dave@gray101.com>

---------

Signed-off-by: Dave <dave@gray101.com>
Signed-off-by: Dave Lee <dave@gray101.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Dave <dave@gray101.com>
Signed-off-by: Dave Lee <dave@gray101.com>
Co-authored-by: Dave <dave@gray101.com>
2024-09-17 04:50:57 +00:00
791c3ace72 feat: add endpoint to list system informations (#3449)
* feat: add endpoint to list system informations

For now, it lists the available backends, but can be expanded later on
to include more system informations (such as GPU devices detected, RAM,
threads configured, and so on so forth).

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* show also external backends

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* add test

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-09-05 20:44:30 +02:00
607fd066f0 chore(model-loader): increase test coverage of model loader (#3433)
chore(model-loader): increase coverage of model loader

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-08-30 15:20:39 +02:00
7f06954425 fix(model-loading): keep track of open GRPC Clients (#3377)
Due to a previous refactor we moved the client constructor tight to the
model address, however that was just a string which we would use to
build the client each time.

With this change we make the loader to return a *Model which carries a
constructor for the client and stores the client on the first
connection.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-08-25 14:36:09 +02:00
ac5f6f210b feat: external backend launching log improvements and relative path support (#3348)
* specify workdir when launching external backend for safety / relative paths, bump version, logs

Signed-off-by: Dave Lee <dave@gray101.com>

* sneak in a devcontainer fix

Signed-off-by: Dave Lee <dave@gray101.com>

---------

Signed-off-by: Dave Lee <dave@gray101.com>
2024-08-24 00:27:14 +02:00
8814b31805 chore: drop gpt4all.cpp (#3106)
chore: drop gpt4all

gpt4all is already supported in llama.cpp - the backend was kept for
keeping compatibility with old gpt4all models (prior to gguf format).

It is good time now to clean up and remove it to slim the compilation
process.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-08-07 23:35:55 +02:00
a9757fb057 fix(cuda): downgrade to 12.0 to increase compatibility range (#2994)
* fix(cuda): downgrade to 12.0 to increase compatibility range

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* improve messaging

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-07-23 23:35:31 +02:00
35d55572ac fix: do not list txt files as potential models (#2910)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-07-18 14:44:44 +02:00
642f6cee75 feat(webui): show also models without a config in the welcome page (#2772)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-07-11 19:55:01 +02:00
59ef426fbf feat(model-list): be consistent, skip known files from listing (#2760)
fix(model-list): be consistent, skip known files from listing

This changeset does two things:

- Removes the dependency of listing models from the OpenAI schema.
- Tries to reduce confusion between ListModels() in model loader and in
  the service - now there is only one ListModels which is in services
and does not depend anymore on the OpenAI schema
- The OpenAI-schema functions were moved nearby the OpenAI specific
  endpoints that needs the schema
- Drops the ListModel Service structure as there was no real need for
  it.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-07-10 15:28:39 +02:00
e591ff2e74 fix(initializer): do select backends that exist (#2694)
we were not checking if the binary exists before picking these up from
the asset dir.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-07-01 22:50:36 +02:00
bd2f95c130 feat(backend): fallback with autodetect (#2693)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-07-01 18:11:04 +02:00
a181dd0ebc refactor: gallery inconsistencies (#2647)
* refactor(gallery): move under core/

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(unarchive): do not allow symlinks

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-24 17:32:12 +02:00
12513ebae0 rf: centralize base64 image handling (#2595)
contains simple fixes to warnings and errors, removes a broken / outdated test, runs go mod tidy, and as the actual change, centralizes base64 image handling

Signed-off-by: Dave Lee <dave@gray101.com>
2024-06-24 08:34:36 +02:00
5866fc8ded chore: fix go.mod module (#2635)
Signed-off-by: Sertac Ozercan <sozercan@gmail.com>
2024-06-23 08:24:36 +00:00
89a11e15e7 fix(single-binary): bundle ld.so (#2602)
* debug

* fix copy command/silly muscle memory

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* remove tmate

* Debugging

* Start binary with ld.so if present in libdir

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* small refactor

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-18 22:43:43 +02:00
596cf76135 build(intel): bundle intel variants in single-binary (#2494)
* wip: try to build also intel variants

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add dependencies

* Select automatically intel backend

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-06 08:40:51 +02:00
d072835796 feat:OpaqueErrors to hide error information (#2486)
* adds a new configuration option to hide all error message information from http requests
---------

Signed-off-by: Dave Lee <dave@gray101.com>
2024-06-05 08:45:24 +02:00
17cf6c4a4d feat(amdgpu): try to build in single binary (#2485)
* feat(amdgpu): try to build in single binary

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Release space from worker

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-05 08:44:15 +02:00
1a3dedece0 dependencies(grpcio): bump to fix CI issues (#2362)
feat(grpcio): bump to fix CI issues

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-21 14:33:47 +02:00
c89271b2e4 feat(llama.cpp): add distributed llama.cpp inferencing (#2324)
* feat(llama.cpp): support distributed llama.cpp

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat: let tweak how chat messages are merged together

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Makefile: register to ALL_GRPC_BACKENDS

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactoring, allow disable auto-detection of backends

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* minor fixups

Signed-off-by: mudler <mudler@localai.io>

* feat: add cmd to start rpc-server from llama.cpp

Signed-off-by: mudler <mudler@localai.io>

* ci: add ccache

Signed-off-by: mudler <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: mudler <mudler@localai.io>
2024-05-15 01:17:02 +02:00
a670318a9f feat: auto select llama-cpp cuda runtime (#2306)
* auto select cpu variant

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

* remove cuda target for now

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

* fix metal

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

* fix path

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

* cuda

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

* auto select cuda

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

* update test

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

* select CUDA backend only if present

Signed-off-by: mudler <mudler@localai.io>

* ci: keep cuda bin in path

Signed-off-by: mudler <mudler@localai.io>

* Makefile: make dist now builds also cuda

Signed-off-by: mudler <mudler@localai.io>

* Keep pushing fallback in case auto-flagset/nvidia fails

There could be other reasons for which the default binary may fail. For example we might have detected an Nvidia GPU,
however the user might not have the drivers/cuda libraries installed in the system, and so it would fail to start.

We keep the fallback of llama.cpp at the end of the llama.cpp backends to try to fallback loading in case things go wrong

Signed-off-by: mudler <mudler@localai.io>

* Do not build cuda on MacOS

Signed-off-by: mudler <mudler@localai.io>

* cleanup

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

* Apply suggestions from code review

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Signed-off-by: mudler <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: mudler <mudler@localai.io>
2024-05-14 19:40:18 +02:00
e2c3ffb09b feat: auto select llama-cpp cpu variant (#2305)
* auto select cpu variant

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

* remove cuda target for now

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

* fix metal

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

* fix path

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

---------

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>
2024-05-13 11:37:52 +02:00
b69ff46c7e feat(startup): show CPU/GPU information with --debug (#2241)
Signed-off-by: mudler <mudler@localai.io>
2024-05-05 09:10:23 +02:00
530bec9c64 feat(llama.cpp): do not specify backends to autoload and add llama.cpp variants (#2232)
* feat(initializer): do not specify backends to autoload

We can simply try to autoload the backends extracted in the asset dir.
This will allow to build variants of the same backend (for e.g. with different instructions sets),
so to have a single binary for all the variants.

Signed-off-by: mudler <mudler@localai.io>

* refactor(prepare): refactor out llama.cpp prepare steps

Make it so are idempotent and that we can re-build

Signed-off-by: mudler <mudler@localai.io>

* [TEST] feat(build): build noavx version along

Signed-off-by: mudler <mudler@localai.io>

* build: make build parallel

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* build: do not override CMAKE_ARGS

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* build: add fallback variant

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(huggingface-langchain): fail if no token is set

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(huggingface-langchain): rename

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: do not autoload local-store

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: give priority between the listed backends

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: mudler <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-04 17:56:12 +02:00
11c48a0004 fix: security scanner warning noise: error handlers part 2 (#2145)
check off a few more error handlers

Signed-off-by: Dave Lee <dave@gray101.com>
2024-04-29 15:11:42 +02:00
e8d44447ad feat(gallery): support model deletion (#2173)
* feat(gallery): op now supports deletion of models

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Wire things with WebUI(WIP)

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* minor improvements

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-04-28 23:42:46 +02:00
c8dd8e5ef4 fix: reduce chmod permissions for created files and directories (#2137)
quiet more security scanner issues: pass one of chmod restriction to remove group and other permissions

Signed-off-by: Dave Lee <dave@gray101.com>
2024-04-26 00:47:06 +02:00
48d0aa2f6d models(gallery): add new models to the gallery (#2124)
* models: add reranker and parler-tts-mini

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: chatml im_end should not have a newline

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* models(noromaid): add

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* models(llama3): add 70b, add dolphin2.9

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* models(llama3): add unholy-8b

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* models(llama3): add therapyllama3, aura

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-04-25 01:28:02 +02:00
b2772509b4 models(llama3): add llama3 to embedded models (#2074)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-04-19 18:23:44 +02:00
27ec84827c refactor(template): isolate and add tests (#2069)
* refactor(template): isolate and add tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Dave <dave@gray101.com>
Co-authored-by: Dave <dave@gray101.com>
2024-04-19 02:40:18 +00:00
bbea62b907 feat(functions): support models with no grammar, add tests (#2068)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-04-18 22:43:12 +02:00
f9c75d4878 tests: add template tests (#2063)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-04-18 10:57:24 +02:00
af9e5a2d05 Revert #1963 (#2056)
* Revert "fix(fncall): fix regression introduced in #1963 (#2048)"

This reverts commit 6b06d4e0af.

* Revert "fix: action-tmate back to upstream, dead code removal (#2038)"

This reverts commit fdec8a9d00.

* Revert "feat(grpc): return consumed token count and update response accordingly (#2035)"

This reverts commit e843d7df0e.

* Revert "refactor: backend/service split, channel-based llm flow (#1963)"

This reverts commit eed5706994.

* feat(grpc): return consumed token count and update response accordingly

Fixes: #1920

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-04-17 23:33:49 +02:00
eed5706994 refactor: backend/service split, channel-based llm flow (#1963)
Refactor: channel based llm flow and services split

---------

Signed-off-by: Dave Lee <dave@gray101.com>
2024-04-13 09:45:34 +02:00
b85dad0286 feat: first pass at improving logging (#1956)
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
2024-04-04 09:24:22 +02:00
bd25d8049c fix(watchdog): use ShutdownModel instead of StopModel (#1882)
Fixes #1760
2024-03-23 16:19:57 +01:00
643d85d2cc feat(stores): Vector store backend (#1795)
Add simple vector store backend

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2024-03-22 21:14:04 +01:00
e533dcf506 feat(functions/aio): all-in-one images, function template enhancements (#1862)
* feat(startup): allow to specify models from local files

* feat(aio): add Dockerfile, make targets, aio profiles

* feat(template): add Function and LastMessage

* add hermes2-pro-mistral

* update hermes2 definition

* feat(template): add sprig

* feat(template): expose FunctionCall

* feat(aio): switch llm for text
2024-03-21 01:12:20 +01:00
88b65f63d0 fix(go-llama): use llama-cpp as default (#1849)
* fix(go-llama): use llama-cpp as default

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

* fix(backends): drop obsoleted lines

---------

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-03-17 23:08:22 +01:00
5d1018495f feat(intel): add diffusers/transformers support (#1746)
* feat(intel): add diffusers support

* try to consume upstream container image

* Debug

* Manually install deps

* Map transformers/hf cache dir to modelpath if not specified

* fix(compel): update initialization, pass by all gRPC options

* fix: add dependencies, implement transformers for xpu

* base it from the oneapi image

* Add pillow

* set threads if specified when launching the API

* Skip conda install if intel

* defaults to non-intel

* ci: add to pipelines

* prepare compel only if enabled

* Skip conda install if intel

* fix cleanup

* Disable compel by default

* Install torch 2.1.0 with Intel

* Skip conda on some setups

* Detect python

* Quiet output

* Do not override system python with conda

* Prefer python3

* Fixups

* exllama2: do not install without conda (overrides pytorch version)

* exllama/exllama2: do not install if not using cuda

* Add missing dataset dependency

* Small fixups, symlink to python, add requirements

* Add neural_speed to the deps

* correctly handle model offloading

* fix: device_map == xpu

* go back at calling python, fixed at dockerfile level

* Exllama2 restricted to only nvidia gpus

* Tokenizer to xpu
2024-03-07 14:37:45 +01:00
c72808f18b feat(tools): support Tool calls in the API (#1715)
* feat(tools): support Tools in the API

Co-authored-by: =?UTF-8?q?Stephan=20A=C3=9Fmus?= <stephan.assmus@sap.com>

* feat(tools): support function streaming

* Adhere to new return types when using tools instead of functions

* Keep backward compatibility with function calling

* Evaluate function names in chat templates

* Disable recovery with --debug

* Correctly stream out the entire result

* Detect when llm chooses to reply and to not perform any action in SSE

* Feedback from code review

---------

Co-authored-by: =?UTF-8?q?Stephan=20A=C3=9Fmus?= <stephan.assmus@sap.com>
2024-02-17 10:00:34 +01:00
ddd21f1644 feat: Use ubuntu as base for container images, drop deprecated ggml-transformers backends (#1689)
* cleanup backends

* switch image to ubuntu 22.04

* adapt commands for ubuntu

* transformers cleanup

* no contrib on ubuntu

* Change test model to gguf

* ci: disable bark tests (too cpu-intensive)

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* cleanup

* refinements

* use intel base image

* Makefile: Add docker targets

* Change test model

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-02-08 20:12:51 +01:00
98ad93d53e Drop ggml-based gpt2 and starcoder (supported by llama.cpp) (#1679)
* Drop ggml-based gpt2 and starcoder (supported by llama.cpp)

* Update compatibility table
2024-02-04 13:15:51 +01:00
df13ba655c Drop old falcon backend (deprecated) (#1675)
Drop old falcon backend
2024-02-03 13:01:13 +01:00
d5d82ba344 feat(grpc): backend SPI pluggable in embedding mode (#1621)
* run server

* grpc backend embedded support

* backend providable
2024-01-23 08:56:36 +01:00
e19d7226f8 feat: more embedded models, coqui fixes, add model usage and description (#1556)
* feat: add model descriptions and usage

* remove default model gallery

* models: add embeddings and tts

* docs: update table

* docs: updates

* images: cleanup pip cache after install

* images: always run apt-get clean

* ux: improve gRPC connection errors

* ux: improve some messages

* fix: fix coqui when no AudioPath is passed by

* embedded: add more models

* Add usage

* Reorder table
2024-01-08 00:37:02 +01:00