Commit Graph

107 Commits

Author SHA1 Message Date
Ettore Di Giacinto
548959b50f
feat: queue up requests if not running parallel requests ()
Return a GRPC which handles a lock in case it is not meant to be
parallel.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-11-16 22:20:16 +01:00
Ettore Di Giacinto
fdd95d1d86
feat: allow to run parallel requests ()
* feat: allow to run parallel requests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fixup

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-11-16 08:20:05 +01:00
Ettore Di Giacinto
0eae727366
🔥 add LaVA support and GPT vision API, Multiple requests for llama.cpp, return JSON types ()
* wip

* wip

* Make it functional

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* wip

* Small fixups

* do not inject space on role encoding, encode img at beginning of messages

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add examples/config defaults

* Add include dir of current source dir

* cleanup

* fixes

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fixups

* Revert "fixups"

This reverts commit f1a4731cca.

* fixes

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-11-11 13:14:59 +01:00
Ettore Di Giacinto
c62504ac92
cleanup: drop bloomz and ggllm as now supported by llama.cpp ()
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-10-26 07:43:31 +02:00
Ettore Di Giacinto
128694213f
feat: llama.cpp gRPC C++ backend ()
* wip: llama.cpp c++ gRPC server

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* make it work, attach it to the build process

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* update deps

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: add protobuf dep

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* try fix protobuf on cmake

* cmake: workarounds

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* add packages

* cmake: use fixed version of grpc

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* cmake(grpc): install locally

* install grpc

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* install required deps for grpc on debian bullseye

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* debug

* debug

* Fixups

* no need to install cmake manually

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ci: fixup macOS

* use brew whenever possible

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* macOS fixups

* debug

* fix container build

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* workaround

* try mac

https://stackoverflow.com/questions/23905661/on-mac-g-clang-fails-to-search-usr-local-include-and-usr-local-lib-by-def

* Disable temp. arm64 docker image builds

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-10-16 21:46:29 +02:00
Dave
10b0e13882
feat: backend monitor shutdown endpoint, process based ()
This PR adds a new endpoint to the backend monitor section
`/backend/shutdown` which terminates the grpc process for the related
model.
2023-08-23 18:38:37 +02:00
Ettore Di Giacinto
ab5b75eb01
feat: add llama-stable backend ()
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-08-20 16:35:42 +02:00
Ettore Di Giacinto
afdc0ebfd7
feat: add --single-active-backend to allow only one backend active at the time ()
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-08-19 01:49:33 +02:00
Dave
8cb1061c11
Usage Features () 2023-08-18 21:23:14 +02:00
Ettore Di Giacinto
0ec695f9e4
feat: make initializer accept gRPC delay times ()
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-08-16 01:11:32 +02:00
Ettore Di Giacinto
8c781a6a44
feat: Add Diffusers ()
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-08-09 08:38:51 +02:00
Ettore Di Giacinto
39805b09e5 fix: pass by env in managed services
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-08-08 00:58:38 +02:00
Ettore Di Giacinto
a843e64fc2 feat: add initial AutoGPTQ backend implementation 2023-08-07 22:53:28 +02:00
Dave
7fb8b4191f
feat: "simple" chat/edit/completion template system prompt from config () 2023-08-03 00:19:55 +02:00
Dave
ce8e9dc690
feature: model list :: filter query string parameter () 2023-07-31 19:14:32 +02:00
Ettore Di Giacinto
569c1d1163
feat: add rope settings and negative prompt, drop grammar backend ()
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-25 19:05:27 +02:00
Aman Gupta Karmani
12fe0932c4
feat: cancel stream generation if client disappears () 2023-07-24 23:10:54 +02:00
Dave
c6bf67f446
feat(llama2): add template for chat messages ()
Co-authored-by: Aman Karmani <aman@tmm1.net>

Lays some of the groundwork for LLAMA2 compatibility as well as other future models with complex prompting schemes.

Started small refactoring in pkg/model/loader.go regarding template loading. Currently still a part of ModelLoader, but should be easy to add template loading for situations other than overall prompt templates and the new chat-specific per-message templates
Adds support for new chat-endpoint-specific, per-message templates as an alternative to the existing Role: XYZ sprintf method.
Includes a temporary prompt template as an example, since I have a few questions before we merge in the model-gallery side changes (see )
Minor debug logging changes.
2023-07-22 11:31:39 -04:00
Ettore Di Giacinto
c71c729bc2 debug 2023-07-21 10:53:26 +02:00
Ettore Di Giacinto
94916749c5 feat: add external grpc and model autoloading 2023-07-20 22:10:12 +02:00
Ettore Di Giacinto
47cc95fc9f feat: add all backends to autoload
Now since gRPCs are not crashing the main thread we can just greedly
attempt all the backends we have available.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-20 00:40:28 +02:00
Ettore Di Giacinto
3feb632eb4
refactor: rename "llama-master" and "llama" ()
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-20 00:36:16 +02:00
Ettore Di Giacinto
6352448b72
feat: add llama-master backend ()
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-17 23:58:15 +02:00
Ettore Di Giacinto
1d0ed95a54 feat: move other backends to grpc
This finally makes everything more consistent

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-15 01:19:43 +02:00
Ettore Di Giacinto
f2f1d7fe72 feat: use gRPC for transformers
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-15 01:19:43 +02:00
Ettore Di Giacinto
ae533cadef feat: move gpt4all to a grpc service
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-15 01:19:43 +02:00
Ettore Di Giacinto
58f6aab637 feat: move llama to a grpc
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-15 01:19:43 +02:00
Ettore Di Giacinto
b816009db0 feat: add falcon ggllm via grpc client
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-15 01:19:43 +02:00
Ettore Di Giacinto
a7bb029d23
feat: add tts with go-piper ()
Signed-off-by: mudler <mudler@localai.io>
2023-06-22 17:53:10 +02:00
Ettore Di Giacinto
e37361985c
deps: update gpt4all bindings, fix search path on new versions () 2023-06-14 13:24:53 +02:00
Ettore Di Giacinto
d62aef2016
feat: add experimental support for falcon-7b ()
Signed-off-by: mudler <mudler@mocaccino.org>
2023-06-06 17:23:19 +02:00
Ettore Di Giacinto
78ad4813df
feat: Update gpt4all, support multiple implementations in runtime ()
Signed-off-by: mudler <mudler@mocaccino.org>
2023-06-01 23:38:52 +02:00
Pavel Zloi
3ba07a5928
feat: add LangChainGo Huggingface backend ()
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2023-06-01 12:00:06 +02:00
Ettore Di Giacinto
9decd0813c
feat: update go-gpt2 ()
Signed-off-by: mudler <mudler@mocaccino.org>
2023-05-23 21:47:47 +02:00
Ettore Di Giacinto
cc9aa9eb3f
feat: add /models/apply endpoint to prepare models () 2023-05-18 15:59:03 +02:00
Ettore Di Giacinto
9d051c5d4f
feat: add image generation with ncnn-stablediffusion () 2023-05-16 19:32:53 +02:00
Ettore Di Giacinto
2a9d7474ce
fix(rwkv): load tokenizer file from model path () 2023-05-14 17:49:10 +02:00
Ettore Di Giacinto
8250391e49
Add support for gptneox/replit () 2023-05-12 11:36:35 +02:00
Ettore Di Giacinto
4413defca5
feat: add starcoder () 2023-05-11 20:20:07 +02:00
Ettore Di Giacinto
85f0f8227d
refactor: drop code dups () 2023-05-11 16:34:16 +02:00
Ettore Di Giacinto
59e3c02002
make use of new bindings for gpt4all () 2023-05-11 14:31:19 +02:00
Matthew Campbell
032dee256f
Keep whisper models in memory () 2023-05-11 14:05:07 +02:00
Ettore Di Giacinto
11675932ac
feat: add dolly/redpajama/bloomz models support () 2023-05-11 01:12:58 +02:00
Ettore Di Giacinto
f8ee20991c
feat: add bert.cpp embeddings () 2023-05-10 15:20:21 +02:00
Ettore Di Giacinto
c839b334eb
feat: add embeddings for go-llama.cpp backend () 2023-05-05 11:20:06 +02:00
Ettore Di Giacinto
714bfcd45b
fix: missing returning error and free callback stream () 2023-05-04 19:49:43 +02:00
Ettore Di Giacinto
751b7eca62
feat: add rwkv support ()
Signed-off-by: mudler <mudler@mocaccino.org>
2023-05-03 11:45:22 +02:00
Ettore Di Giacinto
1ae7150810
feat: allow to specify default backend for model ()
Signed-off-by: mudler <mudler@c3os.io>
2023-05-03 00:31:28 +02:00
Ettore Di Giacinto
156e15a4fa
Bump llama.cpp, downgrade gpt4all-j () 2023-05-02 16:07:18 +02:00
Ettore Di Giacinto
92452d46da
feat: add new gpt4all-j binding () 2023-05-01 20:00:15 +02:00
Ettore Di Giacinto
c806eae0de
feat: config files and SSE ()
Signed-off-by: mudler <mudler@mocaccino.org>
Signed-off-by: Tyler Gillson <tyler.gillson@gmail.com>
Co-authored-by: Tyler Gillson <tyler.gillson@gmail.com>
2023-04-26 21:18:18 -07:00
Ettore Di Giacinto
f816dfae65
Add support for stablelm ()
Signed-off-by: mudler <mudler@mocaccino.org>
2023-04-21 00:06:55 +02:00
Ettore Di Giacinto
1c4fbaae20
Add support for cerebras ()
Signed-off-by: mudler <mudler@c3os.io>
2023-04-20 19:33:36 +02:00
Ettore Di Giacinto
d517a54e28
Major API enhancements () 2023-04-20 18:33:02 +02:00
Ettore Di Giacinto
7fec26f5d3
Enhancements ()
Signed-off-by: mudler <mudler@c3os.io>
2023-04-19 17:10:29 +02:00
mudler
5556aa46dd Small refinements and refactors 2023-04-12 00:02:39 +02:00
mudler
ae30bd346d Reorganize repository layout 2023-04-11 23:43:43 +02:00