Ettore Di Giacinto
8ccf5b2044
feat(speculative-sampling): allow to specify a draft model in the model config ( #1052 )
...
**Description**
This PR fixes #1013 .
It adds `draft_model` and `n_draft` to the model YAML config in order to
load models with speculative sampling. This should be compatible as well
with grammars.
example:
```yaml
backend: llama
context_size: 1024
name: my-model-name
parameters:
model: foo-bar
n_draft: 16
draft_model: model-name
```
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-09-14 17:44:16 +02:00
Ettore Di Giacinto
dc307a1cc0
feat: add vall-e-x ( #1007 )
...
**Description**
This PR fixes #985
**Notes for Reviewers**
**[Signed
commits](../CONTRIBUTING.md#signing-off-on-commits-developer-certificate-of-origin)**
- [ ] Yes, I signed my commits.
<!--
Thank you for contributing to LocalAI!
Contributing Conventions:
1. Include descriptive PR titles with [<component-name>] prepended.
2. Build and test your changes before submitting a PR.
3. Sign your commits
By following the community's contribution conventions upfront, the
review process will
be accelerated and your PR merged more quickly.
-->
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-09-04 19:25:23 +02:00
Dave
005f289632
feat: Model Gallery Endpoint Refactor / Mutable Galleries Endpoints ( #991 )
...
refactor for model gallery endpoints - bundle up resources into a
struct, make galleries mutable with some crud endpoints. This is
groundwork required for making efficient use of the new scraper - while
that PR isn't _quite_ ready yet, the goal is to have more, individually
smaller gallery files. Therefore, rather than requiring a full localai
service restart, these new endpoints have been added to make life
easier.
- Adds endpoints to add, list and remove model galleries at runtime
- Adds these endpoints to the Insomnia config
- Minor fix: loading file urls follows symbolic links now
2023-09-02 09:00:44 +02:00
Ettore Di Giacinto
3bab307904
fix(llama): resolve lora adapters correctly from the model file ( #964 )
...
**Description**
we were otherwise expecting absolute paths. this make it relative to the
model file (as someone would expect)
**Notes for Reviewers**
**[Signed
commits](../CONTRIBUTING.md#signing-off-on-commits-developer-certificate-of-origin)**
- [ ] Yes, I signed my commits.
<!--
Thank you for contributing to LocalAI!
Contributing Conventions:
1. Include descriptive PR titles with [<component-name>] prepended.
2. Build and test your changes before submitting a PR.
3. Sign your commits
By following the community's contribution conventions upfront, the
review process will
be accelerated and your PR merged more quickly.
-->
2023-08-27 10:11:32 +02:00
Ettore Di Giacinto
44bc7aa3d0
feat: Allow to load lora adapters for llama.cpp ( #955 )
...
**Description**
This PR fixes #
**Notes for Reviewers**
**[Signed
commits](../CONTRIBUTING.md#signing-off-on-commits-developer-certificate-of-origin)**
- [ ] Yes, I signed my commits.
<!--
Thank you for contributing to LocalAI!
Contributing Conventions:
1. Include descriptive PR titles with [<component-name>] prepended.
2. Build and test your changes before submitting a PR.
3. Sign your commits
By following the community's contribution conventions upfront, the
review process will
be accelerated and your PR merged more quickly.
-->
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-08-25 21:58:46 +02:00
Ettore Di Giacinto
1120847f72
feat: bump llama.cpp, add gguf support ( #943 )
...
**Description**
This PR syncs up the `llama` backend to use `gguf`
(https://github.com/go-skynet/go-llama.cpp/pull/180 ). It also adds
`llama-stable` to the targets so we can still load ggml. It adapts the
current tests to use the `llama-backend` for ggml and uses a `gguf`
model to run tests on the new backend.
In order to consume the new version of go-llama.cpp, it also bump go to
1.21 (images, pipelines, etc)
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-08-24 01:18:58 +02:00
Dave
10b0e13882
feat: backend monitor shutdown endpoint, process based ( #938 )
...
This PR adds a new endpoint to the backend monitor section
`/backend/shutdown` which terminates the grpc process for the related
model.
2023-08-23 18:38:37 +02:00
Dave
901f0709c5
Feat: rwkv improvements: ( #937 )
2023-08-22 18:48:06 +02:00
Ettore Di Giacinto
ab5b75eb01
feat: add llama-stable backend ( #932 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-08-20 16:35:42 +02:00
Ettore Di Giacinto
cc060a283d
fix: drop racy code, refactor and group API schema ( #931 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-08-20 14:04:45 +02:00
Ettore Di Giacinto
afdc0ebfd7
feat: add --single-active-backend to allow only one backend active at the time ( #925 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-08-19 01:49:33 +02:00
Ettore Di Giacinto
1079b18ff7
feat(diffusers): be consistent with pipelines, support also depthimg2img ( #926 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-08-18 22:06:24 +02:00
Dave
8cb1061c11
Usage Features ( #863 )
2023-08-18 21:23:14 +02:00
Ettore Di Giacinto
2bacd0180d
feat(diffusers): add img2img and clip_skip, support more kernels schedulers ( #906 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-08-17 23:38:59 +02:00
Ettore Di Giacinto
37700f2d98
feat(diffusers): add DPMSolverMultistepScheduler++, DPMSolverMultistepSchedulerSDE++, guidance_scale ( #903 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-08-16 01:11:42 +02:00
Ettore Di Giacinto
0ec695f9e4
feat: make initializer accept gRPC delay times ( #900 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-08-16 01:11:32 +02:00
Ettore Di Giacinto
a96c3bc885
feat(diffusers): various enhancements ( #895 )
2023-08-14 23:12:00 +02:00
Ettore Di Giacinto
8c781a6a44
feat: Add Diffusers ( #874 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-08-09 08:38:51 +02:00
Ettore Di Giacinto
3c8fc37c56
feat: Add UseFastTokenizer
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-08-08 01:10:05 +02:00
Ettore Di Giacinto
39805b09e5
fix: pass by env in managed services
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-08-08 00:58:38 +02:00
Ettore Di Giacinto
63b01199fe
fix: match lowercase of the input, not of the model
2023-08-08 00:46:22 +02:00
Ettore Di Giacinto
a843e64fc2
feat: add initial AutoGPTQ backend implementation
2023-08-07 22:53:28 +02:00
Ettore Di Giacinto
acd829a7a0
fix: do not break on newlines on function returns ( #864 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-08-04 21:46:36 +02:00
Ettore Di Giacinto
4aa5dac768
feat: update integer, number and string rules - allow primitives as root types ( #862 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-08-03 23:32:30 +02:00
Ettore Di Giacinto
5ca21ee398
feat: add ngqa and RMSNormEps parameters ( #860 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-08-03 00:51:08 +02:00
Dave
7fb8b4191f
feat: "simple" chat/edit/completion template system prompt from config ( #856 )
2023-08-03 00:19:55 +02:00
Ettore Di Giacinto
d603a9cbb5
fix(gallery): preload from file should by in YAML format ( #846 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-31 21:13:16 +02:00
Dave
ce8e9dc690
feature: model list :: filter query string parameter ( #830 )
2023-07-31 19:14:32 +02:00
Dave
8e8d474ae8
refactor: Remove remaining uses of depreciated package io/ioutil
( #837 )
2023-07-30 11:23:43 +00:00
Dave
5ce0f216cf
Fix: Model Gallery Downloads ( #835 )
2023-07-30 09:47:22 +02:00
Ettore Di Giacinto
00ccb8d4f1
fix: set default rope freq base to 10000 during model load
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-29 10:40:56 +02:00
Dave
8b90ac2b1a
1000 -> 10,000 for ropeFreqBase?
...
the error message talks about a default of 10k, so setting this to 10k instead of 1k experimentally.
2023-07-29 02:37:24 -04:00
Ettore Di Giacinto
f085baa77d
fix: set default rope if not specified
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-29 01:07:16 +02:00
Ettore Di Giacinto
096d98c3d9
fix: add rope settings during model load, fix CUDA ( #821 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-27 21:56:05 +02:00
Ettore Di Giacinto
b96e30e66c
fix: use bytes in gRPC proto instead of strings ( #813 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-27 18:41:04 +02:00
Ettore Di Giacinto
569c1d1163
feat: add rope settings and negative prompt, drop grammar backend ( #797 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-25 19:05:27 +02:00
Aman Gupta Karmani
12fe0932c4
feat: cancel stream generation if client disappears ( #792 )
2023-07-24 23:10:54 +02:00
Dave
c6bf67f446
feat(llama2): add template for chat messages ( #782 )
...
Co-authored-by: Aman Karmani <aman@tmm1.net>
Lays some of the groundwork for LLAMA2 compatibility as well as other future models with complex prompting schemes.
Started small refactoring in pkg/model/loader.go regarding template loading. Currently still a part of ModelLoader, but should be easy to add template loading for situations other than overall prompt templates and the new chat-specific per-message templates
Adds support for new chat-endpoint-specific, per-message templates as an alternative to the existing Role: XYZ sprintf method.
Includes a temporary prompt template as an example, since I have a few questions before we merge in the model-gallery side changes (see )
Minor debug logging changes.
2023-07-22 11:31:39 -04:00
Ettore Di Giacinto
c71c729bc2
debug
2023-07-21 10:53:26 +02:00
Ettore Di Giacinto
94916749c5
feat: add external grpc and model autoloading
2023-07-20 22:10:12 +02:00
Ettore Di Giacinto
47cc95fc9f
feat: add all backends to autoload
...
Now since gRPCs are not crashing the main thread we can just greedly
attempt all the backends we have available.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-20 00:40:28 +02:00
Ettore Di Giacinto
3feb632eb4
refactor: rename "llama-master" and "llama" ( #776 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-20 00:36:16 +02:00
Ettore Di Giacinto
236497e331
feat: resolve JSONSchema refs (planners) ( #774 )
2023-07-19 22:56:13 +02:00
Ettore Di Giacinto
6352448b72
feat: add llama-master backend ( #752 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-17 23:58:15 +02:00
Ettore Di Giacinto
1d0ed95a54
feat: move other backends to grpc
...
This finally makes everything more consistent
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-15 01:19:43 +02:00
Ettore Di Giacinto
5dcfdbe51d
feat: various refactorings
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-15 01:19:43 +02:00
Ettore Di Giacinto
f2f1d7fe72
feat: use gRPC for transformers
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-15 01:19:43 +02:00
Ettore Di Giacinto
ae533cadef
feat: move gpt4all to a grpc service
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-15 01:19:43 +02:00
Ettore Di Giacinto
58f6aab637
feat: move llama to a grpc
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-15 01:19:43 +02:00
Ettore Di Giacinto
b816009db0
feat: add falcon ggllm via grpc client
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-15 01:19:43 +02:00
mudler
55befe396a
Add grammar_json to the request parameters to facilitate JSON generation
2023-07-06 19:08:04 +02:00
mudler
c0578031b5
Add tests
...
Signed-off-by: mudler <mudler@localai.io>
2023-07-04 18:58:19 +02:00
mudler
b722e7eb7e
feat: cleanups, small enhancements
...
Signed-off-by: mudler <mudler@localai.io>
2023-07-04 18:58:19 +02:00
mudler
f09ddd2983
feat: add grammar and functions call support
2023-07-04 18:58:19 +02:00
Luis López
a6839fd238
feat: [whisper] Partial support for verbose_json format in transcribe endpoint ( #721 )
2023-07-04 14:31:31 +02:00
Ettore Di Giacinto
bf5acf646e
fix: adapt whisper to bindings updates ( #702 )
...
Signed-off-by: mudler <mudler@localai.io>
2023-06-29 11:26:07 +02:00
Ettore Di Giacinto
78f3c3da48
refactor: consolidate usage of GetURI ( #674 )
...
Signed-off-by: mudler <mudler@localai.io>
2023-06-26 12:25:38 +02:00
mudler
d18f85df46
fix: add tags
...
Signed-off-by: mudler <mudler@localai.io>
2023-06-25 23:03:58 +02:00
Ettore Di Giacinto
6213da330a
fix: add omitempty where needed ( #671 )
2023-06-25 22:51:02 +02:00
Ettore Di Giacinto
60db5957d3
Gallery repository ( #663 )
...
Signed-off-by: mudler <mudler@localai.io>
2023-06-24 08:18:17 +02:00
Ettore Di Giacinto
a7bb029d23
feat: add tts with go-piper ( #649 )
...
Signed-off-by: mudler <mudler@localai.io>
2023-06-22 17:53:10 +02:00
Ettore Di Giacinto
e37361985c
deps: update gpt4all bindings, fix search path on new versions ( #592 )
2023-06-14 13:24:53 +02:00
Ettore Di Giacinto
84946e9275
feat: display download progress when installing models ( #543 )
2023-06-08 21:33:18 +02:00
Ettore Di Giacinto
d62aef2016
feat: add experimental support for falcon-7b ( #516 )
...
Signed-off-by: mudler <mudler@mocaccino.org>
2023-06-06 17:23:19 +02:00
Ettore Di Giacinto
b447a2a719
feat: support upscaled image generation with esrgan ( #509 )
2023-06-05 17:21:38 +02:00
Ettore Di Giacinto
78ad4813df
feat: Update gpt4all, support multiple implementations in runtime ( #472 )
...
Signed-off-by: mudler <mudler@mocaccino.org>
2023-06-01 23:38:52 +02:00
Pavel Zloi
3ba07a5928
feat: add LangChainGo Huggingface backend ( #446 )
...
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2023-06-01 12:00:06 +02:00
Ettore Di Giacinto
9decd0813c
feat: update go-gpt2 ( #359 )
...
Signed-off-by: mudler <mudler@mocaccino.org>
2023-05-23 21:47:47 +02:00
Ettore Di Giacinto
05a3d569b0
feat: allow to override model config ( #323 )
2023-05-20 17:03:53 +02:00
Ettore Di Giacinto
1fade53a61
feat: minor enhancements to /models/apply ( #297 )
2023-05-19 08:31:11 +02:00
Ettore Di Giacinto
cc9aa9eb3f
feat: add /models/apply endpoint to prepare models ( #286 )
2023-05-18 15:59:03 +02:00
Ettore Di Giacinto
9d051c5d4f
feat: add image generation with ncnn-stablediffusion ( #272 )
2023-05-16 19:32:53 +02:00
Ettore Di Giacinto
2a9d7474ce
fix(rwkv): load tokenizer file from model path ( #255 )
2023-05-14 17:49:10 +02:00
Ettore Di Giacinto
8250391e49
Add support for gptneox/replit ( #238 )
2023-05-12 11:36:35 +02:00
Ettore Di Giacinto
fd1df4e971
whisper: add tests and allow to set upload size ( #237 )
2023-05-12 10:04:20 +02:00
Ettore Di Giacinto
4413defca5
feat: add starcoder ( #236 )
2023-05-11 20:20:07 +02:00
Ettore Di Giacinto
85f0f8227d
refactor: drop code dups ( #234 )
2023-05-11 16:34:16 +02:00
Ettore Di Giacinto
59e3c02002
make use of new bindings for gpt4all ( #232 )
2023-05-11 14:31:19 +02:00
Matthew Campbell
032dee256f
Keep whisper models in memory ( #233 )
2023-05-11 14:05:07 +02:00
Ettore Di Giacinto
11675932ac
feat: add dolly/redpajama/bloomz models support ( #214 )
2023-05-11 01:12:58 +02:00
Ettore Di Giacinto
f8ee20991c
feat: add bert.cpp embeddings ( #222 )
2023-05-10 15:20:21 +02:00
Ettore Di Giacinto
9f426578cf
feat: add transcript endpoint ( #211 )
2023-05-09 11:43:50 +02:00
Ettore Di Giacinto
c839b334eb
feat: add embeddings for go-llama.cpp backend ( #190 )
2023-05-05 11:20:06 +02:00
Ettore Di Giacinto
714bfcd45b
fix: missing returning error and free callback stream ( #187 )
2023-05-04 19:49:43 +02:00
Ettore Di Giacinto
751b7eca62
feat: add rwkv support ( #158 )
...
Signed-off-by: mudler <mudler@mocaccino.org>
2023-05-03 11:45:22 +02:00
Ettore Di Giacinto
1ae7150810
feat: allow to specify default backend for model ( #156 )
...
Signed-off-by: mudler <mudler@c3os.io>
2023-05-03 00:31:28 +02:00
Ettore Di Giacinto
156e15a4fa
Bump llama.cpp, downgrade gpt4all-j ( #149 )
2023-05-02 16:07:18 +02:00
Ettore Di Giacinto
92452d46da
feat: add new gpt4all-j binding ( #142 )
2023-05-01 20:00:15 +02:00
Ettore Di Giacinto
c806eae0de
feat: config files and SSE ( #83 )
...
Signed-off-by: mudler <mudler@mocaccino.org>
Signed-off-by: Tyler Gillson <tyler.gillson@gmail.com>
Co-authored-by: Tyler Gillson <tyler.gillson@gmail.com>
2023-04-26 21:18:18 -07:00
Ettore Di Giacinto
f816dfae65
Add support for stablelm ( #48 )
...
Signed-off-by: mudler <mudler@mocaccino.org>
2023-04-21 00:06:55 +02:00
Ettore Di Giacinto
1c4fbaae20
Add support for cerebras ( #45 )
...
Signed-off-by: mudler <mudler@c3os.io>
2023-04-20 19:33:36 +02:00
Ettore Di Giacinto
d517a54e28
Major API enhancements ( #44 )
2023-04-20 18:33:02 +02:00
Ettore Di Giacinto
7fec26f5d3
Enhancements ( #34 )
...
Signed-off-by: mudler <mudler@c3os.io>
2023-04-19 17:10:29 +02:00
mudler
5556aa46dd
Small refinements and refactors
2023-04-12 00:02:39 +02:00
mudler
ae30bd346d
Reorganize repository layout
2023-04-11 23:43:43 +02:00