Commit Graph

138 Commits

Author SHA1 Message Date
2d7913b3be feat(assistant): Assistant and AssistantFiles api (#1803)
* Initial implementation of assistants api

* Move load/save configs to utils

* Save assistant and assistantfiles config to disk.

* Add tsets for assistant api

* Fix models path spelling mistake.

* Remove personal go.mod information

---------

Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-03-26 18:54:35 +01:00
bd25d8049c fix(watchdog): use ShutdownModel instead of StopModel (#1882)
Fixes #1760
2024-03-23 16:19:57 +01:00
643d85d2cc feat(stores): Vector store backend (#1795)
Add simple vector store backend

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2024-03-22 21:14:04 +01:00
e533dcf506 feat(functions/aio): all-in-one images, function template enhancements (#1862)
* feat(startup): allow to specify models from local files

* feat(aio): add Dockerfile, make targets, aio profiles

* feat(template): add Function and LastMessage

* add hermes2-pro-mistral

* update hermes2 definition

* feat(template): add sprig

* feat(template): expose FunctionCall

* feat(aio): switch llm for text
2024-03-21 01:12:20 +01:00
88b65f63d0 fix(go-llama): use llama-cpp as default (#1849)
* fix(go-llama): use llama-cpp as default

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

* fix(backends): drop obsoleted lines

---------

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-03-17 23:08:22 +01:00
20136ca8b7 feat(tts): add Elevenlabs and OpenAI TTS compatibility layer (#1834)
* feat(elevenlabs): map elevenlabs API support to TTS

This allows elevenlabs Clients to work automatically with LocalAI by
supporting the elevenlabs API.

The elevenlabs server endpoint is implemented such as it is wired to the
TTS endpoints.

Fixes: https://github.com/mudler/LocalAI/issues/1809

* feat(openai/tts): compat layer with openai tts

Fixes: #1276

* fix: adapt tts CLI
2024-03-14 23:08:34 +01:00
5d1018495f feat(intel): add diffusers/transformers support (#1746)
* feat(intel): add diffusers support

* try to consume upstream container image

* Debug

* Manually install deps

* Map transformers/hf cache dir to modelpath if not specified

* fix(compel): update initialization, pass by all gRPC options

* fix: add dependencies, implement transformers for xpu

* base it from the oneapi image

* Add pillow

* set threads if specified when launching the API

* Skip conda install if intel

* defaults to non-intel

* ci: add to pipelines

* prepare compel only if enabled

* Skip conda install if intel

* fix cleanup

* Disable compel by default

* Install torch 2.1.0 with Intel

* Skip conda on some setups

* Detect python

* Quiet output

* Do not override system python with conda

* Prefer python3

* Fixups

* exllama2: do not install without conda (overrides pytorch version)

* exllama/exllama2: do not install if not using cuda

* Add missing dataset dependency

* Small fixups, symlink to python, add requirements

* Add neural_speed to the deps

* correctly handle model offloading

* fix: device_map == xpu

* go back at calling python, fixed at dockerfile level

* Exllama2 restricted to only nvidia gpus

* Tokenizer to xpu
2024-03-07 14:37:45 +01:00
939411300a Bump vLLM version + more options when loading models in vLLM (#1782)
* Bump vLLM version to 0.3.2

* Add vLLM model loading options

* Remove transformers-exllama

* Fix install exllama
2024-03-01 22:48:53 +01:00
1c312685aa refactor: move remaining api packages to core (#1731)
* core 1

* api/openai/files fix

* core 2 - core/config

* move over core api.go and tests to the start of core/http

* move over localai specific endpoints to core/http, begin the service/endpoint split there

* refactor big chunk on the plane

* refactor chunk 2 on plane, next step: port and modify changes to request.go

* easy fixes for request.go, major changes not done yet

* lintfix

* json tag lintfix?

* gitignore and .keep files

* strange fix attempt: rename the config dir?
2024-03-01 16:19:53 +01:00
255748bcba MQTT Startup Refactoring Part 1: core/ packages part 1 (#1728)
This PR specifically introduces a `core` folder and moves the following packages over, without any other changes:

- `api/backend`
- `api/config`
- `api/options`
- `api/schema`

Once this is merged and we confirm there's no regressions, I can migrate over the remaining changes piece by piece to split up application startup, backend services, http, and mqtt as was the goal of the earlier PRs!
2024-02-21 01:21:19 +00:00
960d314e4f feat(tools): Parallel function calling (#1726)
feat(tools): support returning multiple tools choices

Fixes: https://github.com/mudler/LocalAI/issues/1275
2024-02-20 21:58:45 +01:00
01205fd4c0 Initial implementation of upload files api. (#1703)
* Initial implementation of upload files api.

* Move sanitize method to utils.

* Save uploaded data to uploads folder.

* Avoid loop if we do not have a purpose.

* Minor cleanup of api and fix bug where deleting duplicate filename cause error.

* Revert defer of saving config

* Moved creation of directory to startup.

* Make file names unique when storing on disk.

* Add test for files api.

* Update dependencies.
2024-02-18 10:12:02 +00:00
c72808f18b feat(tools): support Tool calls in the API (#1715)
* feat(tools): support Tools in the API

Co-authored-by: =?UTF-8?q?Stephan=20A=C3=9Fmus?= <stephan.assmus@sap.com>

* feat(tools): support function streaming

* Adhere to new return types when using tools instead of functions

* Keep backward compatibility with function calling

* Evaluate function names in chat templates

* Disable recovery with --debug

* Correctly stream out the entire result

* Detect when llm chooses to reply and to not perform any action in SSE

* Feedback from code review

---------

Co-authored-by: =?UTF-8?q?Stephan=20A=C3=9Fmus?= <stephan.assmus@sap.com>
2024-02-17 10:00:34 +01:00
ddd21f1644 feat: Use ubuntu as base for container images, drop deprecated ggml-transformers backends (#1689)
* cleanup backends

* switch image to ubuntu 22.04

* adapt commands for ubuntu

* transformers cleanup

* no contrib on ubuntu

* Change test model to gguf

* ci: disable bark tests (too cpu-intensive)

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* cleanup

* refinements

* use intel base image

* Makefile: Add docker targets

* Change test model

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-02-08 20:12:51 +01:00
98ad93d53e Drop ggml-based gpt2 and starcoder (supported by llama.cpp) (#1679)
* Drop ggml-based gpt2 and starcoder (supported by llama.cpp)

* Update compatibility table
2024-02-04 13:15:51 +01:00
df13ba655c Drop old falcon backend (deprecated) (#1675)
Drop old falcon backend
2024-02-03 13:01:13 +01:00
6ac5d814fb feat(startup): fetch model definition remotely (#1654) 2024-01-28 00:14:16 +01:00
cb7512734d transformers: correctly load automodels (#1643)
* backends(transformers): use AutoModel with LLM types

* examples: animagine-xl

* Add codellama examples
2024-01-26 00:13:21 +01:00
d5d82ba344 feat(grpc): backend SPI pluggable in embedding mode (#1621)
* run server

* grpc backend embedded support

* backend providable
2024-01-23 08:56:36 +01:00
e19d7226f8 feat: more embedded models, coqui fixes, add model usage and description (#1556)
* feat: add model descriptions and usage

* remove default model gallery

* models: add embeddings and tts

* docs: update table

* docs: updates

* images: cleanup pip cache after install

* images: always run apt-get clean

* ux: improve gRPC connection errors

* ux: improve some messages

* fix: fix coqui when no AudioPath is passed by

* embedded: add more models

* Add usage

* Reorder table
2024-01-08 00:37:02 +01:00
09e5d9007b feat: embedded model configurations, add popular model examples, refactoring (#1532)
* move downloader out

* separate startup functions for preloading configuration files

* docs: add popular model examples

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* shorteners

* Add llava

* Add mistral-openorca

* Better link to build section

* docs: update

* fixup

* Drop code dups

* Minor fixups

* Apply suggestions from code review

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

* ci: try to cache gRPC build during tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ci: do not build all images for tests, just necessary

* ci: cache gRPC also in release pipeline

* fixes

* Update model_preload_test.go

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-01-05 23:16:33 +01:00
db926896bd Revert "[Refactor]: Core/API Split" (#1550)
Revert "[Refactor]: Core/API Split (#1506)"

This reverts commit ab7b4d5ee9.
2024-01-05 18:04:46 +01:00
ab7b4d5ee9 [Refactor]: Core/API Split (#1506)
Refactors api folder to core, creates firm split between backend code and api frontend.
2024-01-05 15:34:56 +01:00
66fa4f1767 feat: share models by url (#1522)
* feat: allow to pass by models via args

* expose it also as an env/arg

* docs: enhancements to build/requirements

* do not display status always

* print download status

* not all mesages are debug
2024-01-01 10:31:03 +01:00
a95bb0521d fix(download): correctly check for not found error (#1514) 2023-12-30 15:36:46 +01:00
cae7b197ec feat: add tiny dream stable diffusion support (#1283)
Signed-off-by: Gianluca Boiano <morf3089@gmail.com>
2023-12-24 19:27:24 +00:00
l
f7621b2c6c feat: partial download (#1486)
* add .partial download

* fix Stat check

* review partial download
2023-12-24 19:39:33 +01:00
1fc3a375df feat: inline templates and accept URLs in models (#1452)
* feat: Allow inline templates

* feat: Allow to specify url in model config files

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

* feat: support 'huggingface://' format

* style: reuse-code from gallery

---------

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2023-12-18 18:58:44 +01:00
3d83128f16 feat(alias): alias llama to llama-cpp, update docs (#1448)
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2023-12-16 18:22:45 +01:00
7641f92cde feat(diffusers): update, add autopipeline, controlnet (#1432)
* feat(diffusers): update, add autopipeline, controlenet

* tests with AutoPipeline

* simplify logic
2023-12-13 19:20:22 +01:00
8b6e601405 Feat: new backend: transformers-musicgen (#1387)
Transformers-MusicGen
---------

Signed-off-by: Dave <dave@gray101.com>
2023-12-08 10:01:02 +01:00
824612f1b4 feat: initial watchdog implementation (#1341)
* feat: initial watchdog implementation

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

* fiuxups

* Add more output

* wip: idletime checker

* wire idle watchdog checks

* enlarge watchdog time window

* small fixes

* Use stopmodel

* Always delete process

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-11-26 18:36:23 +01:00
3c9544b023 refactor: rename llama-stable to llama-ggml (#1287)
* refactor: rename llama-stable to llama-ggml

* Makefile: get sources in sources/

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fixup path

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fixup sources

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fixups sd

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* update SD

* fixup

* fixup: create piper libdir also when not built

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix make target on linux test

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-11-18 08:18:43 +01:00
548959b50f feat: queue up requests if not running parallel requests (#1296)
Return a GRPC which handles a lock in case it is not meant to be
parallel.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-11-16 22:20:16 +01:00
fdd95d1d86 feat: allow to run parallel requests (#1290)
* feat: allow to run parallel requests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fixup

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-11-16 08:20:05 +01:00
ad0e30bca5 refactor: move backends into the backends directory (#1279)
* refactor: move backends into the backends directory

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor: move main close to implementation for every backend

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-11-13 22:40:16 +01:00
803a0ac02a feat(llama.cpp): support lora with scale and yarn (#1277)
* feat(llama.cpp): support lora with scale

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(llama.cpp): support yarn

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-11-11 18:40:48 +01:00
0eae727366 🔥 add LaVA support and GPT vision API, Multiple requests for llama.cpp, return JSON types (#1254)
* wip

* wip

* Make it functional

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* wip

* Small fixups

* do not inject space on role encoding, encode img at beginning of messages

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add examples/config defaults

* Add include dir of current source dir

* cleanup

* fixes

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fixups

* Revert "fixups"

This reverts commit f1a4731cca.

* fixes

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-11-11 13:14:59 +01:00
c62504ac92 cleanup: drop bloomz and ggllm as now supported by llama.cpp (#1217)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-10-26 07:43:31 +02:00
b839eb80a1 Fix backend/cpp/llama CMakeList.txt on OSX (#1212)
* Fix backend/cpp/llama CMakeList.txt on OSX - detect OSX and use homebrew libraries

* sneak a logging fix in too for gallery debugging

* additional logging
2023-10-25 20:53:26 +02:00
128694213f feat: llama.cpp gRPC C++ backend (#1170)
* wip: llama.cpp c++ gRPC server

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* make it work, attach it to the build process

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* update deps

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: add protobuf dep

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* try fix protobuf on cmake

* cmake: workarounds

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* add packages

* cmake: use fixed version of grpc

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* cmake(grpc): install locally

* install grpc

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* install required deps for grpc on debian bullseye

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* debug

* debug

* Fixups

* no need to install cmake manually

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ci: fixup macOS

* use brew whenever possible

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* macOS fixups

* debug

* fix container build

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* workaround

* try mac

https://stackoverflow.com/questions/23905661/on-mac-g-clang-fails-to-search-usr-local-include-and-usr-local-lib-by-def

* Disable temp. arm64 docker image builds

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-10-16 21:46:29 +02:00
44a7045732 Feats: bruno example, gallery improvements for new scraper (#1161)
This PR bundles together two unrelated features:

1. Model Gallery improvements - specifically, the ability to follow
".ref" gallery links (which I made up for this specific application) to
an actual gallery yaml file (in order to have stable URLs) and the
ability to load self-contained configurations, rather than always using
a base.yaml + overrides. This is groundwork for my python-based
huggingface scraper.

2. A while ago I introduced some Insomnia request templates for people
to use. Unfortunately, Insomnia has decided to tank their product... So
I've personally switched to using
[bruno](https://github.com/usebruno/bruno/). Corresponding equivalent
files that I use for my testing have been added. Just open the folder
from bruno and everything will work. No import process required.

---------

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2023-10-11 18:18:12 +02:00
a28ab18987 feat(vllm): Allow to set quantization (#1094)
This particularly useful to set AWQ

**Description**

Follow up of #1015 

**Notes for Reviewers**


**[Signed
commits](../CONTRIBUTING.md#signing-off-on-commits-developer-certificate-of-origin)**
- [ ] Yes, I signed my commits.
 

<!--
Thank you for contributing to LocalAI! 

Contributing Conventions:

1. Include descriptive PR titles with [<component-name>] prepended.
2. Build and test your changes before submitting a PR. 
3. Sign your commits

By following the community's contribution conventions upfront, the
review process will
be accelerated and your PR merged more quickly.
-->

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-09-22 15:52:38 +02:00
8ccf5b2044 feat(speculative-sampling): allow to specify a draft model in the model config (#1052)
**Description**

This PR fixes #1013.

It adds `draft_model` and `n_draft` to the model YAML config in order to
load models with speculative sampling. This should be compatible as well
with grammars.

example:

```yaml
backend: llama                                                                                                                                                                   
context_size: 1024                                                                                                                                                                        
name: my-model-name
parameters:
  model: foo-bar
n_draft: 16                                                                                                                                                                      
draft_model: model-name
```

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-09-14 17:44:16 +02:00
dc307a1cc0 feat: add vall-e-x (#1007)
**Description**

This PR fixes #985 

**Notes for Reviewers**


**[Signed
commits](../CONTRIBUTING.md#signing-off-on-commits-developer-certificate-of-origin)**
- [ ] Yes, I signed my commits.
 

<!--
Thank you for contributing to LocalAI! 

Contributing Conventions:

1. Include descriptive PR titles with [<component-name>] prepended.
2. Build and test your changes before submitting a PR. 
3. Sign your commits

By following the community's contribution conventions upfront, the
review process will
be accelerated and your PR merged more quickly.
-->

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-09-04 19:25:23 +02:00
005f289632 feat: Model Gallery Endpoint Refactor / Mutable Galleries Endpoints (#991)
refactor for model gallery endpoints - bundle up resources into a
struct, make galleries mutable with some crud endpoints. This is
groundwork required for making efficient use of the new scraper - while
that PR isn't _quite_ ready yet, the goal is to have more, individually
smaller gallery files. Therefore, rather than requiring a full localai
service restart, these new endpoints have been added to make life
easier.

- Adds endpoints to add, list and remove model galleries at runtime
- Adds these endpoints to the Insomnia config
- Minor fix: loading file urls follows symbolic links now
2023-09-02 09:00:44 +02:00
3bab307904 fix(llama): resolve lora adapters correctly from the model file (#964)
**Description**

we were otherwise expecting absolute paths. this make it relative to the
model file (as someone would expect)

**Notes for Reviewers**


**[Signed
commits](../CONTRIBUTING.md#signing-off-on-commits-developer-certificate-of-origin)**
- [ ] Yes, I signed my commits.
 

<!--
Thank you for contributing to LocalAI! 

Contributing Conventions:

1. Include descriptive PR titles with [<component-name>] prepended.
2. Build and test your changes before submitting a PR. 
3. Sign your commits

By following the community's contribution conventions upfront, the
review process will
be accelerated and your PR merged more quickly.
-->
2023-08-27 10:11:32 +02:00
44bc7aa3d0 feat: Allow to load lora adapters for llama.cpp (#955)
**Description**

This PR fixes #

**Notes for Reviewers**


**[Signed
commits](../CONTRIBUTING.md#signing-off-on-commits-developer-certificate-of-origin)**
- [ ] Yes, I signed my commits.
 

<!--
Thank you for contributing to LocalAI! 

Contributing Conventions:

1. Include descriptive PR titles with [<component-name>] prepended.
2. Build and test your changes before submitting a PR. 
3. Sign your commits

By following the community's contribution conventions upfront, the
review process will
be accelerated and your PR merged more quickly.
-->

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-08-25 21:58:46 +02:00
1120847f72 feat: bump llama.cpp, add gguf support (#943)
**Description**

This PR syncs up the `llama` backend to use `gguf`
(https://github.com/go-skynet/go-llama.cpp/pull/180). It also adds
`llama-stable` to the targets so we can still load ggml. It adapts the
current tests to use the `llama-backend` for ggml and uses a `gguf`
model to run tests on the new backend.

In order to consume the new version of go-llama.cpp, it also bump go to
1.21 (images, pipelines, etc)

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-08-24 01:18:58 +02:00
10b0e13882 feat: backend monitor shutdown endpoint, process based (#938)
This PR adds a new endpoint to the backend monitor section
`/backend/shutdown` which terminates the grpc process for the related
model.
2023-08-23 18:38:37 +02:00