Commit Graph

123 Commits

Author SHA1 Message Date
20136ca8b7 feat(tts): add Elevenlabs and OpenAI TTS compatibility layer (#1834)
* feat(elevenlabs): map elevenlabs API support to TTS

This allows elevenlabs Clients to work automatically with LocalAI by
supporting the elevenlabs API.

The elevenlabs server endpoint is implemented such as it is wired to the
TTS endpoints.

Fixes: https://github.com/mudler/LocalAI/issues/1809

* feat(openai/tts): compat layer with openai tts

Fixes: #1276

* fix: adapt tts CLI
2024-03-14 23:08:34 +01:00
45d520f913 fix: OSX Build Files for llama.cpp (#1836)
bot ate my changes, seperate branch
2024-03-14 23:07:47 +01:00
3882130911 feat: Add Bitsandbytes quantization for transformer backend enhancement #1775 and fix: Transformer backend error on CUDA #1774 (#1823)
* fixes #1775 and #1774

Add BitsAndBytes Quantization and fixes embedding on CUDA devices

* Manage 4bit and 8 bit quantization

Manage different BitsAndBytes options with the quantization: parameter in yaml

* fix compilation errors on non CUDA environment
2024-03-14 23:06:30 +01:00
b423af001d fix: the correct BUILD_TYPE for OpenCL is clblas (with no t) (#1828) 2024-03-14 08:39:21 +01:00
5d1018495f feat(intel): add diffusers/transformers support (#1746)
* feat(intel): add diffusers support

* try to consume upstream container image

* Debug

* Manually install deps

* Map transformers/hf cache dir to modelpath if not specified

* fix(compel): update initialization, pass by all gRPC options

* fix: add dependencies, implement transformers for xpu

* base it from the oneapi image

* Add pillow

* set threads if specified when launching the API

* Skip conda install if intel

* defaults to non-intel

* ci: add to pipelines

* prepare compel only if enabled

* Skip conda install if intel

* fix cleanup

* Disable compel by default

* Install torch 2.1.0 with Intel

* Skip conda on some setups

* Detect python

* Quiet output

* Do not override system python with conda

* Prefer python3

* Fixups

* exllama2: do not install without conda (overrides pytorch version)

* exllama/exllama2: do not install if not using cuda

* Add missing dataset dependency

* Small fixups, symlink to python, add requirements

* Add neural_speed to the deps

* correctly handle model offloading

* fix: device_map == xpu

* go back at calling python, fixed at dockerfile level

* Exllama2 restricted to only nvidia gpus

* Tokenizer to xpu
2024-03-07 14:37:45 +01:00
5c69dd155f feat(autogpt/transformers): consume trust_remote_code (#1799)
trusting remote code by default is a danger to our users
2024-03-05 19:47:15 +01:00
504f2e8bf4 Update Backend Dependancies (#1797)
* Update transformers.yml

Signed-off-by: TwinFin <57421631+TwinFinz@users.noreply.github.com>

* Update transformers-rocm.yml

Signed-off-by: TwinFin <57421631+TwinFinz@users.noreply.github.com>

* Update transformers-nvidia.yml

Signed-off-by: TwinFin <57421631+TwinFinz@users.noreply.github.com>

---------

Signed-off-by: TwinFin <57421631+TwinFinz@users.noreply.github.com>
2024-03-05 10:10:00 +00:00
939411300a Bump vLLM version + more options when loading models in vLLM (#1782)
* Bump vLLM version to 0.3.2

* Add vLLM model loading options

* Remove transformers-exllama

* Fix install exllama
2024-03-01 22:48:53 +01:00
31a4c9c9d3 Fix Command Injection Vulnerability (#1778)
* Added fix for command injection

* changed function name from sh to runCommand
2024-02-29 18:32:29 +00:00
bc5f5aa538 deps(llama.cpp): update (#1759)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-02-26 13:18:44 +01:00
0135e1e3b9 fix: vllm - use AsyncLLMEngine to allow true streaming mode (#1749)
* fix: use vllm AsyncLLMEngine to bring true stream

Current vLLM implementation uses the LLMEngine, which was designed for offline batch inference, which results in the streaming mode outputing all blobs at once at the end of the inference.

This PR reworks the gRPC server to use asyncio and gRPC.aio, in combination with vLLM's AsyncLLMEngine to bring true stream mode.

This PR also passes more parameters to vLLM during inference (presence_penalty, frequency_penalty, stop, ignore_eos, seed, ...).

* Remove unused import
2024-02-24 11:48:45 +01:00
8292781045 deps(llama.cpp): update, support Gemma models (#1734)
deps(llama.cpp): update

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-02-21 17:23:38 +01:00
54ec6348fa deps(llama.cpp): update (#1714)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-02-21 11:35:44 +01:00
255748bcba MQTT Startup Refactoring Part 1: core/ packages part 1 (#1728)
This PR specifically introduces a `core` folder and moves the following packages over, without any other changes:

- `api/backend`
- `api/config`
- `api/options`
- `api/schema`

Once this is merged and we confirm there's no regressions, I can migrate over the remaining changes piece by piece to split up application startup, backend services, http, and mqtt as was the goal of the earlier PRs!
2024-02-21 01:21:19 +00:00
594eb468df Add TTS dependency for cuda based builds fixes #1727 (#1730)
Signed-off-by: Chakib Benziane <contact@blob42.xyz>
2024-02-20 21:59:43 +01:00
fb0a4c5d9a Build docker container for ROCm (#1595)
* Dockerfile changes to build for ROCm

* Adjust linker flags for ROCm

* Update conda env for diffusers and transformers to use ROCm pytorch

* Update transformers conda env for ROCm

* ci: build hipblas images

* fixup rebase

* use self-hosted

Signed-off-by: mudler <mudler@localai.io>

* specify LD_LIBRARY_PATH only when BUILD_TYPE=hipblas

---------

Signed-off-by: mudler <mudler@localai.io>
Co-authored-by: mudler <mudler@localai.io>
2024-02-16 15:08:50 +01:00
5e155fb081 fix(python): pin exllama2 (#1711)
fix(python): pin python deps

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-02-14 21:44:12 +01:00
c56b6ddb1c fix(llama.cpp): disable infinite context shifting (#1704)
Infinite context loop might as well trigger an infinite loop of context
shifting if the model hallucinates and does not stop answering.
This has the unpleasant effect that the predicion never terminates,
which is the case especially on small models which tends to hallucinate.

Workarounds https://github.com/mudler/LocalAI/issues/1333 by removing
context-shifting.

See also upstream issue: https://github.com/ggerganov/llama.cpp/issues/3969
2024-02-13 21:17:21 +01:00
6e0eb96c61 fix: drop unused code (#1697)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-02-11 11:28:59 +01:00
fd68bf7084 fix(vall-e-x): Fix voice cloning (#1696) 2024-02-11 11:20:00 +01:00
53dbe36f32 feat(tts): respect YAMLs config file, add sycl docs/examples (#1692)
* feat(refactor): refactor config and input reading

* feat(tts): read config file for TTS

* examples(kubernetes): Add simple deployment example

* examples(kubernetes): Add simple deployment for intel arc

* docs(sycl): add sycl example

* feat(tts): do not always pick a first model

* fixups to run vall-e-x on container

* Correctly resolve backend
2024-02-10 21:37:03 +01:00
ddd21f1644 feat: Use ubuntu as base for container images, drop deprecated ggml-transformers backends (#1689)
* cleanup backends

* switch image to ubuntu 22.04

* adapt commands for ubuntu

* transformers cleanup

* no contrib on ubuntu

* Change test model to gguf

* ci: disable bark tests (too cpu-intensive)

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* cleanup

* refinements

* use intel base image

* Makefile: Add docker targets

* Change test model

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-02-08 20:12:51 +01:00
98ad93d53e Drop ggml-based gpt2 and starcoder (supported by llama.cpp) (#1679)
* Drop ggml-based gpt2 and starcoder (supported by llama.cpp)

* Update compatibility table
2024-02-04 13:15:51 +01:00
df13ba655c Drop old falcon backend (deprecated) (#1675)
Drop old falcon backend
2024-02-03 13:01:13 +01:00
1c57f8d077 feat(sycl): Add support for Intel GPUs with sycl (#1647) (#1660)
* feat(sycl): Add sycl support (#1647)

* onekit: install without prompts

* set cmake args only in grpc-server

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* cleanup

* fixup sycl source env

* Cleanup docs

* ci: runs on self-hosted

* fix typo

* bump llama.cpp

* llama.cpp: update server

* adapt to upstream changes

* adapt to upstream changes

* docs: add sycl

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-02-01 19:21:52 +01:00
cb7512734d transformers: correctly load automodels (#1643)
* backends(transformers): use AutoModel with LLM types

* examples: animagine-xl

* Add codellama examples
2024-01-26 00:13:21 +01:00
5e335eaead feat(transformers): support also text generation (#1630)
* feat(transformers): support also text generation

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* embedded: set seed -1

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-01-23 23:07:31 +01:00
697c769b64 fix(llama.cpp): enable cont batching when parallel is set (#1622)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-01-21 14:59:48 +01:00
eaf85a30f9 fix(llama.cpp): Enable parallel requests (#1616)
integrate changes from llama.cpp

Signed-off-by: Sebastian <tauven@gmail.com>
2024-01-21 09:56:14 +01:00
06cd9ef98d feat(extra-backends): Improvements, adding mamba example (#1618)
* feat(extra-backends): Improvements

vllm: add max_tokens, wire up stream event
mamba: fixups, adding examples for mamba-chat

* examples(mamba-chat): add

* docs: update
2024-01-20 17:56:08 +01:00
9e653d6abe feat: 🐍 add mamba support (#1589)
feat(mamba): Initial import

This is a first iteration of the mamba backend, loosely based on
mamba-chat(https://github.com/havenhq/mamba-chat).
2024-01-19 23:42:50 +01:00
441e2965ff move BUILD_GRPC_FOR_BACKEND_LLAMA logic to makefile: errors in this section now immediately fail the build (#1576)
* move BUILD_GRPC_FOR_BACKEND_LLAMA option to makefile

* review: oversight, fixup cmake_args

Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Signed-off-by: Dionysius <1341084+dionysius@users.noreply.github.com>

---------

Signed-off-by: Dionysius <1341084+dionysius@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-01-13 10:08:26 +01:00
e19d7226f8 feat: more embedded models, coqui fixes, add model usage and description (#1556)
* feat: add model descriptions and usage

* remove default model gallery

* models: add embeddings and tts

* docs: update table

* docs: updates

* images: cleanup pip cache after install

* images: always run apt-get clean

* ux: improve gRPC connection errors

* ux: improve some messages

* fix: fix coqui when no AudioPath is passed by

* embedded: add more models

* Add usage

* Reorder table
2024-01-08 00:37:02 +01:00
62a02cd1fe deps(conda): use transformers environment with autogptq (#1555) 2024-01-06 15:30:53 +01:00
949da7792d deps(conda): use transformers-env with vllm,exllama(2) (#1554)
* deps(conda): use transformers with vllm

* join vllm, exllama, exllama2, split petals
2024-01-06 13:32:28 +01:00
db926896bd Revert "[Refactor]: Core/API Split" (#1550)
Revert "[Refactor]: Core/API Split (#1506)"

This reverts commit ab7b4d5ee9.
2024-01-05 18:04:46 +01:00
ab7b4d5ee9 [Refactor]: Core/API Split (#1506)
Refactors api folder to core, creates firm split between backend code and api frontend.
2024-01-05 15:34:56 +01:00
583bd28a5c fix(diffusers): add omegaconf dependency (#1540)
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-01-04 00:06:41 +01:00
a1aa6cb7c2 fix(entrypoint): cd to backend dir before start (#1530)
Certain backends as vall-e-x are not meant to be used as a library, so
we want to start the process in the same folder where the backend and
all the assets are fixes #1394
2024-01-01 22:02:48 +01:00
fd48cb6506 deps(llama.cpp): update and sync grpc server (#1527)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-01-01 14:39:31 +01:00
e2311a145c Fix: Set proper Homebrew install location for x86 Macs (#1510)
* set proper Homebrew install location for x86 Macs

* fix: remove prior conditional that my logic replaces
2023-12-30 12:37:26 +01:00
cae7b197ec feat: add tiny dream stable diffusion support (#1283)
Signed-off-by: Gianluca Boiano <morf3089@gmail.com>
2023-12-24 19:27:24 +00:00
95eb72bfd3 feat: add 🐸 coqui (#1489)
* feat: add coqui

* docs: update news
2023-12-24 19:38:54 +01:00
7e2d101a46 fix: guidance_scale not work in sd (#1488)
Signed-off-by: hibobmaster <32976627+hibobmaster@users.noreply.github.com>
2023-12-24 19:24:52 +01:00
6597881854 fix: exllama2 backend (#1484)
Signed-off-by: Sertac Ozercan <sozercan@gmail.com>
2023-12-24 08:32:12 +00:00
939187a129 env(conda): use transformers for vall-e-x (#1481) 2023-12-23 14:31:34 -05:00
b4b21a446b feat(conda): share envs with transformer-based backends (#1465)
* feat(conda): share env between diffusers and bark

* Detect if env already exists

* share diffusers and petals

* tests: add petals

* Use smaller model for tests with petals

* test only model load on petals

* tests(petals): run only load model tests

* Revert "test only model load on petals"

This reverts commit 111cfa97f1.

* move transformers and sentencetransformers to common env

* Share also transformers-musicgen
2023-12-21 08:35:15 +01:00
dd982acf2c feat(img2vid,txt2vid): Initial support for img2vid,txt2vid (#1442)
* feat(img2vid): Initial support for img2vid

* doc(SD): fix SDXL Example

* Minor fixups for img2vid

* docs(img2img): fix example curl call

* feat(txt2vid): initial support

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

* diffusers: be retro-compatible with CUDA settings

* docs(img2vid, txt2vid): examples

* Add notice on docs

---------

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2023-12-15 18:06:20 -05:00
fb6a5bc620 update(llama.cpp): update server, correctly propagate LLAMA_VERSION (#1440)
* fix(Makefile): correctly propagate LLAMA_VERSION

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

* update grpc-server.cpp

---------

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2023-12-15 08:26:48 +01:00
7641f92cde feat(diffusers): update, add autopipeline, controlnet (#1432)
* feat(diffusers): update, add autopipeline, controlenet

* tests with AutoPipeline

* simplify logic
2023-12-13 19:20:22 +01:00