LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2024-12-22 05:57:50 +00:00

Author	SHA1	Message	Date
Dave	eed5706994	refactor: backend/service split, channel-based llm flow (#1963 ) Refactor: channel based llm flow and services split --------- Signed-off-by: Dave Lee <dave@gray101.com>	2024-04-13 09:45:34 +02:00
cryptk	1981154f49	fix: dont commit generated files to git (#1993 ) * fix: initial work towards not committing generated files to the repository Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * feat: improve build docs Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: remove unused folder from .dockerignore and .gitignore Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: attempt to fix extra backend tests Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: attempt to fix other tests Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: more test fixes Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: fix apple tests Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: more extras tests fixes Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: add GOBIN to PATH in docker build Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: extra tests and Dockerfile corrections Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: remove build dependency checks Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: add golang protobuf compilers to tests-linux action Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: ensure protogen is run for extra backend installs Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: use newer protobuf Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: more missing protoc binaries Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: missing dependencies during docker build Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: don't install grpc compilers in the final stage if they aren't needed Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: python-grpc-tools in 22.04 repos is too old Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: add a couple of extra build dependencies to Makefile Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: unbreak container rebuild functionality Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> --------- Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>	2024-04-13 09:37:32 +02:00
cryptk	a8ebf6f575	fix: respect concurrency from parent build parameters when building GRPC (#2023 ) Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>	2024-04-13 09:14:32 +02:00
Ludovic Leroux	12c0d9443e	feat: use tokenizer.apply_chat_template() in vLLM (#1990 ) Use tokenizer.apply_chat_template() in vLLM Signed-off-by: Ludovic LEROUX <ludovic@inpher.io>	2024-04-11 19:20:22 +02:00
Ludovic Leroux	b4548ad72d	feat: add flash-attn in nvidia and rocm envs (#1995 ) Signed-off-by: Ludovic LEROUX <ludovic@inpher.io>	2024-04-11 09:44:39 +02:00
Koen Farell	36da11a0ee	deps: Update version of vLLM to add support of Cohere Command_R model in vLLM inference (#1975 ) * Update vLLM version to add support of Command_R Signed-off-by: Koen Farell <hellios.dt@gmail.com> * fix: Fixed vllm version from requirements Signed-off-by: Koen Farell <hellios.dt@gmail.com> * chore: Update transformers-rocm.yml Signed-off-by: Koen Farell <hellios.dt@gmail.com> * chore: Update transformers.yml version of vllm Signed-off-by: Koen Farell <hellios.dt@gmail.com> --------- Signed-off-by: Koen Farell <hellios.dt@gmail.com>	2024-04-10 11:25:26 +00:00
Sebastian.W	d23e73b118	fix(autogptq): do not use_triton with qwen-vl (#1985 ) * Enhance autogptq backend to support VL models * update dependencies for autogptq * remove redundant auto-gptq dependency * Convert base64 to image_url for Qwen-VL model * implemented model inference for qwen-vl * remove user prompt from generated answer * fixed write image error * fixed use_triton issue when loading Qwen-VL model --------- Co-authored-by: Binghua Wu <bingwu@estee.com>	2024-04-10 10:36:10 +00:00
fakezeta	a38618db02	fix regression #1971 (#1972 ) fixes regression #1971 introduced by intel_extension_for_transformers==1.4	2024-04-08 22:33:51 +02:00
fakezeta	8210ffcb6c	feat: Token Stream support for Transformer, fix: missing package for OpenVINO (#1908 ) * Streaming working * Small fix for regression on CUDA and XPU * use pip version of optimum[openvino] * Update backend/python/transformers/transformers_server.py Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> * Token streaming support fix optimum[openvino] package in install.sh * Token Streaming support --------- Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2024-03-27 17:50:35 +01:00
fakezeta	e7cbe32601	feat: Openvino runtime for transformer backend and streaming support for Openvino and CUDA (#1892 ) * fixes #1775 and #1774 Add BitsAndBytes Quantization and fixes embedding on CUDA devices * Manage 4bit and 8 bit quantization Manage different BitsAndBytes options with the quantization: parameter in yaml * fix compilation errors on non CUDA environment * OpenVINO draft First draft of OpenVINO integration in transformer backend * first working implementation * Streaming working * Small fix for regression on CUDA and XPU * use pip version of optimum[openvino] * Update backend/python/transformers/transformers_server.py Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> --------- Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2024-03-26 23:31:43 +00:00
Ettore Di Giacinto	607586e0b7	fix: downgrade torch (#1902 ) Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2024-03-26 22:56:02 +01:00
Sebastian.W	b7ffe66219	Enhance autogptq backend to support VL models (#1860 ) * Enhance autogptq backend to support VL models * update dependencies for autogptq * remove redundant auto-gptq dependency * Convert base64 to image_url for Qwen-VL model * implemented model inference for qwen-vl * remove user prompt from generated answer * fixed write image error --------- Co-authored-by: Binghua Wu <bingwu@estee.com>	2024-03-26 18:48:14 +01:00
Richard Palethorpe	643d85d2cc	feat(stores): Vector store backend (#1795 ) Add simple vector store backend Signed-off-by: Richard Palethorpe <io@richiejp.com>	2024-03-22 21:14:04 +01:00
Dave	ed5734ae25	test/fix: OSX Test Repair (#1843 ) * test with gguf instead of ggml. Updates testPrompt to match? Adds debugging line to Dockerfile that I've found helpful recently. * fix testPrompt slightly * Sad Experiment: Test GH runner without metal? * break apart CGO_LDFLAGS * switch runner * upstream llama.cpp disables Metal on Github CI! * missed a dir from clean-tests * CGO_LDFLAGS * tmate failure + NO_ACCELERATE * whisper.cpp has a metal fix * do the exact opposite of the name of this branch, but keep it around for unrelated fixes? * add back newlines * add tmate to linux for testing * update fixtures * timeout for tmate	2024-03-18 19:19:43 +01:00
Ettore Di Giacinto	fa9e330fc6	fix(llama.cpp): fix eos without cache (#1852 )	2024-03-18 18:59:24 +01:00
cryptk	020ce29cd8	fix(make): allow to parallelize jobs (#1845 ) * fix: clean up Makefile dependencies to allow for parallel builds * refactor: remove old unused backend from Makefile * fix: finish removing legacy backend, update piper * fix: I broke llama... I fixed llama * feat: give the tests and builds a few threads * fix: ensure libraries are replaced before build, add dropreplace target * Fix image build workflows	2024-03-17 15:39:20 +01:00
Dave	db199f61da	fix: osx build default.metallib (#1837 ) fix: osx build default.metallib (#1837) * port osx fix from refactor pr to slim pr * manually bump llama.cpp version to unstick CI?	2024-03-15 08:18:58 +00:00
Ettore Di Giacinto	20136ca8b7	feat(tts): add Elevenlabs and OpenAI TTS compatibility layer (#1834 ) * feat(elevenlabs): map elevenlabs API support to TTS This allows elevenlabs Clients to work automatically with LocalAI by supporting the elevenlabs API. The elevenlabs server endpoint is implemented such as it is wired to the TTS endpoints. Fixes: https://github.com/mudler/LocalAI/issues/1809 * feat(openai/tts): compat layer with openai tts Fixes: #1276 * fix: adapt tts CLI	2024-03-14 23:08:34 +01:00
Dave	45d520f913	fix: OSX Build Files for llama.cpp (#1836 ) bot ate my changes, seperate branch	2024-03-14 23:07:47 +01:00
fakezeta	3882130911	feat: Add Bitsandbytes quantization for transformer backend enhancement #1775 and fix: Transformer backend error on CUDA #1774 (#1823 ) * fixes #1775 and #1774 Add BitsAndBytes Quantization and fixes embedding on CUDA devices * Manage 4bit and 8 bit quantization Manage different BitsAndBytes options with the quantization: parameter in yaml * fix compilation errors on non CUDA environment	2024-03-14 23:06:30 +01:00
cryptk	b423af001d	fix: the correct BUILD_TYPE for OpenCL is clblas (with no t) (#1828 )	2024-03-14 08:39:21 +01:00
Ettore Di Giacinto	5d1018495f	feat(intel): add diffusers/transformers support (#1746 ) * feat(intel): add diffusers support * try to consume upstream container image * Debug * Manually install deps * Map transformers/hf cache dir to modelpath if not specified * fix(compel): update initialization, pass by all gRPC options * fix: add dependencies, implement transformers for xpu * base it from the oneapi image * Add pillow * set threads if specified when launching the API * Skip conda install if intel * defaults to non-intel * ci: add to pipelines * prepare compel only if enabled * Skip conda install if intel * fix cleanup * Disable compel by default * Install torch 2.1.0 with Intel * Skip conda on some setups * Detect python * Quiet output * Do not override system python with conda * Prefer python3 * Fixups * exllama2: do not install without conda (overrides pytorch version) * exllama/exllama2: do not install if not using cuda * Add missing dataset dependency * Small fixups, symlink to python, add requirements * Add neural_speed to the deps * correctly handle model offloading * fix: device_map == xpu * go back at calling python, fixed at dockerfile level * Exllama2 restricted to only nvidia gpus * Tokenizer to xpu	2024-03-07 14:37:45 +01:00
Dave	5c69dd155f	feat(autogpt/transformers): consume `trust_remote_code` (#1799 ) trusting remote code by default is a danger to our users	2024-03-05 19:47:15 +01:00
TwinFin	504f2e8bf4	Update Backend Dependancies (#1797 ) * Update transformers.yml Signed-off-by: TwinFin <57421631+TwinFinz@users.noreply.github.com> * Update transformers-rocm.yml Signed-off-by: TwinFin <57421631+TwinFinz@users.noreply.github.com> * Update transformers-nvidia.yml Signed-off-by: TwinFin <57421631+TwinFinz@users.noreply.github.com> --------- Signed-off-by: TwinFin <57421631+TwinFinz@users.noreply.github.com>	2024-03-05 10:10:00 +00:00
Ludovic Leroux	939411300a	Bump vLLM version + more options when loading models in vLLM (#1782 ) * Bump vLLM version to 0.3.2 * Add vLLM model loading options * Remove transformers-exllama * Fix install exllama	2024-03-01 22:48:53 +01:00
Oussama	31a4c9c9d3	Fix Command Injection Vulnerability (#1778 ) * Added fix for command injection * changed function name from sh to runCommand	2024-02-29 18:32:29 +00:00
Ettore Di Giacinto	bc5f5aa538	deps(llama.cpp): update (#1759 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-02-26 13:18:44 +01:00
Ludovic Leroux	0135e1e3b9	fix: vllm - use AsyncLLMEngine to allow true streaming mode (#1749 ) * fix: use vllm AsyncLLMEngine to bring true stream Current vLLM implementation uses the LLMEngine, which was designed for offline batch inference, which results in the streaming mode outputing all blobs at once at the end of the inference. This PR reworks the gRPC server to use asyncio and gRPC.aio, in combination with vLLM's AsyncLLMEngine to bring true stream mode. This PR also passes more parameters to vLLM during inference (presence_penalty, frequency_penalty, stop, ignore_eos, seed, ...). * Remove unused import	2024-02-24 11:48:45 +01:00
Ettore Di Giacinto	8292781045	deps(llama.cpp): update, support Gemma models (#1734 ) deps(llama.cpp): update Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-02-21 17:23:38 +01:00
Ettore Di Giacinto	54ec6348fa	deps(llama.cpp): update (#1714 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-02-21 11:35:44 +01:00
Dave	255748bcba	MQTT Startup Refactoring Part 1: core/ packages part 1 (#1728 ) This PR specifically introduces a `core` folder and moves the following packages over, without any other changes: - `api/backend` - `api/config` - `api/options` - `api/schema` Once this is merged and we confirm there's no regressions, I can migrate over the remaining changes piece by piece to split up application startup, backend services, http, and mqtt as was the goal of the earlier PRs!	2024-02-21 01:21:19 +00:00
Chakib Benziane	594eb468df	Add TTS dependency for cuda based builds fixes #1727 (#1730 ) Signed-off-by: Chakib Benziane <contact@blob42.xyz>	2024-02-20 21:59:43 +01:00
fenfir	fb0a4c5d9a	Build docker container for ROCm (#1595 ) * Dockerfile changes to build for ROCm * Adjust linker flags for ROCm * Update conda env for diffusers and transformers to use ROCm pytorch * Update transformers conda env for ROCm * ci: build hipblas images * fixup rebase * use self-hosted Signed-off-by: mudler <mudler@localai.io> * specify LD_LIBRARY_PATH only when BUILD_TYPE=hipblas --------- Signed-off-by: mudler <mudler@localai.io> Co-authored-by: mudler <mudler@localai.io>	2024-02-16 15:08:50 +01:00
Ettore Di Giacinto	5e155fb081	fix(python): pin exllama2 (#1711 ) fix(python): pin python deps Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-02-14 21:44:12 +01:00
Ettore Di Giacinto	c56b6ddb1c	fix(llama.cpp): disable infinite context shifting (#1704 ) Infinite context loop might as well trigger an infinite loop of context shifting if the model hallucinates and does not stop answering. This has the unpleasant effect that the predicion never terminates, which is the case especially on small models which tends to hallucinate. Workarounds https://github.com/mudler/LocalAI/issues/1333 by removing context-shifting. See also upstream issue: https://github.com/ggerganov/llama.cpp/issues/3969	2024-02-13 21:17:21 +01:00
Ettore Di Giacinto	6e0eb96c61	fix: drop unused code (#1697 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-02-11 11:28:59 +01:00
Ettore Di Giacinto	fd68bf7084	fix(vall-e-x): Fix voice cloning (#1696 )	2024-02-11 11:20:00 +01:00
Ettore Di Giacinto	53dbe36f32	feat(tts): respect YAMLs config file, add sycl docs/examples (#1692 ) * feat(refactor): refactor config and input reading * feat(tts): read config file for TTS * examples(kubernetes): Add simple deployment example * examples(kubernetes): Add simple deployment for intel arc * docs(sycl): add sycl example * feat(tts): do not always pick a first model * fixups to run vall-e-x on container * Correctly resolve backend	2024-02-10 21:37:03 +01:00
Ettore Di Giacinto	ddd21f1644	feat: Use ubuntu as base for container images, drop deprecated ggml-transformers backends (#1689 ) * cleanup backends * switch image to ubuntu 22.04 * adapt commands for ubuntu * transformers cleanup * no contrib on ubuntu * Change test model to gguf * ci: disable bark tests (too cpu-intensive) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * cleanup * refinements * use intel base image * Makefile: Add docker targets * Change test model --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-02-08 20:12:51 +01:00
Ettore Di Giacinto	98ad93d53e	Drop ggml-based gpt2 and starcoder (supported by llama.cpp) (#1679 ) * Drop ggml-based gpt2 and starcoder (supported by llama.cpp) * Update compatibility table	2024-02-04 13:15:51 +01:00
Ettore Di Giacinto	df13ba655c	Drop old falcon backend (deprecated) (#1675 ) Drop old falcon backend	2024-02-03 13:01:13 +01:00
Ettore Di Giacinto	1c57f8d077	feat(sycl): Add support for Intel GPUs with sycl (#1647 ) (#1660 ) * feat(sycl): Add sycl support (#1647) * onekit: install without prompts * set cmake args only in grpc-server Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * cleanup * fixup sycl source env * Cleanup docs * ci: runs on self-hosted * fix typo * bump llama.cpp * llama.cpp: update server * adapt to upstream changes * adapt to upstream changes * docs: add sycl --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-02-01 19:21:52 +01:00
Ettore Di Giacinto	cb7512734d	transformers: correctly load automodels (#1643 ) * backends(transformers): use AutoModel with LLM types * examples: animagine-xl * Add codellama examples	2024-01-26 00:13:21 +01:00
Ettore Di Giacinto	5e335eaead	feat(transformers): support also text generation (#1630 ) * feat(transformers): support also text generation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * embedded: set seed -1 --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-01-23 23:07:31 +01:00
Ettore Di Giacinto	697c769b64	fix(llama.cpp): enable cont batching when parallel is set (#1622 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-01-21 14:59:48 +01:00
Sebastian	eaf85a30f9	fix(llama.cpp): Enable parallel requests (#1616 ) integrate changes from llama.cpp Signed-off-by: Sebastian <tauven@gmail.com>	2024-01-21 09:56:14 +01:00
Ettore Di Giacinto	06cd9ef98d	feat(extra-backends): Improvements, adding mamba example (#1618 ) * feat(extra-backends): Improvements vllm: add max_tokens, wire up stream event mamba: fixups, adding examples for mamba-chat * examples(mamba-chat): add * docs: update	2024-01-20 17:56:08 +01:00
Ettore Di Giacinto	9e653d6abe	feat: 🐍 add mamba support (#1589 ) feat(mamba): Initial import This is a first iteration of the mamba backend, loosely based on mamba-chat(https://github.com/havenhq/mamba-chat).	2024-01-19 23:42:50 +01:00
Dionysius	441e2965ff	move BUILD_GRPC_FOR_BACKEND_LLAMA logic to makefile: errors in this section now immediately fail the build (#1576 ) * move BUILD_GRPC_FOR_BACKEND_LLAMA option to makefile * review: oversight, fixup cmake_args Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com> Signed-off-by: Dionysius <1341084+dionysius@users.noreply.github.com> --------- Signed-off-by: Dionysius <1341084+dionysius@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2024-01-13 10:08:26 +01:00
Ettore Di Giacinto	e19d7226f8	feat: more embedded models, coqui fixes, add model usage and description (#1556 ) * feat: add model descriptions and usage * remove default model gallery * models: add embeddings and tts * docs: update table * docs: updates * images: cleanup pip cache after install * images: always run apt-get clean * ux: improve gRPC connection errors * ux: improve some messages * fix: fix coqui when no AudioPath is passed by * embedded: add more models * Add usage * Reorder table	2024-01-08 00:37:02 +01:00

1 2

90 Commits