* bugfix: CUDA acceleration not working
CUDA not working after #2286.
Refactored the code to be more polish
* Update requirements.txt
Missing imports
Signed-off-by: fakezeta <fakezeta@gmail.com>
* Update requirements.txt
Signed-off-by: fakezeta <fakezeta@gmail.com>
---------
Signed-off-by: fakezeta <fakezeta@gmail.com>
update transformers
*Handle Temperature = 0 as greedy search
*Handle custom works as stop words
*Implement KV cache
*Phi 3 no more requires trust_remote_code: true
* feat: create bash library to handle install/run/test of python backends
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* chore: minor cleanup
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* fix: remove incorrect LIMIT_TARGETS from parler-tts
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* fix: update runUnitests to handle running tests from a custom test file
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* chore: document runUnittests
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
---------
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* feat: migrate diffusers backend from conda to uv
- replace conda with UV for diffusers install (prototype for all
extras backends)
- add ability to build docker with one/some/all extras backends
instead of all or nothing
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* feat: migrate autogtpq bark coqui from conda to uv
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* feat: convert exllama over to uv
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* feat: migrate exllama2 to uv
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* feat: migrate mamba to uv
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* feat: migrate parler to uv
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* feat: migrate petals to uv
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* fix: fix tests
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* feat: migrate rerankers to uv
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* feat: migrate sentencetransformers to uv
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* fix: install uv for tests-linux
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* fix: make sure file exists before installing on intel images
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* feat: migrate transformers backend to uv
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* feat: migrate transformers-musicgen to uv
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* feat: migrate vall-e-x to uv
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* feat: migrate vllm to uv
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* fix: add uv install to the rest of test-extra.yml
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* fix: adjust file perms on all install/run/test scripts
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* fix: add missing acclerate dependencies
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* fix: add some more missing dependencies to python backends
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* fix: parler tests venv py dir fix
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* fix: correct filename for transformers-musicgen tests
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* fix: adjust the pwd for valle tests
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* feat: cleanup and optimization work for uv migration
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* fix: add setuptools to requirements-install for mamba
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* feat: more size optimization work
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* feat: make installs and tests more consistent, cleanup some deps
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* fix: cleanup
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* fix: mamba backend is cublas only
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* fix: uncomment lines in makefile
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
---------
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
Winograd convolutions were always disabled giving error when inference device was CPU.
This commit implement logic to disable Winograd convolutions only if CPU or NPU are declared.
fix: more places where we are installing grpc that need a version specified
fix: attempt to fix metal tests
fix: metal/brew is forcing an update, they don't have 1.58 available anymore
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* Bump oneapi-basekit, optimum and openvino
* Changed PERFORMANCE HINT to CUMULATIVE_THROUGHPUT
Minor latency change for first token but about 10-15% speedup on token generation.
* fix regression #1971
fixes regression #1971 introduced by intel_extension_for_transformers==1.4
* UseTokenizerTemplate and StopPrompt
Implementation of use_tokenizer_template and stopwords options
* feat(parler-tts): Add new backend
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(parler-tts): try downgrade protobuf
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(parler-tts): add parler conda env
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Revert "feat(parler-tts): try downgrade protobuf"
This reverts commit bd5941d5cfc00676b45a99f71debf3c34249cf3c.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* deps: add grpc
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: try to gen proto with same environment
* workaround
* Revert "fix: try to gen proto with same environment"
This reverts commit 998c745e2f475ec3ec43ac017bcebf3a7ce15b8b.
* Workaround fixup
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Dave <dave@gray101.com>
* fix: initial work towards not committing generated files to the repository
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* feat: improve build docs
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* fix: remove unused folder from .dockerignore and .gitignore
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* fix: attempt to fix extra backend tests
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* fix: attempt to fix other tests
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* fix: more test fixes
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* fix: fix apple tests
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* fix: more extras tests fixes
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* fix: add GOBIN to PATH in docker build
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* fix: extra tests and Dockerfile corrections
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* fix: remove build dependency checks
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* fix: add golang protobuf compilers to tests-linux action
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* fix: ensure protogen is run for extra backend installs
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* fix: use newer protobuf
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* fix: more missing protoc binaries
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* fix: missing dependencies during docker build
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* fix: don't install grpc compilers in the final stage if they aren't needed
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* fix: python-grpc-tools in 22.04 repos is too old
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* fix: add a couple of extra build dependencies to Makefile
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* fix: unbreak container rebuild functionality
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
---------
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* Enhance autogptq backend to support VL models
* update dependencies for autogptq
* remove redundant auto-gptq dependency
* Convert base64 to image_url for Qwen-VL model
* implemented model inference for qwen-vl
* remove user prompt from generated answer
* fixed write image error
* fixed use_triton issue when loading Qwen-VL model
---------
Co-authored-by: Binghua Wu <bingwu@estee.com>
* Streaming working
* Small fix for regression on CUDA and XPU
* use pip version of optimum[openvino]
* Update backend/python/transformers/transformers_server.py
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* Token streaming support
fix optimum[openvino] package in install.sh
* Token Streaming support
---------
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* fixes#1775 and #1774
Add BitsAndBytes Quantization and fixes embedding on CUDA devices
* Manage 4bit and 8 bit quantization
Manage different BitsAndBytes options with the quantization: parameter in yaml
* fix compilation errors on non CUDA environment
* OpenVINO draft
First draft of OpenVINO integration in transformer backend
* first working implementation
* Streaming working
* Small fix for regression on CUDA and XPU
* use pip version of optimum[openvino]
* Update backend/python/transformers/transformers_server.py
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
---------
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>