LocalAI/backend
fakezeta e7cbe32601
feat: Openvino runtime for transformer backend and streaming support for Openvino and CUDA (#1892)
* fixes #1775 and #1774

Add BitsAndBytes Quantization and fixes embedding on CUDA devices

* Manage 4bit and 8 bit quantization

Manage different BitsAndBytes options with the quantization: parameter in yaml

* fix compilation errors on non CUDA environment

* OpenVINO draft

First draft of OpenVINO integration in transformer backend

* first working implementation

* Streaming working

* Small fix for regression on CUDA and XPU

* use pip version of optimum[openvino]

* Update backend/python/transformers/transformers_server.py

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-03-26 23:31:43 +00:00
..
cpp test/fix: OSX Test Repair (#1843) 2024-03-18 19:19:43 +01:00
go feat(stores): Vector store backend (#1795) 2024-03-22 21:14:04 +01:00
python feat: Openvino runtime for transformer backend and streaming support for Openvino and CUDA (#1892) 2024-03-26 23:31:43 +00:00
backend_grpc.pb.go transformers: correctly load automodels (#1643) 2024-01-26 00:13:21 +01:00
backend.proto feat(stores): Vector store backend (#1795) 2024-03-22 21:14:04 +01:00