LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2025-05-31 22:40:45 +00:00

History

feat: Openvino runtime for transformer backend and streaming support for Openvino and CUDA (#1892 )

* fixes #1775 and #1774

Add BitsAndBytes Quantization and fixes embedding on CUDA devices

* Manage 4bit and 8 bit quantization

Manage different BitsAndBytes options with the quantization: parameter in yaml

* fix compilation errors on non CUDA environment

* OpenVINO draft

First draft of OpenVINO integration in transformer backend

* first working implementation

* Streaming working

* Small fix for regression on CUDA and XPU

* use pip version of optimum[openvino]

* Update backend/python/transformers/transformers_server.py

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

2024-03-26 23:31:43 +00:00

cpp

test/fix: OSX Test Repair (#1843 )

2024-03-18 19:19:43 +01:00

feat(stores): Vector store backend (#1795 )

2024-03-22 21:14:04 +01:00

python

feat: Openvino runtime for transformer backend and streaming support for Openvino and CUDA (#1892 )

2024-03-26 23:31:43 +00:00

backend_grpc.pb.go

transformers: correctly load automodels (#1643 )

2024-01-26 00:13:21 +01:00

backend.proto

feat(stores): Vector store backend (#1795 )

2024-03-22 21:14:04 +01:00