bf6426aef2
feat: Realtime API support reboot ( #5392 )
...
Explorer deployment / build-linux (push) Has been cancelled
GPU tests / ubuntu-latest (1.21.x) (push) Has been cancelled
generate and publish intel docker caches / generate_caches (intel/oneapi-basekit:2025.1.0-0-devel-ubuntu22.04, linux/amd64, ubuntu-latest) (push) Has been cancelled
build container images / hipblas-jobs (-aio-gpu-hipblas, rocm/dev-ubuntu-22.04:6.1, hipblas, true, ubuntu:22.04, extras, latest-gpu-hipblas-extras, latest-aio-gpu-hipblas, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, -hipblas-extras) (push) Has been cancelled
build container images / hipblas-jobs (rocm/dev-ubuntu-22.04:6.1, hipblas, true, ubuntu:22.04, core, latest-gpu-hipblas, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -hipblas) (push) Has been cancelled
build container images / self-hosted-jobs (-aio-gpu-intel-f16, quay.io/go-skynet/intel-oneapi-base:latest, sycl_f16, true, ubuntu:22.04, extras, latest-gpu-intel-f16-extras, latest-aio-gpu-intel-f16, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f16-… (push) Has been cancelled
build container images / self-hosted-jobs (-aio-gpu-intel-f32, quay.io/go-skynet/intel-oneapi-base:latest, sycl_f32, true, ubuntu:22.04, extras, latest-gpu-intel-f32-extras, latest-aio-gpu-intel-f32, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f32-… (push) Has been cancelled
build container images / self-hosted-jobs (-aio-gpu-nvidia-cuda-11, ubuntu:22.04, cublas, 11, 7, true, extras, latest-gpu-nvidia-cuda-11-extras, latest-aio-gpu-nvidia-cuda-11, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -cublas-cuda11-extras) (push) Has been cancelled
build container images / self-hosted-jobs (-aio-gpu-nvidia-cuda-12, ubuntu:22.04, cublas, 12, 0, true, extras, latest-gpu-nvidia-cuda-12-extras, latest-aio-gpu-nvidia-cuda-12, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -cublas-cuda12-extras) (push) Has been cancelled
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f16, true, ubuntu:22.04, core, latest-gpu-intel-f16, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f16) (push) Has been cancelled
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f32, true, ubuntu:22.04, core, latest-gpu-intel-f32, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f32) (push) Has been cancelled
build container images / core-image-build (-aio-cpu, ubuntu:22.04, , true, core, latest-cpu, latest-aio-cpu, --jobs=4 --output-sync=target, linux/amd64,linux/arm64, arc-runner-set, false, auto, ) (push) Has been cancelled
build container images / core-image-build (ubuntu:22.04, cublas, 11, 7, true, core, latest-gpu-nvidia-cuda-12, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda11) (push) Has been cancelled
build container images / core-image-build (ubuntu:22.04, cublas, 12, 0, true, core, latest-gpu-nvidia-cuda-12, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda12) (push) Has been cancelled
build container images / core-image-build (ubuntu:22.04, vulkan, true, core, latest-gpu-vulkan, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -vulkan) (push) Has been cancelled
build container images / gh-runner (nvcr.io/nvidia/l4t-jetpack:r36.4.0, cublas, 12, 0, true, core, latest-nvidia-l4t-arm64, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, true, false, -nvidia-l4t-arm64) (push) Has been cancelled
Security Scan / tests (push) Has been cancelled
Tests extras backends / tests-transformers (push) Has been cancelled
Tests extras backends / tests-rerankers (push) Has been cancelled
Tests extras backends / tests-diffusers (push) Has been cancelled
Tests extras backends / tests-coqui (push) Has been cancelled
tests / tests-linux (1.21.x) (push) Has been cancelled
tests / tests-aio-container (push) Has been cancelled
tests / tests-apple (1.21.x) (push) Has been cancelled
* feat(realtime): Initial Realtime API implementation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* chore: go mod tidy
Signed-off-by: Richard Palethorpe <io@richiejp.com >
* feat: Implement transcription only mode for realtime API
Reduce the scope of the real time API for the initial realease and make
transcription only mode functional.
Signed-off-by: Richard Palethorpe <io@richiejp.com >
* chore(build): Build backends on a separate layer to speed up core only changes
Signed-off-by: Richard Palethorpe <io@richiejp.com >
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
Signed-off-by: Richard Palethorpe <io@richiejp.com >
Co-authored-by: Ettore Di Giacinto <mudler@localai.io >
2025-05-25 22:25:05 +02:00
3b0cf52f6a
feat(llama.cpp): add reranking ( #5396 )
...
Explorer deployment / build-linux (push) Has been cancelled
GPU tests / ubuntu-latest (1.21.x) (push) Has been cancelled
generate and publish intel docker caches / generate_caches (intel/oneapi-basekit:2025.1.0-0-devel-ubuntu22.04, linux/amd64, ubuntu-latest) (push) Has been cancelled
build container images / hipblas-jobs (-aio-gpu-hipblas, rocm/dev-ubuntu-22.04:6.1, hipblas, true, ubuntu:22.04, extras, latest-gpu-hipblas-extras, latest-aio-gpu-hipblas, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, -hipblas-extras) (push) Has been cancelled
build container images / hipblas-jobs (rocm/dev-ubuntu-22.04:6.1, hipblas, true, ubuntu:22.04, core, latest-gpu-hipblas, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -hipblas) (push) Has been cancelled
build container images / self-hosted-jobs (-aio-gpu-intel-f16, quay.io/go-skynet/intel-oneapi-base:latest, sycl_f16, true, ubuntu:22.04, extras, latest-gpu-intel-f16-extras, latest-aio-gpu-intel-f16, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f16-… (push) Has been cancelled
build container images / self-hosted-jobs (-aio-gpu-intel-f32, quay.io/go-skynet/intel-oneapi-base:latest, sycl_f32, true, ubuntu:22.04, extras, latest-gpu-intel-f32-extras, latest-aio-gpu-intel-f32, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f32-… (push) Has been cancelled
build container images / self-hosted-jobs (-aio-gpu-nvidia-cuda-11, ubuntu:22.04, cublas, 11, 7, true, extras, latest-gpu-nvidia-cuda-11-extras, latest-aio-gpu-nvidia-cuda-11, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -cublas-cuda11-extras) (push) Has been cancelled
build container images / self-hosted-jobs (-aio-gpu-nvidia-cuda-12, ubuntu:22.04, cublas, 12, 0, true, extras, latest-gpu-nvidia-cuda-12-extras, latest-aio-gpu-nvidia-cuda-12, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -cublas-cuda12-extras) (push) Has been cancelled
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f16, true, ubuntu:22.04, core, latest-gpu-intel-f16, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f16) (push) Has been cancelled
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f32, true, ubuntu:22.04, core, latest-gpu-intel-f32, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f32) (push) Has been cancelled
build container images / core-image-build (-aio-cpu, ubuntu:22.04, , true, core, latest-cpu, latest-aio-cpu, --jobs=4 --output-sync=target, linux/amd64,linux/arm64, arc-runner-set, false, auto, ) (push) Has been cancelled
build container images / core-image-build (ubuntu:22.04, cublas, 11, 7, true, core, latest-gpu-nvidia-cuda-12, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda11) (push) Has been cancelled
build container images / core-image-build (ubuntu:22.04, cublas, 12, 0, true, core, latest-gpu-nvidia-cuda-12, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda12) (push) Has been cancelled
build container images / core-image-build (ubuntu:22.04, vulkan, true, core, latest-gpu-vulkan, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -vulkan) (push) Has been cancelled
build container images / gh-runner (nvcr.io/nvidia/l4t-jetpack:r36.4.0, cublas, 12, 0, true, core, latest-nvidia-l4t-arm64, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, true, false, -nvidia-l4t-arm64) (push) Has been cancelled
Security Scan / tests (push) Has been cancelled
Tests extras backends / tests-transformers (push) Has been cancelled
Tests extras backends / tests-rerankers (push) Has been cancelled
Tests extras backends / tests-diffusers (push) Has been cancelled
Tests extras backends / tests-coqui (push) Has been cancelled
tests / tests-linux (1.21.x) (push) Has been cancelled
tests / tests-aio-container (push) Has been cancelled
tests / tests-apple (1.21.x) (push) Has been cancelled
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2025-05-22 21:49:30 +02:00
b2f9fc870b
chore(defaults): enlarge defaults, drop gpu layers which is infered ( #5308 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2025-05-03 18:44:51 +02:00
2c9279a542
feat(video-gen): add endpoint for video generation ( #5247 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2025-04-26 18:05:01 +02:00
61cc76c455
chore(autogptq): drop archived backend ( #5214 )
...
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f32, true, ubuntu:22.04, core, latest-gpu-intel-f32-core, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f32-ffmpeg-core) (push) Has been cancelled
build container images / self-hosted-jobs (ubuntu:22.04, , , extras, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, ) (push) Has been cancelled
build container images / self-hosted-jobs (ubuntu:22.04, , true, extras, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, -ffmpeg) (push) Has been cancelled
build container images / self-hosted-jobs (ubuntu:22.04, cublas, 11, 7, , extras, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -cublas-cuda11) (push) Has been cancelled
build container images / self-hosted-jobs (ubuntu:22.04, cublas, 12, 0, , extras, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -cublas-cuda12) (push) Has been cancelled
build container images / core-image-build (-aio-cpu, ubuntu:22.04, , true, core, latest-cpu, latest-aio-cpu, --jobs=4 --output-sync=target, linux/amd64,linux/arm64, arc-runner-set, false, auto, -ffmpeg-core) (push) Has been cancelled
build container images / core-image-build (ubuntu:22.04, cublas, 11, 7, , core, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda11-core) (push) Has been cancelled
build container images / core-image-build (ubuntu:22.04, cublas, 11, 7, true, core, latest-gpu-nvidia-cuda-12-core, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda11-ffmpeg-core) (push) Has been cancelled
build container images / core-image-build (ubuntu:22.04, cublas, 12, 0, , core, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda12-core) (push) Has been cancelled
build container images / core-image-build (ubuntu:22.04, cublas, 12, 0, true, core, latest-gpu-nvidia-cuda-12-core, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda12-ffmpeg-core) (push) Has been cancelled
build container images / core-image-build (ubuntu:22.04, vulkan, true, core, latest-gpu-vulkan-core, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -vulkan-ffmpeg-core) (push) Has been cancelled
build container images / gh-runner (nvcr.io/nvidia/l4t-jetpack:r36.4.0, cublas, 12, 0, true, core, latest-nvidia-l4t-arm64-core, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, true, false, -nvidia-l4t-arm64-core) (push) Has been cancelled
Security Scan / tests (push) Has been cancelled
Tests extras backends / tests-transformers (push) Has been cancelled
Tests extras backends / tests-rerankers (push) Has been cancelled
Tests extras backends / tests-diffusers (push) Has been cancelled
Tests extras backends / tests-coqui (push) Has been cancelled
tests / tests-linux (1.21.x) (push) Has been cancelled
tests / tests-aio-container (push) Has been cancelled
tests / tests-apple (1.21.x) (push) Has been cancelled
Update swagger / swagger (push) Has been cancelled
Check if checksums are up-to-date / checksum_check (push) Has been cancelled
Bump dependencies / bump (mudler/LocalAI) (push) Has been cancelled
Bump dependencies / bump (main, PABannier/bark.cpp, BARKCPP_VERSION) (push) Has been cancelled
Bump dependencies / bump (master, ggerganov/whisper.cpp, WHISPER_CPP_VERSION) (push) Has been cancelled
Bump dependencies / bump (master, ggml-org/llama.cpp, CPPLLAMA_VERSION) (push) Has been cancelled
Bump dependencies / bump (master, leejet/stable-diffusion.cpp, STABLEDIFFUSION_GGML_VERSION) (push) Has been cancelled
Bump dependencies / bump (master, mudler/go-piper, PIPER_VERSION) (push) Has been cancelled
Bump dependencies / bump (master, mudler/go-stable-diffusion, STABLEDIFFUSION_VERSION) (push) Has been cancelled
generate and publish GRPC docker caches / generate_caches (ubuntu:22.04, linux/amd64,linux/arm64, arc-runner-set) (push) Has been cancelled
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2025-04-19 15:52:29 +02:00
f67e5dec68
fix: bark-cpp: assign FLAG_TTS to bark-cpp backend ( #5186 )
...
Signed-off-by: Gianluca Boiano <morf3089@gmail.com >
2025-04-16 08:21:30 +02:00
9c74d74f7b
feat(gguf): guess default context size from file ( #5089 )
...
build container images / hipblas-jobs (rocm/dev-ubuntu-22.04:6.1, hipblas, false, ubuntu:22.04, core, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -hipblas-core) (push) Waiting to run
build container images / hipblas-jobs (rocm/dev-ubuntu-22.04:6.1, hipblas, false, ubuntu:22.04, extras, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -hipblas) (push) Waiting to run
build container images / hipblas-jobs (rocm/dev-ubuntu-22.04:6.1, hipblas, true, ubuntu:22.04, core, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -hipblas-ffmpeg-core) (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-intel-f16, quay.io/go-skynet/intel-oneapi-base:latest, sycl_f16, true, ubuntu:22.04, extras, latest-gpu-intel-f16, latest-aio-gpu-intel-f16, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, -sycl-f16-ffmpeg) (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-intel-f32, quay.io/go-skynet/intel-oneapi-base:latest, sycl_f32, true, ubuntu:22.04, extras, latest-gpu-intel-f32, latest-aio-gpu-intel-f32, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, -sycl-f32-ffmpeg) (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-nvidia-cuda-11, ubuntu:22.04, cublas, 11, 7, true, extras, latest-gpu-nvidia-cuda-11, latest-aio-gpu-nvidia-cuda-11, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, -cublas-cuda11-ffmpeg) (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-nvidia-cuda-12, ubuntu:22.04, cublas, 12, 0, true, extras, latest-gpu-nvidia-cuda-12, latest-aio-gpu-nvidia-cuda-12, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, -cublas-cuda12-ffmpeg) (push) Waiting to run
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f16, false, ubuntu:22.04, core, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f16-core) (push) Waiting to run
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f16, true, ubuntu:22.04, core, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f16-ffmpeg-core) (push) Waiting to run
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f32, false, ubuntu:22.04, core, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f32-core) (push) Waiting to run
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f32, true, ubuntu:22.04, core, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f32-ffmpeg-core) (push) Waiting to run
build container images / self-hosted-jobs (ubuntu:22.04, , , extras, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, ) (push) Waiting to run
build container images / self-hosted-jobs (ubuntu:22.04, , true, extras, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, -ffmpeg) (push) Waiting to run
build container images / self-hosted-jobs (ubuntu:22.04, cublas, 11, 7, , extras, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -cublas-cuda11) (push) Waiting to run
build container images / self-hosted-jobs (ubuntu:22.04, cublas, 12, 0, , extras, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -cublas-cuda12) (push) Waiting to run
build container images / core-image-build (-aio-cpu, ubuntu:22.04, , true, core, latest-cpu, latest-aio-cpu, --jobs=4 --output-sync=target, linux/amd64,linux/arm64, arc-runner-set, false, auto, -ffmpeg-core) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 11, 7, , core, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda11-core) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 11, 7, true, core, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda11-ffmpeg-core) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 12, 0, , core, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda12-core) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 12, 0, true, core, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda12-ffmpeg-core) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, vulkan, true, core, latest-vulkan-ffmpeg-core, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -vulkan-ffmpeg-core) (push) Waiting to run
build container images / gh-runner (nvcr.io/nvidia/l4t-jetpack:r36.4.0, cublas, 12, 0, true, core, latest-nvidia-l4t-arm64-core, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, true, false, -nvidia-l4t-arm64-core) (push) Waiting to run
Security Scan / tests (push) Waiting to run
Tests extras backends / tests-transformers (push) Waiting to run
Tests extras backends / tests-rerankers (push) Waiting to run
Tests extras backends / tests-diffusers (push) Waiting to run
Tests extras backends / tests-coqui (push) Waiting to run
tests / tests-linux (1.21.x) (push) Waiting to run
tests / tests-aio-container (push) Waiting to run
tests / tests-apple (1.21.x) (push) Waiting to run
feat(gguf): guess default config file from files
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2025-03-29 14:42:14 +01:00
a7b4001b75
feat: allow to specify a reply prefix ( #4931 )
...
build container images / hipblas-jobs (rocm/dev-ubuntu-22.04:6.1, hipblas, false, ubuntu:22.04, core, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -hipblas-core) (push) Waiting to run
build container images / hipblas-jobs (rocm/dev-ubuntu-22.04:6.1, hipblas, false, ubuntu:22.04, extras, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -hipblas) (push) Waiting to run
build container images / hipblas-jobs (rocm/dev-ubuntu-22.04:6.1, hipblas, true, ubuntu:22.04, core, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -hipblas-ffmpeg-core) (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-intel-f16, quay.io/go-skynet/intel-oneapi-base:latest, sycl_f16, true, ubuntu:22.04, extras, latest-gpu-intel-f16, latest-aio-gpu-intel-f16, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, -sycl-f16-ffmpeg) (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-intel-f32, quay.io/go-skynet/intel-oneapi-base:latest, sycl_f32, true, ubuntu:22.04, extras, latest-gpu-intel-f32, latest-aio-gpu-intel-f32, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, -sycl-f32-ffmpeg) (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-nvidia-cuda-11, ubuntu:22.04, cublas, 11, 7, true, extras, latest-gpu-nvidia-cuda-11, latest-aio-gpu-nvidia-cuda-11, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, -cublas-cuda11-ffmpeg) (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-nvidia-cuda-12, ubuntu:22.04, cublas, 12, 0, true, extras, latest-gpu-nvidia-cuda-12, latest-aio-gpu-nvidia-cuda-12, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, -cublas-cuda12-ffmpeg) (push) Waiting to run
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f16, false, ubuntu:22.04, core, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f16-core) (push) Waiting to run
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f16, true, ubuntu:22.04, core, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f16-ffmpeg-core) (push) Waiting to run
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f32, false, ubuntu:22.04, core, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f32-core) (push) Waiting to run
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f32, true, ubuntu:22.04, core, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f32-ffmpeg-core) (push) Waiting to run
build container images / self-hosted-jobs (ubuntu:22.04, , , extras, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, ) (push) Waiting to run
build container images / self-hosted-jobs (ubuntu:22.04, , true, extras, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, -ffmpeg) (push) Waiting to run
build container images / self-hosted-jobs (ubuntu:22.04, cublas, 11, 7, , extras, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -cublas-cuda11) (push) Waiting to run
build container images / self-hosted-jobs (ubuntu:22.04, cublas, 12, 0, , extras, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -cublas-cuda12) (push) Waiting to run
build container images / core-image-build (-aio-cpu, ubuntu:22.04, , true, core, latest-cpu, latest-aio-cpu, --jobs=4 --output-sync=target, linux/amd64,linux/arm64, arc-runner-set, false, auto, -ffmpeg-core) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 11, 7, , core, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda11-core) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 11, 7, true, core, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda11-ffmpeg-core) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 12, 0, , core, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda12-core) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 12, 0, true, core, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda12-ffmpeg-core) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, vulkan, true, core, latest-vulkan-ffmpeg-core, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -vulkan-ffmpeg-core) (push) Waiting to run
build container images / gh-runner (nvcr.io/nvidia/l4t-jetpack:r36.4.0, cublas, 12, 0, true, core, latest-nvidia-l4t-arm64-core, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, true, false, -nvidia-l4t-arm64-core) (push) Waiting to run
Security Scan / tests (push) Waiting to run
Tests extras backends / tests-transformers (push) Waiting to run
Tests extras backends / tests-rerankers (push) Waiting to run
Tests extras backends / tests-diffusers (push) Waiting to run
Tests extras backends / tests-coqui (push) Waiting to run
tests / tests-linux (1.21.x) (push) Waiting to run
tests / tests-aio-container (push) Waiting to run
tests / tests-apple (1.21.x) (push) Waiting to run
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2025-03-02 16:07:32 +01:00
bbbb28e3ca
fix(models): unify usecases identifications ( #4914 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2025-02-27 15:51:12 +01:00
6a6e1a0ea9
feat(vllm): Additional vLLM config options (Disable logging, dtype, and Per-Prompt media limits) ( #4855 )
...
build container images / hipblas-jobs (rocm/dev-ubuntu-22.04:6.1, hipblas, false, ubuntu:22.04, core, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -hipblas-core) (push) Waiting to run
build container images / hipblas-jobs (rocm/dev-ubuntu-22.04:6.1, hipblas, false, ubuntu:22.04, extras, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -hipblas) (push) Waiting to run
build container images / hipblas-jobs (rocm/dev-ubuntu-22.04:6.1, hipblas, true, ubuntu:22.04, core, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -hipblas-ffmpeg-core) (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-intel-f16, quay.io/go-skynet/intel-oneapi-base:latest, sycl_f16, true, ubuntu:22.04, extras, latest-gpu-intel-f16, latest-aio-gpu-intel-f16, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, -sycl-f16-ffmpeg) (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-intel-f32, quay.io/go-skynet/intel-oneapi-base:latest, sycl_f32, true, ubuntu:22.04, extras, latest-gpu-intel-f32, latest-aio-gpu-intel-f32, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, -sycl-f32-ffmpeg) (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-nvidia-cuda-11, ubuntu:22.04, cublas, 11, 7, true, extras, latest-gpu-nvidia-cuda-11, latest-aio-gpu-nvidia-cuda-11, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, -cublas-cuda11-ffmpeg) (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-nvidia-cuda-12, ubuntu:22.04, cublas, 12, 0, true, extras, latest-gpu-nvidia-cuda-12, latest-aio-gpu-nvidia-cuda-12, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, -cublas-cuda12-ffmpeg) (push) Waiting to run
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f16, false, ubuntu:22.04, core, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f16-core) (push) Waiting to run
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f16, true, ubuntu:22.04, core, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f16-ffmpeg-core) (push) Waiting to run
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f32, false, ubuntu:22.04, core, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f32-core) (push) Waiting to run
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f32, true, ubuntu:22.04, core, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f32-ffmpeg-core) (push) Waiting to run
build container images / self-hosted-jobs (ubuntu:22.04, , , extras, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, ) (push) Waiting to run
build container images / self-hosted-jobs (ubuntu:22.04, , true, extras, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, -ffmpeg) (push) Waiting to run
build container images / self-hosted-jobs (ubuntu:22.04, cublas, 11, 7, , extras, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -cublas-cuda11) (push) Waiting to run
build container images / self-hosted-jobs (ubuntu:22.04, cublas, 12, 0, , extras, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -cublas-cuda12) (push) Waiting to run
build container images / core-image-build (-aio-cpu, ubuntu:22.04, , true, core, latest-cpu, latest-aio-cpu, --jobs=4 --output-sync=target, linux/amd64,linux/arm64, arc-runner-set, false, auto, -ffmpeg-core) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 11, 7, , core, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda11-core) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 11, 7, true, core, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda11-ffmpeg-core) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 12, 0, , core, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda12-core) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 12, 0, true, core, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda12-ffmpeg-core) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, vulkan, true, core, latest-vulkan-ffmpeg-core, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -vulkan-ffmpeg-core) (push) Waiting to run
build container images / gh-runner (nvcr.io/nvidia/l4t-jetpack:r36.4.0, cublas, 12, 0, true, core, latest-nvidia-l4t-arm64-core, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, true, false, -nvidia-l4t-arm64-core) (push) Waiting to run
Security Scan / tests (push) Waiting to run
Tests extras backends / tests-transformers (push) Waiting to run
Tests extras backends / tests-rerankers (push) Waiting to run
Tests extras backends / tests-diffusers (push) Waiting to run
Tests extras backends / tests-coqui (push) Waiting to run
tests / tests-linux (1.21.x) (push) Waiting to run
tests / tests-aio-container (push) Waiting to run
tests / tests-apple (1.21.x) (push) Waiting to run
* Adding the following vLLM config options: disable_log_status, dtype, limit_mm_per_prompt
Signed-off-by: TheDropZone <brandonbeiler@gmail.com >
* using " marks in the config.yaml file
Signed-off-by: TheDropZone <brandonbeiler@gmail.com >
* adding in missing colon
Signed-off-by: TheDropZone <brandonbeiler@gmail.com >
---------
Signed-off-by: TheDropZone <brandonbeiler@gmail.com >
2025-02-18 19:27:58 +01:00
5b19af99ff
feat(ui): detect model usage and display link ( #4864 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2025-02-18 19:27:07 +01:00
3cddf24747
feat: Centralized Request Processing middleware ( #3847 )
...
* squash past, centralize request middleware PR
Signed-off-by: Dave Lee <dave@gray101.com >
* migrate bruno request files to examples repo
Signed-off-by: Dave Lee <dave@gray101.com >
* fix
Signed-off-by: Dave Lee <dave@gray101.com >
* Update tests/e2e-aio/e2e_test.go
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com >
---------
Signed-off-by: Dave Lee <dave@gray101.com >
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com >
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com >
2025-02-10 12:06:16 +01:00
cc1f6f913f
fix(llama.cpp): disable mirostat as default ( #2911 )
...
build container images / hipblas-jobs (rocm/dev-ubuntu-22.04:6.1, hipblas, false, ubuntu:22.04, core, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -hipblas-core) (push) Waiting to run
build container images / hipblas-jobs (rocm/dev-ubuntu-22.04:6.1, hipblas, false, ubuntu:22.04, extras, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -hipblas) (push) Waiting to run
build container images / hipblas-jobs (rocm/dev-ubuntu-22.04:6.1, hipblas, true, ubuntu:22.04, core, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -hipblas-ffmpeg-core) (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-intel-f16, quay.io/go-skynet/intel-oneapi-base:latest, sycl_f16, true, ubuntu:22.04, extras, latest-gpu-intel-f16, latest-aio-gpu-intel-f16, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, -sycl-f16-ffmpeg) (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-intel-f32, quay.io/go-skynet/intel-oneapi-base:latest, sycl_f32, true, ubuntu:22.04, extras, latest-gpu-intel-f32, latest-aio-gpu-intel-f32, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, -sycl-f32-ffmpeg) (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-nvidia-cuda-11, ubuntu:22.04, cublas, 11, 7, true, extras, latest-gpu-nvidia-cuda-11, latest-aio-gpu-nvidia-cuda-11, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, -cublas-cuda11-ffmpeg) (push) Waiting to run
build container images / self-hosted-jobs (-aio-gpu-nvidia-cuda-12, ubuntu:22.04, cublas, 12, 0, true, extras, latest-gpu-nvidia-cuda-12, latest-aio-gpu-nvidia-cuda-12, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, -cublas-cuda12-ffmpeg) (push) Waiting to run
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f16, false, ubuntu:22.04, core, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f16-core) (push) Waiting to run
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f16, true, ubuntu:22.04, core, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f16-ffmpeg-core) (push) Waiting to run
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f32, false, ubuntu:22.04, core, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f32-core) (push) Waiting to run
build container images / self-hosted-jobs (quay.io/go-skynet/intel-oneapi-base:latest, sycl_f32, true, ubuntu:22.04, core, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -sycl-f32-ffmpeg-core) (push) Waiting to run
build container images / self-hosted-jobs (ubuntu:22.04, , , extras, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, ) (push) Waiting to run
build container images / self-hosted-jobs (ubuntu:22.04, , true, extras, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, auto, -ffmpeg) (push) Waiting to run
build container images / self-hosted-jobs (ubuntu:22.04, cublas, 11, 7, , extras, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -cublas-cuda11) (push) Waiting to run
build container images / self-hosted-jobs (ubuntu:22.04, cublas, 12, 0, , extras, --jobs=3 --output-sync=target, linux/amd64, arc-runner-set, false, -cublas-cuda12) (push) Waiting to run
build container images / core-image-build (-aio-cpu, ubuntu:22.04, , true, core, latest-cpu, latest-aio-cpu, --jobs=4 --output-sync=target, linux/amd64,linux/arm64, arc-runner-set, false, auto, -ffmpeg-core) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 11, 7, , core, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda11-core) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 11, 7, true, core, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda11-ffmpeg-core) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 12, 0, , core, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda12-core) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 12, 0, true, core, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -cublas-cuda12-ffmpeg-core) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, vulkan, true, core, latest-vulkan-ffmpeg-core, --jobs=4 --output-sync=target, linux/amd64, arc-runner-set, false, false, -vulkan-ffmpeg-core) (push) Waiting to run
build container images / gh-runner (nvcr.io/nvidia/l4t-jetpack:r36.4.0, cublas, 12, 0, true, core, latest-nvidia-l4t-arm64-core, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, true, false, -nvidia-l4t-arm64-core) (push) Waiting to run
Security Scan / tests (push) Waiting to run
Tests extras backends / tests-transformers (push) Waiting to run
Tests extras backends / tests-rerankers (push) Waiting to run
Tests extras backends / tests-diffusers (push) Waiting to run
Tests extras backends / tests-coqui (push) Waiting to run
tests / tests-linux (1.21.x) (push) Waiting to run
tests / tests-aio-container (push) Waiting to run
tests / tests-apple (1.21.x) (push) Waiting to run
Even if increasing the quality of the output, it has shown to have
performance drawbacks to be so noticeable that the confuses users about
speed of LocalAI ( see also
https://github.com/mudler/LocalAI/issues/2780 ).
This changeset disables Mirostat by default (which can
be still enabled manually).
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
Co-authored-by: Dave <dave@gray101.com >
2025-02-06 19:39:59 +01:00
e15d29aba2
chore(stablediffusion-ncn): drop in favor of ggml implementation ( #4652 )
...
* chore(stablediffusion-ncn): drop in favor of ggml implementation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* chore(ci): drop stablediffusion build
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* chore(tests): add
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* chore(tests): try to fixup current tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* Try to fix tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* Tests improvements
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* chore(tests): use quality to specify step
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* chore(tests): switch to sd-1.5
also increase prep time for downloading models
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2025-01-22 19:34:16 +01:00
032a33de49
chore: remove deprecated tinydream backend ( #4631 )
...
Signed-off-by: Gianluca Boiano <morf3089@gmail.com >
2025-01-18 18:35:30 +01:00
7d0ac1ea3f
chore(vall-e-x): Drop backend ( #4619 )
...
There are many new architectures that are SOTA and replaces vall-e-x
nowadays.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2025-01-17 09:35:10 +01:00
cea5a0ea42
feat(template): read jinja templates from gguf files ( #4332 )
...
* Read jinja templates as fallback
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* Move templating out of model loader
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* Test TemplateMessages
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* Set role and content from transformers
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* Tests: be more flexible
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* More jinja
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* Small refactoring and adaptations
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2024-12-08 13:50:33 +01:00
d4c1746c7d
feat(llama.cpp): expose cache_type_k and cache_type_v for quant of kv cache ( #4329 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2024-12-06 10:23:59 +01:00
44a5dac312
feat(backend): add stablediffusion-ggml ( #4289 )
...
* feat(backend): add stablediffusion-ggml
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* chore(ci): track stablediffusion-ggml
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* Use default scheduler and sampler if not specified
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* Move cfg scale out of diffusers block
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* Make it working
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* fix: set free_params_immediately to false to call the model in sequence
https://github.com/leejet/stable-diffusion.cpp/issues/366
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2024-12-03 22:41:22 +01:00
947224b952
feat(diffusers): allow multiple lora adapters ( #4081 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2024-11-05 15:14:33 +01:00
ae1ec4e096
feat(vllm): expose 'load_format' ( #3943 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2024-10-23 15:34:57 +02:00
ccc7cb0287
feat(templates): use a single template for multimodals messages ( #3892 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2024-10-22 09:34:05 +02:00
648ffdf449
feat(multimodal): allow to template placeholders ( #3728 )
...
feat(multimodal): allow to template image placeholders
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2024-10-04 18:32:29 +02:00
307a835199
groundwork: ListModels Filtering Upgrade ( #2773 )
...
* seperate the filtering from the middleware changes
---------
Signed-off-by: Dave Lee <dave@gray101.com >
2024-10-01 18:55:46 +00:00
cf747bcdec
feat: extract output with regexes from LLMs ( #3491 )
...
* feat: extract output with regexes from LLMs
This changset adds `extract_regex` to the LLM config. It is a list of
regexes that can match output and will be used to re extract text from
the LLM output. This is particularly useful for LLMs which outputs final
results into tags.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* Add tests, enhance output in case of configuration error
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2024-09-13 13:27:36 +02:00
a36b721ca6
fix: be consistent in downloading files, check for scanner errors ( #3108 )
...
* fix(downloader): be consistent in downloading files
This PR puts some order in the downloader such as functions are re-used
across several places.
This fixes an issue with having uri's inside the model YAML file, it
would resolve to MD5 rather then using the filename
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* fix(scanner): do raise error only if unsafeFiles are found
Fixes: https://github.com/mudler/LocalAI/issues/3114
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2024-08-02 20:06:25 +02:00
35561edb6e
feat(llama.cpp): support embeddings endpoints ( #2871 )
...
* feat(llama.cpp): add embeddings
Also enable embeddings by default for llama.cpp models
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* fix(Makefile): prepare llama.cpp sources only once
Otherwise we keep cloning llama.cpp for each of the variants
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* do not set embeddings to false
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* docs: add embeddings to the YAML config reference
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2024-07-15 22:54:16 +02:00
a181dd0ebc
refactor: gallery inconsistencies ( #2647 )
...
* refactor(gallery): move under core/
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* fix(unarchive): do not allow symlinks
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2024-06-24 17:32:12 +02:00
5866fc8ded
chore: fix go.mod module ( #2635 )
...
Signed-off-by: Sertac Ozercan <sozercan@gmail.com >
2024-06-23 08:24:36 +00:00
aae7ad9d73
feat(llama.cpp): guess model defaults from file ( #2522 )
...
* wip: guess informations from gguf file
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* update go mod
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* Small fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* Identify llama3
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* Do not try to guess the name, as reading gguf files can be expensive
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* Allow to disable guessing
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2024-06-08 22:13:02 +02:00
b99182c8d4
TTS API improvements ( #2308 )
...
* update doc on COQUI_LANGUAGE env variable
Signed-off-by: blob42 <contact@blob42.xyz >
* return errors from tts gRPC backend
Signed-off-by: blob42 <contact@blob42.xyz >
* handle speaker_id and language in coqui TTS backend
Signed-off-by: blob42 <contact@blob42.xyz >
* TTS endpoint: add optional language paramter
Signed-off-by: blob42 <contact@blob42.xyz >
* tts fix: empty language string breaks non-multilingual models
Signed-off-by: blob42 <contact@blob42.xyz >
* allow tts param definition in config file
- consolidate TTS options under `tts` config entry
Signed-off-by: blob42 <contact@blob42.xyz >
* tts: update doc
Signed-off-by: blob42 <contact@blob42.xyz >
---------
Signed-off-by: blob42 <contact@blob42.xyz >
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com >
2024-06-01 18:26:27 +00:00
4d98dd9ce7
feat(image): support response_type
in the OpenAI API request ( #2347 )
...
* Change response_format type to string to match OpenAI Spec
Signed-off-by: prajwal <prajwalnayak7@gmail.com >
* updated response_type type to interface
Signed-off-by: prajwal <prajwalnayak7@gmail.com >
* feat: correctly parse generic struct
Signed-off-by: mudler <mudler@localai.io >
* add tests
Signed-off-by: mudler <mudler@localai.io >
---------
Signed-off-by: prajwal <prajwalnayak7@gmail.com >
Signed-off-by: mudler <mudler@localai.io >
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com >
Co-authored-by: mudler <mudler@localai.io >
2024-05-29 14:40:54 +02:00
0b637465d9
refactor: Minor improvements to BackendConfigLoader ( #2353 )
...
some minor renames and refactorings within BackendConfigLoader - make things more consistent, remove underused code, rename things for clarity
Signed-off-by: Dave Lee <dave@gray101.com >
2024-05-23 22:48:12 +02:00
6cbe6a4f99
models(gallery): add phi-3-medium-4k-instruct ( #2367 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2024-05-22 08:32:30 +02:00
1a3dedece0
dependencies(grpcio): bump to fix CI issues ( #2362 )
...
feat(grpcio): bump to fix CI issues
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2024-05-21 14:33:47 +02:00
c89271b2e4
feat(llama.cpp): add distributed llama.cpp inferencing ( #2324 )
...
* feat(llama.cpp): support distributed llama.cpp
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* feat: let tweak how chat messages are merged together
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* refactor
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* Makefile: register to ALL_GRPC_BACKENDS
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* refactoring, allow disable auto-detection of backends
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* minor fixups
Signed-off-by: mudler <mudler@localai.io >
* feat: add cmd to start rpc-server from llama.cpp
Signed-off-by: mudler <mudler@localai.io >
* ci: add ccache
Signed-off-by: mudler <mudler@localai.io >
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
Signed-off-by: mudler <mudler@localai.io >
2024-05-15 01:17:02 +02:00
e49ea0123b
feat(llama.cpp): add flash_attention
and no_kv_offloading
( #2310 )
...
feat(llama.cpp): add flash_attn and no_kv_offload
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2024-05-13 19:07:51 +02:00
c4f958e11b
refactor(application): introduce application global state ( #2072 )
...
* start breaking up the giant channel refactor now that it's better understood - easier to merge bites
Signed-off-by: Dave Lee <dave@gray101.com >
* add concurrency and base64 back in, along with new base64 tests.
Signed-off-by: Dave Lee <dave@gray101.com >
* Automatic rename of whisper.go's Result to TranscriptResult
Signed-off-by: Dave Lee <dave@gray101.com >
* remove pkg/concurrency - significant changes coming in split 2
Signed-off-by: Dave Lee <dave@gray101.com >
* fix comments
Signed-off-by: Dave Lee <dave@gray101.com >
* add list_model service as another low-risk service to get it out of the way
Signed-off-by: Dave Lee <dave@gray101.com >
* split backend config loader into seperate file from the actual config struct. No changes yet, just reduce cognative load with smaller files of logical blocks
Signed-off-by: Dave Lee <dave@gray101.com >
* rename state.go ==> application.go
Signed-off-by: Dave Lee <dave@gray101.com >
* fix lost import?
Signed-off-by: Dave Lee <dave@gray101.com >
---------
Signed-off-by: Dave Lee <dave@gray101.com >
2024-04-29 17:42:37 +00:00
e8d44447ad
feat(gallery): support model deletion ( #2173 )
...
* feat(gallery): op now supports deletion of models
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* Wire things with WebUI(WIP)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* minor improvements
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2024-04-28 23:42:46 +02:00
fb2a05ff43
feat(gallery): display job status also during navigation ( #2151 )
...
* feat(gallery): keep showing progress also when refreshing
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* fix(intel-gpu): better defaults
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* feat: make it thread-safe
Signed-off-by: mudler <mudler@localai.io >
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
Signed-off-by: mudler <mudler@localai.io >
2024-04-27 09:08:33 +02:00
0d8bf91699
feat: Galleries UI ( #2104 )
...
* WIP: add models to webui
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* Register routes
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* fix: don't cache models
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* small fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* fix: fixup multiple installs (strings.Clone)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2024-04-23 09:22:58 +02:00
180cd4ccda
fix(llama.cpp-ggml): fixup max_tokens
for old backend ( #2094 )
...
fix(llama.cpp-ggml): set 0 as default for `max_tokens`
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2024-04-21 16:34:00 +02:00
afa1bca1e3
fix(llama.cpp): set -1 as default for max tokens ( #2087 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2024-04-20 20:20:10 +02:00
03adc1f60d
Add tensor_parallel_size setting to vllm setting items ( #2085 )
...
Signed-off-by: Taikono-Himazin <kazu@po.harenet.ne.jp >
2024-04-20 14:37:02 +00:00
bbea62b907
feat(functions): support models with no grammar, add tests ( #2068 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2024-04-18 22:43:12 +02:00
af9e5a2d05
Revert #1963 ( #2056 )
...
* Revert "fix(fncall): fix regression introduced in #1963 (#2048 )"
This reverts commit 6b06d4e0af
.
* Revert "fix: action-tmate back to upstream, dead code removal (#2038 )"
This reverts commit fdec8a9d00
.
* Revert "feat(grpc): return consumed token count and update response accordingly (#2035 )"
This reverts commit e843d7df0e
.
* Revert "refactor: backend/service split, channel-based llm flow (#1963 )"
This reverts commit eed5706994
.
* feat(grpc): return consumed token count and update response accordingly
Fixes : #1920
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2024-04-17 23:33:49 +02:00
eed5706994
refactor: backend/service split, channel-based llm flow ( #1963 )
...
Refactor: channel based llm flow and services split
---------
Signed-off-by: Dave Lee <dave@gray101.com >
2024-04-13 09:45:34 +02:00
12c0d9443e
feat: use tokenizer.apply_chat_template() in vLLM ( #1990 )
...
Use tokenizer.apply_chat_template() in vLLM
Signed-off-by: Ludovic LEROUX <ludovic@inpher.io >
2024-04-11 19:20:22 +02:00
8342553214
fix(llama.cpp): set better defaults for llama.cpp ( #1961 )
...
fix(defaults): set better defaults for llama.cpp
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2024-04-06 22:56:45 +02:00
ff77d3bc22
fix(seed): generate random seed per-request if -1 is set ( #1952 )
...
* fix(seed): generate random seed per-request if -1 is set
Also update ci with new workflows and allow the aio tests to run with an
api key
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
* docs(openvino): Add OpenVINO example
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2024-04-03 22:25:47 +02:00