Vulkan k-quant mmq and ggml-backend offload functionality (llama/6155)

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-06-14 12:58:10 +00:00

* Fix Vulkan no kv offload incoherence

* Add k-quant mul mat mat shaders

* Rework working buffer allocation, reduces vram use noticeably

Clean up cpu assist code, replaced with ggml-backend offload function

* Default to all dedicated GPUs

* Add fallback for integrated GPUs if no dedicated GPUs are found

* Add debug info which device is allocating memory

* Fix Intel dequant issue

Fix validation issue

* Fix Vulkan GGML_OP_GET_ROWS implementation

* Clean up merge artifacts

* Remove Vulkan warning

This commit is contained in:

0cc4m

2024-03-29 17:29:21 +01:00

committed by

Georgi Gerganov

parent b83a9fc9d3

commit fa966b9b40

3 changed files with 327 additions and 352 deletions

633

ggml-vulkan.cpp

View File

File diff suppressed because it is too large Load Diff

Vulkan k-quant mmq and ggml-backend offload functionality (llama/6155)

633 ggml-vulkan.cpp View File

633

ggml-vulkan.cpp

View File