mirror of
https://github.com/ggerganov/whisper.cpp.git
synced 2025-06-14 12:58:10 +00:00
Vulkan k-quant mmq and ggml-backend offload functionality (llama/6155)
* Fix Vulkan no kv offload incoherence * Add k-quant mul mat mat shaders * Rework working buffer allocation, reduces vram use noticeably Clean up cpu assist code, replaced with ggml-backend offload function * Default to all dedicated GPUs * Add fallback for integrated GPUs if no dedicated GPUs are found * Add debug info which device is allocating memory * Fix Intel dequant issue Fix validation issue * Fix Vulkan GGML_OP_GET_ROWS implementation * Clean up merge artifacts * Remove Vulkan warning
This commit is contained in:
633
ggml-vulkan.cpp
633
ggml-vulkan.cpp
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user