whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-05-07 02:58:21 +00:00

History

Jeff Bolz b243416918 vulkan: Implement split_k for coopmat2 flash attention. (llama/12627)

When using group query attention, we have one workgroup per KV batch and this
can be very few workgroups (e.g. just 8 in some models). Enable split_k to
spread the work across SMs. This helps a lot when the KV cache is large.

2025-04-24 20:39:16 +03:00

cmake

fix: ggml: fix vulkan-shaders-gen build (llama/10448)

2025-02-03 22:00:57 +02:00

vulkan-shaders

vulkan: Implement split_k for coopmat2 flash attention. (llama/12627)

2025-04-24 20:39:16 +03:00

CMakeLists.txt

cmake: remove caching from vulkan coopmat checks (llama/12719)

2025-04-24 20:39:16 +03:00

ggml-vulkan.cpp

vulkan: Implement split_k for coopmat2 flash attention. (llama/12627)

2025-04-24 20:39:16 +03:00