Jeff Bolz b243416918 vulkan: Implement split_k for coopmat2 flash attention. (llama/12627)
When using group query attention, we have one workgroup per KV batch and this
can be very few workgroups (e.g. just 8 in some models). Enable split_k to
spread the work across SMs. This helps a lot when the KV cache is large.
2025-04-24 20:39:16 +03:00
..