whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-05-20 17:33:03 +00:00

History

Jeff Bolz 2105b110d3 vulkan: Implement grouped query attention in the coopmat2 FA shader (llama/12559)

When adjacent batches of Q share the same batches of K/V, batch them into
the same workgroup. For example, when:

dst(128,32,1,1) = FA(q(128,1,32,1), k(128,16640,8,1), v(128,16640,8,1))

previously we would run 32 workgroups computing 1 result each, now we will
run 8 workgroups computing 4 results each.

This doesn't directly translate to better performance (at least when you have
>=32 SMs), but in a subsequent change I'll enable split_k which will scale much
better with 4x fewer workgroups.

2025-04-24 20:39:16 +03:00

cmake

ggml : sync/merge cmake,riscv,powerpc, add common.cmake (ggml/0)

2025-03-27 11:06:03 +02:00

include

metal : improve FA + improve MoE (llama/12612)

2025-03-28 21:47:42 +02:00

src

vulkan: Implement grouped query attention in the coopmat2 FA shader (llama/12559)

2025-04-24 20:39:16 +03:00

.gitignore

whisper : reorganize source code + improve CMake (#2256 )

2024-06-26 19:34:09 +03:00

CMakeLists.txt

ggml : sync/merge cmake,riscv,powerpc, add common.cmake (ggml/0)

2025-03-27 11:06:03 +02:00