whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-05-31 06:20:58 +00:00

History

Jeff Bolz 102af79f63 vulkan: Submit once enough matmul work has been recorded (llama/12406)

I've been seeing significantly worse performance for tg with flash attention
enabled vs disabled, and it seems to be related to the submit heuristic.
Change the heuristic to check how many bytes worth of weight matrix are
used and flush every 100MB, and ramp up after the first few submits.
This seems to resolve the issue, and also increases perf for non-FA a bit.

2025-03-27 11:06:03 +02:00

cmake

cmake: Comment out GGML_BIN_DIR for now (ggml/1139)

2025-03-27 11:06:03 +02:00

include

llama: Add support for RWKV v7 architecture (llama/12412)

2025-03-27 11:06:03 +02:00

src

vulkan: Submit once enough matmul work has been recorded (llama/12406)

2025-03-27 11:06:03 +02:00

.gitignore

whisper : reorganize source code + improve CMake (#2256 )

2024-06-26 19:34:09 +03:00

CMakeLists.txt

SYCL: using graphs is configurable by environment variable and compile option (llama/12371)

2025-03-27 11:06:03 +02:00