whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-05-27 20:44:12 +00:00

History

agray3 042e95d92f Vectorize load instructions in dmmv f16 CUDA kernel (llama/9816)

* Vectorize load instructions in dmmv f16 CUDA kernel

Replaces scalar with vector load instructions, which substantially
improves performance on NVIDIA HBM GPUs, e.g. gives a 1.27X overall
speedup for Meta-Llama-3-8B-Instruct-F16 BS1 inference evaluation on
H100 SXM 80GB HBM3. On GDDR GPUs, there is a slight (1.01X) speedup.

* addressed comment

* Update ggml/src/ggml-cuda/dmmv.cu

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

2024-11-01 10:19:05 +02:00

cmake

whisper : reorganize source code + improve CMake (#2256 )

2024-06-26 19:34:09 +03:00

include

rpc : add backend registry / device interfaces (llama/9812)

2024-11-01 10:19:05 +02:00

src

Vectorize load instructions in dmmv f16 CUDA kernel (llama/9816)

2024-11-01 10:19:05 +02:00

.gitignore

whisper : reorganize source code + improve CMake (#2256 )

2024-06-26 19:34:09 +03:00

CMakeLists.txt

cmake : do not hide GGML options + rename option (llama/9465)

2024-09-24 19:45:08 +03:00

ggml_vk_generate_shaders.py

whisper : reorganize source code + improve CMake (#2256 )

2024-06-26 19:34:09 +03:00