Johannes Gäßler
2d436bfbfb
CUDA: FA support for Deepseek (Ampere or newer) (llama/13306)
...
* CUDA: FA support for Deepseek (Ampere or newer)
* do loop unrolling via C++ template
2025-05-13 13:59:21 +03:00
Johannes Gäßler
2d70cd36d7
CUDA: optimize FA for GQA + large batches (llama/12014)
2025-02-27 08:55:36 +02:00
Johannes Gäßler
f8a831779e
CUDA: use mma PTX instructions for FlashAttention (llama/11583)
...
* CUDA: use mma PTX instructions for FlashAttention
* __shfl_sync workaround for movmatrix
* add __shfl_sync to HIP
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-02-03 22:00:57 +02:00
Johannes Gäßler
8c4f30497a
CUDA: MMQ code deduplication + iquant support (llama/8495)
...
* CUDA: MMQ code deduplication + iquant support
* 1 less parallel job for CI build
2024-08-08 22:48:46 +03:00
Johannes Gäßler
5dc636a65a
CUDA: MMQ support for iq4_nl, iq4_xs (llama/8278)
2024-07-08 14:53:55 +03:00
Georgi Gerganov
e30c679928
whisper : reorganize source code + improve CMake ( #2256 )
...
* scripts : update sync [no ci]
* files : reorganize [no ci]
* sync : llama.cpp
* cmake : link math library
* cmake : build normal ggml library
* files : move headers to include
* objc : fix path to ggml-metal.h
* ci : fix WHISPER_CUDA -> GGML_CUDA
* scripts : sync LICENSE [no ci]
2024-06-26 19:34:09 +03:00