Commit Graph

6 Commits

Author SHA1 Message Date
Johannes Gäßler
96b8419b27 CUDA: fix FA out-of-bounds reads (llama/7479) 2024-06-16 18:19:48 +03:00
Johannes Gäßler
3c63f4cf35 CUDA: fix FA out-of-bounds writes (llama/7465) 2024-06-16 18:19:48 +03:00
Georgi Gerganov
5848dfd9c8 cuda : fix compile warning (llama/7454) 2024-06-16 18:19:48 +03:00
Johannes Gäßler
29ab5d0326 CUDA: remove incorrect precision check (llama/7454) 2024-06-16 18:19:48 +03:00
Johannes Gäßler
45b5b95e29 CUDA: deduplicate FlashAttention code (llama/7352) 2024-06-16 18:19:48 +03:00
Johannes Gäßler
ec52f900e4 CUDA: faster large batch FA without tensor cores (llama/7314) 2024-06-16 18:19:48 +03:00