Commit Graph

3 Commits

Author SHA1 Message Date
Johannes Gäßler
29ab5d0326 CUDA: remove incorrect precision check (llama/7454) 2024-06-16 18:19:48 +03:00
Johannes Gäßler
45b5b95e29 CUDA: deduplicate FlashAttention code (llama/7352) 2024-06-16 18:19:48 +03:00
Johannes Gäßler
ec52f900e4 CUDA: faster large batch FA without tensor cores (llama/7314) 2024-06-16 18:19:48 +03:00