Commit Graph

2 Commits

Author SHA1 Message Date
Johannes Gäßler
a99e213a82 CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (llama/7860) 2024-06-16 18:19:48 +03:00
Johannes Gäßler
7483d2b61c CUDA: use tensor cores for MMQ (llama/7676)
* CUDA: int8 tensor cores for MMQ (legacy quants)

* fix out-of-bounds writes

* __builtin_assume -> GGML_CUDA_ASSUME

* fix writeback returning too early
2024-06-16 18:19:48 +03:00