Commit Graph

4 Commits

Author SHA1 Message Date
Johannes Gäßler
7483d2b61c CUDA: use tensor cores for MMQ (llama/7676)
* CUDA: int8 tensor cores for MMQ (legacy quants)

* fix out-of-bounds writes

* __builtin_assume -> GGML_CUDA_ASSUME

* fix writeback returning too early
2024-06-16 18:19:48 +03:00
Johannes Gäßler
760497e1ab CUDA: revise q8_1 data layout for mul_mat_q (llama/7824) 2024-06-16 18:19:48 +03:00
Johannes Gäßler
e08c62149b CUDA: refactor mmq, dmmv, mmvq (llama/7716)
* CUDA: refactor mmq, dmmv, mmvq

* fix out-of-bounds write

* struct for qk, qr, qi

* fix cmake build

* mmq_type_traits
2024-06-16 18:19:48 +03:00
Georgi Gerganov
2948c740a2
sync : ggml (#2001)
* sync : update scripts

* sync : ggml

* talk-llama : sync llama.cpp

* make : WHISPER_CUBLAS -> WHISPER_CUDA

* ci : try to fix sycl build

* talk-llama : fix make build
2024-03-27 18:55:10 +02:00