Johannes Gäßler
|
e08c62149b
|
CUDA: refactor mmq, dmmv, mmvq (llama/7716)
* CUDA: refactor mmq, dmmv, mmvq
* fix out-of-bounds write
* struct for qk, qr, qi
* fix cmake build
* mmq_type_traits
|
2024-06-16 18:19:48 +03:00 |
|
Johannes Gäßler
|
5582039d0a
|
CUDA: quantized KV support for FA vec (llama/7527)
* CUDA: quantized KV support for FA vec
* try CI fix
* fix commented-out kernel variants
* add q8_0 q4_0 tests
* fix nwarps > batch size
* split fattn compile via extern templates
* fix flake8
* fix metal tests
* fix cmake
* make generate_cu_files.py executable
* add autogenerated .cu files
* fix AMD
* error if type_v != FP16 and not flash_attn
* remove obsolete code
|
2024-06-16 18:19:48 +03:00 |
|
Georgi Gerganov
|
45ddda8e0c
|
ggml : drop support for QK_K=64 (llama/7473)
* ggml : drop support for QK_K=64
ggml-ci
* opencl : restore QK_K=256 define
|
2024-06-16 18:19:48 +03:00 |
|
Georgi Gerganov
|
2948c740a2
|
sync : ggml (#2001)
* sync : update scripts
* sync : ggml
* talk-llama : sync llama.cpp
* make : WHISPER_CUBLAS -> WHISPER_CUDA
* ci : try to fix sycl build
* talk-llama : fix make build
|
2024-03-27 18:55:10 +02:00 |
|