whisper.cpp/fattn-vec-f16-instance-hs128-q4_1-q8_0.cu at e5e38d4920a6843944b62bac4e239ba7ee314da3 - whisper.cpp - Gitea

ExternalVendorCode/whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2024-12-19 12:47:52 +00:00

Johannes Gäßler 5582039d0a CUDA: quantized KV support for FA vec (llama/7527)

* CUDA: quantized KV support for FA vec

* try CI fix

* fix commented-out kernel variants

* add q8_0 q4_0 tests

* fix nwarps > batch size

* split fattn compile via extern templates

* fix flake8

* fix metal tests

* fix cmake

* make generate_cu_files.py executable

* add autogenerated .cu files

* fix AMD

* error if type_v != FP16 and not flash_attn

* remove obsolete code

2024-06-16 18:19:48 +03:00

6 lines

179 B

Plaintext

Raw Blame History

 // This file has been autogenerated by generate-variants.py, do not edit manually.
 #include "../fattn-vec-f16.cuh"
 DECL_FATTN_VEC_F16_CASE(128, GGML_TYPE_Q4_1, GGML_TYPE_Q8_0);