agray3
042e95d92f
Vectorize load instructions in dmmv f16 CUDA kernel (llama/9816)
...
* Vectorize load instructions in dmmv f16 CUDA kernel
Replaces scalar with vector load instructions, which substantially
improves performance on NVIDIA HBM GPUs, e.g. gives a 1.27X overall
speedup for Meta-Llama-3-8B-Instruct-F16 BS1 inference evaluation on
H100 SXM 80GB HBM3. On GDDR GPUs, there is a slight (1.01X) speedup.
* addressed comment
* Update ggml/src/ggml-cuda/dmmv.cu
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2024-11-01 10:19:05 +02:00
..
2024-08-08 22:48:46 +03:00
2024-09-24 19:45:08 +03:00
2024-06-26 19:34:09 +03:00
2024-06-26 19:34:09 +03:00
2024-06-26 19:34:09 +03:00
2024-06-26 19:34:09 +03:00
2024-10-05 15:23:51 +03:00
2024-10-05 15:23:51 +03:00
2024-08-08 22:48:46 +03:00
2024-06-26 19:34:09 +03:00
2024-09-24 19:45:08 +03:00
2024-09-24 19:45:08 +03:00
2024-06-26 19:34:09 +03:00
2024-06-26 19:34:09 +03:00
2024-10-05 15:23:51 +03:00
2024-06-26 19:34:09 +03:00
2024-06-26 19:34:09 +03:00
2024-07-08 14:53:55 +03:00
2024-07-08 14:53:55 +03:00
2024-06-26 19:34:09 +03:00
2024-06-26 19:34:09 +03:00
2024-10-05 15:23:51 +03:00
2024-10-05 15:23:51 +03:00
2024-09-24 19:45:08 +03:00
2024-06-26 19:34:09 +03:00
2024-09-24 19:45:08 +03:00
2024-09-24 19:45:08 +03:00
2024-06-26 19:34:09 +03:00
2024-06-26 19:34:09 +03:00
2024-06-26 19:34:09 +03:00
2024-11-01 10:19:05 +02:00
2024-08-08 22:48:46 +03:00
2024-08-28 13:22:20 +03:00
2024-10-05 15:23:51 +03:00
2024-06-26 19:34:09 +03:00
2024-09-24 19:45:08 +03:00
2024-06-26 19:34:09 +03:00
2024-10-05 15:23:51 +03:00
2024-08-28 13:22:20 +03:00
2024-08-28 13:22:20 +03:00
2024-09-24 19:45:08 +03:00
2024-06-26 19:34:09 +03:00
2024-08-08 22:48:46 +03:00
2024-06-26 19:34:09 +03:00
2024-10-03 12:22:17 +03:00
2024-06-26 19:34:09 +03:00
2024-08-08 22:48:46 +03:00
2024-09-24 19:45:08 +03:00
2024-09-24 19:45:08 +03:00
2024-08-08 22:48:46 +03:00
2024-06-26 19:34:09 +03:00
2024-08-08 22:48:46 +03:00
2024-06-26 19:34:09 +03:00
2024-09-24 19:45:08 +03:00
2024-09-24 19:45:08 +03:00
2024-09-24 19:45:08 +03:00
2024-09-24 19:45:08 +03:00
2024-06-26 19:34:09 +03:00
2024-06-26 19:34:09 +03:00
2024-06-26 19:34:09 +03:00
2024-06-26 19:34:09 +03:00
2024-08-08 22:48:46 +03:00
2024-08-08 22:48:46 +03:00
2024-08-28 13:22:20 +03:00
2024-06-26 19:34:09 +03:00
2024-09-24 19:45:08 +03:00
2024-09-24 19:45:08 +03:00
2024-06-26 19:34:09 +03:00
2024-06-26 19:34:09 +03:00
2024-06-26 19:34:09 +03:00
2024-06-26 19:34:09 +03:00
2024-09-24 19:45:08 +03:00
2024-09-24 19:45:08 +03:00
2024-08-28 13:22:20 +03:00
2024-08-28 13:22:20 +03:00
2024-06-26 19:34:09 +03:00
2024-06-26 19:34:09 +03:00
2024-09-24 19:45:08 +03:00
2024-09-24 19:45:08 +03:00
2024-06-26 19:34:09 +03:00
2024-06-26 19:34:09 +03:00
2024-08-08 22:48:46 +03:00