lhez
|
6fc0ae2f5a
|
opencl: add multi and vision rope, gelu_quick and im2col (llama/12600)
* opencl: add `im2col`
* opencl: add `gelu_quick`
* opencl: add mrope
* opencl: add vision rope
|
2025-03-28 21:47:42 +02:00 |
|
lhez
|
a34cb73dc2
|
opencl: Noncontiguous norm , rms_norm , disable fp16 for some ops (llama/12217)
* opencl: support noncontiguous `norm`
* opencl: support noncontiguous `rms_norm`
* opencl: disable fp16 for `ADD`, `MUL`, `SCALE`, `RELU`, `GELU`, `SILU`, `CLAMP`
|
2025-03-08 15:13:01 +02:00 |
|
lhez
|
b43b9d928c
|
opencl: fix for small models (llama/11950)
* opencl: fix small shape gemv, remove unused extensions
* opencl: fix `transpose_16`, `dump_tensor`, enforce subgroup size
* opencl: fix for token length < 4
* opencl: use wave size of 64 for all Adreno GPUs
---------
Co-authored-by: Shawn Gu <quic_shawngu@quicinc.com>
Co-authored-by: Skyler Szot <quic_sszot@quicinc.com>
|
2025-02-27 08:55:36 +02:00 |
|
lhez
|
4b48fe449a
|
opencl: Fix rope and softmax (llama/11833)
* opencl: fix `ROPE`
* opencl: fix `SOFT_MAX`
* Add fp16 variant
* opencl: enforce subgroup size for `soft_max`
|
2025-02-27 08:55:36 +02:00 |
|
lhez
|
1deb41f0e7
|
ggml : add opencl backend (skip) (llama/10693)
---------
Co-authored-by: Skyler Szot <quic_sszot@quicinc.com>
Co-authored-by: Shangqing Gu <quic_shawngu@quicinc.com>
Co-authored-by: Alexander Angus <quic_aangus@quicinc.com>
Co-authored-by: Hongqiang Wang <quic_wangh@quicinc.com>
Co-authored-by: Max Krasnyansky <quic_maxk@quicinc.com>
|
2025-01-14 10:38:01 +02:00 |
|