5 Commits

Author SHA1 Message Date
lhez
6fc0ae2f5a opencl: add multi and vision rope, gelu_quick and im2col (llama/12600)
* opencl: add `im2col`

* opencl: add `gelu_quick`

* opencl: add mrope

* opencl: add vision rope
2025-03-28 21:47:42 +02:00
lhez
a34cb73dc2 opencl: Noncontiguous norm, rms_norm, disable fp16 for some ops (llama/12217)
* opencl: support noncontiguous `norm`

* opencl: support noncontiguous `rms_norm`

* opencl: disable fp16 for `ADD`, `MUL`, `SCALE`, `RELU`, `GELU`, `SILU`, `CLAMP`
2025-03-08 15:13:01 +02:00
lhez
b43b9d928c opencl: fix for small models (llama/11950)
* opencl: fix small shape gemv, remove unused extensions

* opencl: fix `transpose_16`, `dump_tensor`, enforce subgroup size

* opencl: fix for token length < 4

* opencl: use wave size of 64 for all Adreno GPUs

---------

Co-authored-by: Shawn Gu <quic_shawngu@quicinc.com>
Co-authored-by: Skyler Szot <quic_sszot@quicinc.com>
2025-02-27 08:55:36 +02:00
lhez
4b48fe449a opencl: Fix rope and softmax (llama/11833)
* opencl: fix `ROPE`

* opencl: fix `SOFT_MAX`

* Add fp16 variant

* opencl: enforce subgroup size for `soft_max`
2025-02-27 08:55:36 +02:00
lhez
1deb41f0e7 ggml : add opencl backend (skip) (llama/10693)
---------

Co-authored-by: Skyler Szot <quic_sszot@quicinc.com>
Co-authored-by: Shangqing Gu <quic_shawngu@quicinc.com>
Co-authored-by: Alexander Angus <quic_aangus@quicinc.com>
Co-authored-by: Hongqiang Wang <quic_wangh@quicinc.com>
Co-authored-by: Max Krasnyansky <quic_maxk@quicinc.com>
2025-01-14 10:38:01 +02:00