fxzjshm
c310272fa0
HIP: force max threads per block to be 1024 (llama/11621)
...
Some old/vendor forked version of llvm still use 256. Explicitly set it to 1024 to align with upstream llvm.
Signed-off-by: fxzjshm <fxzjshm@163.com>
2025-02-27 08:55:36 +02:00
Johannes Gäßler
f8a831779e
CUDA: use mma PTX instructions for FlashAttention (llama/11583)
...
* CUDA: use mma PTX instructions for FlashAttention
* __shfl_sync workaround for movmatrix
* add __shfl_sync to HIP
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-02-03 22:00:57 +02:00
uvos
43c744ce8b
HIP: require at least HIP 5.5
2025-02-03 22:00:57 +02:00
uvos
a160fa0f3a
Hip: disable VMM on hip as it seams that it dosent work in some configurations (llama/11420)
2025-02-03 22:00:57 +02:00
uvos
0282ad8fd1
hip : Add hipGraph and VMM support to ROCM (llama/11362)
...
* Add hipGraph support
* Enable VMM on rocm
2025-02-03 22:00:57 +02:00
Radoslav Gerganov
a4b00bcaaf
ggml : do not define GGML_USE_CUDA when building with GGML_BACKEND_DL (llama/11211)
...
Build fails when using HIP and GGML_BACKEND_DL:
```
/usr/bin/ld: ../ggml/src/libggml.so: undefined reference to `ggml_backend_cuda_reg'
collect2: error: ld returned 1 exit status
```
This patch fixes this.
2025-01-14 10:38:01 +02:00
Diego Devesa
77e3e4a090
ggml : add support for dynamic loading of backends (llama/10469)
...
* ggml : add support for dynamic loading of backends
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-12-08 20:14:35 +02:00
Johannes Gäßler
dcb2922d1d
CUDA: remove DMMV, consolidate F16 mult mat vec (llama/10318)
2024-11-20 21:00:08 +02:00
Diego Devesa
746bf2596f
ggml : build backends as libraries (llama/10256)
...
* ggml : build backends as libraries
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>
2024-11-20 21:00:08 +02:00