* [cl][adreno] Add Adreno GPU support
Add new OpenCL backend to support Adreno GPUs
---------
Co-authored-by: Skyler Szot <quic_sszot@quicinc.com>
Co-authored-by: Shangqing Gu <quic_shawngu@quicinc.com>
Co-authored-by: Alexander Angus <quic_aangus@quicinc.com>
Co-authored-by: Hongqiang Wang <quic_wangh@quicinc.com>
Co-authored-by: Max Krasnyansky <quic_maxk@quicinc.com>
* [cl][ci] Add workflow for CL
* [cl][adreno] Fix memory leak for non SMALL_ALLOC path
* opencl: integrate backend dyn.load interface and fix compiler and format warnings
* opencl: remove small-alloc support and fix build errors for non-opencl platforms
* opencl: fixed merge conflict (MUSA added twice in cmake)
* opencl-ci: use RUNNER_TEMP instead of github.workspace
* opencl: fix embed tool invocation with python3
* opencl: CI workflow fixes
* opencl: Clean up small-alloc in CMake files
* opencl: cleanup ggml-opencl2 header file
* opencl: use ulong for offsets and strides in ADD kernel
* opencl: use cl_ulong for all offsets
* opencl: use cl_ulong for sizes and strides
* opencl: use `GGML_LOG_xxx` instead of `fprintf(stderr, ...)`
* opencl: rename backend `opencl2` -> `opencl`
* opencl: rename kernel files `ggml-opencl2` -> `ggml-opencl`
* opencl: make OpenCL required, remove redundant lib and inc directories
* `ggml-base`, `..` and `.` are added by `ggml_add_backend_library`
* opencl: rename backend - funcs, structs, etc `opencl2` -> `opencl`
* opencl: remove copyright marker since main license already covers
* opencl: replace some more OPENCL2 leftovers
* opencl: remove limits on `tensor_extra`
* opencl: use pools for `tensor_extra`
* opencl: fix compiler warnings with GCC and Clang
Still getting the warning about clCreateCmdQueue being obsolete.
Will fix that separately.
* opencl: fail gracefully if opencl devices are not available
Also for unsupported GPUs.
* opencl: fix MSVC builds (string length error)
* opencl: check for various requirements, allow deprecated API
* opencl: update log message for unsupported GPUs
---------
Co-authored-by: Skyler Szot <quic_sszot@quicinc.com>
Co-authored-by: Shangqing Gu <quic_shawngu@quicinc.com>
Co-authored-by: Alexander Angus <quic_aangus@quicinc.com>
Co-authored-by: Hongqiang Wang <quic_wangh@quicinc.com>
Co-authored-by: Max Krasnyansky <quic_maxk@quicinc.com>
* Add option to set the SYCL architecture for all targets
* Convert GGML_SYCL_HIP_TARGET to the more generic GGML_SYCL_ARCH option
* Document that setting GGML_SYCL_ARCH can improve the performance
* metal : opt-in compile flag for BF16
ggml-ci
* ci : use BF16
ggml-ci
* swift : switch back to v12
* metal : has_float -> use_float
ggml-ci
* metal : fix BF16 check in MSL
ggml-ci
add intel amx isa detection
add vnni kernel for gemv cases
add vnni and amx kernel support for block_q8_0
code cleanup
fix packing B issue
enable openmp
fine tune amx kernel
switch to aten parallel pattern
add error message for nested parallelism
code cleanup
add f16 support in ggml-amx
add amx kernels for QK_K quant formats: Q4_K, Q5_K, Q6_K and IQ4_XS
update CMakeList
update README
fix some compilation warning
fix compiler warning when amx is not enabled
minor change
ggml-ci
move ggml_amx_init from ggml.c to ggml-amx/mmq.cpp
ggml-ci
update CMakeLists with -mamx-tile, -mamx-int8 and -mamx-bf16
ggml-ci
add amx as an ggml-backend
update header file, the old path for immintrin.h has changed to ggml-cpu-impl.h
minor change
update CMakeLists.txt
minor change
apply weight prepacking in set_tensor method in ggml-backend
fix compile error
ggml-ci
minor change
ggml-ci
update CMakeLists.txt
ggml-ci
add march dependency
minor change
ggml-ci
change ggml_backend_buffer_is_host to return false for amx backend
ggml-ci
fix supports_op
use device reg for AMX backend
ggml-ci
minor change
ggml-ci
minor change
fix rebase
set .buffer_from_host_ptr to be false for AMX backend
* Improve Vulkan shader builds system
- Add dependency to vulkan-shaders-gen to rebuild shaders when changing the shader compilation utility.
- Add option to generate debug info for Vulkan shaders to provide shader source to Vulkan shader profiling tools
* remove not required self dependency
* Update doc for MUSA
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Add GGML_MUSA in Makefile
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Add GGML_MUSA in CMake
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* CUDA => MUSA
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* MUSA adds support for __vsubss4
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Fix CI build failure
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>