* Rename oneMKL Interface to oneMath
* Use oneMath for Intel vendor
* Rename occurences to mkl
* clang-format
* Silence verbose warnings
* Set oneMath HIP_TARGETS
* Fix silence warnings
* Remove step to build oneMath from build instructions
* Use fixed oneMath version
* Remove INTEL_CPU
* Fold CMake oneDNN conditions
* Use Intel oneMKL for Intel devices
* Improve CMake message
* Link against MKL::MKL_SYCL::BLAS only
* Move oneMath documentation to Nvidia and AMD sections
* Upgrade init_tensor API to return a ggml_status
To prepare for an 'abort-free' ggml
(ggml not to abort on OOMs but return a OOM status),
as agreeed with Diego in the ggml repo,
upgrade the init_tensor() and view_init() APIs
to return a ggml_status.
* misc fixes
---------
Co-authored-by: slaren <slarengh@gmail.com>
* opt performance by reorder for Intel GPU
* detect hw type and save opt feature, and print opt feature
* correct name
* support optimize graph once when compute graph, record the opt status in tensor->extra, make CI passed
* add env variable GGML_SYCL_DISABLE_OPT for debug
* use syclex::architecture replace the custom hw define, update the guide for GGML_SYCL_DISABLE_OPT
* add performance data
* mv getrows functions to separeted files
* fix global variables
---------
Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
Implemented ggml_sycl_op_soft_max() F16 src1(mask) support for which a pragma deprecation warning was added during #5021.
To do this, had to decouple it from ggml_sycl_op_flatten which always considered src1 to be of fp32 type(many OP functions are dependent on it).
* SYCL: SOFTMAX F16 mask support and other fixes
* test-backend-ops: Add F16 mask test cases
* Implement host pool for matrix_info
Creating a new memory pool on the host to store memory location for
matrix_info needed to launch gemm_batch from oneMKL/oneMath.
Removing complex support in gemm_batch since it is not used in llama.cpp
* Remove unnecessary headers and cast
* Reorder member variable to avoid warning on initialization
* Formatting
* Remove unused variable
* Address PR review feedback - remove warning
---------
Signed-off-by: nscipione <nicolo.scipione@codeplay.com>
* SYCL: refactor ggml_sycl_compute_forward
* SYCL: add back GGML_USED(dst) to ggml_sycl_cpy
* SYCL: add function name to noop debug
* SYCL: Some device info print refactoring and add details of XMX availability
* Migrate to tensor->buffer for checking backend buffer type: 1
* SYCL: common.cpp try to migrate away from tensor->backend
* SYCL: fix assertions and add proper comments
* SYCL: remove extra space
* SYCL: Add back static to ggml_backend_buffer_is_sycl_split function
* SYCL: Add pragma directive to suppress warning spam
* SYCL: Integrate debug logs with GGML_LOG and other fixes
* Revert "SYCL: Integrate debug logs with GGML_LOG and other fixes"
This reverts commit 2607b7de0f0d2f4f1f690226f86fa861aa39cb97.
Let's keep the current SYCL specific logging mechanism for now
* SYCL: Use GGML_SYCL_DEBUG after reverting
* SYCL: reg_get_proc_address func, update to the current func signature
* SYCL: Refactor SYCL buffer checks in ggml_sycl_cpy_tensor_2d
* Try to reduce some unused and typecast warnings
* Reduce compiler warnings step 2
* add a newline at the end of the file
* Initialize nreduce as size_t
* [SYCL] Remove pragma directives from mmq.cpp
* SYCL: mmq add condition to prevent blocks_per_tile_x_row variable from becoming 0
* SYCL softmax: Initialize nreduce as size_t
* ggml-sycl.cpp: fix some trailing whitespaces
* SYCL: remove the unused variables instead of commenting it out
* SYCL poo2d kernel: set NAN for invalid pooling op
* SYCL gemm.hpp: remove pragma directives
* SYCL gemm.hpp: use const cast to properly support dnnl::memory
* SYCL: wkv6 remove a comment
* SYCL: clean comments step 2
* SYCL: clean comments and variables step 3
* SYCL: Use GGML_UNUSED for unused variables
* SYCL: remove extra empty lines and a comment
* Remove TODO
* cleanup spaces
* add a stdout for unsupported op
* use sycl printf over fprintf
* remove prints for CI
* SYCL ggml-sycl: pool2D use sycl::nan and remove if-else block
---------
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
* rename ggml-cpu-aarch64.c to .cpp
* reformat extra cpu backend.
- clean Q4_0_N_M and IQ4_0_N_M
- remove from "file" tensor type
- allow only with dynamic repack
- extract cpu extra bufts and convert to C++
- hbm
- "aarch64"
- more generic use of extra buffer
- generalise extra_supports_op
- new API for "cpu-accel":
- amx
- aarch64
* clang-format
* Clean Q4_0_N_M ref
Enable restrict on C++
* add op GGML_OP_MUL_MAT_ID for Q4_0_N_M with runtime repack
* added/corrected control on tensor size for Q4 repacking.
* Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* add debug logs on repacks.
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* [SYCL] Move to Compile Time backend selection on oneMKL Interface for NVIDIA backend
Move to compile time selection to backend to avoid latency at run time.
Add it to all mkl gemm calls and only for NVIDIA backend.
Signed-off-by: nscipione <nicolo.scipione@codeplay.com>
* Formatting
* Address PR comments to increase readibility
---------
Signed-off-by: nscipione <nicolo.scipione@codeplay.com>