961 Commits

Author SHA1 Message Date
b68222f92c CUDA: add conv_2d_transpose (llama/14287)
* CUDA: add conv_2d_transpose

* remove direct include of cuda_fp16

* Review: add brackets for readability, remove ggml_set_param and add asserts
2025-06-21 07:34:17 +03:00
a455dcb04c sycl: add usage of enqueue_functions extension (llama/14244)
* Add header and namespace to use enqueue_functions extension

* Convert submit and parallel_for to use new extension in convert.cpp

* Convert submit and parallel_for to use extension in ggml-sycl.cpp

* Convert submit and parallel_for to use extension in gla.cpp

* Convert submit and parallel_for in mmq.cpp

* Convert submit and parallel_for in mmvq.cpp

* Convert submit and parallel_for in remaining files

* Convert all simple parallel_for to nd_launch from enqueue_functions
extension

* Wrapping extension in general function

Create a general function that enable the enqueue_functions extension if
it is enable in the compiler, otherwise call the general SYCL function
to launch kernels.

---------

Signed-off-by: nscipione <nicolo.scipione@codeplay.com>
2025-06-21 07:34:17 +03:00
af7168174c Implement GGML_CPU_ALL_VARIANTS for PowerPC (llama/14286)
* Add PowerPC feature detection and scoring

* ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for PowerPC

* ggml-cpu: Delay some initializations until function is called

When using GGML_BACKEND_DL=ON, these initializations might use
instructions that are not supported by the current CPU.

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-06-21 07:34:17 +03:00
33d1f0a3e0 cuda : synchronize graph capture and cublas handle destruction (llama/14288)
Workarounds an issue that may cause CUDA graph capture to fail when a cuBLAS handle is destroyed in a different thread
2025-06-21 07:34:17 +03:00
018b2d340e ggml : fix repack work size for mul_mat_id (llama/14292)
ggml-ci
2025-06-21 07:34:17 +03:00
694f435d22 ggml: Update KleidiAI to v1.9.0 (llama/14277) 2025-06-21 07:34:17 +03:00
5efd43c956 CUDA: add conv_2d_dw (llama/14265)
* CUDA: add conv_2d_dw

* better naming

* simplify using template

* Review: fix operation ordering in ggml-cuda, use __forceinline__, use more const
2025-06-21 07:34:17 +03:00
71adde9203 ggml-cpu : remove unnecesary arm feature detection (llama/14281)
Support for Arm runtime feature detection has now been added to GGML_CPU_ALL_VARIANTS. This removes the old and not very functional code.
2025-06-21 07:34:17 +03:00
cef59c1e26 build : suppress gcc15 compile warnings (llama/14261)
* Change _contains_any() substrs to std::string_view and fix the find comparison logic.
2025-06-21 07:34:17 +03:00
a02a2d4240 sycl: Cleanup codepaths in Get Rows in sycl backend (llama/14215)
Addresses unused reorder path
2025-06-21 07:34:17 +03:00
be4ea0826b llamafile : support s390x SIMD instruction set (llama/14273) 2025-06-21 07:34:17 +03:00
1aca7b5c8a Vulkan: Set device max size for host memory to avoid OOM warning and fallback to CPU buffer (llama/14249) 2025-06-21 07:34:17 +03:00
b251d739ad metal : add mean kernel (llama/14267)
* metal : add mean kernel

ggml-ci

* cont : dedup implementation

ggml-ci
2025-06-21 07:34:17 +03:00
203451bcba ggml-cpu: reduce asm calls for hsum (llama/14037)
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-06-21 07:34:17 +03:00
34940abe53 ggml-cpu: fix uncaught underscore terminators (llama/14023)
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-06-21 07:34:17 +03:00
4fc9c34126 ggml: Add Apple support for GGML_CPU_ALL_VARIANTS (llama/14258) 2025-06-21 07:34:17 +03:00
471df139fa Add ggml_roll (ggml/1274)
* ggml : add ggml_roll

* use set/get_op_params & std::min
2025-06-21 07:34:17 +03:00
0e068779c7 cmake: remove shader-gen step-targets from ggml-vulkan (llama/14226)
* Remove step-targets from vulkan-shaders-gen

* Unset DESTDIR when building vulkan-shaders-gen
2025-06-18 12:40:34 +03:00
ac8a303c9a ggml-cpu : remove the weak alias trick (llama/14221) 2025-06-18 12:40:34 +03:00
2a84593960 musa: fix build warning (unused variable) (llama/14231)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-06-18 12:40:34 +03:00
44871c8a3e llama : add thread safety test (llama/14035)
* llama : add thread safety test

* llamafile : remove global state

* llama : better LLAMA_SPLIT_MODE_NONE logic

when main_gpu < 0 GPU devices are not used

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-06-18 12:40:34 +03:00
ad6cd94a3a cmake: clean up external project logic for vulkan-shaders-gen (llama/14179)
* Remove install step for vulkan-shaders-gen

* Add install step to normalize msvc with make

* Regenerate modified shaders at build-time
2025-06-18 12:40:34 +03:00
dbad9d8fba HIP: disable rocwmma on gfx12 by default until rocm 7.0 (llama/14202) 2025-06-18 12:40:34 +03:00
518835ee56 ggml: Add Android support for GGML_CPU_ALL_VARIANTS (llama/14206) 2025-06-18 12:40:34 +03:00
a3d1c55c66 vulkan: mutex around vkQueueSubmit (llama/14127)
This fixes the remaining crash in test-thread-safety on my system.
2025-06-18 12:40:34 +03:00
0c25129d30 ggml-cpu : rework weak alias on apple targets (llama/14146)
* ggml-cpu : rework weak alias on apple targets

* fix powerpc detection

* fix ppc detection

* fix powerpc detection on darwin
2025-06-18 12:40:34 +03:00
a433680a2f CUDA/HIP: fix ssm_scan on devices where warp size is not 32 (llama/14196) 2025-06-18 12:40:34 +03:00
aeaed9806f HIP: Replace usage of depricated preprocessor macro __AMDGCN_WAVEFRONT_SIZE__ (llama/14183) 2025-06-18 12:40:34 +03:00
4ea599afdf sycl: Adding additional cpy dbg print output (llama/14034) 2025-06-18 12:40:34 +03:00
783cf0309f SYCL: Bump oneMath commit (llama/14152)
Update oneMath commit to merged PR https://github.com/uxlfoundation/oneMath/pull/669
which adds SYCL-Graph support for recording CUDA BLAS commands.

With this change the `MUL_MAT` tests now pass on DPC++ CUDA backends with SYCL-Graph
enabled. Prior to this change, an error would be thrown.

```
$ GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0 -o MUL_MAT -p type_a=f16,type_b=f32,m=16,n=1,k=256,bs=\\[1,1\\],nr=\\[2

UR CUDA ERROR:
        Value:           700
        Name:            CUDA_ERROR_ILLEGAL_ADDRESS
        Description:     an illegal memory access was encountered
        Function:        operator()
        Source Location: $HOME/dpcpp/unified-runtime/source/adapters/cuda/queue.cpp:154

Native API failed. Native API returns: 2147483646 (UR_RESULT_ERROR_UNKNOWN)
Exception caught at file:$HOME/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp, line:3598, func:operator()
SYCL error: CHECK_TRY_ERROR((stream)->wait()): Meet error in this line code!
  in function ggml_backend_sycl_synchronize at $HOME/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:3598
$HOME/llama.cpp/ggml/src/ggml-sycl/../ggml-sycl/common.hpp:118: SYCL error
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
```
2025-06-18 12:40:34 +03:00
0097eaf839 sycl: Remove not needed copy f16->f32 for dnnl mul mat (llama/14125) 2025-06-18 12:40:34 +03:00
a96a880f7b cmake : handle whitepsaces in path during metal build (llama/14126)
* cmake : handle whitepsaces in path during metal build

ggml-ci

* cont : proper fix

ggml-ci

---------

Co-authored-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2025-06-18 12:40:34 +03:00
26c16ad6bd Implement GGML_CPU_ALL_VARIANTS for ARM (llama/14080)
* ggml-cpu: Factor out feature detection build from x86

* ggml-cpu: Add ARM feature detection and scoring

This is analogous to cpu-feats-x86.cpp. However, to detect compile-time
activation of features, we rely on GGML_USE_<FEAT> which need to be set
in cmake, instead of GGML_<FEAT> that users would set for x86.

This is because on ARM, users specify features with GGML_CPU_ARM_ARCH,
rather than with individual flags.

* ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for ARM

Like x86, however to pass around arch flags within cmake, we use
GGML_INTERNAL_<FEAT> as we don't have GGML_<FEAT>.

Some features are optional, so we may need to build multiple backends
per arch version (armv8.2_1, armv8.2_2, ...), and let the scoring
function sort out which one can be used.

* ggml-cpu: Limit ARM GGML_CPU_ALL_VARIANTS to Linux for now

The other platforms will need their own specific variants.

This also fixes the bug that the the variant-building branch was always
being executed as the else-branch of GGML_NATIVE=OFF. The branch is
moved to an elseif-branch which restores the previous behavior.
2025-06-18 12:40:34 +03:00
40d0d47cf1 vulkan: Better thread-safety for command pools/buffers (llama/14116)
This change moves the command pool/buffer tracking into a vk_command_pool
structure. There are two instances per context (for compute+transfer) and
two instances per device for operations that don't go through a context.
This should prevent separate contexts from stomping on each other.
2025-06-18 12:40:34 +03:00
40c6525517 vulkan: Track descriptor pools/sets per-context (llama/14109)
Use the same descriptor set layout for all pipelines (MAX_PARAMETER_COUNT == 8)
and move it to the vk_device. Move all the descriptor pool and set tracking to
the context - none of it is specific to pipelines anymore. It has a single vector
of pools and vector of sets, and a single counter to track requests and a single
counter to track use.
2025-06-18 12:40:34 +03:00
74c68067dc opencl: add mul_mv_id_q4_0_f32_8x_flat (llama/14003) 2025-06-18 12:40:34 +03:00
794bf23994 Vulkan: Don't default to CPU device (like llvmpipe), even if no other device is available, to allow fallback to CPU backend (llama/14099) 2025-06-18 12:40:34 +03:00
26dcc196c7 rpc : nicer error messages for RPC server crash (llama/14076) 2025-06-18 12:40:34 +03:00
ffe5400d1b ggml : disable warnings for tests when using MSVC (ggml/1273)
* ggml : disable warnings for tests when using MSVC

This commit disables warnings for tests on windows when using MSVC.

The motivation for this is that this brings the build output more
inline with what Linux/MacOS systems produce.

There is still one warning generated for the tests which is:
```console
  Building Custom Rule C:/ggml/tests/CMakeLists.txt
cl : command line  warning D9025: overriding '/DNDEBUG' with '/UNDEBUG'
[C:\ggml\build\tests\test-arange.vcxproj]
  test-arange.cpp
  test-arange.vcxproj -> C:\ggml\build\bin\Release\test-arange.exe
```

* ggml : fix typo in tests disable list
2025-06-18 12:40:34 +03:00
1b01c0cc4e ggml : remove unused ggml_context_container (ggml/1272)
This commit removes the unused `ggml_context_container` structure from
the ggml library. It looks like the usage of this struct was removed in
Commit 4757fe18d56ec11bf9c07feaca6e9d5b5357e7f4 ("ggml : alloc
ggml_contexts on the heap (whisper/2525)").

The motivation for this changes is to improve code clarity/readability.
2025-06-18 12:40:34 +03:00
db30f46761 examples : include examples in msvc disable warn (ggml/1270)
This commit adds the examples in the "list" of targets to ignore MSVC
warnings.

The motivation for this is that currently the examples generate a number
of warnings that are ignore/disabled for the core ggml project. This
makes for a cleaner output when building.
2025-06-18 12:40:34 +03:00
93d543905e ggml : fix weak alias win32 (#0)
Some checks failed
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, x86, 0.3.29, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, x64_64, 0.3.29, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / android_java (push) Has been cancelled
CI / bindings-java (push) Has been cancelled
CI / quantize (push) Has been cancelled
CI / release (push) Has been cancelled
CI / coreml-base-en (push) Has been cancelled
CI / vad (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main-intel.Dockerfile platform:linux/amd64 tag:main-intel]) (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main-musa.Dockerfile platform:linux/amd64 tag:main-musa]) (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled
Examples WASM / deploy-wasm-github-pages (push) Has been cancelled
ggml-ci
2025-06-10 12:40:33 +03:00
175e7e4f1a files : remove old sources (part 2) 2025-06-10 12:40:33 +03:00
38347a7dda files : remove old sources 2025-06-10 12:40:33 +03:00
7a675807a2 metal : use less stack memory in FA kernel (llama/14088)
* metal : use less stack memory in FA kernel

ggml-ci

* cont : fix BF16 variant
2025-06-10 12:40:33 +03:00
8cbc889f85 ggml-cpu : split arch-specific implementations (llama/13892)
* move ggml-cpu-aarch64 to repack

* split quantize_row_q8_0/1

* split helper functions

* split ggml_vec_dot_q4_0_q8_0

* split ggml_vec_dot_q4_1_q8_1

* split ggml_vec_dot_q5_0_q8_0

* split ggml_vec_dot_q5_1_q8_1

* split ggml_vec_dot_q8_0_q8_0

* split ggml_vec_dot_tq1_0_q8_K

* split ggml_vec_dot_tq2_0_q8_K

* split ggml_vec_dot_q2_K_q8_K

* split ggml_vec_dot_q3_K_q8_K

* split ggml_vec_dot_q4_K_q8_K

* split ggml_vec_dot_q5_K_q8_K

* split ggml_vec_dot_q6_K_q8_K

* split ggml_vec_dot_iq2_xxs_q8_K

* split ggml_vec_dot_iq2_xs_q8_K

* split ggml_vec_dot_iq2_s_q8_K

* split ggml_vec_dot_iq3_xxs_q8_K

* split ggml_vec_dot_iq3_s_q8_K

* split ggml_vec_dot_iq1_s_q8_K

* split ggml_vec_dot_iq1_m_q8_K

* split ggml_vec_dot_iq4_nl_q8_0

* split ggml_vec_dot_iq4_xs_q8_K

* fix typos

* fix missing prototypes

* rename ggml-cpu-quants.c

* rename ggml-cpu-traits

* rename arm folder

* move cpu-feats-x86.cpp

* rename ggml-cpu-hbm

* update arm detection macro in quants.c

* move iq quant tables

* split ggml_quantize_mat_q8_0/K

* split ggml_gemv_*

* split ggml_gemm_*

* rename namespace aarch64 to repack

* use weak aliases to replace test macros

* rename GGML_CPU_AARCH64 to GGML_CPU_REPACK

* rename more aarch64 to repack

* clean up rebase leftover

* fix compilation errors

* remove trailing spaces

* try to fix clang compilation errors

* try to fix clang compilation errors again

* try to fix clang compilation errors, 3rd attempt

* try to fix clang compilation errors, 4th attempt

* try to fix clang compilation errors, 5th attempt

* try to fix clang compilation errors, 6th attempt

* try to fix clang compilation errors, 7th attempt

* try to fix clang compilation errors, 8th attempt

* try to fix clang compilation errors, 9th attempt

* more cleanup

* fix compilation errors

* fix apple targets

* fix a typo in arm version of ggml_vec_dot_q4_K_q8_K

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-06-10 12:40:33 +03:00
e16a84cd95 cuda : fix device sync on buffer clear (llama/14033) 2025-06-10 12:40:33 +03:00
26282282fa CANN: Simplify the environment variable setting(#13104)
* Simplify the environment variable setting to specify the memory pool type.

* Adjust the GGML_CANN_ASYNC_MODE setting to accept yes, enable, 1, or on (case-insensitive) as valid options.

* update

* fix CI

* update

* delete whitespace

* fix according to review

* update CANN.md

* update CANN.md
2025-06-10 12:40:33 +03:00
4737a8c780 sycl: Add reorder to Q6_K mmvq implementation (llama/13885)
* Add Reorder to Q6_K mmvq implementation

* Address PR comments: clean up comments

* Remove unused parameter after refactoring q4_k

* Adding inline to function and removing unnecessary reference to int

---------

Signed-off-by: nscipione <nicolo.scipione@codeplay.com>
2025-06-10 12:40:33 +03:00
8a70f4d18b cuda : fix buffer type check with integrated GPUs (llama/14069) 2025-06-10 12:40:33 +03:00