Weizhao Ouyang
e5edcc6259
ggml-cpu: Fix duplicate MATMUL_INT8 (llama/11817)
...
Signed-off-by: Weizhao Ouyang <o451686892@gmail.com>
2025-02-27 08:55:36 +02:00
Johannes Gäßler
556f773d53
CUDA: fix CUDART_VERSION checks (llama/11821)
2025-02-27 08:55:36 +02:00
Sheldon Robinson
91d02de332
Fix #11802 : Compile bug - RegQueryValueExA changed to RegQueryValueEx (llama/11803)
...
* Fix #11802 : Compile bug - RegQueryValueExA changed to RegQueryValueEx
* Fix #11802 : PR #11803 - keep RegQueryValueExA, remove TEXT macro, description needs to be ANSI string
2025-02-27 08:55:36 +02:00
Johannes Gäßler
1b67d72f87
CUDA: use arch list for compatibility check (llama/11775)
...
* CUDA: use arch list for feature availability check
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-02-27 08:55:36 +02:00
Maxim Evtush
14d7c0368d
fix: typos in documentation files (llama/11791)
...
* Update ggml.c
* Update arg.cpp
* Update speculative.h
2025-02-27 08:55:36 +02:00
Danny Milosavljevic
db6e19188a
vulkan: Make Vulkan optional at runtime (ggml/11493). (llama/11494)
...
Co-authored-by: Jeff Bolz <jbolz@nvidia.com>
2025-02-27 08:55:36 +02:00
Wagner Bruna
b4b063a5c9
vulkan: add environment variable GGML_VK_PREFER_HOST_MEMORY to avoid VRAM allocation (llama/11592)
2025-02-27 08:55:36 +02:00
Jeff Bolz
930b739e7a
vulkan: account for lookup tables when checking shared memory size (llama/11502)
2025-02-27 08:55:36 +02:00
Karol Kontny
5981352bb5
ggml: Fix data race in ggml threadpool (llama/11736)
...
After the barrier in last iteration is executed, still the loop termination
condition will be executed. However main thread can destroy the cgraph object
and its nodes already, then another thread will access it, but the thing is already gone.
Also trouble can happen when n_nodes == 0 or abort is called, but I'm not sure if the
prior situation is possible.
Last syncronization should be done after the loop to ensure the cgraph/cplan won't be
accessed after the main thread exits from the function.
2025-02-27 08:55:36 +02:00
Johannes Gäßler
7561da244e
CUDA: fix min. version for movmatrix (llama/11751)
2025-02-27 08:55:36 +02:00
Jeff Bolz
be83f342fb
vulkan: print shared memory size (llama/11719)
2025-02-27 08:55:36 +02:00
Akarshan Biswas
fd369871f7
SYCL: remove XMX info from print devices (llama/11712)
2025-02-27 08:55:36 +02:00
Jinyang He
bbd8364f5e
ggml : optimize and build warning fix for LoongArch (llama/11709)
...
* ggml : optimize convert f32<->f16 for loongarch_asx
* ggml : optimize loongarch_asx extend i16,i8,u8 to i32,i16
* ggml : Fix warnings when run cpu CI locally on LoongArch
2025-02-27 08:55:36 +02:00
Akarshan Biswas
e4102440ef
SYCL: Adjust support condition for norm operators (llama/11674)
...
SYCL does not support non contiguous tensors for norm operations
2025-02-27 08:55:36 +02:00
junchao-zhao
f8242ec483
ggml : fix LoongArch compile error with 128-bit SIMD (llama/11701)
2025-02-27 08:55:36 +02:00
Jeff Bolz
ef51b4cba4
vulkan: optimize coopmat2 iq2/iq3 callbacks (llama/11521)
...
* vulkan: optimize coopmat2 iq2/iq3 callbacks
* build: trigger CI on GLSL compute shader changes
2025-02-27 08:55:36 +02:00
Rémy O
6f08b24146
vulkan: initial support for IQ4_XS quantization (llama/11501)
2025-02-27 08:55:36 +02:00
Jeff Bolz
7c165d7fa8
vulkan: use smaller combined allocations to avoid fragmentation (llama/11551)
2025-02-27 08:55:36 +02:00
Charles Duffy
2f0cf44915
metal : avoid breaking build when metal API predates TARGET_OS_VISION (llama/11690)
...
Avoids breakage in nix flake build introduced by b0569130c5e9c671152c913d82803b7c2f014ff9
2025-02-27 08:55:36 +02:00
Georgi Gerganov
b9c972fd0d
metal : adjust support conditions for norm operators (llama/11671)
...
cont #11659
ggml-ci
2025-02-27 08:55:36 +02:00
Johannes Gäßler
01c9aafbfd
CUDA: support for mat. mul. with ne03 != ne13 (llama/11656)
2025-02-27 08:55:36 +02:00
Johannes Gäßler
bae6bbf487
CUDA: non-contiguous (RMS) norm support (llama/11659)
...
* CUDA: non-contiguous (RMS) norm support
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-02-27 08:55:36 +02:00
fxzjshm
c310272fa0
HIP: force max threads per block to be 1024 (llama/11621)
...
Some old/vendor forked version of llvm still use 256. Explicitly set it to 1024 to align with upstream llvm.
Signed-off-by: fxzjshm <fxzjshm@163.com>
2025-02-27 08:55:36 +02:00
Jhen-Jie Hong
bd0b55dbe0
metal : use residency set for other platforms (llama/11648)
2025-02-27 08:55:36 +02:00
Patrick Peng
ba4645db2c
rpc: fix known RCE in rpc-server (ggml/1103)
...
Add bounds checking in `rpc_server::copy_tensor` to prevent out-of-bounds writes
+ Check if `(uint8_t *)dst->data + ggml_nbytes(src)` remains within the destination buffer’s allocated region.
2025-02-27 08:55:36 +02:00
midnight
46d07b9c85
cmake : fix compile assumptions for power9/etc ( #2777 )
...
CI / ubuntu-22-clang (linux/amd64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/amd64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Waiting to run
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
* Add small comment re: VSX to readme
Co-authored-by: midnight <midnight@example.com>
2025-02-05 14:41:10 +02:00
Christian Kastner
16245b35e4
cmake: Add ability to pass in GGML_BUILD_NUMBER (ggml/1096)
...
This makes git as a dependency optional, and is useful in the case where
ggml is built not from git, but from a tarball, or a distribution source
package.
This conditional also affects GGML_BUILD_COMMIT. Nothing seems to be
using it, though, so there doesn't seem much value factor it out, or
even require it.
2025-02-04 13:03:03 +02:00
Georgi Gerganov
b8ab126343
cmake : sync cmake scripts
2025-02-03 22:00:57 +02:00
Johannes Gäßler
dbeb7916b8
CUDA: fix Volta FlashAttention logic (llama/11615)
2025-02-03 22:00:57 +02:00
Johannes Gäßler
fad2806352
HIP: fix flash_attn_stream_k_fixup warning (llama/11604)
2025-02-03 22:00:57 +02:00
uvos
9906792ec3
CUDA/HIP: add support for selectable warp size to mmv (llama/11519)
...
CUDA/HIP: add support for selectable warp size to mmv
2025-02-03 22:00:57 +02:00
uvos
c49ee07ff4
HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (llama/11601)
...
This fixes a bug where RDNA1 gpus other than gfx1010 where not handled correctly
2025-02-03 22:00:57 +02:00
Johannes Gäßler
f8a831779e
CUDA: use mma PTX instructions for FlashAttention (llama/11583)
...
* CUDA: use mma PTX instructions for FlashAttention
* __shfl_sync workaround for movmatrix
* add __shfl_sync to HIP
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-02-03 22:00:57 +02:00
Olivier Chafik
85451e3612
ci
: use sccache on windows instead of ccache (llama/11545)
...
* Use sccache on ci for windows
* Detect sccache in cmake
2025-02-03 22:00:57 +02:00
uvos
43c744ce8b
HIP: require at least HIP 5.5
2025-02-03 22:00:57 +02:00
uvos
fc2e44490d
HIP: Prepare reduction operators for wave 64
2025-02-03 22:00:57 +02:00
uvos
f41fdad200
CUDA/HIP: add warp_size to cuda_device_info
2025-02-03 22:00:57 +02:00
Rémy Oudompheng
80fa576254
vulkan: implement initial support for IQ2 and IQ3 quantizations (llama/11360)
...
* vulkan: initial support for IQ3_S
* vulkan: initial support for IQ3_XXS
* vulkan: initial support for IQ2_XXS
* vulkan: initial support for IQ2_XS
* vulkan: optimize Q3_K by removing branches
* vulkan: implement dequantize variants for coopmat2
* vulkan: initial support for IQ2_S
* vulkan: vertically realign code
* port failing dequant callbacks from mul_mm
* Fix array length mismatches
* vulkan: avoid using workgroup size before it is referenced
* tests: increase timeout for Vulkan llvmpipe backend
---------
Co-authored-by: Jeff Bolz <jbolz@nvidia.com>
2025-02-03 22:00:57 +02:00
Jeff Bolz
75e7d0585e
vulkan: Catch pipeline creation failure and print an error message (llama/11436)
...
* vulkan: Catch pipeline creation failure and print an error message
Also, fix some warnings from my on-demand compile change.
* vulkan: fix pipeline creation logging
2025-02-03 22:00:57 +02:00
uvos
682a6f5f87
HIP: Supress transformation warning in softmax.cu
...
loops with bounds not known at compile time can not be unrolled.
when ncols_template == 0, the bounds of the loop are not constexpr, thus llvm cant unroll the loops here.
2025-02-03 22:00:57 +02:00
Nikita Sarychev
115716d109
HIP: Only call rocblas_initialize on rocblas versions with the multiple instantation bug (llama/11080)
...
This disables the workaround on rocblas fixed versions (>=4.0.0) to eliminate the runtime cost and unnecessary VRAM allocation of loading all tensile objects.
2025-02-03 22:00:57 +02:00
someone13574
b2cfef655b
cmake : don't fail on GGML_CPU=OFF
(llama/11457)
2025-02-03 22:00:57 +02:00
Akarshan Biswas
22e3df0afa
SYCL : SOFTMAX F16 mask support and other fixes (llama/11261)
...
Implemented ggml_sycl_op_soft_max() F16 src1(mask) support for which a pragma deprecation warning was added during #5021 .
To do this, had to decouple it from ggml_sycl_op_flatten which always considered src1 to be of fp32 type(many OP functions are dependent on it).
* SYCL: SOFTMAX F16 mask support and other fixes
* test-backend-ops: Add F16 mask test cases
2025-02-03 22:00:57 +02:00
Haus1
028511d349
AMD: parse the architecture as supplied by gcnArchName (llama/11244)
...
The value provided by minor doesn't include stepping for AMD, parse the value returned by gcnArchName instead to retrieve an accurate ID.
2025-02-03 22:00:57 +02:00
Ihar Hrachyshka
70c4038842
metal: Handle null returned from MTLCreateSystemDefaultDevice() (llama/11441)
...
This fixes segmentation fault error when running tests when no metal
devices are available (for example, when not linked with Core Graphics
framework or otherwise).
2025-02-03 22:00:57 +02:00
Georgi Gerganov
8639c003a9
metal : use residency sets (llama/11427)
...
* metal : use residency sets
ggml-ci
* metal : restore commandBufferWithUnretainedReferences calls [no ci]
* metal : release descriptors
ggml-ci
* metal : check env GGML_METAL_NO_RESIDENCY
ggml-ci
* metal : fix build + clean-up
ggml-ci
2025-02-03 22:00:57 +02:00
bandoti
d5d831da65
cmake: add ggml find package (llama/11369)
...
* Add initial ggml cmake package
* Add build numbers to ggml find-package
* Expand variables with GGML_ prefix
* Guard against adding to cache variable twice
* Add git to msys2 workflow
* Handle ggml-cpu-* variants
* Link ggml/ggml-base libraries to their targets
* Replace main-cmake-pkg with simple-cmake-pkg
* Interface features require c_std_90
* Fix typo
* Removed unnecessary bracket from status message
* Update examples/simple-cmake-pkg/README.md
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update examples/simple-cmake-pkg/README.md
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-02-03 22:00:57 +02:00
Jeff Bolz
7230a6e1c8
vulkan: compile shaders on-demand (llama/11406)
...
Reduce first-run startup time and memory consumption.
Should fix #11339 .
2025-02-03 22:00:57 +02:00
uvos
a160fa0f3a
Hip: disable VMM on hip as it seams that it dosent work in some configurations (llama/11420)
2025-02-03 22:00:57 +02:00
uvos
0282ad8fd1
hip : Add hipGraph and VMM support to ROCM (llama/11362)
...
* Add hipGraph support
* Enable VMM on rocm
2025-02-03 22:00:57 +02:00