Jeff Bolz
7c165d7fa8
vulkan: use smaller combined allocations to avoid fragmentation (llama/11551)
2025-02-27 08:55:36 +02:00
Charles Duffy
2f0cf44915
metal : avoid breaking build when metal API predates TARGET_OS_VISION (llama/11690)
...
Avoids breakage in nix flake build introduced by b0569130c5e9c671152c913d82803b7c2f014ff9
2025-02-27 08:55:36 +02:00
Georgi Gerganov
b9c972fd0d
metal : adjust support conditions for norm operators (llama/11671)
...
cont #11659
ggml-ci
2025-02-27 08:55:36 +02:00
Johannes Gäßler
01c9aafbfd
CUDA: support for mat. mul. with ne03 != ne13 (llama/11656)
2025-02-27 08:55:36 +02:00
Johannes Gäßler
bae6bbf487
CUDA: non-contiguous (RMS) norm support (llama/11659)
...
* CUDA: non-contiguous (RMS) norm support
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-02-27 08:55:36 +02:00
fxzjshm
c310272fa0
HIP: force max threads per block to be 1024 (llama/11621)
...
Some old/vendor forked version of llvm still use 256. Explicitly set it to 1024 to align with upstream llvm.
Signed-off-by: fxzjshm <fxzjshm@163.com>
2025-02-27 08:55:36 +02:00
Jhen-Jie Hong
bd0b55dbe0
metal : use residency set for other platforms (llama/11648)
2025-02-27 08:55:36 +02:00
Patrick Peng
ba4645db2c
rpc: fix known RCE in rpc-server (ggml/1103)
...
Add bounds checking in `rpc_server::copy_tensor` to prevent out-of-bounds writes
+ Check if `(uint8_t *)dst->data + ggml_nbytes(src)` remains within the destination buffer’s allocated region.
2025-02-27 08:55:36 +02:00
masahji
dfc6ca62f3
stream : add beam size parameter( #2836 )
...
CI / ubuntu-22-clang (linux/amd64, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/amd64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / quantize (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled
* feat: Add beam size parameter to stream.cpp for beam search configuration
* feat: Add beam size parameter to whisper full params in stream example
* fix: Remove duplicate beam search size assignment in server.cpp
2025-02-25 11:39:33 +02:00
Thomas Fitzsimmons
47e14c0529
whisper : restore big endian support ( #2816 )
...
* whisper : fix BYTESWAP whitespace
* whisper : make byteswap useable with C++17
* cmake : define WHISPER_BIG_ENDIAN for big-endian targets
* ci : fix (again) arm64 build fails
* docker : attempt fixing arm64 build on ci
* qemu v7.0.0-28
[imported from
https://github.com/ggml-org/llama.cpp
/commit/818a340ea8be55b3706e1772527cb8738e90a8c7
(#11895 )]
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-02-25 11:38:13 +02:00
Judd
d682e15090
Fixes for Windows ( #2790 )
...
CI / ubuntu-22-clang (linux/amd64, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/amd64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / quantize (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled
Fixes for Windows:
* MSVC default to utf-8 without BOM.
* Console output code page changed to utf-8.
---------
Co-authored-by: Judd <foldl@boxvest.com>
2025-02-06 15:37:21 +08:00
midnight
46d07b9c85
cmake : fix compile assumptions for power9/etc ( #2777 )
...
CI / ubuntu-22-clang (linux/amd64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/amd64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Waiting to run
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
* Add small comment re: VSX to readme
Co-authored-by: midnight <midnight@example.com>
2025-02-05 14:41:10 +02:00
Georgi Gerganov
33ea03f131
authors : update
CI / ubuntu-22-clang (linux/amd64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/amd64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Waiting to run
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
2025-02-04 13:03:40 +02:00
Georgi Gerganov
dbcc669e1a
sync : ggml
2025-02-04 13:03:09 +02:00
Christian Kastner
16245b35e4
cmake: Add ability to pass in GGML_BUILD_NUMBER (ggml/1096)
...
This makes git as a dependency optional, and is useful in the case where
ggml is built not from git, but from a tarball, or a distribution source
package.
This conditional also affects GGML_BUILD_COMMIT. Nothing seems to be
using it, though, so there doesn't seem much value factor it out, or
even require it.
2025-02-04 13:03:03 +02:00
Georgi Gerganov
898c0cb9d1
readme : add maintenance roadmap
CI / ubuntu-22-clang (linux/amd64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/amd64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Waiting to run
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
2025-02-04 10:50:10 +02:00
Georgi Gerganov
eb9e5032c4
ci : add stalebot
2025-02-04 09:30:20 +02:00
billyct
cadfc50eab
node : add max_len params in node addon ( #2760 )
CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Waiting to run
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
Bindings Tests (Ruby) / ubuntu-22 (push) Has been cancelled
Examples Tests / addon_node-ubuntu-22 (16.x) (push) Has been cancelled
Examples Tests / addon_node-ubuntu-22 (18.x) (push) Has been cancelled
2025-02-03 22:49:06 +02:00
Georgi Gerganov
3f91832352
talk-llama : sync llama.cpp
2025-02-03 22:42:26 +02:00
mgrachten
cff8868b5f
coreml : always convert to "neuralnetwork" ( #2770 )
2025-02-03 22:36:32 +02:00
Georgi Gerganov
90e3c5fc40
ci : more git
2025-02-03 22:00:57 +02:00
Georgi Gerganov
e0f4cef867
ci : install git
2025-02-03 22:00:57 +02:00
Georgi Gerganov
234460987e
ci : use ubuntu-22.04 instead of ubuntu-latest
2025-02-03 22:00:57 +02:00
Georgi Gerganov
b8ab126343
cmake : sync cmake scripts
2025-02-03 22:00:57 +02:00
Georgi Gerganov
edc5d9267c
sync : ggml
2025-02-03 22:00:57 +02:00
Georgi Gerganov
344b98a44f
scripts : fix sync paths
2025-02-03 22:00:57 +02:00
Johannes Gäßler
dbeb7916b8
CUDA: fix Volta FlashAttention logic (llama/11615)
2025-02-03 22:00:57 +02:00
Johannes Gäßler
fad2806352
HIP: fix flash_attn_stream_k_fixup warning (llama/11604)
2025-02-03 22:00:57 +02:00
uvos
9906792ec3
CUDA/HIP: add support for selectable warp size to mmv (llama/11519)
...
CUDA/HIP: add support for selectable warp size to mmv
2025-02-03 22:00:57 +02:00
uvos
c49ee07ff4
HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (llama/11601)
...
This fixes a bug where RDNA1 gpus other than gfx1010 where not handled correctly
2025-02-03 22:00:57 +02:00
Johannes Gäßler
f8a831779e
CUDA: use mma PTX instructions for FlashAttention (llama/11583)
...
* CUDA: use mma PTX instructions for FlashAttention
* __shfl_sync workaround for movmatrix
* add __shfl_sync to HIP
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-02-03 22:00:57 +02:00
Olivier Chafik
85451e3612
ci
: use sccache on windows instead of ccache (llama/11545)
...
* Use sccache on ci for windows
* Detect sccache in cmake
2025-02-03 22:00:57 +02:00
uvos
43c744ce8b
HIP: require at least HIP 5.5
2025-02-03 22:00:57 +02:00
uvos
fc2e44490d
HIP: Prepare reduction operators for wave 64
2025-02-03 22:00:57 +02:00
uvos
f41fdad200
CUDA/HIP: add warp_size to cuda_device_info
2025-02-03 22:00:57 +02:00
Rémy Oudompheng
80fa576254
vulkan: implement initial support for IQ2 and IQ3 quantizations (llama/11360)
...
* vulkan: initial support for IQ3_S
* vulkan: initial support for IQ3_XXS
* vulkan: initial support for IQ2_XXS
* vulkan: initial support for IQ2_XS
* vulkan: optimize Q3_K by removing branches
* vulkan: implement dequantize variants for coopmat2
* vulkan: initial support for IQ2_S
* vulkan: vertically realign code
* port failing dequant callbacks from mul_mm
* Fix array length mismatches
* vulkan: avoid using workgroup size before it is referenced
* tests: increase timeout for Vulkan llvmpipe backend
---------
Co-authored-by: Jeff Bolz <jbolz@nvidia.com>
2025-02-03 22:00:57 +02:00
Jeff Bolz
75e7d0585e
vulkan: Catch pipeline creation failure and print an error message (llama/11436)
...
* vulkan: Catch pipeline creation failure and print an error message
Also, fix some warnings from my on-demand compile change.
* vulkan: fix pipeline creation logging
2025-02-03 22:00:57 +02:00
uvos
682a6f5f87
HIP: Supress transformation warning in softmax.cu
...
loops with bounds not known at compile time can not be unrolled.
when ncols_template == 0, the bounds of the loop are not constexpr, thus llvm cant unroll the loops here.
2025-02-03 22:00:57 +02:00
Nikita Sarychev
115716d109
HIP: Only call rocblas_initialize on rocblas versions with the multiple instantation bug (llama/11080)
...
This disables the workaround on rocblas fixed versions (>=4.0.0) to eliminate the runtime cost and unnecessary VRAM allocation of loading all tensile objects.
2025-02-03 22:00:57 +02:00
someone13574
b2cfef655b
cmake : don't fail on GGML_CPU=OFF
(llama/11457)
2025-02-03 22:00:57 +02:00
Akarshan Biswas
22e3df0afa
SYCL : SOFTMAX F16 mask support and other fixes (llama/11261)
...
Implemented ggml_sycl_op_soft_max() F16 src1(mask) support for which a pragma deprecation warning was added during #5021 .
To do this, had to decouple it from ggml_sycl_op_flatten which always considered src1 to be of fp32 type(many OP functions are dependent on it).
* SYCL: SOFTMAX F16 mask support and other fixes
* test-backend-ops: Add F16 mask test cases
2025-02-03 22:00:57 +02:00
Haus1
028511d349
AMD: parse the architecture as supplied by gcnArchName (llama/11244)
...
The value provided by minor doesn't include stepping for AMD, parse the value returned by gcnArchName instead to retrieve an accurate ID.
2025-02-03 22:00:57 +02:00
Ihar Hrachyshka
70c4038842
metal: Handle null returned from MTLCreateSystemDefaultDevice() (llama/11441)
...
This fixes segmentation fault error when running tests when no metal
devices are available (for example, when not linked with Core Graphics
framework or otherwise).
2025-02-03 22:00:57 +02:00
Georgi Gerganov
8639c003a9
metal : use residency sets (llama/11427)
...
* metal : use residency sets
ggml-ci
* metal : restore commandBufferWithUnretainedReferences calls [no ci]
* metal : release descriptors
ggml-ci
* metal : check env GGML_METAL_NO_RESIDENCY
ggml-ci
* metal : fix build + clean-up
ggml-ci
2025-02-03 22:00:57 +02:00
bandoti
d5d831da65
cmake: add ggml find package (llama/11369)
...
* Add initial ggml cmake package
* Add build numbers to ggml find-package
* Expand variables with GGML_ prefix
* Guard against adding to cache variable twice
* Add git to msys2 workflow
* Handle ggml-cpu-* variants
* Link ggml/ggml-base libraries to their targets
* Replace main-cmake-pkg with simple-cmake-pkg
* Interface features require c_std_90
* Fix typo
* Removed unnecessary bracket from status message
* Update examples/simple-cmake-pkg/README.md
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update examples/simple-cmake-pkg/README.md
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-02-03 22:00:57 +02:00
Jeff Bolz
7230a6e1c8
vulkan: compile shaders on-demand (llama/11406)
...
Reduce first-run startup time and memory consumption.
Should fix #11339 .
2025-02-03 22:00:57 +02:00
uvos
a160fa0f3a
Hip: disable VMM on hip as it seams that it dosent work in some configurations (llama/11420)
2025-02-03 22:00:57 +02:00
uvos
0282ad8fd1
hip : Add hipGraph and VMM support to ROCM (llama/11362)
...
* Add hipGraph support
* Enable VMM on rocm
2025-02-03 22:00:57 +02:00
Johannes Gäßler
9e467815d4
CUDA: fix FP16 cuBLAS GEMM (llama/11396)
2025-02-03 22:00:57 +02:00
uvos
727891d9bf
rocBLAS: Avoid fp32->fp16->fp32 conversion on cdna (llama/11356)
2025-02-03 22:00:57 +02:00