William Tambellini
c98681e6d5
ggml : upgrade init_tensor API to return a ggml_status (llama/11854)
...
* Upgrade init_tensor API to return a ggml_status
To prepare for an 'abort-free' ggml
(ggml not to abort on OOMs but return a OOM status),
as agreeed with Diego in the ggml repo,
upgrade the init_tensor() and view_init() APIs
to return a ggml_status.
* misc fixes
---------
Co-authored-by: slaren <slarengh@gmail.com>
2025-03-08 15:13:01 +02:00
Rémy O
3bab804981
vulkan: add specific MMV kernels for IQ2 and IQ3 quants + optimizations (llama/11595)
...
* vulkan: implement specialized MMV kernels for IQ2 quantizations
* vulkan: add MMV kernels for IQ3 quants
* vulkan: Increase MMV batch size and unroll IQ LUT setup
* vulkan: fix init_iq_shmem for WG sizes larger than tables
* vulkan: common batch size for all I-quants
2025-03-08 15:13:01 +02:00
Johannes Gäßler
c927830a70
CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ (llama/12098)
2025-03-08 15:13:01 +02:00
Prashant Vithule
992b51b3d5
ggml: aarch64: implement SVE kernels for q2_k_q8_k vector dot (llama/12064)
...
* Added SVE Support for Q2_K Quantized Models
* Use 4-space indentation in the switch cases
* removed comments lines
* Remove the loop Retain the curly bracess for better understanding of code
* Remove the comment like added for q3_k_q8_k kernel
---------
Co-authored-by: vithulep <p.m.vithule1517@gmail.com>
2025-03-08 15:13:01 +02:00
hipudding
2c882cbe4c
CANN: Fix build error with GCC 13 (llama/11990)
...
Remove unused header file that causes compilation failure on ARM
platform with GCC 13.
2025-03-08 15:13:01 +02:00
Eve
1fbb119b1e
vulkan: matmul dequantization improvements (llama/12015)
...
* faster dequant for old quants
* dont use unpack for iq4_nl
* vec2 unpack for q8
2025-03-08 15:13:01 +02:00
Daniele
40dea850fd
vulkan: improve im2col (llama/11826)
...
* vulkan: improve im2col performance
2025-03-08 15:13:01 +02:00
Vladimir Vuksanovic
8255a830a8
cmake: Fix ggml backend dependencies and installation (llama/11818)
...
* Fix dependencies between ggml and backends
ggml backends link only to ggml-base and ggml links to all backends.
* Fix installation of ggml backends
Set up GNUInstallDirs before setting the installation directory of ggml backends
2025-03-08 15:13:01 +02:00
Jeff Bolz
a0f76b2da7
vulkan: fix assertion when qy_needs_dequant (llama/12068)
...
Looks like a copy/paste bug from qx_needs_dequant.
2025-03-08 15:13:01 +02:00
Molly Sophia
394768c48b
ggml-cpu: Fix build with sve (llama/12059)
...
* ggml-cpu: Fix build with sve
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* ggml-cpu: Remove unused variable in sve q3_k vec dot
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2025-03-08 15:13:01 +02:00
cmdr2
846e01b2c0
cuda: unary ops as float + de-duplicate (ggml/1130)
2025-03-08 15:13:01 +02:00
cmdr2
6ac8e6b2ce
cuda/vulkan: specify fp32-only support for some operations in supports_op (ggml/1129)
...
* cuda: restrict SILU_BACK to fp32, since fp16 exceeds the desired test threshold
* vulkan: specify fp32-only support for certain ops (that are now tested for fp16 as well)
* f32 sigmoid in vulkan supports op
* Revert "f32 sigmoid in vulkan supports op"
This reverts commit c6f04b3c19bf4504c2776149c6d8cd84e0b48acb.
2025-03-08 15:13:01 +02:00
cmdr2
60d2ddebdf
cuda/cpu: Increase support for fp16 unary operations (ggml/1125)
...
* Support fp16 unary operations in the CUDA backend
* cpu: increase fp16 support for unary operators in the CPU backend
* cuda: increase fp16 support for unary operators in the CUDA backend
* Add test cases for fp16 unary operators
* metal: update supports_op for unary operators that don't support fp16, to prevent test-backend-ops from failing
* metal: fix PR comments for unary op support after fp16 unary tests
2025-03-08 15:13:01 +02:00
petterreinholdtsen
2e180184a8
Told cmake to install ggml-cpp.h as a public header file. (ggml/1126)
...
It is used by Whisper talk-llama example.
Co-authored-by: Petter Reinholdtsen <pere@debian.org>
2025-03-08 15:13:01 +02:00
Ivy233
ef40950c4a
common : more general m_audio_len update logic ( #2855 )
...
Co-authored-by: Ivy233 <wangjinrun@uniontech.com>
2025-03-07 10:10:03 +02:00
Ryan Johnson
c774eec709
go : improve model download ( #2756 )
...
* Updated models download URL
* Updated list of models available
All of the high efficiency quantized models are rejected when trying to download. They exist on the server. Let's allow them.
* added path prefix for whisper-cli in message to user. The message is misleading if this script is called from another script in a different folder. So the message has to be fixed.
* undid download URL change I made earlier. Fixed filepath.Join(urlPath, model) bug.
* Undid download URL change I made earlier.
Seems that the old URL works but only when provided a model to download. Still doesn't explain why there's a different download URL that also works. Please elucidate in docs.
* Fixed URLForModel Function's bug
filepath.Join is designed for filesystem paths, and it uses backslashes (\) on Windows. URLs, however, require forward slashes (/), so the use of filepath.Join is inappropriate for constructing URLs.
The fmt.Sprintf function ensures that forward slashes are used.
* Fixed URL trailing / double slash bug
Ensure no double slash by trimming trailing '/' from srcUrl if present
* Fixed bad download URL, missing ggml prefix
Not sure if that was a bug I introduced but it was trying to download without the prefix.
* Added question before downloading all models. Added download size estimate
HEAD Requests:
Efficiently fetches file sizes without downloading the content.
Interactive Workflow:
Allows the user to make informed decisions about downloading all models.
Safe Defaults:
Aborts if the user does not explicitly confirm.
* Fixed Unbuffered channel warning.
warning in context.go : misuse of unbuffered os.Signal channel as argument to signal.
The warning indicates that the unbuffered channel used in signal.Notify in context.go may be misused. In Go, unbuffered channels can cause potential deadlocks if signals are sent faster than they are received.
* Fixed download size calculation, download URL prefix bug, added link to models URL for user.
The URL formatter was prepending the model name to the formatted model name in the URL
* Added logs and exes to gitignore
* Delete bindings/go/examples/go-model-download/go-model-download.exe
* Delete whisper_build.log
2025-03-07 10:03:51 +02:00
Dmitry Atamanov
5b481a27a6
common : fix audio loading by miniaudio ( #2862 )
CI / ubuntu-22-clang (linux/amd64, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/amd64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / quantize (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled
2025-03-04 19:05:21 +02:00
Lin Xiaodong
fc7b1ee521
fix: missing include common-whisper ( #2858 )
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / quantize (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled
Examples Tests / addon_node-ubuntu-22 (16.x) (push) Has been cancelled
Examples Tests / addon_node-ubuntu-22 (18.x) (push) Has been cancelled
2025-03-02 20:55:11 +02:00
KITAITI Makoto
c42f67e2d2
ruby : follow audio library change ( #2851 )
...
CI / ubuntu-22-clang (linux/amd64, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/amd64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / quantize (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled
* Enable CPU
* Follow audio lib change
2025-02-28 08:09:02 +02:00
Diego Devesa
339a1cba5d
whisper : support GGML_BACKEND_DL ( #2843 )
...
CI / ubuntu-22-clang (linux/amd64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/amd64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Waiting to run
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
* whisper : support GGML_BACKEND_DL
* fix DTW crash
* whisper.objc : fix build - add ggml-cpp.h
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-02-27 13:35:07 +01:00
Georgi Gerganov
c64f3e8ada
common : separate whisper sources ( #2846 )
...
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Waiting to run
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
Examples Tests / addon_node-ubuntu-22 (16.x) (push) Has been cancelled
Examples Tests / addon_node-ubuntu-22 (18.x) (push) Has been cancelled
* common : separate whisper sources
* examples : add chrono
* examples : add more headers
2025-02-27 12:50:32 +02:00
Georgi Gerganov
9f83f67221
common : fix build min/max ( #2845 )
...
* common : try to fix build
* cont : try another fix
2025-02-27 10:39:13 +02:00
Dmitry Atamanov
7d3da68f79
examples : use miniaudio for direct decoding flac, mp3, ogg and wav ( #2759 )
2025-02-27 09:06:54 +02:00
petterreinholdtsen
b5d21359c1
stream : stop on ^C when no audio is received ( #2822 )
...
Add check for ctrl-c in potentially endless loop while calling audio.get()
to receive sound.
Co-authored-by: Petter Reinholdtsen <pere@debian.org>
2025-02-27 08:59:51 +02:00
Georgi Gerganov
17addf7104
sync : ggml
2025-02-27 08:55:36 +02:00
cmdr2
cdaee8b4bd
Support pure float16 add/sub/mul/div operations in the CUDA (and CPU) backend (ggml/1121)
...
* Support float16-to-float16 add/sub/mul/div operations in the CUDA backend
* Add fp16 support for add/sub/mul/div on the CPU backend
* Add test cases for fp16 add/sub/mul/div
2025-02-27 08:55:36 +02:00
Gian-Carlo Pascutto
4b60ff4f92
metal : copy kernels for quant to F32/F16 conversions (llama/12017)
...
metal: use dequantize_q templates
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-02-27 08:55:36 +02:00
lhez
b43b9d928c
opencl: fix for small models (llama/11950)
...
* opencl: fix small shape gemv, remove unused extensions
* opencl: fix `transpose_16`, `dump_tensor`, enforce subgroup size
* opencl: fix for token length < 4
* opencl: use wave size of 64 for all Adreno GPUs
---------
Co-authored-by: Shawn Gu <quic_shawngu@quicinc.com>
Co-authored-by: Skyler Szot <quic_sszot@quicinc.com>
2025-02-27 08:55:36 +02:00
Neo Zhang Jianyu
e3cb412a59
Optimize mul_mat for Q4_0 on Intel GPU (llama/12035)
...
* opt performance by reorder for Intel GPU
* detect hw type and save opt feature, and print opt feature
* correct name
* support optimize graph once when compute graph, record the opt status in tensor->extra, make CI passed
* add env variable GGML_SYCL_DISABLE_OPT for debug
* use syclex::architecture replace the custom hw define, update the guide for GGML_SYCL_DISABLE_OPT
* add performance data
* mv getrows functions to separeted files
* fix global variables
---------
Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
2025-02-27 08:55:36 +02:00
Akarshan Biswas
ac301a7d9b
SYCL: Fix GGML_SYCL_DEBUG macro (llama/11995)
2025-02-27 08:55:36 +02:00
Aaron Teo
82e04e7670
ggml-cpu: Support s390x SIMD Instruction Set (llama/12019)
...
* ggml: add s390x ARCH_FLAGS for compilation
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: add SIMD for s390x using vector intrinsics
SIMD is activated for:
* ggml_vec_dot_f32
* ggml_vec_dot_f16
* ggml_vec_mad_f32
* ggml_vec_mad_f16
* ggml_vec_mad_f32_unroll
* ggml_vec_scale_f32
* ggml_vec_scale_f16
SIMD is NOT activated for:
* ggml_vec_dot_f16_unroll (pending bugfix)
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: fix missing escape character in GGML_F32x4_REDUCE
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: add temporary patch for GGML_F32_ARR and GGML_F16_ARR
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: fix s390x GGML_F32x4_REDUCE
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: full SIMD activation for F32,F16 s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: add option to disable s390x VXE/VXE2
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: change vecintrin.h include to ggml-cpu-impl
* add __VXE__ and __VXE2__ macros
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* cmake: add s390x target detection for VX/VXE/VXE2
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: move s390x vector intrinsics to ggml-cpu-impl.h
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: s390x Q8_0 SIMD
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: correct documentation for Q8_0
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: s390x reduce code complexity Q8_0
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: s390x bugfix typo Q8_0
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: s390x SIMD activated for Q4_1
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: s390x inline vec_reve
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: s390x SIMD activation for Q4_0
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: add VXE backend feature
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: remove test.py
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: s390x SIMD activation for quantize_row_q8_0
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: s390x SIMD activation for quantize_row_q8_1
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: s390x SIMD activation for iq4_xs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: bugfix iq4_xs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: s390x SIMD activation for iq4_nl
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: add float, double, and long vector data type
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: clean up iq4_xs SIMD
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: fix improper use of restrict keyword
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: update warning message for ggml_vec_tbl
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: untested implementation of ggml_vec_dot_iq2_xxs_q8_K
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: update ggml_vec_dot_q4_1_q8_1 to use typedefs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: switch to restrict for iq4_nl
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: slight dot product speed improvement for q4_1_q8_1
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: s390x SIMD activation for q6_K
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: add missing `_t` to ggml_int8x16x4_t
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: fix missing `_t` for ggml_vec_xl_s8x4
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: fix more missing `_t`
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: add unroll and prefetch to Q8_0
increase of 3.86% for prompt processing and 32.22% for token generation
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: patch Q8_0 to use proper vector sizes
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: optimise Q8_0 dot prod compute kernel further
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: add unroll and prefetch to Q4_1
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: refactor Q6_K variable naming for readability
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: fix Q6_K typos
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: s390x SIMD activation for Q5_K
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: fix wrong char*x16_t naming
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: Q5_K y0 wrong signness
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: fix Q5_K invalid uchar type
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: fix Q5_K invalid uchar type
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: s390x SIMD activation for Q4_K
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: fix Q4_K invalid vector intrinsics
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: simplify ggml_padd_s16 compute kernel
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: correct ggml-cpu vxe wording
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: change ggml_aligned_malloc alignment to 256
256 is the cache line size for s390x platforms
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: resolve pr merge via cherry-pick 225bbbf
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml : fix LoongArch compile error with 128-bit SIMD (llama/11701)
* ggml: resolve pr merge via cherry-pick 4571953
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: cmake remove fork when determining s390x machine type
thank you @ericcurtin
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Co-authored-by: Jinyang He <hejinyang@loongson.cn>
Co-authored-by: junchao-zhao <68935141+junchao-loongson@users.noreply.github.com>
2025-02-27 08:55:36 +02:00
Johannes Gäßler
38ac47cd4d
CUDA: app option to compile without FlashAttention (llama/12025)
2025-02-27 08:55:36 +02:00
Johannes Gäßler
2d70cd36d7
CUDA: optimize FA for GQA + large batches (llama/12014)
2025-02-27 08:55:36 +02:00
Gian-Carlo Pascutto
98dab49b9a
cuda: Add Q5_1, Q5_0, Q4_1 and Q4_0 to F32 conversion support. (llama/12000)
2025-02-27 08:55:36 +02:00
PureJourney
b1385e9aa9
CUDA: correct the lowest Maxwell supported by CUDA 12 (llama/11984)
...
* CUDA: correct the lowest Maxwell supported by CUDA 12
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-02-27 08:55:36 +02:00
Bodhi
48f5e893f5
MUSA: support ARM64 and enable dp4a .etc (llama/11843)
...
* MUSA: support ARM64 and enable __dp4a .etc
* fix cross entropy loss op for musa
* update
* add cc info log for musa
* add comment for the MUSA .cc calculation block
---------
Co-authored-by: Bodhi Hu <huaishun.hu@mthreads.com>
2025-02-27 08:55:36 +02:00
Charles Xu
dc21871fcb
ggml-cpu: Add CPU backend support for KleidiAI library (llama/11390)
...
* ggml-cpu: Add CPU backend support for KleidiAI library
* Add environmental variable GGML_KLEIDIAI_SME
* Add support for multithread LHS conversion
* Switch kernel selection order to dotprod and i8mm
* updates for review comments
* More updates for review comments
* Reorganize and rename KleidiAI files
* Move ggml-cpu-traits.h to source file
* Update cmake for SME build and add alignment for SME
* Remove append GGML_USE_CPU_KLEIDIAI to the GGML_CDEF_PUBLIC list
2025-02-27 08:55:36 +02:00
Prashant Vithule
64a430bc81
ggml: aarch64: implement SVE kernels for q3_K_q8_K vector dot (llama/11917)
...
* Added SVE Implementation for Q3_K Kernel in ggml-cpu-quants.c file
* Improved Formating of code in ggml-cpu-quants.c file
* style : minor fixes
* style : less whitespaces
* style : ptr spaceing
---------
Co-authored-by: vithulep <p.m.vithule1517@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-02-27 08:55:36 +02:00
Johannes Gäßler
51a3580c79
CUDA: use async data loading for FlashAttention (llama/11894)
...
* CUDA: use async data loading for FlashAttention
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-02-27 08:55:36 +02:00
Rémy O
37a21dd43d
vulkan: implement several ops relevant for ggml_opt (llama/11769)
...
* vulkan: support memset_tensor
* vulkan: support GGML_OP_SUM
* vulkan: implement GGML_OP_ARGMAX
* vulkan: implement GGML_OP_SUB
* vulkan: implement GGML_OP_COUNT_EQUAL
* vulkan: implement GGML_OP_OPT_STEP_ADAMW
* vulkan: fix check_results RWKV_WKV6 crash and memory leaks
* vulkan: implement GGML_OP_REPEAT_BACK
* tests: remove invalid test-backend-ops REPEAT_BACK tests
* vulkan: fix COUNT_EQUAL memset using a fillBuffer command
2025-02-27 08:55:36 +02:00
Jeff Bolz
8a22a8b17f
vulkan: support multi/vision rope, and noncontiguous rope (llama/11902)
2025-02-27 08:55:36 +02:00
Hale Chan
fcbcad0c90
metal : fix the crash caused by the lack of residency set support on Intel Macs. (llama/11904)
2025-02-27 08:55:36 +02:00
Adrian Kretz
4444db7360
metal : optimize dequant q6_K kernel (llama/11892)
2025-02-27 08:55:36 +02:00
Georgi Gerganov
a7fc1038ca
repo : update links to new url (llama/11886)
...
* repo : update links to new url
ggml-ci
* cont : more urls
ggml-ci
2025-02-27 08:55:36 +02:00
Rémy O
1689aaf854
vulkan: initial support for IQ1_S and IQ1_M quantizations (llama/11528)
...
* vulkan: initial support for IQ1_S and IQ1_M quantizations
* vulkan: define MMV kernels for IQ1 quantizations
* devops: increase timeout of Vulkan tests again
* vulkan: simplify ifdef for init_iq_shmem
2025-02-27 08:55:36 +02:00
lhez
4b48fe449a
opencl: Fix rope and softmax (llama/11833)
...
* opencl: fix `ROPE`
* opencl: fix `SOFT_MAX`
* Add fp16 variant
* opencl: enforce subgroup size for `soft_max`
2025-02-27 08:55:36 +02:00
Diego Devesa
47cc043e69
cuda : add ampere to the list of default architectures (llama/11870)
2025-02-27 08:55:36 +02:00
Jinyang He
e3d9ffb98b
ggml: optimize some vec dot functions for LoongArch ASX (llama/11842)
...
* Optimize ggml_vec_dot_q3_K_q8_K for LoongArch ASX
* Optimize ggml_vec_dot_q4_K_q8_K for LoongArch ASX
* Optimize ggml_vec_dot_q6_K_q8_K for LoongArch ASX
* Optimize ggml_vec_dot_q5_K_q8_K for LoongArch ASX
* Optimize ggml_vec_dot_q2_K_q8_K for LoongArch ASX
* Optimize mul_sum_i8_pairs_float for LoongArch ASX
* Optimize ggml_vec_dot_iq4_xs_q8_K for LoongArch ASX
2025-02-27 08:55:36 +02:00
Eve
e22d69839d
vulkan: linux builds + small subgroup size fixes (llama/11767)
...
* mm subgroup size
* upload vulkan x86 builds
2025-02-27 08:55:36 +02:00
Jeffrey Morgan
defe731263
llamafile: use member variable instead of constant for iq4nlt (llama/11780)
2025-02-27 08:55:36 +02:00