f9b2dfdd8c
examples : fix deprecated FFmpeg functions ( #3073 )
...
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / android_java (push) Has been cancelled
CI / bindings-java (push) Has been cancelled
CI / quantize (push) Has been cancelled
CI / release (push) Has been cancelled
CI / coreml-base-en (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled
Examples WASM / deploy-wasm-github-pages (push) Has been cancelled
* Fix deprecated FFmpeg functions and free packet
* avcodec_free_context
2025-04-28 06:16:50 +02:00
50fda73f4c
ruby : add encoder begin callback related methods ( #3076 )
...
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / android_java (push) Has been cancelled
CI / quantize (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled
Examples WASM / deploy-wasm-github-pages (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / bindings-java (push) Has been cancelled
CI / release (push) Has been cancelled
CI / coreml-base-en (push) Has been cancelled
* Lazy run TestBase.whisper
* Fix indentation
* Remove disused GGML_HIP_UMA from Ruby
* Add encoder_begin_callback
* Comment out existing abort mechanism
* Add test for encoder_begin_callback
* Add signatures for encoder_begin_callback related methods
* Update gem date
2025-04-26 04:33:11 +09:00
1c20f46887
ci : enable bindings java job ( #3070 )
...
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / android_java (push) Has been cancelled
CI / bindings-java (push) Has been cancelled
CI / quantize (push) Has been cancelled
CI / release (push) Has been cancelled
CI / coreml-base-en (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled
Examples WASM / deploy-wasm-github-pages (push) Has been cancelled
* ci : re-enable bindings-java (java) job
This commit re-enables the job previously name `java` which was
disabled in the build.yml file.
The motivation for this is that we recently fixed a few issue in the
java bindings and it should be possible to build them on windows.
Refs: https://github.com/ggerganov/whisper.cpp/pull/2949
Resolves: https://github.com/ggerganov/whisper.cpp/issues/2781
2025-04-25 14:56:06 +02:00
adaea088bc
ruby : add cmake option ( #0 )
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / android_java (push) Has been cancelled
CI / quantize (push) Has been cancelled
CI / release (push) Has been cancelled
CI / coreml-base-en (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled
Examples WASM / deploy-wasm-github-pages (push) Has been cancelled
2025-04-24 20:39:16 +03:00
6c0d843f9d
cuda : fix unused variable compile warning ( #0 )
...
ggml-ci
2025-04-24 20:39:16 +03:00
efb800557f
sync : ggml
...
ggml-ci
2025-04-24 20:39:16 +03:00
337becefb9
opencl : remove obsolete files (skip) (ggml/1200)
2025-04-24 20:39:16 +03:00
11ae30c19e
sync : ggml
2025-04-24 20:39:16 +03:00
88c3cecd43
opencl: split ggml-opencl.cl into multiple files and cleanup (llama/12886)
...
---------
Co-authored-by: Shangqing Gu <quic_shawngu@quicinc.com >
2025-04-24 20:39:16 +03:00
fe4acb33e3
ggml : fix trailing whitespaces (llama/0)
2025-04-24 20:39:16 +03:00
fd5a3e1bc6
CUDA: use switch statements in constexpr functions (llama/13095)
2025-04-24 20:39:16 +03:00
01e1600edd
metal : fix floating-point range of attention scores in FA kernels (llama/13090)
...
ggml-ci
2025-04-24 20:39:16 +03:00
cf3eb291ab
vulkan: matmul gcn tuning (llama/13016)
...
* tune matmul for gcn
* this one is more power efficient
* Update ggml/src/ggml-vulkan/ggml-vulkan.cpp
Co-authored-by: 0cc4m <picard12@live.de >
* disable this tune for the proprietary driver
---------
Co-authored-by: 0cc4m <picard12@live.de >
2025-04-24 20:39:16 +03:00
3d54b68ea7
CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID (llama/13014)
...
* CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID
* fix logic for RoPE support, CUDA graphs
2025-04-24 20:39:16 +03:00
11218294db
ggml : add SSE 4.2 and x64 base variant for CPUs without AVX (llama/12871)
...
* ggml : add SSE 4.2 variant for CPUs without AVX
* ggml : add x64 base ABI variant
2025-04-24 20:39:16 +03:00
33c89ade7d
SYCL: Add non-contiguous support in ROPE (llama/12993)
...
ggml-ci
2025-04-24 20:39:16 +03:00
27a56e7243
vulkan: support noncontiguous rms_norm (llama/13031)
2025-04-24 20:39:16 +03:00
f4ca3e2f9c
metal: add neg operator (llama/13029)
2025-04-24 20:39:16 +03:00
0287a5c51b
SYCL: Refactor and enable FP16 in binary broadcast OPs (llama/12975)
...
* SYCL: refactor move to a separate file
* Fix binbcast
* Remove duplicates
* fix include formatting
* fix typo
2025-04-24 20:39:16 +03:00
24d29c55df
rpc : add RPC_CMD_HELLO (llama/12955)
...
Add RPC_CMD_HELLO for getting the version of the protocol implemend by
the server. Follow the semantic versioning rules at https://semver.org
Hopefully this bring better user experience when we make breaking
changes at the protocol level and avoid issues like #12465
2025-04-24 20:39:16 +03:00
36019c35a3
graph : make FA compatible with MLA + add initial Metal kernels (llama/12953)
...
* graph : make mla compatible with FA
* metal : add exp FA kernels for DeepSeek models
ggml-ci
* llama : minor naming updates
ggml-ci
* ggml : disable FA for DS head sizes
* tests : add FA tests for MLA shapes
ggml-ci
2025-04-24 20:39:16 +03:00
4e936e2afa
ggml: Re-enable CUDA graphs in presence of CONT and DUP nodes (llama/12970)
2025-04-24 20:39:16 +03:00
314ce5981e
CANN: Add support for async operator submission (llama/12864)
...
Submit operators using asynchronous threads to improve performance.
Use the environment variable GGML_CANN_ASYNC_MODE to control whether
asynchronous submission is enabled. It is disabled by default.
Testing shows a 10%–20% performance improvement in scenarios with
small parameter sizes, especially in quantized models.
2025-04-24 20:39:16 +03:00
cb7642b0f5
opencl: fix incorrect local_size index in profiling log (llama/12868)
2025-04-24 20:39:16 +03:00
7db8f278f0
vulkan: enable coopmat2 FA gqa and split_k optimizations more often (llama/12931)
...
The grouped query attention optmization doesn't require a power of two ratio,
the only thing relying on it was the modulo operation written as bitwise &.
split_k need not depend on gqa_ratio - enable it any time there's only one
workgroup in the X dimension. The shader gets the split index from the x coord,
and multiple workgroups in the X dimension (pre-split) indicates a larger
FA operation that wouldn't need splitting.
2025-04-24 20:39:16 +03:00
be42a19eab
CANN: Add 310P operator support check (llama/12962)
2025-04-24 20:39:16 +03:00
b8755670ca
metal : add FA-vec kernels for head size 96 (llama/12952)
...
ggml-ci
2025-04-24 20:39:16 +03:00
483eecae62
CANN: Add x86 build ci (llama/12950)
...
* CANN: Add x86 build ci
* CANN: fix code format
2025-04-24 20:39:16 +03:00
43e3d25d93
CUDA/HIP: Share the same unified memory allocation logic. (llama/12934)
...
Replace compile-time `GGML_HIP_UMA` with environment variable `GGML_CUDA_ENABLE_UNIFIED_MEMORY`. This unifies the usage on NVIDIA and AMD GPUs, and allows a single binary to be shared between integrated and dedicated GPUs.
2025-04-24 20:39:16 +03:00
e1dbf9a42e
SYCL: Add ROPE vision kernel (llama/12887)
...
* SYCL: Add ROPE vision kernel
* Add comment about rope mode
2025-04-24 20:39:16 +03:00
ee0013865d
ggml : Add AVX512 implementation of GEMM - Q4_Kx8 (llama/12829)
...
* Add AVX512 implementation of GEMM - q4kx8
* Update changes to remove unnecessary whitespaces
2025-04-24 20:39:16 +03:00
32a407166b
CANN: Opt ROPE optimization (llama/12865)
...
* [CANN]Opt ROPE optimization
* [CANN]Codestyle adjustment
* [CANN]Fix the ROPE precision issue
* [CANN]codestyle fix
* [CANN]add rope unsupport case
Signed-off-by: noemotiovon <noemotiovon@gmail.com >
2025-04-24 20:39:16 +03:00
622f981853
CANN: Optimize CANN buffer pool memory management (llama/12875)
...
Multiple optional memory pools are provided for CANN, including VMM,
priority queue-based, and traditional memory pools.
1.When the memory pool is available and GGML_CANN_DISABLE_VMM_POOL
is not defined, the VMM pool is selected by default.
2.Otherwise, if GGML_CANN_ENABLE_BUF_PRIO_POOL is defined,
the priority queue-based memory pool is used.
3.If neither condition is met, the default memory pool is used.
2025-04-24 20:39:16 +03:00
d049d67065
SYCL: Fix im2col (llama/12910)
...
* SYCL: Fix im2col
* restore local workgroup size adjustments for large inputs
* restore format
2025-04-24 20:39:16 +03:00
877308838e
rpc : use ggml_context_ptr (llama/12938)
2025-04-24 20:39:16 +03:00
d87dfcf7c0
ggml : Depthwise 2D convolution (ggml/1152)
...
* ggml-cpu : kernels for faster depthwise 2D convolution
* fix compile: remove static after moving to ops.cpp
* add dilation for depthwise_conv_2d
* review: rename to ggml_conv_2d_dw_direct, remove redundant struct keywords, pass by ref, whitespace
* review: rename depthwise_conv_2d -> conv_2d_dw everywhere
2025-04-24 20:39:16 +03:00
915c14ef10
ggml: use _mm[512/256]_dpbusd[_avx]_epi32 to directly accumulate into the result register (llama/12773)
...
* ggml: use _mm[512/256]_dpbusd[_avx]_epi32 to directly accumulate into the result register
* simplifies the codebase by removing redundant functions
2025-04-24 20:39:16 +03:00
5d33d3c929
ggml: disable CUDA graphs for unsupported DUP and CONT node types (llama/12891)
...
Fixes #12798
2025-04-24 20:39:16 +03:00
751e42b21e
vulkan: use aligned loads for flash attention mask (llama/12853)
...
Rewrite the stride logic for the mask tensor in the FA shader to force the
stride to be aligned, to allow using more efficient loads.
2025-04-24 20:39:16 +03:00
e8ee32d12d
sycl: Support sycl_ext_oneapi_limited_graph (llama/12873)
...
The current usage of the SYCL-Graph extension checks for
the `sycl_ext_oneapi_graph` device aspect. However, it is also
possible to support `sycl_ext_oneapi_limied_graph` devices that
don't support update
2025-04-24 20:39:16 +03:00
e9ce285135
SYCL: Add fp16 type support to unary op kernels (llama/12788)
...
* SYCL: Add fp16 support to some elementwise OP kernels
* remove comment
ggml-ci
* Use static_cast directly
* remove not needed cast from tanh
* Use static cast and remove unneeded castings
* Adjust device_support_op for unary OPs
* Use cast_data and typed_data struct to deduplicate casting code
2025-04-24 20:39:16 +03:00
b942f451b6
ggml: fix compilation error s390x (llama/12848)
...
* ggml: fixes #12846 compilation error
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
Co-authored-by: Aleksei Nikiforov <aleksei.nikiforov@ibm.com >
* ggml: add documentation for code change
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
Co-authored-by: Aleksei Nikiforov <aleksei.nikiforov@ibm.com >
* ggml: refactor to type-cast and update documentation
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
Co-authored-by: Aleksei Nikiforov <aleksei.nikiforov@ibm.com >
* ggml: update documentation to provide full issue link
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
Co-authored-by: Aleksei Nikiforov <aleksei.nikiforov@ibm.com >
---------
Co-authored-by: Aleksei Nikiforov <aleksei.nikiforov@ibm.com >
2025-04-24 20:39:16 +03:00
e6410faf99
cpu: fix cpu backend's supports-op for GET_ROWS_BACK. fixes a fatal when running test-backend-ops with only the CPU backend (ggml/1190)
2025-04-24 20:39:16 +03:00
182df69384
CANN: Support more ops (llama/12841)
...
* [CANN]Support Opt LOG && MEAN && PAD_REFLECT_1D
* [CANN]Support COUNT_EQUAL && STEP && SGN
* [CANN]codestyle adjustment
* [CANN]codestyle adjustment
---------
Signed-off-by: noemotiovon <noemotiovon@gmail.com >
2025-04-24 20:39:16 +03:00
3bf9691dfd
Fixes #12823 (llama/12830)
...
* Including limits file on AIX
* Fixes #12823
2025-04-24 20:39:16 +03:00
ba444e9c23
ggml-cpu-impl.h: do not redefine bool on POWER9 (llama/12856)
...
error: unknown type name '_Bool'
2025-04-24 20:39:16 +03:00
c6caf8eef2
ggml-impl.h: fix build on POWER9 (llama/12855)
...
error: ISO C++17 does not allow 'register' storage class specifier
2025-04-24 20:39:16 +03:00
6cae79a1d7
CANN: Support Opt CONV_TRANSPOSE_1D and ELU (llama/12786)
...
* [CANN] Support ELU and CONV_TRANSPOSE_1D
* [CANN]Modification review comments
* [CANN]Modification review comments
* [CANN]name adjustment
* [CANN]remove lambda used in template
* [CANN]Use std::func instead of template
* [CANN]Modify the code according to the review comments
---------
Signed-off-by: noemotiovon <noemotiovon@gmail.com >
2025-04-24 20:39:16 +03:00
b9bfe0c693
vulkan: In coopmat2 mmq, load q4_k/q5_k scales through shared memory (llama/12833)
...
q4_k and q5_k had a lot of redundant global loads where the same 16B of
scale information is repeatedly loaded and decoded during each loop iteration.
This change restructures the loops to more explicitly iterate over whole
blocks in the outer loop (with unrolled inner loop) and to copy/decode the
scale data into shared memory once at the start of each outer loop. The copy
is pipelined so the scale load from global memory is relatively cheap.
This improves q4_k/q5_k model prompt processing performance by around 5-7%.
I briefly tried applying this to q6_k and q4_0, and it didn't help for q6_k
and hurt for q4_0.
The big "else" path in mul_mm_cm2.comp that had all the clamped/unclamped
variants isn't used as often as it originally was (e.g. due to the padded_N
change), so I trimmed it down to offset some of the new complexity of the
semi-manual loop unrolling.
2025-04-24 20:39:16 +03:00
1d50c6ac22
vulkan: Use fp16 for the flash attention P*V multiplication (llama/12783)
...
This is consistent with the ggml-cuda behavior and the mul_mat fallback.
2025-04-24 20:39:16 +03:00