whisper.cpp/ggml/src
Jeff Bolz bbb292ed38 vulkan: further optimize mul_mat_vec using larger loads (llama/10387)
* vulkan: Use pipeline_robustness to disable robustness in mul_mat_vec.

Add some early returns for nonexistent rows in mul_mat_vec shaders. These
can only be hit when dispatching a 2D grid of workgroups. Fix the logic
for the 2D grid of workgroups to round up.

Enable the pipeline robustness extension if it's available, and use it to
disable robustness for these pipelines. The instructions to do the bounds
checking contend for the same ALU resources as the bit twiddling dequant
instructions.

* vulkan: Add GLSL structure aliases for quant types to allow larger loads

In Vulkan it's not possible to cast pointer types, so instead you have to
declare an aliased binding for the memory with a different type. This
commit adds aliases for the quant formats using 16b ints, and in a few
places where the struct size is a multiple of 4 also using 32b ints.
Currently only q4_k's aliases are used, but others will be used in
subsequent commits.

* vulkan: use larger loads in q5_k and q6_k shaders.

Similar to the optimization I did in q4_k recently, this vectorizes some loads
and reduces the number of bit twiddling instructions.

* vulkan: use larger K step per iteration in mul_mat_vec.

Add vec4 dequantization functions, and use them to do K=8 per iteration in
mul_mat_vec. This uses 16b loads for the quant values and 128b loads for B
which helps reduce the load on the memory system.

The K_PER_ITER==2 logic is still there, just for F16/F32, and really only
because they support unaligned sizes.

Tweak the num_iters/unrolling logic to be simpler and catch a couple missed
unrolling opportunities.
2024-12-08 20:14:35 +02:00
..
ggml-amx ggml : adapt AMX to tensor->grad removal (llama/0) 2024-11-20 21:00:08 +02:00
ggml-blas cuda : fix CUDA_FLAGS not being applied (llama/10403) 2024-11-20 21:00:08 +02:00
ggml-cann ggml : build backends as libraries (llama/10256) 2024-11-20 21:00:08 +02:00
ggml-cpu add cmake rvv support (llama/10411) 2024-12-08 20:14:35 +02:00
ggml-cuda CUDA: remove unnecessary warp reduce in FA (ggml/1032) 2024-12-08 20:14:35 +02:00
ggml-hip CUDA: remove DMMV, consolidate F16 mult mat vec (llama/10318) 2024-11-20 21:00:08 +02:00
ggml-kompute ggml : build backends as libraries (llama/10256) 2024-11-20 21:00:08 +02:00
ggml-metal feat: add GGML_UNARY_OP_ARGMAX Metal kernel (ggml/1019) 2024-12-08 20:14:35 +02:00
ggml-musa ggml : sync resolve (skip) (#0) 2024-11-20 21:00:08 +02:00
ggml-rpc ggml : build backends as libraries (llama/10256) 2024-11-20 21:00:08 +02:00
ggml-sycl sycl : Add option to set the SYCL architecture for all targets (llama/10266) 2024-11-20 21:00:08 +02:00
ggml-vulkan vulkan: further optimize mul_mat_vec using larger loads (llama/10387) 2024-12-08 20:14:35 +02:00
CMakeLists.txt Add required ggml-base and backend libs to cmake pkg (llama/10407) 2024-11-20 21:00:08 +02:00
ggml-aarch64.c ggml : optimize Q4_0 into Q4_0_X_Y repack (llama/10324) 2024-11-20 21:00:08 +02:00
ggml-aarch64.h ggml : build backends as libraries (llama/10256) 2024-11-20 21:00:08 +02:00
ggml-alloc.c ggml: new optimization interface (ggml/988) 2024-11-20 21:00:08 +02:00
ggml-backend-impl.h llama : refactor model loader with backend registry (llama/10026) 2024-11-15 15:21:04 +02:00
ggml-backend-reg.cpp ggml : build backends as libraries (llama/10256) 2024-11-20 21:00:08 +02:00
ggml-backend.cpp ggml-opt: fix data corruption (ggml/1022) 2024-12-08 20:14:35 +02:00
ggml-common.h ggml-quants : ternary packing for TriLMs and BitNet b1.58 (llama/8151) 2024-09-24 19:45:08 +03:00
ggml-impl.h Do not include arm_neon.h when compiling CUDA code (ggml/1028) 2024-12-08 20:14:35 +02:00
ggml-opt.cpp ggml-opt: fix data corruption (ggml/1022) 2024-12-08 20:14:35 +02:00
ggml-quants.c ggml : build backends as libraries (llama/10256) 2024-11-20 21:00:08 +02:00
ggml-quants.h ggml : build backends as libraries (llama/10256) 2024-11-20 21:00:08 +02:00
ggml-threading.cpp ggml : build backends as libraries (llama/10256) 2024-11-20 21:00:08 +02:00
ggml-threading.h ggml : build backends as libraries (llama/10256) 2024-11-20 21:00:08 +02:00
ggml.c ggml-opt: fix data corruption (ggml/1022) 2024-12-08 20:14:35 +02:00