whisper.cpp/ggml/src
Jeff Bolz 21b01a21b6 vulkan: Optimize contiguous copies (llama/10254)
* tests: Fix memory bandwidth calculation for perf tests

Add a flops calculation for flash attention.

Add one GGML_OP_CPY perf test.

* vulkan: Optimize contiguous copies

Add a variant of the copy shader for when the tensors are contiguous. Avoid
the complex addressing calculations, and do four elements per invocation
to hide some other overhead.

Apply similar changes to the scale shader, since scale is always contiguous.

Add a "progress bar" for shader compiles.
2024-11-15 15:21:04 +02:00
..
ggml-amx ggml : add AMX backend (llama/8998) 2024-11-01 10:19:05 +02:00
ggml-cann cann: fix crash when llama-bench is running on multiple cann devices (llama/9627) 2024-10-03 12:22:17 +03:00
ggml-cuda ggml: fix zero division in ‘dne’ calculation in CUDA COUNT_EQUAL operator when ‘ne’ is small (#10213) 2024-11-15 15:21:04 +02:00
ggml-sycl Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration (llama/10133) 2024-11-15 15:21:04 +02:00
kompute-shaders whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
vulkan-shaders vulkan: Optimize contiguous copies (llama/10254) 2024-11-15 15:21:04 +02:00
CMakeLists.txt ggml : optimize llamafile cpu matrix multiplication for ppc64le (llama/10156) 2024-11-15 15:21:04 +02:00
ggml-aarch64.c ggml : move CPU backend to a separate file (llama/10144) 2024-11-15 15:21:04 +02:00
ggml-aarch64.h ggml : add ggml-aarch64 (ggml/0) 2024-08-08 22:48:46 +03:00
ggml-alloc.c ggml : move more prints to the ggml log system (llama/9839) 2024-11-01 10:19:05 +02:00
ggml-amx.cpp llama : refactor model loader with backend registry (llama/10026) 2024-11-15 15:21:04 +02:00
ggml-backend-impl.h llama : refactor model loader with backend registry (llama/10026) 2024-11-15 15:21:04 +02:00
ggml-backend.cpp ggml : move CPU backend to a separate file (llama/10144) 2024-11-15 15:21:04 +02:00
ggml-blas.cpp llama : refactor model loader with backend registry (llama/10026) 2024-11-15 15:21:04 +02:00
ggml-cann.cpp CANN: adjust backend registry refactor. (llama/10158) 2024-11-15 15:21:04 +02:00
ggml-common.h ggml-quants : ternary packing for TriLMs and BitNet b1.58 (llama/8151) 2024-09-24 19:45:08 +03:00
ggml-cpu-impl.h ggml : add ggml-cpu-impl.h (skip) (#0) 2024-09-24 19:45:08 +03:00
ggml-cpu.c fix q4_0_8_8 format for corrupted tokens issue (llama/10198) 2024-11-15 15:21:04 +02:00
ggml-cuda.cu metal : optimize FA kernels (llama/10171) 2024-11-15 15:21:04 +02:00
ggml-impl.h ggml : move CPU backend to a separate file (llama/10144) 2024-11-15 15:21:04 +02:00
ggml-kompute.cpp kompute: add mul_mat_q4_k shader (llama/10097) 2024-11-15 15:21:04 +02:00
ggml-metal.m metal : fix build and some more comments (llama/10229) 2024-11-15 15:21:04 +02:00
ggml-metal.metal metal : more precise Q*K in FA vec kernel (llama/10247) 2024-11-15 15:21:04 +02:00
ggml-quants.c Q6_K AVX improvements (llama/10118) 2024-11-15 15:21:04 +02:00
ggml-quants.h ggml : add run-time detection of neon, i8mm and sve (llama/9331) 2024-10-03 12:22:17 +03:00
ggml-rpc.cpp ggml : move CPU backend to a separate file (llama/10144) 2024-11-15 15:21:04 +02:00
ggml-sycl.cpp Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration (llama/10133) 2024-11-15 15:21:04 +02:00
ggml-vulkan.cpp vulkan: Optimize contiguous copies (llama/10254) 2024-11-15 15:21:04 +02:00
ggml.c metal : optimize FA kernels (llama/10171) 2024-11-15 15:21:04 +02:00
sgemm.cpp whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
sgemm.h whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00