whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2024-12-19 12:47:52 +00:00

History

Jeff Bolz 21b01a21b6 vulkan: Optimize contiguous copies (llama/10254) * tests: Fix memory bandwidth calculation for perf tests Add a flops calculation for flash attention. Add one GGML_OP_CPY perf test. * vulkan: Optimize contiguous copies Add a variant of the copy shader for when the tensors are contiguous. Avoid the complex addressing calculations, and do four elements per invocation to hide some other overhead. Apply similar changes to the scale shader, since scale is always contiguous. Add a "progress bar" for shader compiles.		2024-11-15 15:21:04 +02:00
..
ggml-amx	ggml : add AMX backend (llama/8998)	2024-11-01 10:19:05 +02:00
ggml-cann	cann: fix crash when llama-bench is running on multiple cann devices (llama/9627)	2024-10-03 12:22:17 +03:00
ggml-cuda	ggml: fix zero division in ‘dne’ calculation in CUDA COUNT_EQUAL operator when ‘ne’ is small (#10213 )	2024-11-15 15:21:04 +02:00
ggml-sycl	Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration (llama/10133)	2024-11-15 15:21:04 +02:00
kompute-shaders	whisper : reorganize source code + improve CMake (#2256 )	2024-06-26 19:34:09 +03:00
vulkan-shaders	vulkan: Optimize contiguous copies (llama/10254)	2024-11-15 15:21:04 +02:00
CMakeLists.txt	ggml : optimize llamafile cpu matrix multiplication for ppc64le (llama/10156)	2024-11-15 15:21:04 +02:00
ggml-aarch64.c	ggml : move CPU backend to a separate file (llama/10144)	2024-11-15 15:21:04 +02:00
ggml-aarch64.h	ggml : add ggml-aarch64 (ggml/0)	2024-08-08 22:48:46 +03:00
ggml-alloc.c	ggml : move more prints to the ggml log system (llama/9839)	2024-11-01 10:19:05 +02:00
ggml-amx.cpp	llama : refactor model loader with backend registry (llama/10026)	2024-11-15 15:21:04 +02:00
ggml-backend-impl.h	llama : refactor model loader with backend registry (llama/10026)	2024-11-15 15:21:04 +02:00
ggml-backend.cpp	ggml : move CPU backend to a separate file (llama/10144)	2024-11-15 15:21:04 +02:00
ggml-blas.cpp	llama : refactor model loader with backend registry (llama/10026)	2024-11-15 15:21:04 +02:00
ggml-cann.cpp	CANN: adjust backend registry refactor. (llama/10158)	2024-11-15 15:21:04 +02:00
ggml-common.h	ggml-quants : ternary packing for TriLMs and BitNet b1.58 (llama/8151)	2024-09-24 19:45:08 +03:00
ggml-cpu-impl.h	ggml : add ggml-cpu-impl.h (skip) (#0 )	2024-09-24 19:45:08 +03:00
ggml-cpu.c	fix q4_0_8_8 format for corrupted tokens issue (llama/10198)	2024-11-15 15:21:04 +02:00
ggml-cuda.cu	metal : optimize FA kernels (llama/10171)	2024-11-15 15:21:04 +02:00
ggml-impl.h	ggml : move CPU backend to a separate file (llama/10144)	2024-11-15 15:21:04 +02:00
ggml-kompute.cpp	kompute: add mul_mat_q4_k shader (llama/10097)	2024-11-15 15:21:04 +02:00
ggml-metal.m	metal : fix build and some more comments (llama/10229)	2024-11-15 15:21:04 +02:00
ggml-metal.metal	metal : more precise Q*K in FA vec kernel (llama/10247)	2024-11-15 15:21:04 +02:00
ggml-quants.c	Q6_K AVX improvements (llama/10118)	2024-11-15 15:21:04 +02:00
ggml-quants.h	ggml : add run-time detection of neon, i8mm and sve (llama/9331)	2024-10-03 12:22:17 +03:00
ggml-rpc.cpp	ggml : move CPU backend to a separate file (llama/10144)	2024-11-15 15:21:04 +02:00
ggml-sycl.cpp	Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration (llama/10133)	2024-11-15 15:21:04 +02:00
ggml-vulkan.cpp	vulkan: Optimize contiguous copies (llama/10254)	2024-11-15 15:21:04 +02:00
ggml.c	metal : optimize FA kernels (llama/10171)	2024-11-15 15:21:04 +02:00
sgemm.cpp	whisper : reorganize source code + improve CMake (#2256 )	2024-06-26 19:34:09 +03:00
sgemm.h	whisper : reorganize source code + improve CMake (#2256 )	2024-06-26 19:34:09 +03:00