whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-06-02 15:30:42 +00:00

History

shalinib-ibm 42938398f9 ggml : Enable MMA for BF16 in llamafile_sgemm (llama/13148)

This patch upstreams llamafile's cpu matrix multiplication kernels for ppc64le using MMA builtins for BF16 data type.

This change results in 9x - 40x gains
in total speed S t/s (ie all tokens/total time), across various batch sizes tested using llama-batched-bench benchmark.

The patch is tested with Meta-Lllama-3-8B,
and Mistral-7B models (BF16 models generated by using llama-quantize from corresponding FP32 models) on an IBM POWER10 machine.

Signed-off-by: Shalini Salomi Bodapati <Shalini.Salomi.Bodapati@ibm.com>

2025-05-07 15:39:32 +03:00

cmake

ggml : sync/merge cmake,riscv,powerpc, add common.cmake (ggml/0)

2025-03-27 11:06:03 +02:00

include

CUDA: fix q_nope_absorbed prec for DS 2 Lite f16 (llama/13137)

2025-05-01 13:29:02 +03:00

src

ggml : Enable MMA for BF16 in llamafile_sgemm (llama/13148)

2025-05-07 15:39:32 +03:00

.gitignore

whisper : reorganize source code + improve CMake (#2256 )

2024-06-26 19:34:09 +03:00

CMakeLists.txt

whisper: remove MSVC warnings pragmas (#3090 )

2025-05-05 13:09:35 +02:00