whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-05-21 09:47:52 +00:00

History

amritahs-ibm fc6d343e76 llamafile : ppc64le MMA implementation for Q4_0. (llama/12489)

This change upstreams llamafile's cpu matrix
multiplication kernels for ppc64le ISA using MMA
builtins. This patch handles matrix multiplication
between quantised datatypes, block_q4_0 and
block_q8_0.

This change results in 5% - 50% improvement
in total speed(ie all tokens/total time), across
various batch sizes.

The patch is tested with Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf models on a
IBM POWER10 machine.

Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>

2025-03-27 11:06:03 +02:00

cmake

cmake: Comment out GGML_BIN_DIR for now (ggml/1139)

2025-03-27 11:06:03 +02:00

include

llama: Add support for RWKV v7 architecture (llama/12412)

2025-03-27 11:06:03 +02:00

src

llamafile : ppc64le MMA implementation for Q4_0. (llama/12489)

2025-03-27 11:06:03 +02:00

.gitignore

whisper : reorganize source code + improve CMake (#2256 )

2024-06-26 19:34:09 +03:00

CMakeLists.txt

SYCL: using graphs is configurable by environment variable and compile option (llama/12371)

2025-03-27 11:06:03 +02:00