whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-05-31 14:30:42 +00:00

History

amritahs-ibm b7b38f7d68 ggml : optimize llamafile cpu matrix multiplication for ppc64le (llama/10156)

This change upstreams llamafile's cpu matrix
multiplication kernels for ppc64le using MMA
builtins for FP32 datatype.

This change results in a consistent 90%
improvement in input processing time, and 20%
to 80% improvement in output processing time,
across various batch sizes.

The patch is tested with Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf models on a
IBM POWER10 machine.

Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>

2024-11-15 15:21:04 +02:00

cmake

whisper : reorganize source code + improve CMake (#2256 )

2024-06-26 19:34:09 +03:00

include

metal : optimize FA kernels (llama/10171)

2024-11-15 15:21:04 +02:00

src

ggml : optimize llamafile cpu matrix multiplication for ppc64le (llama/10156)

2024-11-15 15:21:04 +02:00

.gitignore

whisper : reorganize source code + improve CMake (#2256 )

2024-06-26 19:34:09 +03:00

CMakeLists.txt

metal : opt-in compile flag for BF16 (llama/10218)

2024-11-15 15:21:04 +02:00

ggml_vk_generate_shaders.py

whisper : reorganize source code + improve CMake (#2256 )

2024-06-26 19:34:09 +03:00