whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-02-16 07:10:22 +00:00

History

Johannes Gäßler f8a831779e CUDA: use mma PTX instructions for FlashAttention (llama/11583)

* CUDA: use mma PTX instructions for FlashAttention

* __shfl_sync workaround for movmatrix

* add __shfl_sync to HIP

Co-authored-by: Diego Devesa <slarengh@gmail.com>

2025-02-03 22:00:57 +02:00

ggml-alloc.h

ggml : fix typo in example usage ggml_gallocr_new (ggml/984)

2024-10-05 15:23:51 +03:00

ggml-backend.h

rpc : early register backend devices (llama/11262)

2025-02-03 22:00:57 +02:00

ggml-blas.h

ggml : build backends as libraries (llama/10256)

2024-11-20 21:00:08 +02:00

ggml-cann.h

ggml : build backends as libraries (llama/10256)

2024-11-20 21:00:08 +02:00

ggml-cpp.h

GGUF: C++ refactor, backend support, misc fixes (llama/11030)

2025-01-14 10:38:01 +02:00

ggml-cpu.h

ggml : refactor online repacking (llama/10446)

2024-12-18 12:52:16 +02:00

ggml-cuda.h

ggml : build backends as libraries (llama/10256)

2024-11-20 21:00:08 +02:00

ggml-kompute.h

ggml : build backends as libraries (llama/10256)

2024-11-20 21:00:08 +02:00

ggml-metal.h

ggml : build backends as libraries (llama/10256)

2024-11-20 21:00:08 +02:00

ggml-opencl.h

Introducing experimental OpenCL backend with support for Qualcomm Adreno GPUs (llama/10693)

2024-12-18 12:52:16 +02:00

ggml-opt.h

ggml: new optimization interface (ggml/988)

2024-11-20 21:00:08 +02:00

ggml-rpc.h

ggml : build backends as libraries (llama/10256)

2024-11-20 21:00:08 +02:00

ggml-sycl.h

ggml : build backends as libraries (llama/10256)

2024-11-20 21:00:08 +02:00

ggml-vulkan.h

ggml : build backends as libraries (llama/10256)

2024-11-20 21:00:08 +02:00

ggml.h

CUDA: use mma PTX instructions for FlashAttention (llama/11583)

2025-02-03 22:00:57 +02:00

gguf.h

GGUF: C++ refactor, backend support, misc fixes (skip) (llama/11030)

2025-01-14 10:38:01 +02:00