diff --git a/README.md b/README.md index de620fd0..e4f9007d 100644 --- a/README.md +++ b/README.md @@ -16,12 +16,10 @@ High-performance inference of [OpenAI's Whisper](https://github.com/openai/whisp - VSX intrinsics support for POWER architectures - Mixed F16 / F32 precision - [4-bit and 5-bit integer quantization support](https://github.com/ggerganov/whisper.cpp#quantization) -- Low memory usage (Flash Attention) - Zero memory allocations at runtime - Support for CPU-only inference -- [Partial GPU support for NVIDIA via cuBLAS](https://github.com/ggerganov/whisper.cpp#nvidia-gpu-support-via-cublas) +- [Efficient GPU support for NVIDIA](https://github.com/ggerganov/whisper.cpp#nvidia-gpu-support-via-cublas) - [Partial OpenCL GPU support via CLBlast](https://github.com/ggerganov/whisper.cpp#opencl-gpu-support-via-clblast) -- [BLAS CPU support via OpenBLAS](https://github.com/ggerganov/whisper.cpp#blas-cpu-support-via-openblas) - [OpenVINO Support](https://github.com/ggerganov/whisper.cpp#openvino-support) - [C-style API](https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h) @@ -400,12 +398,12 @@ This can result in significant speedup in encoder performance. Here are the inst The first time run on an OpenVINO device is slow, since the OpenVINO framework will compile the IR (Intermediate Representation) model to a device-specific 'blob'. This device-specific blob will get cached for the next run. - + For more information about the Core ML implementation please refer to PR [#1037](https://github.com/ggerganov/whisper.cpp/pull/1037). -## NVIDIA GPU support via cuBLAS +## NVIDIA GPU support -With NVIDIA cards the Encoder processing can to a large extent be offloaded to the GPU through cuBLAS. +With NVIDIA cards the processing of the models is done efficiently on the GPU via cuBLAS and custom CUDA kernels. First, make sure you have installed `cuda`: https://developer.nvidia.com/cuda-downloads Now build `whisper.cpp` with cuBLAS support: