mirror of https://github.com/ggerganov/whisper.cpp.git synced 2024-12-24 14:46:39 +00:00

Abhilash Majumder a0ddd8392c

* add changes from llama upstream

* add sycl abstraction

* add sycl build

* update cmake

* add sycl build config

* fix bug

* fix bug

* refactor build

* fix bug

* update build

* call build

* use sycl header

* add examples

* add target

* fix typecast in quant.c

* readd fp16 and readme

* fix quant typecast

* add sample

* add readme

* remove cxx file check

2024-02-23 09:22:24 +02:00

6.7 KiB

Raw Blame History

whisper.cpp for SYCL

Background

SYCL is a higher-level programming model to improve programming productivity on various hardware accelerators—such as CPUs, GPUs, and FPGAs. It is a single-source embedded domain-specific language based on pure C++17.

oneAPI is a specification that is open and standards-based, supporting multiple architecture types including but not limited to GPU, CPU, and FPGA. The spec has both direct programming and API-based programming paradigms.

Intel uses the SYCL as direct programming language to support CPU, GPUs and FPGAs.

To avoid re-inventing the wheel, this code refers other code paths in llama.cpp (like OpenBLAS, cuBLAS, CLBlast). We use a open-source tool SYCLomatic (Commercial release Intel® DPC++ Compatibility Tool) migrate to SYCL.

The whisper.cpp for SYCL is used to support Intel GPUs.

For Intel CPU, recommend to use whisper.cpp for X86 (Intel MKL build).

OS

OS	Status	Verified
Linux	Support	Ubuntu 22.04
Windows	Ongoing

Intel GPU

Intel GPU	Status	Verified Model
Intel Data Center Max Series	Support	Max 1550
Intel Data Center Flex Series	Support	Flex 170
Intel Arc Series	Support	Arc 770
Intel built-in Arc GPU	Support	built-in Arc GPU in Meteor Lake
Intel iGPU	Support	iGPU in i5-1250P, i7-1165G7

Linux

Setup Environment

Install Intel GPU driver.

a. Please install Intel GPU driver by official guide: Install GPU Drivers.

Note: for iGPU, please install the client GPU driver.

b. Add user to group: video, render.

sudo usermod -aG render username
sudo usermod -aG video username

Note: re-login to enable it.

c. Check

sudo apt install clinfo
sudo clinfo -l

Output (example):

Platform #0: Intel(R) OpenCL Graphics
 `-- Device #0: Intel(R) Arc(TM) A770 Graphics


Platform #0: Intel(R) OpenCL HD Graphics
 `-- Device #0: Intel(R) Iris(R) Xe Graphics [0x9a49]

Install Intel® oneAPI Base toolkit.

a. Please follow the procedure in Get the Intel® oneAPI Base Toolkit .

Recommend to install to default folder: /opt/intel/oneapi.

Following guide use the default folder as example. If you use other folder, please modify the following guide info with your folder.

b. Check

source /opt/intel/oneapi/setvars.sh

sycl-ls

There should be one or more level-zero devices. Like [ext_oneapi_level_zero:gpu:0].

Output (example):

[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2023.16.10.0.17_160000]
[opencl:cpu:1] Intel(R) OpenCL, 13th Gen Intel(R) Core(TM) i7-13700K OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics OpenCL 3.0 NEO  [23.30.26918.50]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.26918]

Build locally:

mkdir -p build
cd build
source /opt/intel/oneapi/setvars.sh

#for FP16
#cmake .. -DWHISPER_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DWHISPER_SYCL_F16=ON 

#for FP32
cmake .. -DWHISPER_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx

#build example/main only
#cmake --build . --config Release --target main

#build all binary
cmake --build . --config Release -v

./examples/sycl/build.sh

Note:

By default, it will build for all binary files. It will take more time. To reduce the time, we recommend to build for example/main only.

Run

Put model file to folder models
Enable oneAPI running environment

source /opt/intel/oneapi/setvars.sh

List device ID

Run without parameter:

./build/bin/ls-sycl-device

or

./build/bin/main

Check the ID in startup log, like:

found 4 SYCL devices:
  Device 0: Intel(R) Arc(TM) A770 Graphics,	compute capability 1.3,
    max compute_units 512,	max work group size 1024,	max sub group size 32,	global mem size 16225243136
  Device 1: Intel(R) FPGA Emulation Device,	compute capability 1.2,
    max compute_units 24,	max work group size 67108864,	max sub group size 64,	global mem size 67065057280
  Device 2: 13th Gen Intel(R) Core(TM) i7-13700K,	compute capability 3.0,
    max compute_units 24,	max work group size 8192,	max sub group size 64,	global mem size 67065057280
  Device 3: Intel(R) Arc(TM) A770 Graphics,	compute capability 3.0,
    max compute_units 512,	max work group size 1024,	max sub group size 32,	global mem size 16225243136

Attribute	Note
compute capability 1.3	Level-zero running time, recommended
compute capability 3.0	OpenCL running time, slower than level-zero in most cases

Set device ID and execute whisper.cpp

Set device ID = 0 by GGML_SYCL_DEVICE=0

GGML_SYCL_DEVICE=0 ./build/bin/main -m models/ggml-base.en.bin -f samples/jfk.wav

or run by script:

./examples/sycl/run_whisper.sh

Check the device ID in output

Like:

Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device

Environment Variable

Build

Name	Value	Function
WHISPER_SYCL	ON (mandatory)	Enable build with SYCL code path. For FP32/FP16, WHISPER_SYCL=ON is mandatory.
WHISPER_SYCL_F16	ON (optional)	Enable FP16 build with SYCL code path.For FP32, do not set it.
CMAKE_C_COMPILER	icx	Use icx compiler for SYCL code path
CMAKE_CXX_COMPILER	icpx	use icpx for SYCL code path

Running

Name	Value	Function
GGML_SYCL_DEVICE	0 (default) or 1	Set the device id used. Check the device ids by default running output
GGML_SYCL_DEBUG	0 (default) or 1	Enable log function by macro: GGML_SYCL_DEBUG

Known Issue

Error: error while loading shared libraries: libsycl.so.7: cannot open shared object file: No such file or directory.

Miss to enable oneAPI running environment.

Install oneAPI base toolkit and enable it by: source /opt/intel/oneapi/setvars.sh.
Hang during startup

llama.cpp use mmap as default way to read model file and copy to GPU. In some system, memcpy will be abnormal and block.

Solution: add --no-mmap.

Todo

Support to build in Windows.
Support multiple cards.

6.7 KiB Raw Blame History

whisper.cpp for SYCL

Background

OS

Intel GPU

Linux

Setup Environment

Run

Environment Variable

Build

Running

Known Issue

Todo

6.7 KiB

Raw Blame History