mirror of
https://github.com/ggerganov/whisper.cpp.git
synced 2025-04-24 04:56:03 +00:00
Compare commits
No commits in common. "master" and "v1.7.5" have entirely different histories.
54
.github/workflows/bindings-ruby.yml
vendored
54
.github/workflows/bindings-ruby.yml
vendored
@ -1,11 +1,55 @@
|
|||||||
name: Bindings Tests (Ruby)
|
name: Bindings Tests (Ruby)
|
||||||
|
|
||||||
on:
|
on:
|
||||||
push:
|
push:
|
||||||
branches:
|
paths:
|
||||||
- master
|
- bindings/ruby/**
|
||||||
|
- src/**/*.c
|
||||||
|
- src/**/*.cpp
|
||||||
|
- src/**/*.h
|
||||||
|
- src/**/*.m
|
||||||
|
- src/**/*.metal
|
||||||
|
- include/**/*.c
|
||||||
|
- include/**/*.cpp
|
||||||
|
- include/**/*.h
|
||||||
|
- include/**/*.m
|
||||||
|
- include/**/*.metal
|
||||||
|
- ggml/**/*.c
|
||||||
|
- ggml/**/*.cpp
|
||||||
|
- ggml/**/*.h
|
||||||
|
- ggml/**/*.m
|
||||||
|
- ggml/**/*.metal
|
||||||
|
- scripts/get-flags.mk
|
||||||
|
- examples/common.h
|
||||||
|
- examples/common.cpp
|
||||||
|
- examples/common-whisper.h
|
||||||
|
- examples/common-whisper.cpp
|
||||||
|
- examples/stb_vorbis.c
|
||||||
|
- examples/miniaudio.h
|
||||||
pull_request:
|
pull_request:
|
||||||
types: [opened, synchronize, reopened]
|
paths:
|
||||||
|
- bindings/ruby/**
|
||||||
|
- src/**/*.c
|
||||||
|
- src/**/*.cpp
|
||||||
|
- src/**/*.h
|
||||||
|
- src/**/*.m
|
||||||
|
- src/**/*.metal
|
||||||
|
- include/**/*.c
|
||||||
|
- include/**/*.cpp
|
||||||
|
- include/**/*.h
|
||||||
|
- include/**/*.m
|
||||||
|
- include/**/*.metal
|
||||||
|
- ggml/**/*.c
|
||||||
|
- ggml/**/*.cpp
|
||||||
|
- ggml/**/*.h
|
||||||
|
- ggml/**/*.m
|
||||||
|
- ggml/**/*.metal
|
||||||
|
- scripts/get-flags.mk
|
||||||
|
- examples/common.h
|
||||||
|
- examples/common.cpp
|
||||||
|
- examples/common-whisper.h
|
||||||
|
- examples/common-whisper.cpp
|
||||||
|
- examples/stb_vorbis.c
|
||||||
|
- examples/miniaudio.h
|
||||||
|
|
||||||
jobs:
|
jobs:
|
||||||
ubuntu-22:
|
ubuntu-22:
|
||||||
@ -16,6 +60,6 @@ jobs:
|
|||||||
steps:
|
steps:
|
||||||
- uses: ruby/setup-ruby@v1
|
- uses: ruby/setup-ruby@v1
|
||||||
with:
|
with:
|
||||||
ruby-version: '3.2'
|
ruby-version: '3.1'
|
||||||
- uses: actions/checkout@v4
|
- uses: actions/checkout@v4
|
||||||
- run: rake test
|
- run: rake test
|
||||||
|
34
.github/workflows/build.yml
vendored
34
.github/workflows/build.yml
vendored
@ -200,23 +200,23 @@ jobs:
|
|||||||
cmake --build build --config Release -j $(sysctl -n hw.logicalcpu)
|
cmake --build build --config Release -j $(sysctl -n hw.logicalcpu)
|
||||||
|
|
||||||
|
|
||||||
# freeBSD-latest:
|
freeBSD-latest:
|
||||||
# runs-on: macos-13
|
runs-on: macos-13
|
||||||
#
|
|
||||||
# steps:
|
steps:
|
||||||
# - name: Clone
|
- name: Clone
|
||||||
# uses: actions/checkout@v4
|
uses: actions/checkout@v4
|
||||||
#
|
|
||||||
# - name: Build
|
- name: Build
|
||||||
# uses: cross-platform-actions/action@v0.27.0
|
uses: cross-platform-actions/action@v0.27.0
|
||||||
# with:
|
with:
|
||||||
# operating_system: freebsd
|
operating_system: freebsd
|
||||||
# version: '14.2'
|
version: '14.2'
|
||||||
# run: |
|
run: |
|
||||||
# sudo pkg update
|
sudo pkg update
|
||||||
# sudo pkg install -y gmake sdl2 cmake git
|
sudo pkg install -y gmake sdl2 cmake git
|
||||||
# cmake -B build
|
cmake -B build
|
||||||
# cmake --build build --config Release
|
cmake --build build --config Release
|
||||||
|
|
||||||
ubuntu-22-gcc:
|
ubuntu-22-gcc:
|
||||||
if: ${{ github.event_name == 'push' || github.event_name == 'pull_request' ||
|
if: ${{ github.event_name == 'push' || github.event_name == 'pull_request' ||
|
||||||
|
129
README.md
129
README.md
@ -2,12 +2,15 @@
|
|||||||
|
|
||||||

|

|
||||||
|
|
||||||
[](https://github.com/ggml-org/whisper.cpp/actions)
|
[](https://github.com/ggerganov/whisper.cpp/actions)
|
||||||
[](https://opensource.org/licenses/MIT)
|
[](https://opensource.org/licenses/MIT)
|
||||||
[](https://conan.io/center/whisper-cpp)
|
[](https://conan.io/center/whisper-cpp)
|
||||||
[](https://www.npmjs.com/package/whisper.cpp/)
|
[](https://www.npmjs.com/package/whisper.cpp/)
|
||||||
|
|
||||||
Stable: [v1.7.5](https://github.com/ggml-org/whisper.cpp/releases/tag/v1.7.5) / [Roadmap](https://github.com/orgs/ggml-org/projects/4/)
|
> [!NOTE]
|
||||||
|
> New maintenance roadmap: https://github.com/ggerganov/whisper.cpp/discussions/2788
|
||||||
|
|
||||||
|
Stable: [v1.7.5](https://github.com/ggerganov/whisper.cpp/releases/tag/v1.7.5) / [Roadmap | F.A.Q.](https://github.com/ggerganov/whisper.cpp/discussions/126)
|
||||||
|
|
||||||
High-performance inference of [OpenAI's Whisper](https://github.com/openai/whisper) automatic speech recognition (ASR) model:
|
High-performance inference of [OpenAI's Whisper](https://github.com/openai/whisper) automatic speech recognition (ASR) model:
|
||||||
|
|
||||||
@ -23,7 +26,7 @@ High-performance inference of [OpenAI's Whisper](https://github.com/openai/whisp
|
|||||||
- [Efficient GPU support for NVIDIA](#nvidia-gpu-support)
|
- [Efficient GPU support for NVIDIA](#nvidia-gpu-support)
|
||||||
- [OpenVINO Support](#openvino-support)
|
- [OpenVINO Support](#openvino-support)
|
||||||
- [Ascend NPU Support](#ascend-npu-support)
|
- [Ascend NPU Support](#ascend-npu-support)
|
||||||
- [C-style API](https://github.com/ggml-org/whisper.cpp/blob/master/include/whisper.h)
|
- [C-style API](https://github.com/ggerganov/whisper.cpp/blob/master/include/whisper.h)
|
||||||
|
|
||||||
Supported platforms:
|
Supported platforms:
|
||||||
|
|
||||||
@ -31,14 +34,14 @@ Supported platforms:
|
|||||||
- [x] [iOS](examples/whisper.objc)
|
- [x] [iOS](examples/whisper.objc)
|
||||||
- [x] [Android](examples/whisper.android)
|
- [x] [Android](examples/whisper.android)
|
||||||
- [x] [Java](bindings/java/README.md)
|
- [x] [Java](bindings/java/README.md)
|
||||||
- [x] Linux / [FreeBSD](https://github.com/ggml-org/whisper.cpp/issues/56#issuecomment-1350920264)
|
- [x] Linux / [FreeBSD](https://github.com/ggerganov/whisper.cpp/issues/56#issuecomment-1350920264)
|
||||||
- [x] [WebAssembly](examples/whisper.wasm)
|
- [x] [WebAssembly](examples/whisper.wasm)
|
||||||
- [x] Windows ([MSVC](https://github.com/ggml-org/whisper.cpp/blob/master/.github/workflows/build.yml#L117-L144) and [MinGW](https://github.com/ggml-org/whisper.cpp/issues/168)]
|
- [x] Windows ([MSVC](https://github.com/ggerganov/whisper.cpp/blob/master/.github/workflows/build.yml#L117-L144) and [MinGW](https://github.com/ggerganov/whisper.cpp/issues/168)]
|
||||||
- [x] [Raspberry Pi](https://github.com/ggml-org/whisper.cpp/discussions/166)
|
- [x] [Raspberry Pi](https://github.com/ggerganov/whisper.cpp/discussions/166)
|
||||||
- [x] [Docker](https://github.com/ggml-org/whisper.cpp/pkgs/container/whisper.cpp)
|
- [x] [Docker](https://github.com/ggerganov/whisper.cpp/pkgs/container/whisper.cpp)
|
||||||
|
|
||||||
The entire high-level implementation of the model is contained in [whisper.h](include/whisper.h) and [whisper.cpp](src/whisper.cpp).
|
The entire high-level implementation of the model is contained in [whisper.h](include/whisper.h) and [whisper.cpp](src/whisper.cpp).
|
||||||
The rest of the code is part of the [`ggml`](https://github.com/ggml-org/ggml) machine learning library.
|
The rest of the code is part of the [`ggml`](https://github.com/ggerganov/ggml) machine learning library.
|
||||||
|
|
||||||
Having such a lightweight implementation of the model allows to easily integrate it in different platforms and applications.
|
Having such a lightweight implementation of the model allows to easily integrate it in different platforms and applications.
|
||||||
As an example, here is a video of running the model on an iPhone 13 device - fully offline, on-device: [whisper.objc](examples/whisper.objc)
|
As an example, here is a video of running the model on an iPhone 13 device - fully offline, on-device: [whisper.objc](examples/whisper.objc)
|
||||||
@ -51,14 +54,14 @@ https://user-images.githubusercontent.com/1991296/204038393-2f846eae-c255-4099-a
|
|||||||
|
|
||||||
On Apple Silicon, the inference runs fully on the GPU via Metal:
|
On Apple Silicon, the inference runs fully on the GPU via Metal:
|
||||||
|
|
||||||
https://github.com/ggml-org/whisper.cpp/assets/1991296/c82e8f86-60dc-49f2-b048-d2fdbd6b5225
|
https://github.com/ggerganov/whisper.cpp/assets/1991296/c82e8f86-60dc-49f2-b048-d2fdbd6b5225
|
||||||
|
|
||||||
## Quick start
|
## Quick start
|
||||||
|
|
||||||
First clone the repository:
|
First clone the repository:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
git clone https://github.com/ggml-org/whisper.cpp.git
|
git clone https://github.com/ggerganov/whisper.cpp.git
|
||||||
```
|
```
|
||||||
|
|
||||||
Navigate into the directory:
|
Navigate into the directory:
|
||||||
@ -149,7 +152,6 @@ standard cmake setup with:
|
|||||||
cmake -B build -DGGML_BLAS=1
|
cmake -B build -DGGML_BLAS=1
|
||||||
cmake --build build --config Release
|
cmake --build build --config Release
|
||||||
./build/bin/whisper-cli [ .. etc .. ]
|
./build/bin/whisper-cli [ .. etc .. ]
|
||||||
```
|
|
||||||
|
|
||||||
## Quantization
|
## Quantization
|
||||||
|
|
||||||
@ -223,7 +225,7 @@ speed-up - more than x3 faster compared with CPU-only execution. Here are the in
|
|||||||
The first run on a device is slow, since the ANE service compiles the Core ML model to some device-specific format.
|
The first run on a device is slow, since the ANE service compiles the Core ML model to some device-specific format.
|
||||||
Next runs are faster.
|
Next runs are faster.
|
||||||
|
|
||||||
For more information about the Core ML implementation please refer to PR [#566](https://github.com/ggml-org/whisper.cpp/pull/566).
|
For more information about the Core ML implementation please refer to PR [#566](https://github.com/ggerganov/whisper.cpp/pull/566).
|
||||||
|
|
||||||
## OpenVINO support
|
## OpenVINO support
|
||||||
|
|
||||||
@ -308,7 +310,7 @@ This can result in significant speedup in encoder performance. Here are the inst
|
|||||||
The first time run on an OpenVINO device is slow, since the OpenVINO framework will compile the IR (Intermediate Representation) model to a device-specific 'blob'. This device-specific blob will get
|
The first time run on an OpenVINO device is slow, since the OpenVINO framework will compile the IR (Intermediate Representation) model to a device-specific 'blob'. This device-specific blob will get
|
||||||
cached for the next run.
|
cached for the next run.
|
||||||
|
|
||||||
For more information about the OpenVINO implementation please refer to PR [#1037](https://github.com/ggml-org/whisper.cpp/pull/1037).
|
For more information about the OpenVINO implementation please refer to PR [#1037](https://github.com/ggerganov/whisper.cpp/pull/1037).
|
||||||
|
|
||||||
## NVIDIA GPU support
|
## NVIDIA GPU support
|
||||||
|
|
||||||
@ -322,12 +324,6 @@ cmake -B build -DGGML_CUDA=1
|
|||||||
cmake --build build -j --config Release
|
cmake --build build -j --config Release
|
||||||
```
|
```
|
||||||
|
|
||||||
or for newer NVIDIA GPU's (RTX 5000 series):
|
|
||||||
```
|
|
||||||
cmake -B build -DGGML_CUDA=1 -DCMAKE_CUDA_ARCHITECTURES="86"
|
|
||||||
cmake --build build -j --config Release
|
|
||||||
```
|
|
||||||
|
|
||||||
## Vulkan GPU support
|
## Vulkan GPU support
|
||||||
Cross-vendor solution which allows you to accelerate workload on your GPU.
|
Cross-vendor solution which allows you to accelerate workload on your GPU.
|
||||||
First, make sure your graphics card driver provides support for Vulkan API.
|
First, make sure your graphics card driver provides support for Vulkan API.
|
||||||
@ -381,37 +377,6 @@ Run the inference examples as usual, for example:
|
|||||||
- If you have trouble with Ascend NPU device, please create a issue with **[CANN]** prefix/tag.
|
- If you have trouble with Ascend NPU device, please create a issue with **[CANN]** prefix/tag.
|
||||||
- If you run successfully with your Ascend NPU device, please help update the table `Verified devices`.
|
- If you run successfully with your Ascend NPU device, please help update the table `Verified devices`.
|
||||||
|
|
||||||
## FFmpeg support (Linux only)
|
|
||||||
|
|
||||||
If you want to support more audio formats (such as Opus and AAC), you can turn on the `WHISPER_FFMPEG` build flag to enable FFmpeg integration.
|
|
||||||
|
|
||||||
First, you need to install required libraries:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Debian/Ubuntu
|
|
||||||
sudo apt install libavcodec-dev libavformat-dev libavutil-dev
|
|
||||||
|
|
||||||
# RHEL/Fedora
|
|
||||||
sudo dnf install libavcodec-free-devel libavformat-free-devel libavutil-free-devel
|
|
||||||
```
|
|
||||||
|
|
||||||
Then you can build the project as follows:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cmake -B build -D WHISPER_FFMPEG=yes
|
|
||||||
cmake --build build
|
|
||||||
```
|
|
||||||
|
|
||||||
Run the following example to confirm it's working:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Convert an audio file to Opus format
|
|
||||||
ffmpeg -i samples/jfk.wav jfk.opus
|
|
||||||
|
|
||||||
# Transcribe the audio file
|
|
||||||
./build/bin/whisper-cli --model models/ggml-base.en.bin --file jfk.opus
|
|
||||||
```
|
|
||||||
|
|
||||||
## Docker
|
## Docker
|
||||||
|
|
||||||
### Prerequisites
|
### Prerequisites
|
||||||
@ -423,8 +388,8 @@ ffmpeg -i samples/jfk.wav jfk.opus
|
|||||||
|
|
||||||
We have two Docker images available for this project:
|
We have two Docker images available for this project:
|
||||||
|
|
||||||
1. `ghcr.io/ggml-org/whisper.cpp:main`: This image includes the main executable file as well as `curl` and `ffmpeg`. (platforms: `linux/amd64`, `linux/arm64`)
|
1. `ghcr.io/ggerganov/whisper.cpp:main`: This image includes the main executable file as well as `curl` and `ffmpeg`. (platforms: `linux/amd64`, `linux/arm64`)
|
||||||
2. `ghcr.io/ggml-org/whisper.cpp:main-cuda`: Same as `main` but compiled with CUDA support. (platforms: `linux/amd64`)
|
2. `ghcr.io/ggerganov/whisper.cpp:main-cuda`: Same as `main` but compiled with CUDA support. (platforms: `linux/amd64`)
|
||||||
|
|
||||||
### Usage
|
### Usage
|
||||||
|
|
||||||
@ -462,7 +427,7 @@ For detailed instructions on how to use Conan, please refer to the [Conan docume
|
|||||||
|
|
||||||
This is a naive example of performing real-time inference on audio from your microphone.
|
This is a naive example of performing real-time inference on audio from your microphone.
|
||||||
The [stream](examples/stream) tool samples the audio every half a second and runs the transcription continuously.
|
The [stream](examples/stream) tool samples the audio every half a second and runs the transcription continuously.
|
||||||
More info is available in [issue #10](https://github.com/ggml-org/whisper.cpp/issues/10).
|
More info is available in [issue #10](https://github.com/ggerganov/whisper.cpp/issues/10).
|
||||||
You will need to have [sdl2](https://wiki.libsdl.org/SDL2/Installation) installed for it to work properly.
|
You will need to have [sdl2](https://wiki.libsdl.org/SDL2/Installation) installed for it to work properly.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
@ -551,7 +516,7 @@ main: processing './samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 pr
|
|||||||
|
|
||||||
## Speaker segmentation via tinydiarize (experimental)
|
## Speaker segmentation via tinydiarize (experimental)
|
||||||
|
|
||||||
More information about this approach is available here: https://github.com/ggml-org/whisper.cpp/pull/1058
|
More information about this approach is available here: https://github.com/ggerganov/whisper.cpp/pull/1058
|
||||||
|
|
||||||
Sample usage:
|
Sample usage:
|
||||||
|
|
||||||
@ -615,7 +580,7 @@ https://user-images.githubusercontent.com/1991296/199337538-b7b0c7a3-2753-4a88-a
|
|||||||
|
|
||||||
## Video comparison of different models
|
## Video comparison of different models
|
||||||
|
|
||||||
Use the [scripts/bench-wts.sh](https://github.com/ggml-org/whisper.cpp/blob/master/scripts/bench-wts.sh) script to generate a video in the following format:
|
Use the [scripts/bench-wts.sh](https://github.com/ggerganov/whisper.cpp/blob/master/scripts/bench-wts.sh) script to generate a video in the following format:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
./scripts/bench-wts.sh samples/jfk.wav
|
./scripts/bench-wts.sh samples/jfk.wav
|
||||||
@ -632,7 +597,7 @@ In order to have an objective comparison of the performance of the inference acr
|
|||||||
use the [whisper-bench](examples/bench) tool. The tool simply runs the Encoder part of the model and prints how much time it
|
use the [whisper-bench](examples/bench) tool. The tool simply runs the Encoder part of the model and prints how much time it
|
||||||
took to execute it. The results are summarized in the following Github issue:
|
took to execute it. The results are summarized in the following Github issue:
|
||||||
|
|
||||||
[Benchmark results](https://github.com/ggml-org/whisper.cpp/issues/89)
|
[Benchmark results](https://github.com/ggerganov/whisper.cpp/issues/89)
|
||||||
|
|
||||||
Additionally a script to run whisper.cpp with different models and audio files is provided [bench.py](scripts/bench.py).
|
Additionally a script to run whisper.cpp with different models and audio files is provided [bench.py](scripts/bench.py).
|
||||||
|
|
||||||
@ -659,24 +624,25 @@ You can download the converted models using the [models/download-ggml-model.sh](
|
|||||||
or manually from here:
|
or manually from here:
|
||||||
|
|
||||||
- https://huggingface.co/ggerganov/whisper.cpp
|
- https://huggingface.co/ggerganov/whisper.cpp
|
||||||
|
- https://ggml.ggerganov.com
|
||||||
|
|
||||||
For more details, see the conversion script [models/convert-pt-to-ggml.py](models/convert-pt-to-ggml.py) or [models/README.md](models/README.md).
|
For more details, see the conversion script [models/convert-pt-to-ggml.py](models/convert-pt-to-ggml.py) or [models/README.md](models/README.md).
|
||||||
|
|
||||||
## [Bindings](https://github.com/ggml-org/whisper.cpp/discussions/categories/bindings)
|
## [Bindings](https://github.com/ggerganov/whisper.cpp/discussions/categories/bindings)
|
||||||
|
|
||||||
- [x] Rust: [tazz4843/whisper-rs](https://github.com/tazz4843/whisper-rs) | [#310](https://github.com/ggml-org/whisper.cpp/discussions/310)
|
- [x] Rust: [tazz4843/whisper-rs](https://github.com/tazz4843/whisper-rs) | [#310](https://github.com/ggerganov/whisper.cpp/discussions/310)
|
||||||
- [x] JavaScript: [bindings/javascript](bindings/javascript) | [#309](https://github.com/ggml-org/whisper.cpp/discussions/309)
|
- [x] JavaScript: [bindings/javascript](bindings/javascript) | [#309](https://github.com/ggerganov/whisper.cpp/discussions/309)
|
||||||
- React Native (iOS / Android): [whisper.rn](https://github.com/mybigday/whisper.rn)
|
- React Native (iOS / Android): [whisper.rn](https://github.com/mybigday/whisper.rn)
|
||||||
- [x] Go: [bindings/go](bindings/go) | [#312](https://github.com/ggml-org/whisper.cpp/discussions/312)
|
- [x] Go: [bindings/go](bindings/go) | [#312](https://github.com/ggerganov/whisper.cpp/discussions/312)
|
||||||
- [x] Java:
|
- [x] Java:
|
||||||
- [GiviMAD/whisper-jni](https://github.com/GiviMAD/whisper-jni)
|
- [GiviMAD/whisper-jni](https://github.com/GiviMAD/whisper-jni)
|
||||||
- [x] Ruby: [bindings/ruby](bindings/ruby) | [#507](https://github.com/ggml-org/whisper.cpp/discussions/507)
|
- [x] Ruby: [bindings/ruby](bindings/ruby) | [#507](https://github.com/ggerganov/whisper.cpp/discussions/507)
|
||||||
- [x] Objective-C / Swift: [ggml-org/whisper.spm](https://github.com/ggml-org/whisper.spm) | [#313](https://github.com/ggml-org/whisper.cpp/discussions/313)
|
- [x] Objective-C / Swift: [ggerganov/whisper.spm](https://github.com/ggerganov/whisper.spm) | [#313](https://github.com/ggerganov/whisper.cpp/discussions/313)
|
||||||
- [exPHAT/SwiftWhisper](https://github.com/exPHAT/SwiftWhisper)
|
- [exPHAT/SwiftWhisper](https://github.com/exPHAT/SwiftWhisper)
|
||||||
- [x] .NET: | [#422](https://github.com/ggml-org/whisper.cpp/discussions/422)
|
- [x] .NET: | [#422](https://github.com/ggerganov/whisper.cpp/discussions/422)
|
||||||
- [sandrohanea/whisper.net](https://github.com/sandrohanea/whisper.net)
|
- [sandrohanea/whisper.net](https://github.com/sandrohanea/whisper.net)
|
||||||
- [NickDarvey/whisper](https://github.com/NickDarvey/whisper)
|
- [NickDarvey/whisper](https://github.com/NickDarvey/whisper)
|
||||||
- [x] Python: | [#9](https://github.com/ggml-org/whisper.cpp/issues/9)
|
- [x] Python: | [#9](https://github.com/ggerganov/whisper.cpp/issues/9)
|
||||||
- [stlukey/whispercpp.py](https://github.com/stlukey/whispercpp.py) (Cython)
|
- [stlukey/whispercpp.py](https://github.com/stlukey/whispercpp.py) (Cython)
|
||||||
- [AIWintermuteAI/whispercpp](https://github.com/AIWintermuteAI/whispercpp) (Updated fork of aarnphm/whispercpp)
|
- [AIWintermuteAI/whispercpp](https://github.com/AIWintermuteAI/whispercpp) (Updated fork of aarnphm/whispercpp)
|
||||||
- [aarnphm/whispercpp](https://github.com/aarnphm/whispercpp) (Pybind11)
|
- [aarnphm/whispercpp](https://github.com/aarnphm/whispercpp) (Pybind11)
|
||||||
@ -684,33 +650,6 @@ For more details, see the conversion script [models/convert-pt-to-ggml.py](model
|
|||||||
- [x] R: [bnosac/audio.whisper](https://github.com/bnosac/audio.whisper)
|
- [x] R: [bnosac/audio.whisper](https://github.com/bnosac/audio.whisper)
|
||||||
- [x] Unity: [macoron/whisper.unity](https://github.com/Macoron/whisper.unity)
|
- [x] Unity: [macoron/whisper.unity](https://github.com/Macoron/whisper.unity)
|
||||||
|
|
||||||
## XCFramework
|
|
||||||
The XCFramework is a precompiled version of the library for iOS, visionOS, tvOS,
|
|
||||||
and macOS. It can be used in Swift projects without the need to compile the
|
|
||||||
library from source. For examples:
|
|
||||||
```swift
|
|
||||||
// swift-tools-version: 5.10
|
|
||||||
// The swift-tools-version declares the minimum version of Swift required to build this package.
|
|
||||||
|
|
||||||
import PackageDescription
|
|
||||||
|
|
||||||
let package = Package(
|
|
||||||
name: "Whisper",
|
|
||||||
targets: [
|
|
||||||
.executableTarget(
|
|
||||||
name: "Whisper",
|
|
||||||
dependencies: [
|
|
||||||
"WhisperFramework"
|
|
||||||
]),
|
|
||||||
.binaryTarget(
|
|
||||||
name: "WhisperFramework",
|
|
||||||
url: "https://github.com/ggml-org/whisper.cpp/releases/download/v1.7.5/whisper-v1.7.5-xcframework.zip",
|
|
||||||
checksum: "c7faeb328620d6012e130f3d705c51a6ea6c995605f2df50f6e1ad68c59c6c4a"
|
|
||||||
)
|
|
||||||
]
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Examples
|
## Examples
|
||||||
|
|
||||||
There are various examples of using the library for different projects in the [examples](examples) folder.
|
There are various examples of using the library for different projects in the [examples](examples) folder.
|
||||||
@ -729,13 +668,13 @@ Some of the examples are even ported to run in the browser using WebAssembly. Ch
|
|||||||
| [whisper.android](examples/whisper.android) | | Android mobile application using whisper.cpp |
|
| [whisper.android](examples/whisper.android) | | Android mobile application using whisper.cpp |
|
||||||
| [whisper.nvim](examples/whisper.nvim) | | Speech-to-text plugin for Neovim |
|
| [whisper.nvim](examples/whisper.nvim) | | Speech-to-text plugin for Neovim |
|
||||||
| [generate-karaoke.sh](examples/generate-karaoke.sh) | | Helper script to easily [generate a karaoke video](https://youtu.be/uj7hVta4blM) of raw audio capture |
|
| [generate-karaoke.sh](examples/generate-karaoke.sh) | | Helper script to easily [generate a karaoke video](https://youtu.be/uj7hVta4blM) of raw audio capture |
|
||||||
| [livestream.sh](examples/livestream.sh) | | [Livestream audio transcription](https://github.com/ggml-org/whisper.cpp/issues/185) |
|
| [livestream.sh](examples/livestream.sh) | | [Livestream audio transcription](https://github.com/ggerganov/whisper.cpp/issues/185) |
|
||||||
| [yt-wsp.sh](examples/yt-wsp.sh) | | Download + transcribe and/or translate any VOD [(original)](https://gist.github.com/DaniruKun/96f763ec1a037cc92fe1a059b643b818) |
|
| [yt-wsp.sh](examples/yt-wsp.sh) | | Download + transcribe and/or translate any VOD [(original)](https://gist.github.com/DaniruKun/96f763ec1a037cc92fe1a059b643b818) |
|
||||||
| [wchess](examples/wchess) | [wchess.wasm](examples/wchess) | Voice-controlled chess |
|
| [wchess](examples/wchess) | [wchess.wasm](examples/wchess) | Voice-controlled chess |
|
||||||
|
|
||||||
## [Discussions](https://github.com/ggml-org/whisper.cpp/discussions)
|
## [Discussions](https://github.com/ggerganov/whisper.cpp/discussions)
|
||||||
|
|
||||||
If you have any kind of feedback about this project feel free to use the Discussions section and open a new topic.
|
If you have any kind of feedback about this project feel free to use the Discussions section and open a new topic.
|
||||||
You can use the [Show and tell](https://github.com/ggml-org/whisper.cpp/discussions/categories/show-and-tell) category
|
You can use the [Show and tell](https://github.com/ggerganov/whisper.cpp/discussions/categories/show-and-tell) category
|
||||||
to share your own projects that use `whisper.cpp`. If you have a question, make sure to check the
|
to share your own projects that use `whisper.cpp`. If you have a question, make sure to check the
|
||||||
[Frequently asked questions (#126)](https://github.com/ggml-org/whisper.cpp/discussions/126) discussion.
|
[Frequently asked questions (#126)](https://github.com/ggerganov/whisper.cpp/discussions/126) discussion.
|
||||||
|
@ -51,7 +51,7 @@ func main() {
|
|||||||
In order to build, you need to have the Go compiler installed. You can get it from [here](https://golang.org/dl/). Run the tests with:
|
In order to build, you need to have the Go compiler installed. You can get it from [here](https://golang.org/dl/). Run the tests with:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
git clone https://github.com/ggml-org/whisper.cpp.git
|
git clone https://github.com/ggerganov/whisper.cpp.git
|
||||||
cd whisper.cpp/bindings/go
|
cd whisper.cpp/bindings/go
|
||||||
make test
|
make test
|
||||||
```
|
```
|
||||||
@ -98,7 +98,7 @@ The API Documentation:
|
|||||||
|
|
||||||
Getting help:
|
Getting help:
|
||||||
|
|
||||||
* Follow the discussion for the go bindings [here](https://github.com/ggml-org/whisper.cpp/discussions/312)
|
* Follow the discussion for the go bindings [here](https://github.com/ggerganov/whisper.cpp/discussions/312)
|
||||||
|
|
||||||
## License
|
## License
|
||||||
|
|
||||||
|
@ -1,5 +1,5 @@
|
|||||||
/*
|
/*
|
||||||
github.com/ggml-org/whisper.cpp/bindings/go
|
github.com/ggerganov/whisper.cpp/bindings/go
|
||||||
provides a speech-to-text service bindings for the Go programming language.
|
provides a speech-to-text service bindings for the Go programming language.
|
||||||
*/
|
*/
|
||||||
package whisper
|
package whisper
|
||||||
|
@ -52,7 +52,7 @@ public class Example {
|
|||||||
In order to build, you need to have the JDK 8 or higher installed. Run the tests with:
|
In order to build, you need to have the JDK 8 or higher installed. Run the tests with:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
git clone https://github.com/ggml-org/whisper.cpp.git
|
git clone https://github.com/ggerganov/whisper.cpp.git
|
||||||
cd whisper.cpp/bindings/java
|
cd whisper.cpp/bindings/java
|
||||||
|
|
||||||
./gradlew build
|
./gradlew build
|
||||||
|
3
bindings/ruby/.gitignore
vendored
3
bindings/ruby/.gitignore
vendored
@ -1,6 +1,3 @@
|
|||||||
LICENSE
|
LICENSE
|
||||||
pkg/
|
pkg/
|
||||||
lib/whisper.*
|
lib/whisper.*
|
||||||
ext/sources/*
|
|
||||||
!ext/sources/CMakeGraphVizOptions.cmake
|
|
||||||
ext/mkmf.log
|
|
||||||
|
@ -16,18 +16,6 @@ If bundler is not being used to manage dependencies, install the gem by executin
|
|||||||
|
|
||||||
$ gem install whispercpp
|
$ gem install whispercpp
|
||||||
|
|
||||||
You can pass build options for whisper.cpp, for instance:
|
|
||||||
|
|
||||||
$ bundle config build.whispercpp --enable-ggml-cuda
|
|
||||||
|
|
||||||
or,
|
|
||||||
|
|
||||||
$ gem install whispercpp -- --enable-ggml-cuda
|
|
||||||
|
|
||||||
See whisper.cpp's [README](https://github.com/ggml-org/whisper.cpp/blob/master/README.md) for available options. You need convert options present the README to Ruby-style options.
|
|
||||||
For boolean options like `GGML_CUDA`, the README says `-DGGML_CUDA=1`. You need strip `-D`, prepend `--enable-` for `1` or `ON` (`--disable-` for `0` or `OFF`) and make it kebab-case: `--enable-ggml-cuda`.
|
|
||||||
For options which require arguments like `CMAKE_CUDA_ARCHITECTURES`, the README says `-DCMAKE_CUDA_ARCHITECTURES="86"`. You need strip `-D`, prepend `--`, make it kebab-case, append `=` and append argument: `--cmake-cuda-architectures="86"`.
|
|
||||||
|
|
||||||
Usage
|
Usage
|
||||||
-----
|
-----
|
||||||
|
|
||||||
@ -240,7 +228,7 @@ The second argument `samples` may be an array, an object with `length` and `each
|
|||||||
Development
|
Development
|
||||||
-----------
|
-----------
|
||||||
|
|
||||||
% git clone https://github.com/ggml-org/whisper.cpp.git
|
% git clone https://github.com/ggerganov/whisper.cpp.git
|
||||||
% cd whisper.cpp/bindings/ruby
|
% cd whisper.cpp/bindings/ruby
|
||||||
% rake test
|
% rake test
|
||||||
|
|
||||||
@ -253,5 +241,5 @@ License
|
|||||||
|
|
||||||
The same to [whisper.cpp][].
|
The same to [whisper.cpp][].
|
||||||
|
|
||||||
[whisper.cpp]: https://github.com/ggml-org/whisper.cpp
|
[whisper.cpp]: https://github.com/ggerganov/whisper.cpp
|
||||||
[models]: https://github.com/ggml-org/whisper.cpp/tree/master/models
|
[models]: https://github.com/ggerganov/whisper.cpp/tree/master/models
|
||||||
|
@ -3,15 +3,11 @@ require "bundler/gem_tasks"
|
|||||||
require "rake/testtask"
|
require "rake/testtask"
|
||||||
require_relative "extsources"
|
require_relative "extsources"
|
||||||
|
|
||||||
SOURCES_DIR = "ext/sources"
|
|
||||||
|
|
||||||
SOURCES = FileList[]
|
SOURCES = FileList[]
|
||||||
|
|
||||||
EXTSOURCES.each do |src|
|
EXTSOURCES.each do |src|
|
||||||
basename = src.pathmap("%f")
|
basename = src.pathmap("%f")
|
||||||
dest = basename == "LICENSE" ? basename
|
dest = basename == "LICENSE" ? basename : src.pathmap("%{../..,ext}p")
|
||||||
: src.pathmap("%{\\.\\./\\.\\.,#{SOURCES_DIR}}p")
|
|
||||||
.pathmap("%{\\.\\./javascript,#{SOURCES_DIR}/bindings/javascript}p")
|
|
||||||
dir = dest.pathmap("%d")
|
dir = dest.pathmap("%d")
|
||||||
file src
|
file src
|
||||||
directory dir
|
directory dir
|
||||||
@ -22,6 +18,7 @@ EXTSOURCES.each do |src|
|
|||||||
end
|
end
|
||||||
|
|
||||||
CLEAN.include SOURCES
|
CLEAN.include SOURCES
|
||||||
|
CLEAN.include FileList["ext/**/*.o", "ext/**/*.metal", "ext/**/*.tmp", "ext/whisper.{so,bundle,dll}"]
|
||||||
|
|
||||||
SRC = FileList["ext/*.{c,cpp,h}"]
|
SRC = FileList["ext/*.{c,cpp,h}"]
|
||||||
|
|
||||||
@ -39,20 +36,6 @@ file "ext/Makefile" => SRC + ["ext/extconf.rb"] + SOURCES do |t|
|
|||||||
ruby "extconf.rb"
|
ruby "extconf.rb"
|
||||||
end
|
end
|
||||||
end
|
end
|
||||||
if File.exist? "ext/Makefile"
|
|
||||||
task :make_clean do
|
|
||||||
cd "ext" do
|
|
||||||
sh "make", "clean"
|
|
||||||
end
|
|
||||||
end
|
|
||||||
task clean: :make_clean
|
|
||||||
task :make_distclean do
|
|
||||||
cd "ext" do
|
|
||||||
sh "make", "distclean"
|
|
||||||
end
|
|
||||||
end
|
|
||||||
task clobber: :make_distclean
|
|
||||||
end
|
|
||||||
|
|
||||||
file SO_FILE => "ext/Makefile" do |t|
|
file SO_FILE => "ext/Makefile" do |t|
|
||||||
chdir "ext" do
|
chdir "ext" do
|
||||||
|
11
bindings/ruby/ext/cpu.mk
Normal file
11
bindings/ruby/ext/cpu.mk
Normal file
@ -0,0 +1,11 @@
|
|||||||
|
ggml/src/ggml-cpu/ggml-cpu-cpp.o: \
|
||||||
|
ggml/src/ggml-cpu/ggml-cpu.cpp \
|
||||||
|
ggml/src/ggml-cpu/unary-ops.cpp \
|
||||||
|
ggml/src/ggml-cpu/binary-ops.cpp \
|
||||||
|
ggml/include/ggml-backend.h \
|
||||||
|
ggml/include/ggml.h \
|
||||||
|
ggml/include/ggml-alloc.h \
|
||||||
|
ggml/src/ggml-backend-impl.h \
|
||||||
|
ggml/include/ggml-cpu.h \
|
||||||
|
ggml/src/ggml-impl.h
|
||||||
|
$(CXX) $(CXXFLAGS) -c $< -o $@
|
@ -1,61 +0,0 @@
|
|||||||
require "tsort"
|
|
||||||
|
|
||||||
class Dependencies
|
|
||||||
def initialize(cmake, options)
|
|
||||||
@cmake = cmake
|
|
||||||
@options = options
|
|
||||||
|
|
||||||
generate_dot
|
|
||||||
@libs = parse_dot
|
|
||||||
end
|
|
||||||
|
|
||||||
def to_s
|
|
||||||
@libs.join(" ")
|
|
||||||
end
|
|
||||||
|
|
||||||
private
|
|
||||||
|
|
||||||
def dot_path
|
|
||||||
File.join(__dir__, "build", "whisper.cpp.dot")
|
|
||||||
end
|
|
||||||
|
|
||||||
def generate_dot
|
|
||||||
system @cmake, "-S", "sources", "-B", "build", "--graphviz", dot_path, "-D", "BUILD_SHARED_LIBS=OFF", @options.to_s, exception: true
|
|
||||||
end
|
|
||||||
|
|
||||||
def parse_dot
|
|
||||||
static_lib_shape = nil
|
|
||||||
nodes = {}
|
|
||||||
depends = Hash.new {|h, k| h[k] = []}
|
|
||||||
|
|
||||||
class << depends
|
|
||||||
include TSort
|
|
||||||
alias tsort_each_node each_key
|
|
||||||
def tsort_each_child(node, &block)
|
|
||||||
fetch(node, []).each(&block)
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
File.open(dot_path).each_line do |line|
|
|
||||||
case line
|
|
||||||
when /\[\s*label\s*=\s*"Static Library"\s*,\s*shape\s*=\s*(?<shape>\w+)\s*\]/
|
|
||||||
static_lib_shape = $~[:shape]
|
|
||||||
when /\A\s*"(?<node>\w+)"\s*\[\s*label\s*=\s*"(?<label>\S+)"\s*,\s*shape\s*=\s*(?<shape>\w+)\s*\]\s*;\s*\z/
|
|
||||||
node = $~[:node]
|
|
||||||
label = $~[:label]
|
|
||||||
shape = $~[:shape]
|
|
||||||
nodes[node] = [label, shape]
|
|
||||||
when /\A\s*"(?<depender>\w+)"\s*->\s*"(?<dependee>\w+)"/
|
|
||||||
depender = $~[:depender]
|
|
||||||
dependee = $~[:dependee]
|
|
||||||
depends[depender] ||= []
|
|
||||||
depends[depender] << dependee
|
|
||||||
end
|
|
||||||
end
|
|
||||||
depends.tsort.filter_map {|node|
|
|
||||||
label, shape = nodes[node]
|
|
||||||
shape == static_lib_shape ? label : nil
|
|
||||||
}.collect {|lib| "lib#{lib}.a"}
|
|
||||||
.reverse
|
|
||||||
end
|
|
||||||
end
|
|
@ -1,22 +1,210 @@
|
|||||||
require "mkmf"
|
require 'mkmf'
|
||||||
require_relative "options"
|
|
||||||
require_relative "dependencies"
|
|
||||||
|
|
||||||
cmake = find_executable("cmake") || abort
|
# need to use c++ compiler flags
|
||||||
options = Options.new
|
$CXXFLAGS << ' -std=c++17'
|
||||||
have_library("gomp") rescue nil
|
|
||||||
libs = Dependencies.new(cmake, options)
|
|
||||||
|
|
||||||
$INCFLAGS << " -Isources/include -Isources/ggml/include -Isources/examples"
|
$LDFLAGS << ' -lstdc++'
|
||||||
$LOCAL_LIBS << " #{libs}"
|
|
||||||
$cleanfiles << " build #{libs}"
|
|
||||||
|
|
||||||
create_makefile "whisper" do |conf|
|
# Set to true when building binary gems
|
||||||
conf << <<~EOF
|
if enable_config('static-stdlib', false)
|
||||||
$(TARGET_SO): #{libs}
|
$LDFLAGS << ' -static-libgcc -static-libstdc++'
|
||||||
#{libs}: cmake-targets
|
end
|
||||||
cmake-targets:
|
|
||||||
#{"\t"}#{cmake} -S sources -B build -D BUILD_SHARED_LIBS=OFF -D CMAKE_ARCHIVE_OUTPUT_DIRECTORY=#{__dir__} -D CMAKE_POSITION_INDEPENDENT_CODE=ON #{options}
|
if enable_config('march-tune-native', false)
|
||||||
#{"\t"}#{cmake} --build build --config Release --target common whisper
|
$CFLAGS << ' -march=native -mtune=native'
|
||||||
EOF
|
$CXXFLAGS << ' -march=native -mtune=native'
|
||||||
|
end
|
||||||
|
|
||||||
|
if ENV['WHISPER_METAL']
|
||||||
|
$GGML_METAL ||= true
|
||||||
|
$DEPRECATE_WARNING ||= true
|
||||||
|
end
|
||||||
|
|
||||||
|
$UNAME_S = `uname -s`.chomp
|
||||||
|
$UNAME_P = `uname -p`.chomp
|
||||||
|
$UNAME_M = `uname -m`.chomp
|
||||||
|
|
||||||
|
if $UNAME_S == 'Darwin'
|
||||||
|
unless ENV['GGML_NO_METAL']
|
||||||
|
$GGML_METAL ||= true
|
||||||
|
end
|
||||||
|
$GGML_NO_OPENMP ||= true
|
||||||
|
end
|
||||||
|
|
||||||
|
if $GGML_METAL
|
||||||
|
$GGML_METAL_EMBED_LIBRARY = true
|
||||||
|
end
|
||||||
|
|
||||||
|
$MK_CPPFLAGS = '-Iggml/include -Iggml/src -Iggml/src/ggml-cpu -Iinclude -Isrc -Iexamples -DGGML_USE_CPU'
|
||||||
|
$MK_CFLAGS = '-std=c11 -fPIC'
|
||||||
|
$MK_CXXFLAGS = '-std=c++17 -fPIC'
|
||||||
|
$MK_NVCCFLAGS = '-std=c++17'
|
||||||
|
$MK_LDFLAGS = ''
|
||||||
|
|
||||||
|
$OBJ_GGML = []
|
||||||
|
$OBJ_WHISPER = []
|
||||||
|
$OBJ_COMMON = []
|
||||||
|
$OBJ_SDL = []
|
||||||
|
|
||||||
|
$MK_CPPFLAGS << ' -D_XOPEN_SOURCE=600'
|
||||||
|
|
||||||
|
if $UNAME_S == 'Linux'
|
||||||
|
$MK_CPPFLAGS << ' -D_GNU_SOURCE'
|
||||||
|
end
|
||||||
|
|
||||||
|
if $UNAME_S == 'Darwin'
|
||||||
|
$MK_CPPFLAGS << ' -D_DARWIN_C_SOURCE'
|
||||||
|
end
|
||||||
|
|
||||||
|
if ENV['WHISPER_DEBUG']
|
||||||
|
$MK_CFLAGS << ' -O0 -g'
|
||||||
|
$MK_CXXFLAGS << ' -O0 -g'
|
||||||
|
$MK_LDFLAGS << ' -g'
|
||||||
|
$MK_NVCCFLAGS << ' -O0 -g'
|
||||||
|
else
|
||||||
|
$MK_CPPFLAGS << ' -DNDEBUG'
|
||||||
|
$MK_CFLAGS << ' -O3'
|
||||||
|
$MK_CXXFLAGS << ' -O3'
|
||||||
|
$MK_NVCCFLAGS << ' -O3'
|
||||||
|
end
|
||||||
|
|
||||||
|
$WARN_FLAGS =
|
||||||
|
' -Wall' <<
|
||||||
|
' -Wextra' <<
|
||||||
|
' -Wpedantic' <<
|
||||||
|
' -Wcast-qual' <<
|
||||||
|
' -Wno-unused-function'
|
||||||
|
|
||||||
|
$MK_CFLAGS <<
|
||||||
|
$WARN_FLAGS <<
|
||||||
|
' -Wshadow' <<
|
||||||
|
' -Wstrict-prototypes' <<
|
||||||
|
' -Wpointer-arith' <<
|
||||||
|
' -Wmissing-prototypes' <<
|
||||||
|
' -Werror=implicit-int' <<
|
||||||
|
' -Werror=implicit-function-declaration'
|
||||||
|
|
||||||
|
$MK_CXXFLAGS <<
|
||||||
|
$WARN_FLAGS <<
|
||||||
|
' -Wmissing-declarations' <<
|
||||||
|
' -Wmissing-noreturn'
|
||||||
|
|
||||||
|
unless `#{cc_command} #{$LDFLAGS} -Wl,-v 2>&1`.chomp.include? 'dyld-1015.7'
|
||||||
|
$MK_CPPFLAGS << ' -DHAVE_BUGGY_APPLE_LINKER'
|
||||||
|
end
|
||||||
|
|
||||||
|
if %w[Linux Darwin FreeBSD NetBSD OpenBSD Haiku].include? $UNAME_S
|
||||||
|
$MK_CFLAGS << ' -pthread'
|
||||||
|
$MK_CXXFLAGS << ' -pthread'
|
||||||
|
end
|
||||||
|
|
||||||
|
unless $_WIN32
|
||||||
|
$DSO_EXT = '.so'
|
||||||
|
else
|
||||||
|
$DSO_EXT = '.dll'
|
||||||
|
end
|
||||||
|
|
||||||
|
unless ENV['RISCV']
|
||||||
|
if %w[x86_64 i686 amd64].include? $UNAME_M
|
||||||
|
$HOST_CXXFLAGS ||= ''
|
||||||
|
|
||||||
|
$MK_CFLAGS << ' -march=native -mtune=native'
|
||||||
|
$HOST_CXXFLAGS << ' -march=native -mtune=native'
|
||||||
|
end
|
||||||
|
else
|
||||||
|
$MK_CFLAGS << ' -march=rv64gcv -mabi=lp64d'
|
||||||
|
$MK_CXXFLAGS << ' -march=rv64gcv -mabi=lp64d'
|
||||||
|
end
|
||||||
|
|
||||||
|
unless ENV['GGML_NO_ACCELERATE']
|
||||||
|
if $UNAME_S == 'Darwin'
|
||||||
|
$MK_CPPFLAGS << ' -DGGML_USE_ACCELERATE -DGGML_USE_BLAS -DGGML_BLAS_USE_ACCELERATE'
|
||||||
|
$MK_CPPFLAGS << ' -DACCELERATE_NEW_LAPACK'
|
||||||
|
$MK_CPPFLAGS << ' -DACCELERATE_LAPACK_ILP64'
|
||||||
|
$MK_LDFLAGS << ' -framework Accelerate'
|
||||||
|
$OBJ_GGML << 'ggml/src/ggml-blas/ggml-blas.o'
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
if ENV['GGML_OPENBLAS']
|
||||||
|
$MK_CPPFLAGS << " -DGGML_USE_BLAS #{`pkg-config --cflags-only-I openblas`.chomp}"
|
||||||
|
$MK_CFLAGS << " #{`pkg-config --cflags-only-other openblas)`.chomp}"
|
||||||
|
$MK_LDFLAGS << " #{`pkg-config --libs openblas`}"
|
||||||
|
$OBJ_GGML << 'ggml/src/ggml-blas/ggml-blas.o'
|
||||||
|
end
|
||||||
|
|
||||||
|
if ENV['GGML_OPENBLAS64']
|
||||||
|
$MK_CPPFLAGS << " -DGGML_USE_BLAS #{`pkg-config --cflags-only-I openblas64`.chomp}"
|
||||||
|
$MK_CFLAGS << " #{`pkg-config --cflags-only-other openblas64)`.chomp}"
|
||||||
|
$MK_LDFLAGS << " #{`pkg-config --libs openblas64`}"
|
||||||
|
$OBJ_GGML << 'ggml/src/ggml-blas/ggml-blas.o'
|
||||||
|
end
|
||||||
|
|
||||||
|
if $GGML_METAL
|
||||||
|
$MK_CPPFLAGS << ' -DGGML_USE_METAL'
|
||||||
|
$MK_LDFLAGS << ' -framework Foundation -framework Metal -framework MetalKit'
|
||||||
|
$OBJ_GGML << 'ggml/src/ggml-metal/ggml-metal.o'
|
||||||
|
|
||||||
|
if ENV['GGML_METAL_NDEBUG']
|
||||||
|
$MK_CPPFLAGS << ' -DGGML_METAL_NDEBUG'
|
||||||
|
end
|
||||||
|
|
||||||
|
if $GGML_METAL_EMBED_LIBRARY
|
||||||
|
$MK_CPPFLAGS << ' -DGGML_METAL_EMBED_LIBRARY'
|
||||||
|
$OBJ_GGML << 'ggml/src/ggml-metal/ggml-metal-embed.o'
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
$OBJ_GGML <<
|
||||||
|
'ggml/src/ggml.o' <<
|
||||||
|
'ggml/src/ggml-alloc.o' <<
|
||||||
|
'ggml/src/ggml-backend.o' <<
|
||||||
|
'ggml/src/ggml-backend-reg.o' <<
|
||||||
|
'ggml/src/ggml-opt.o' <<
|
||||||
|
'ggml/src/ggml-quants.o' <<
|
||||||
|
'ggml/src/ggml-threading.o' <<
|
||||||
|
'ggml/src/ggml-cpu/ggml-cpu.o' <<
|
||||||
|
'ggml/src/ggml-cpu/ggml-cpu-cpp.o' <<
|
||||||
|
'ggml/src/ggml-cpu/ggml-cpu-aarch64.o' <<
|
||||||
|
'ggml/src/ggml-cpu/ggml-cpu-hbm.o' <<
|
||||||
|
'ggml/src/ggml-cpu/ggml-cpu-quants.o' <<
|
||||||
|
'ggml/src/ggml-cpu/ggml-cpu-traits.o' <<
|
||||||
|
'ggml/src/ggml-cpu/unary-ops.o' <<
|
||||||
|
'ggml/src/ggml-cpu/binary-ops.o'
|
||||||
|
|
||||||
|
$OBJ_WHISPER <<
|
||||||
|
'src/whisper.o' <<
|
||||||
|
'examples/common.o' <<
|
||||||
|
'examples/common-whisper.o'
|
||||||
|
|
||||||
|
$objs = $OBJ_GGML + $OBJ_WHISPER + $OBJ_COMMON + $OBJ_SDL
|
||||||
|
$objs <<
|
||||||
|
"ruby_whisper.o" <<
|
||||||
|
"ruby_whisper_context.o" <<
|
||||||
|
"ruby_whisper_transcribe.o" <<
|
||||||
|
"ruby_whisper_params.o" <<
|
||||||
|
"ruby_whisper_error.o" <<
|
||||||
|
"ruby_whisper_segment.o" <<
|
||||||
|
"ruby_whisper_model.o"
|
||||||
|
|
||||||
|
$CPPFLAGS = "#{$MK_CPPFLAGS} #{$CPPFLAGS}"
|
||||||
|
$CFLAGS = "#{$CPPFLAGS} #{$MK_CFLAGS} #{$GF_CFLAGS} #{$CFLAGS}"
|
||||||
|
$BASE_CXXFLAGS = "#{$MK_CXXFLAGS} #{$CXXFLAGS}"
|
||||||
|
$CXXFLAGS = "#{$BASE_CXXFLAGS} #{$HOST_CXXFLAGS} #{$GF_CXXFLAGS} #{$CPPFLAGS}"
|
||||||
|
$NVCCFLAGS = "#{$MK_NVCCFLAGS} #{$NVCCFLAGS}"
|
||||||
|
$LDFLAGS = "#{$MK_LDFLAGS} #{$LDFLAGS}"
|
||||||
|
|
||||||
|
create_makefile('whisper')
|
||||||
|
|
||||||
|
File.open 'Makefile', 'a' do |file|
|
||||||
|
file.puts 'include scripts/get-flags.mk'
|
||||||
|
file.puts 'include cpu.mk'
|
||||||
|
|
||||||
|
if $GGML_METAL
|
||||||
|
file.puts 'include metal.mk'
|
||||||
|
|
||||||
|
if $GGML_METAL_EMBED_LIBRARY
|
||||||
|
file.puts 'include metal-embed.mk'
|
||||||
|
end
|
||||||
|
end
|
||||||
end
|
end
|
||||||
|
17
bindings/ruby/ext/metal-embed.mk
Normal file
17
bindings/ruby/ext/metal-embed.mk
Normal file
@ -0,0 +1,17 @@
|
|||||||
|
ggml/src/ggml-metal/ggml-metal-embed.o: \
|
||||||
|
ggml/src/ggml-metal/ggml-metal.metal \
|
||||||
|
ggml/src/ggml-metal/ggml-metal-impl.h \
|
||||||
|
ggml/src/ggml-common.h
|
||||||
|
@echo "Embedding Metal library"
|
||||||
|
@sed -e '/__embed_ggml-common.h__/r ggml/src/ggml-common.h' -e '/__embed_ggml-common.h__/d' < ggml/src/ggml-metal/ggml-metal.metal > ggml/src/ggml-metal/ggml-metal-embed.metal.tmp
|
||||||
|
@sed -e '/#include "ggml-metal-impl.h"/r ggml/src/ggml-metal/ggml-metal-impl.h' -e '/#include "ggml-metal-impl.h"/d' < ggml/src/ggml-metal/ggml-metal-embed.metal.tmp > ggml/src/ggml-metal/ggml-metal-embed.metal
|
||||||
|
$(eval TEMP_ASSEMBLY=$(shell mktemp -d))
|
||||||
|
@echo ".section __DATA, __ggml_metallib" > $(TEMP_ASSEMBLY)/ggml-metal-embed.s
|
||||||
|
@echo ".globl _ggml_metallib_start" >> $(TEMP_ASSEMBLY)/ggml-metal-embed.s
|
||||||
|
@echo "_ggml_metallib_start:" >> $(TEMP_ASSEMBLY)/ggml-metal-embed.s
|
||||||
|
@echo ".incbin \"ggml/src/ggml-metal/ggml-metal-embed.metal\"" >> $(TEMP_ASSEMBLY)/ggml-metal-embed.s
|
||||||
|
@echo ".globl _ggml_metallib_end" >> $(TEMP_ASSEMBLY)/ggml-metal-embed.s
|
||||||
|
@echo "_ggml_metallib_end:" >> $(TEMP_ASSEMBLY)/ggml-metal-embed.s
|
||||||
|
$(CC) $(CFLAGS) -c $(TEMP_ASSEMBLY)/ggml-metal-embed.s -o $@
|
||||||
|
@rm -f ${TEMP_ASSEMBLY}/ggml-metal-embed.s
|
||||||
|
@rmdir ${TEMP_ASSEMBLY}
|
6
bindings/ruby/ext/metal.mk
Normal file
6
bindings/ruby/ext/metal.mk
Normal file
@ -0,0 +1,6 @@
|
|||||||
|
ggml/src/ggml-metal/ggml-metal.o: \
|
||||||
|
ggml/src/ggml-metal/ggml-metal.m \
|
||||||
|
ggml/src/ggml-metal/ggml-metal-impl.h \
|
||||||
|
ggml/include/ggml-metal.h \
|
||||||
|
ggml/include/ggml.h
|
||||||
|
$(CC) $(CFLAGS) -c $< -o $@
|
@ -1,219 +0,0 @@
|
|||||||
class Options
|
|
||||||
def initialize
|
|
||||||
@options = {}
|
|
||||||
@pending_options = []
|
|
||||||
@ignored_options = []
|
|
||||||
|
|
||||||
configure
|
|
||||||
end
|
|
||||||
|
|
||||||
def help
|
|
||||||
@options
|
|
||||||
.collect_concat {|name, (type, value)|
|
|
||||||
option = option_name(name)
|
|
||||||
if type == :bool
|
|
||||||
["--enable-#{option}", "--disable-#{option}"]
|
|
||||||
else
|
|
||||||
"--#{option}=#{type.upcase}"
|
|
||||||
end
|
|
||||||
}
|
|
||||||
.join($/)
|
|
||||||
end
|
|
||||||
|
|
||||||
def to_s
|
|
||||||
@options
|
|
||||||
.reject {|name, (type, value)| value.nil?}
|
|
||||||
.collect {|name, (type, value)| "-D #{name}=#{value == true ? "ON" : value == false ? "OFF" : value.shellescape}"}
|
|
||||||
.join(" ")
|
|
||||||
end
|
|
||||||
|
|
||||||
def cmake_options
|
|
||||||
return @cmake_options if @cmake_options
|
|
||||||
|
|
||||||
output = nil
|
|
||||||
Dir.chdir __dir__ do
|
|
||||||
output = `cmake -S sources -B build -L`
|
|
||||||
end
|
|
||||||
started = false
|
|
||||||
@cmake_options = output.lines.filter_map {|line|
|
|
||||||
if line.chomp == "-- Cache values"
|
|
||||||
started = true
|
|
||||||
next
|
|
||||||
end
|
|
||||||
next unless started
|
|
||||||
option, value = line.chomp.split("=", 2)
|
|
||||||
name, type = option.split(":", 2)
|
|
||||||
[name, type, value]
|
|
||||||
}
|
|
||||||
end
|
|
||||||
|
|
||||||
def missing_options
|
|
||||||
cmake_options.collect {|name, type, value| name} -
|
|
||||||
@options.keys - @pending_options - @ignored_options
|
|
||||||
end
|
|
||||||
|
|
||||||
def extra_options
|
|
||||||
@options.keys + @pending_options - @ignored_options -
|
|
||||||
cmake_options.collect {|name, type, value| name}
|
|
||||||
end
|
|
||||||
|
|
||||||
private
|
|
||||||
|
|
||||||
def configure
|
|
||||||
filepath "ACCELERATE_FRAMEWORK"
|
|
||||||
ignored "BUILD_SHARED_LIBS"
|
|
||||||
ignored "BUILD_TESTING"
|
|
||||||
ignored "CMAKE_BUILD_TYPE"
|
|
||||||
ignored "CMAKE_INSTALL_PREFIX"
|
|
||||||
string "CMAKE_OSX_ARCHITECTURES"
|
|
||||||
ignored "CMAKE_OSX_DEPLOYMENT_TARGET"
|
|
||||||
string "CMAKE_OSX_SYSROOT"
|
|
||||||
filepath "FOUNDATION_LIBRARY"
|
|
||||||
bool "GGML_ACCELERATE"
|
|
||||||
bool "GGML_ALL_WARNINGS_3RD_PARTY"
|
|
||||||
bool "GGML_AMX_BF16"
|
|
||||||
bool "GGML_AMX_INT8"
|
|
||||||
bool "GGML_AMX_TILE"
|
|
||||||
bool "GGML_AVX"
|
|
||||||
bool "GGML_AVX2"
|
|
||||||
bool "GGML_AVX512"
|
|
||||||
bool "GGML_AVX512_BF16"
|
|
||||||
bool "GGML_AVX512_VBMI"
|
|
||||||
bool "GGML_AVX512_VNNI"
|
|
||||||
bool "GGML_AVX_VNNI"
|
|
||||||
ignored "GGML_BACKEND_DL"
|
|
||||||
ignored "GGML_BIN_INSTALL_DIR"
|
|
||||||
bool "GGML_BLAS"
|
|
||||||
string "GGML_BLAS_VENDOR"
|
|
||||||
bool "GGML_BMI2"
|
|
||||||
ignored "GGML_BUILD_EXAMPLES"
|
|
||||||
ignored "GGML_BUILD_TESTS"
|
|
||||||
filepath "GGML_CCACHE_FOUND"
|
|
||||||
bool "GGML_CPU"
|
|
||||||
bool "GGML_CPU_AARCH64"
|
|
||||||
ignored "GGML_CPU_ALL_VARIANTS"
|
|
||||||
string "GGML_CPU_ARM_ARCH"
|
|
||||||
bool "GGML_CPU_HBM"
|
|
||||||
bool "GGML_CPU_KLEIDIAI"
|
|
||||||
string "GGML_CPU_POWERPC_CPUTYPE"
|
|
||||||
bool "GGML_CUDA"
|
|
||||||
string "GGML_CUDA_COMPRESSION_MODE"
|
|
||||||
bool "GGML_CUDA_F16"
|
|
||||||
bool "GGML_CUDA_FA"
|
|
||||||
bool "GGML_CUDA_FA_ALL_QUANTS"
|
|
||||||
bool "GGML_CUDA_FORCE_CUBLAS"
|
|
||||||
bool "GGML_CUDA_FORCE_MMQ"
|
|
||||||
ignored "GGML_CUDA_GRAPHS"
|
|
||||||
bool "GGML_CUDA_NO_PEER_COPY"
|
|
||||||
bool "GGML_CUDA_NO_VMM"
|
|
||||||
string "GGML_CUDA_PEER_MAX_BATCH_SIZE"
|
|
||||||
bool "GGML_F16C"
|
|
||||||
bool "GGML_FMA"
|
|
||||||
bool "GGML_GPROF"
|
|
||||||
bool "GGML_HIP"
|
|
||||||
bool "GGML_HIP_GRAPHS"
|
|
||||||
bool "GGML_HIP_NO_VMM"
|
|
||||||
bool "GGML_HIP_ROCWMMA_FATTN"
|
|
||||||
bool "GGML_HIP_UMA"
|
|
||||||
ignored "GGML_INCLUDE_INSTALL_DIR"
|
|
||||||
bool "GGML_KOMPUTE"
|
|
||||||
bool "GGML_LASX"
|
|
||||||
ignored "GGML_LIB_INSTALL_DIR"
|
|
||||||
ignored "GGML_LLAMAFILE"
|
|
||||||
bool "GGML_LSX"
|
|
||||||
bool "GGML_LTO"
|
|
||||||
bool "GGML_METAL"
|
|
||||||
bool "GGML_METAL_EMBED_LIBRARY"
|
|
||||||
string "GGML_METAL_MACOSX_VERSION_MIN"
|
|
||||||
bool "GGML_METAL_NDEBUG"
|
|
||||||
bool "GGML_METAL_SHADER_DEBUG"
|
|
||||||
string "GGML_METAL_STD"
|
|
||||||
bool "GGML_METAL_USE_BF16"
|
|
||||||
bool "GGML_MUSA"
|
|
||||||
bool "GGML_NATIVE"
|
|
||||||
bool "GGML_OPENCL"
|
|
||||||
bool "GGML_OPENCL_EMBED_KERNELS"
|
|
||||||
bool "GGML_OPENCL_PROFILING"
|
|
||||||
string "GGML_OPENCL_TARGET_VERSION"
|
|
||||||
bool "GGML_OPENCL_USE_ADRENO_KERNELS"
|
|
||||||
bool "GGML_OPENMP"
|
|
||||||
bool "GGML_RPC"
|
|
||||||
bool "GGML_RVV"
|
|
||||||
bool "GGML_RV_ZFH"
|
|
||||||
pending "GGML_SCCACHE_FOUND"
|
|
||||||
string "GGML_SCHED_MAX_COPIES"
|
|
||||||
ignored "GGML_STATIC"
|
|
||||||
bool "GGML_SYCL"
|
|
||||||
string "GGML_SYCL_DEVICE_ARCH"
|
|
||||||
bool "GGML_SYCL_F16"
|
|
||||||
bool "GGML_SYCL_GRAPH"
|
|
||||||
string "GGML_SYCL_TARGET"
|
|
||||||
bool "GGML_VULKAN"
|
|
||||||
bool "GGML_VULKAN_CHECK_RESULTS"
|
|
||||||
bool "GGML_VULKAN_DEBUG"
|
|
||||||
bool "GGML_VULKAN_MEMORY_DEBUG"
|
|
||||||
bool "GGML_VULKAN_PERF"
|
|
||||||
ignored "GGML_VULKAN_RUN_TESTS"
|
|
||||||
filepath "GGML_VULKAN_SHADERS_GEN_TOOLCHAIN"
|
|
||||||
bool "GGML_VULKAN_SHADER_DEBUG_INFO"
|
|
||||||
pending "GGML_VULKAN_VALIDATE"
|
|
||||||
bool "GGML_VXE"
|
|
||||||
filepath "GIT_EXE"
|
|
||||||
filepath "MATH_LIBRARY"
|
|
||||||
filepath "METALKIT_FRAMEWORK"
|
|
||||||
filepath "METAL_FRAMEWORK"
|
|
||||||
bool "WHISPER_ALL_WARNINGS"
|
|
||||||
bool "WHISPER_ALL_WARNINGS_3RD_PARTY"
|
|
||||||
ignored "WHISPER_BIN_INSTALL_DIR"
|
|
||||||
ignored "WHISPER_BUILD_EXAMPLES"
|
|
||||||
ignored "WHISPER_BUILD_SERVER"
|
|
||||||
ignored"WHISPER_BUILD_TESTS"
|
|
||||||
bool "WHISPER_CCACHE"
|
|
||||||
bool "WHISPER_COREML"
|
|
||||||
bool "WHISPER_COREML_ALLOW_FALLBACK"
|
|
||||||
ignored "WHISPER_CURL"
|
|
||||||
bool "WHISPER_FATAL_WARNINGS"
|
|
||||||
ignored "WHISPER_FFMPEG"
|
|
||||||
ignored "WHISPER_INCLUDE_INSTALL_DIR"
|
|
||||||
ignored "WHISPER_LIB_INSTALL_DIR"
|
|
||||||
bool "WHISPER_OPENVINO"
|
|
||||||
bool "WHISPER_SANITIZE_ADDRESS"
|
|
||||||
bool "WHISPER_SANITIZE_THREAD"
|
|
||||||
bool "WHISPER_SANITIZE_UNDEFINED"
|
|
||||||
ignored "WHISPER_SDL2"
|
|
||||||
pending "WHISPER_USE_SYSTEM_GGML"
|
|
||||||
end
|
|
||||||
|
|
||||||
def option_name(name)
|
|
||||||
name.downcase.gsub("_", "-")
|
|
||||||
end
|
|
||||||
|
|
||||||
def bool(name)
|
|
||||||
option = option_name(name)
|
|
||||||
value = enable_config(option)
|
|
||||||
@options[name] = [:bool, value]
|
|
||||||
end
|
|
||||||
|
|
||||||
def string(name, type=:string)
|
|
||||||
option = "--#{option_name(name)}"
|
|
||||||
value = arg_config(option)
|
|
||||||
raise "String expected for #{option}" if value == true || value&.empty?
|
|
||||||
@options[name] = [type, value]
|
|
||||||
end
|
|
||||||
|
|
||||||
def path(name)
|
|
||||||
string(name, :path)
|
|
||||||
end
|
|
||||||
|
|
||||||
def filepath(name)
|
|
||||||
string(name, :filepath)
|
|
||||||
end
|
|
||||||
|
|
||||||
def pending(name)
|
|
||||||
@pending_options << name
|
|
||||||
end
|
|
||||||
|
|
||||||
def ignored(name)
|
|
||||||
@ignored_options << name
|
|
||||||
end
|
|
||||||
end
|
|
@ -918,7 +918,7 @@ ruby_whisper_params_initialize(int argc, VALUE *argv, VALUE self)
|
|||||||
return self;
|
return self;
|
||||||
}
|
}
|
||||||
|
|
||||||
rb_get_kwargs(kw_hash, param_names, 0, RUBY_WHISPER_PARAMS_PARAM_NAMES_COUNT, values);
|
rb_get_kwargs(kw_hash, ¶m_names, 0, RUBY_WHISPER_PARAMS_PARAM_NAMES_COUNT, &values);
|
||||||
Data_Get_Struct(self, ruby_whisper_params, rwp);
|
Data_Get_Struct(self, ruby_whisper_params, rwp);
|
||||||
|
|
||||||
for (i = 0; i < RUBY_WHISPER_PARAMS_PARAM_NAMES_COUNT; i++) {
|
for (i = 0; i < RUBY_WHISPER_PARAMS_PARAM_NAMES_COUNT; i++) {
|
||||||
|
@ -1,8 +0,0 @@
|
|||||||
set(GRAPHVIZ_EXECUTABLES FALSE)
|
|
||||||
set(GRAPHVIZ_STATIC_LIBS TRUE)
|
|
||||||
set(GRAPHVIZ_SHARED_LIBS FALSE)
|
|
||||||
set(GRAPHVIZ_MODULE_LIBS FALSE)
|
|
||||||
set(GRAPHVIZ_INTERFACE_LIBS FALSE)
|
|
||||||
set(GRAPHVIZ_OBJECT_LIBS FALSE)
|
|
||||||
set(GRAPHVIZ_UNKNOWN_LIBS FALSE)
|
|
||||||
set(GRAPHVIZ_GENERATE_DEPENDERS FALSE)
|
|
@ -1,34 +1,6 @@
|
|||||||
ignored_dirs = %w[
|
require "yaml"
|
||||||
.devops
|
|
||||||
examples/wchess/wchess.wasm
|
|
||||||
examples/whisper.android
|
|
||||||
examples/whisper.android.java
|
|
||||||
examples/whisper.objc
|
|
||||||
examples/whisper.swiftui
|
|
||||||
grammars
|
|
||||||
models
|
|
||||||
samples
|
|
||||||
scripts
|
|
||||||
]
|
|
||||||
ignored_files = %w[
|
|
||||||
AUTHORS
|
|
||||||
Makefile
|
|
||||||
README.md
|
|
||||||
README_sycl.md
|
|
||||||
.gitignore
|
|
||||||
.gitmodules
|
|
||||||
whisper.nvim
|
|
||||||
twitch.sh
|
|
||||||
yt-wsp.sh
|
|
||||||
]
|
|
||||||
|
|
||||||
EXTSOURCES =
|
sources = `git ls-files -z ../..`.split("\x0")
|
||||||
`git ls-files -z ../..`.split("\x0")
|
paths = YAML.load_file("../../.github/workflows/bindings-ruby.yml")[true]["push"]["paths"]
|
||||||
.select {|file|
|
paths.delete "bindings/ruby/**"
|
||||||
basename = File.basename(file)
|
EXTSOURCES = (Dir.glob(paths, base: "../..").collect {|path| "../../#{path}"} << "../../LICENSE") & sources
|
||||||
|
|
||||||
ignored_dirs.all? {|dir| !file.start_with?("../../#{dir}")} &&
|
|
||||||
!ignored_files.include?(basename) &&
|
|
||||||
(file.start_with?("../..") || file.start_with?("../javascript")) &&
|
|
||||||
(!file.start_with?("../../.github/") || basename == "bindings-ruby.yml")
|
|
||||||
}
|
|
||||||
|
@ -34,7 +34,7 @@ module Whisper
|
|||||||
when /darwin/
|
when /darwin/
|
||||||
Pathname(Dir.home)/"Library/Caches"
|
Pathname(Dir.home)/"Library/Caches"
|
||||||
else
|
else
|
||||||
ENV.key?("XDG_CACHE_HOME") ? Pathname(ENV["XDG_CACHE_HOME"]) : Pathname(Dir.home)/".cache"
|
ENV.key?("XDG_CACHE_HOME") ? ENV["XDG_CACHE_HOME"] : Pathname(Dir.home)/".cache"
|
||||||
end
|
end
|
||||||
base/"whisper.cpp"
|
base/"whisper.cpp"
|
||||||
end
|
end
|
||||||
|
@ -23,20 +23,9 @@ module Whisper
|
|||||||
def self.log_set: (log_callback, Object? user_data) -> log_callback
|
def self.log_set: (log_callback, Object? user_data) -> log_callback
|
||||||
|
|
||||||
class Context
|
class Context
|
||||||
def self.new: (path | ::URI::HTTP) -> instance
|
def self.new: (string | _ToPath | ::URI::HTTP) -> instance
|
||||||
|
|
||||||
# transcribe a single file
|
|
||||||
# can emit to a block results
|
|
||||||
#
|
|
||||||
# params = Whisper::Params.new
|
|
||||||
# params.duration = 60_000
|
|
||||||
# whisper.transcribe "path/to/audio.wav", params do |text|
|
|
||||||
# puts text
|
|
||||||
# end
|
|
||||||
#
|
|
||||||
def transcribe: (string, Params) -> self
|
def transcribe: (string, Params) -> self
|
||||||
| (string, Params) { (String) -> void } -> self
|
| (string, Params) { (String) -> void } -> self
|
||||||
|
|
||||||
def model_n_vocab: () -> Integer
|
def model_n_vocab: () -> Integer
|
||||||
def model_n_audio_ctx: () -> Integer
|
def model_n_audio_ctx: () -> Integer
|
||||||
def model_n_audio_state: () -> Integer
|
def model_n_audio_state: () -> Integer
|
||||||
@ -45,72 +34,19 @@ module Whisper
|
|||||||
def model_n_mels: () -> Integer
|
def model_n_mels: () -> Integer
|
||||||
def model_ftype: () -> Integer
|
def model_ftype: () -> Integer
|
||||||
def model_type: () -> String
|
def model_type: () -> String
|
||||||
|
|
||||||
# Yields each Whisper::Segment:
|
|
||||||
#
|
|
||||||
# whisper.transcribe("path/to/audio.wav", params)
|
|
||||||
# whisper.each_segment do |segment|
|
|
||||||
# puts segment.text
|
|
||||||
# end
|
|
||||||
#
|
|
||||||
# Returns an Enumerator if no block given:
|
|
||||||
#
|
|
||||||
# whisper.transcribe("path/to/audio.wav", params)
|
|
||||||
# enum = whisper.each_segment
|
|
||||||
# enum.to_a # => [#<Whisper::Segment>, ...]
|
|
||||||
#
|
|
||||||
def each_segment: { (Segment) -> void } -> void
|
def each_segment: { (Segment) -> void } -> void
|
||||||
| () -> Enumerator[Segment]
|
| () -> Enumerator[Segment]
|
||||||
|
|
||||||
def model: () -> Model
|
def model: () -> Model
|
||||||
def full_get_segment: (Integer nth) -> Segment
|
def full_get_segment: (Integer nth) -> Segment
|
||||||
def full_n_segments: () -> Integer
|
def full_n_segments: () -> Integer
|
||||||
|
|
||||||
# Language ID, which can be converted to string by Whisper.lang_str and Whisper.lang_str_full.
|
|
||||||
#
|
|
||||||
def full_lang_id: () -> Integer
|
def full_lang_id: () -> Integer
|
||||||
|
|
||||||
# Start time of a segment indexed by +segment_index+ in centiseconds (10 times milliseconds).
|
|
||||||
#
|
|
||||||
# full_get_segment_t0(3) # => 1668 (16680 ms)
|
|
||||||
#
|
|
||||||
def full_get_segment_t0: (Integer) -> Integer
|
def full_get_segment_t0: (Integer) -> Integer
|
||||||
|
|
||||||
# End time of a segment indexed by +segment_index+ in centiseconds (10 times milliseconds).
|
|
||||||
#
|
|
||||||
# full_get_segment_t1(3) # => 1668 (16680 ms)
|
|
||||||
#
|
|
||||||
def full_get_segment_t1: (Integer) -> Integer
|
def full_get_segment_t1: (Integer) -> Integer
|
||||||
|
|
||||||
# Whether the next segment indexed by +segment_index+ is predicated as a speaker turn.
|
|
||||||
#
|
|
||||||
# full_get_segment_speacker_turn_next(3) # => true
|
|
||||||
#
|
|
||||||
def full_get_segment_speaker_turn_next: (Integer) -> (true | false)
|
def full_get_segment_speaker_turn_next: (Integer) -> (true | false)
|
||||||
|
|
||||||
# Text of a segment indexed by +segment_index+.
|
|
||||||
#
|
|
||||||
# full_get_segment_text(3) # => "ask not what your country can do for you, ..."
|
|
||||||
#
|
|
||||||
def full_get_segment_text: (Integer) -> String
|
def full_get_segment_text: (Integer) -> String
|
||||||
|
|
||||||
def full_get_segment_no_speech_prob: (Integer) -> Float
|
def full_get_segment_no_speech_prob: (Integer) -> Float
|
||||||
|
|
||||||
# Run the entire model: PCM -> log mel spectrogram -> encoder -> decoder -> text
|
|
||||||
# Not thread safe for same context
|
|
||||||
# Uses the specified decoding strategy to obtain the text.
|
|
||||||
#
|
|
||||||
# The second argument +samples+ must be an array of samples, respond to :length, or be a MemoryView of an array of float. It must be 32 bit float PCM audio data.
|
|
||||||
#
|
|
||||||
def full: (Params, Array[Float] samples, ?Integer n_samples) -> self
|
def full: (Params, Array[Float] samples, ?Integer n_samples) -> self
|
||||||
| (Params, _Samples, ?Integer n_samples) -> self
|
| (Params, _Samples, ?Integer n_samples) -> self
|
||||||
|
|
||||||
# Split the input audio in chunks and process each chunk separately using whisper_full_with_state()
|
|
||||||
# Result is stored in the default state of the context
|
|
||||||
# Not thread safe if executed in parallel on the same context.
|
|
||||||
# It seems this approach can offer some speedup in some cases.
|
|
||||||
# However, the transcription accuracy can be worse at the beginning and end of each chunk.
|
|
||||||
#
|
|
||||||
def full_parallel: (Params, Array[Float], ?Integer n_samples) -> self
|
def full_parallel: (Params, Array[Float], ?Integer n_samples) -> self
|
||||||
| (Params, _Samples, ?Integer n_samples) -> self
|
| (Params, _Samples, ?Integer n_samples) -> self
|
||||||
| (Params, _Samples, ?Integer? n_samples, Integer n_processors) -> self
|
| (Params, _Samples, ?Integer? n_samples, Integer n_processors) -> self
|
||||||
@ -149,202 +85,68 @@ module Whisper
|
|||||||
?abort_callback: abort_callback,
|
?abort_callback: abort_callback,
|
||||||
?abort_callback_user_data: Object
|
?abort_callback_user_data: Object
|
||||||
) -> instance
|
) -> instance
|
||||||
|
|
||||||
# params.language = "auto" | "en", etc...
|
|
||||||
#
|
|
||||||
def language=: (String) -> String # TODO: Enumerate lang names
|
def language=: (String) -> String # TODO: Enumerate lang names
|
||||||
|
|
||||||
def language: () -> String
|
def language: () -> String
|
||||||
def translate=: (boolish) -> boolish
|
def translate=: (boolish) -> boolish
|
||||||
def translate: () -> (true | false)
|
def translate: () -> (true | false)
|
||||||
def no_context=: (boolish) -> boolish
|
def no_context=: (boolish) -> boolish
|
||||||
|
|
||||||
# If true, does not use past transcription (if any) as initial prompt for the decoder.
|
|
||||||
#
|
|
||||||
def no_context: () -> (true | false)
|
def no_context: () -> (true | false)
|
||||||
|
|
||||||
def single_segment=: (boolish) -> boolish
|
def single_segment=: (boolish) -> boolish
|
||||||
|
|
||||||
# If true, forces single segment output (useful for streaming).
|
|
||||||
#
|
|
||||||
def single_segment: () -> (true | false)
|
def single_segment: () -> (true | false)
|
||||||
|
|
||||||
def print_special=: (boolish) -> boolish
|
def print_special=: (boolish) -> boolish
|
||||||
|
|
||||||
# If true, prints special tokens (e.g. <SOT>, <EOT>, <BEG>, etc.).
|
|
||||||
#
|
|
||||||
def print_special: () -> (true | false)
|
def print_special: () -> (true | false)
|
||||||
|
|
||||||
def print_progress=: (boolish) -> boolish
|
def print_progress=: (boolish) -> boolish
|
||||||
|
|
||||||
# If true, prints progress information.
|
|
||||||
#
|
|
||||||
def print_progress: () -> (true | false)
|
def print_progress: () -> (true | false)
|
||||||
|
|
||||||
def print_realtime=: (boolish) -> boolish
|
def print_realtime=: (boolish) -> boolish
|
||||||
|
|
||||||
# If true, prints results from within whisper.cpp. (avoid it, use callback instead)
|
|
||||||
#
|
|
||||||
def print_realtime: () -> (true | false)
|
def print_realtime: () -> (true | false)
|
||||||
|
|
||||||
# If true, prints timestamps for each text segment when printing realtime.
|
|
||||||
#
|
|
||||||
def print_timestamps=: (boolish) -> boolish
|
def print_timestamps=: (boolish) -> boolish
|
||||||
|
|
||||||
def print_timestamps: () -> (true | false)
|
def print_timestamps: () -> (true | false)
|
||||||
|
|
||||||
def suppress_blank=: (boolish) -> boolish
|
def suppress_blank=: (boolish) -> boolish
|
||||||
|
|
||||||
# If true, suppresses blank outputs.
|
|
||||||
#
|
|
||||||
def suppress_blank: () -> (true | false)
|
def suppress_blank: () -> (true | false)
|
||||||
|
|
||||||
def suppress_nst=: (boolish) -> boolish
|
def suppress_nst=: (boolish) -> boolish
|
||||||
|
|
||||||
# If true, suppresses non-speech-tokens.
|
|
||||||
#
|
|
||||||
def suppress_nst: () -> (true | false)
|
def suppress_nst: () -> (true | false)
|
||||||
|
|
||||||
def token_timestamps=: (boolish) -> boolish
|
def token_timestamps=: (boolish) -> boolish
|
||||||
|
|
||||||
# If true, enables token-level timestamps.
|
|
||||||
#
|
|
||||||
def token_timestamps: () -> (true | false)
|
def token_timestamps: () -> (true | false)
|
||||||
|
|
||||||
def split_on_word=: (boolish) -> boolish
|
def split_on_word=: (boolish) -> boolish
|
||||||
|
|
||||||
# If true, split on word rather than on token (when used with max_len).
|
|
||||||
#
|
|
||||||
def split_on_word: () -> (true | false)
|
def split_on_word: () -> (true | false)
|
||||||
|
|
||||||
def initial_prompt=: (_ToS) -> _ToS
|
def initial_prompt=: (_ToS) -> _ToS
|
||||||
|
|
||||||
# Tokens to provide to the whisper decoder as initial prompt
|
|
||||||
# these are prepended to any existing text context from a previous call
|
|
||||||
# use whisper_tokenize() to convert text to tokens.
|
|
||||||
# Maximum of whisper_n_text_ctx()/2 tokens are used (typically 224).
|
|
||||||
#
|
|
||||||
def initial_prompt: () -> (String | nil)
|
def initial_prompt: () -> (String | nil)
|
||||||
|
|
||||||
def diarize=: (boolish) -> boolish
|
def diarize=: (boolish) -> boolish
|
||||||
|
|
||||||
# If true, enables diarization.
|
|
||||||
#
|
|
||||||
def diarize: () -> (true | false)
|
def diarize: () -> (true | false)
|
||||||
|
|
||||||
def offset=: (Integer) -> Integer
|
def offset=: (Integer) -> Integer
|
||||||
|
|
||||||
# Start offset in ms.
|
|
||||||
#
|
|
||||||
def offset: () -> Integer
|
def offset: () -> Integer
|
||||||
|
|
||||||
def duration=: (Integer) -> Integer
|
def duration=: (Integer) -> Integer
|
||||||
|
|
||||||
# Audio duration to process in ms.
|
|
||||||
#
|
|
||||||
def duration: () -> Integer
|
def duration: () -> Integer
|
||||||
|
|
||||||
def max_text_tokens=: (Integer) -> Integer
|
def max_text_tokens=: (Integer) -> Integer
|
||||||
|
|
||||||
# Max tokens to use from past text as prompt for the decoder.
|
|
||||||
#
|
|
||||||
def max_text_tokens: () -> Integer
|
def max_text_tokens: () -> Integer
|
||||||
|
|
||||||
def temperature=: (Float) -> Float
|
def temperature=: (Float) -> Float
|
||||||
def temperature: () -> Float
|
def temperature: () -> Float
|
||||||
def max_initial_ts=: (Float) -> Float
|
def max_initial_ts=: (Float) -> Float
|
||||||
|
|
||||||
# See https://github.com/openai/whisper/blob/f82bc59f5ea234d4b97fb2860842ed38519f7e65/whisper/decoding.py#L97
|
|
||||||
#
|
|
||||||
def max_initial_ts: () -> Float
|
def max_initial_ts: () -> Float
|
||||||
|
|
||||||
def length_penalty=: (Float) -> Float
|
def length_penalty=: (Float) -> Float
|
||||||
def length_penalty: () -> Float
|
def length_penalty: () -> Float
|
||||||
def temperature_inc=: (Float) -> Float
|
def temperature_inc=: (Float) -> Float
|
||||||
def temperature_inc: () -> Float
|
def temperature_inc: () -> Float
|
||||||
def entropy_thold=: (Float) -> Float
|
def entropy_thold=: (Float) -> Float
|
||||||
|
|
||||||
# Similar to OpenAI's "compression_ratio_threshold"
|
|
||||||
#
|
|
||||||
def entropy_thold: () -> Float
|
def entropy_thold: () -> Float
|
||||||
|
|
||||||
def logprob_thold=: (Float) -> Float
|
def logprob_thold=: (Float) -> Float
|
||||||
def logprob_thold: () -> Float
|
def logprob_thold: () -> Float
|
||||||
def no_speech_thold=: (Float) -> Float
|
def no_speech_thold=: (Float) -> Float
|
||||||
def no_speech_thold: () -> Float
|
def no_speech_thold: () -> Float
|
||||||
|
|
||||||
# Sets new segment callback, called for every newly generated text segment.
|
|
||||||
#
|
|
||||||
# params.new_segment_callback = ->(context, _, n_new, user_data) {
|
|
||||||
# # ...
|
|
||||||
# }
|
|
||||||
#
|
|
||||||
def new_segment_callback=: (new_segment_callback) -> new_segment_callback
|
def new_segment_callback=: (new_segment_callback) -> new_segment_callback
|
||||||
def new_segment_callback: () -> (new_segment_callback | nil)
|
def new_segment_callback: () -> (new_segment_callback | nil)
|
||||||
|
|
||||||
# Sets user data passed to the last argument of new segment callback.
|
|
||||||
#
|
|
||||||
def new_segment_callback_user_data=: (Object) -> Object
|
def new_segment_callback_user_data=: (Object) -> Object
|
||||||
|
|
||||||
def new_segment_callback_user_data: () -> Object
|
def new_segment_callback_user_data: () -> Object
|
||||||
|
|
||||||
# Sets progress callback, called on each progress update.
|
|
||||||
#
|
|
||||||
# params.new_segment_callback = ->(context, _, progress, user_data) {
|
|
||||||
# # ...
|
|
||||||
# }
|
|
||||||
#
|
|
||||||
# +progress+ is an Integer between 0 and 100.
|
|
||||||
#
|
|
||||||
def progress_callback=: (progress_callback) -> progress_callback
|
def progress_callback=: (progress_callback) -> progress_callback
|
||||||
|
|
||||||
def progress_callback: () -> (progress_callback | nil)
|
def progress_callback: () -> (progress_callback | nil)
|
||||||
|
|
||||||
# Sets user data passed to the last argument of progress callback.
|
|
||||||
#
|
|
||||||
def progress_callback_user_data=: (Object) -> Object
|
def progress_callback_user_data=: (Object) -> Object
|
||||||
|
|
||||||
def progress_callback_user_data: () -> Object
|
def progress_callback_user_data: () -> Object
|
||||||
|
|
||||||
# Sets abort callback, called to check if the process should be aborted.
|
|
||||||
#
|
|
||||||
# params.abort_callback = ->(user_data) {
|
|
||||||
# # ...
|
|
||||||
# }
|
|
||||||
#
|
|
||||||
#
|
|
||||||
def abort_callback=: (abort_callback) -> abort_callback
|
def abort_callback=: (abort_callback) -> abort_callback
|
||||||
|
|
||||||
def abort_callback: () -> (abort_callback | nil)
|
def abort_callback: () -> (abort_callback | nil)
|
||||||
|
|
||||||
# Sets user data passed to the last argument of abort callback.
|
|
||||||
#
|
|
||||||
def abort_callback_user_data=: (Object) -> Object
|
def abort_callback_user_data=: (Object) -> Object
|
||||||
|
|
||||||
def abort_callback_user_data: () -> Object
|
def abort_callback_user_data: () -> Object
|
||||||
|
|
||||||
# Hook called on new segment. Yields each Whisper::Segment.
|
|
||||||
#
|
|
||||||
# whisper.on_new_segment do |segment|
|
|
||||||
# # ...
|
|
||||||
# end
|
|
||||||
#
|
|
||||||
def on_new_segment: { (Segment) -> void } -> void
|
def on_new_segment: { (Segment) -> void } -> void
|
||||||
|
|
||||||
# Hook called on progress update. Yields each progress Integer between 0 and 100.
|
|
||||||
#
|
|
||||||
def on_progress: { (Integer progress) -> void } -> void
|
def on_progress: { (Integer progress) -> void } -> void
|
||||||
|
|
||||||
# Call block to determine whether abort or not. Return +true+ when you want to abort.
|
|
||||||
#
|
|
||||||
# params.abort_on do
|
|
||||||
# if some_condition
|
|
||||||
# true # abort
|
|
||||||
# else
|
|
||||||
# false # continue
|
|
||||||
# end
|
|
||||||
# end
|
|
||||||
#
|
|
||||||
def abort_on: { (Object user_data) -> boolish } -> void
|
def abort_on: { (Object user_data) -> boolish } -> void
|
||||||
end
|
end
|
||||||
|
|
||||||
@ -365,24 +167,16 @@ module Whisper
|
|||||||
def type: () -> String
|
def type: () -> String
|
||||||
|
|
||||||
class URI
|
class URI
|
||||||
def self.new: (string | ::URI::HTTP) -> instance
|
def self.new: (string | ::URI::HTTP) -> self
|
||||||
def to_path: -> String
|
def to_path: -> String
|
||||||
def clear_cache: -> void
|
def clear_cache: -> void
|
||||||
end
|
end
|
||||||
end
|
end
|
||||||
|
|
||||||
class Segment
|
class Segment
|
||||||
# Start time in milliseconds.
|
|
||||||
#
|
|
||||||
def start_time: () -> Integer
|
def start_time: () -> Integer
|
||||||
|
|
||||||
# End time in milliseconds.
|
|
||||||
#
|
|
||||||
def end_time: () -> Integer
|
def end_time: () -> Integer
|
||||||
|
|
||||||
# Whether the next segment is predicted as a speaker turn.
|
|
||||||
def speaker_next_turn?: () -> (true | false)
|
def speaker_next_turn?: () -> (true | false)
|
||||||
|
|
||||||
def text: () -> String
|
def text: () -> String
|
||||||
def no_speech_prob: () -> Float
|
def no_speech_prob: () -> Float
|
||||||
end
|
end
|
||||||
|
@ -21,15 +21,4 @@ class TestBase < Test::Unit::TestCase
|
|||||||
def whisper
|
def whisper
|
||||||
self.class.whisper
|
self.class.whisper
|
||||||
end
|
end
|
||||||
|
|
||||||
module BuildOptions
|
|
||||||
load "ext/options.rb", self
|
|
||||||
Options.include self
|
|
||||||
|
|
||||||
def enable_config(name)
|
|
||||||
end
|
|
||||||
|
|
||||||
def arg_config(name)
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
end
|
||||||
|
@ -21,26 +21,11 @@ class TestPackage < TestBase
|
|||||||
match_data = `rake -Tbuild`.match(/(whispercpp-(.+)\.gem)/)
|
match_data = `rake -Tbuild`.match(/(whispercpp-(.+)\.gem)/)
|
||||||
filename = match_data[1]
|
filename = match_data[1]
|
||||||
version = match_data[2]
|
version = match_data[2]
|
||||||
|
basename = "whisper.#{RbConfig::CONFIG["DLEXT"]}"
|
||||||
Dir.mktmpdir do |dir|
|
Dir.mktmpdir do |dir|
|
||||||
system "gem", "install", "--install-dir", dir.shellescape, "--no-document", "pkg/#{filename.shellescape}", exception: true
|
system "gem", "install", "--install-dir", dir.shellescape, "--no-document", "pkg/#{filename.shellescape}", exception: true
|
||||||
assert_installed dir, version
|
assert_path_exist File.join(dir, "gems/whispercpp-#{version}/lib", basename)
|
||||||
end
|
end
|
||||||
end
|
|
||||||
|
|
||||||
private
|
|
||||||
|
|
||||||
def assert_installed(dir, version)
|
|
||||||
assert_path_exist File.join(dir, "gems/whispercpp-#{version}/lib", "whisper.#{RbConfig::CONFIG["DLEXT"]}")
|
|
||||||
assert_path_exist File.join(dir, "gems/whispercpp-#{version}/LICENSE")
|
|
||||||
assert_path_not_exist File.join(dir, "gems/whispercpp-#{version}/ext/build")
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
def test_build_options
|
|
||||||
options = BuildOptions::Options.new
|
|
||||||
assert_empty options.missing_options
|
|
||||||
unless ENV["CI"]
|
|
||||||
assert_empty options.extra_options
|
|
||||||
end
|
end
|
||||||
end
|
end
|
||||||
end
|
end
|
||||||
|
@ -3,8 +3,8 @@ require_relative "extsources"
|
|||||||
Gem::Specification.new do |s|
|
Gem::Specification.new do |s|
|
||||||
s.name = "whispercpp"
|
s.name = "whispercpp"
|
||||||
s.authors = ["Georgi Gerganov", "Todd A. Fisher"]
|
s.authors = ["Georgi Gerganov", "Todd A. Fisher"]
|
||||||
s.version = '1.3.2'
|
s.version = '1.3.1'
|
||||||
s.date = '2025-04-17'
|
s.date = '2024-12-19'
|
||||||
s.description = %q{High-performance inference of OpenAI's Whisper automatic speech recognition (ASR) model via Ruby}
|
s.description = %q{High-performance inference of OpenAI's Whisper automatic speech recognition (ASR) model via Ruby}
|
||||||
s.email = 'todd.fisher@gmail.com'
|
s.email = 'todd.fisher@gmail.com'
|
||||||
s.extra_rdoc_files = ['LICENSE', 'README.md']
|
s.extra_rdoc_files = ['LICENSE', 'README.md']
|
||||||
@ -15,8 +15,7 @@ Gem::Specification.new do |s|
|
|||||||
if s.extra_rdoc_files.include?(basename)
|
if s.extra_rdoc_files.include?(basename)
|
||||||
basename
|
basename
|
||||||
else
|
else
|
||||||
file.sub("../..", "ext/sources")
|
file.sub("../..", "ext")
|
||||||
.sub("../javascript", "ext/sources/bindings/javascript")
|
|
||||||
end
|
end
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -27,7 +26,7 @@ Gem::Specification.new do |s|
|
|||||||
s.required_ruby_version = '>= 3.1.0'
|
s.required_ruby_version = '>= 3.1.0'
|
||||||
|
|
||||||
#### Documentation and testing.
|
#### Documentation and testing.
|
||||||
s.homepage = 'https://github.com/ggml-org/whisper.cpp'
|
s.homepage = 'https://github.com/ggerganov/whisper.cpp'
|
||||||
s.rdoc_options = ['--main', 'README.md']
|
s.rdoc_options = ['--main', 'README.md']
|
||||||
|
|
||||||
|
|
||||||
|
@ -41,11 +41,6 @@ COMMON_CMAKE_ARGS=(
|
|||||||
-DGGML_OPENMP=${GGML_OPENMP}
|
-DGGML_OPENMP=${GGML_OPENMP}
|
||||||
)
|
)
|
||||||
|
|
||||||
XCODE_VERSION=$(xcodebuild -version 2>/dev/null | head -n1 | awk '{ print $2 }')
|
|
||||||
MAJOR_VERSION=$(echo $XCODE_VERSION | cut -d. -f1)
|
|
||||||
MINOR_VERSION=$(echo $XCODE_VERSION | cut -d. -f2)
|
|
||||||
echo "Detected Xcode version: $XCODE_VERSION"
|
|
||||||
|
|
||||||
check_required_tool() {
|
check_required_tool() {
|
||||||
local tool=$1
|
local tool=$1
|
||||||
local install_message=$2
|
local install_message=$2
|
||||||
@ -340,28 +335,21 @@ combine_static_libraries() {
|
|||||||
|
|
||||||
# Platform-specific post-processing for device builds
|
# Platform-specific post-processing for device builds
|
||||||
if [[ "$is_simulator" == "false" ]]; then
|
if [[ "$is_simulator" == "false" ]]; then
|
||||||
if command -v xcrun vtool &>/dev/null; then
|
if command -v vtool &>/dev/null; then
|
||||||
case "$platform" in
|
case "$platform" in
|
||||||
"ios")
|
"ios")
|
||||||
echo "Marking binary as a framework binary for iOS..."
|
echo "Marking binary as a framework binary for iOS..."
|
||||||
xcrun vtool -set-build-version ios ${IOS_MIN_OS_VERSION} ${IOS_MIN_OS_VERSION} -replace \
|
vtool -set-build-version ios ${IOS_MIN_OS_VERSION} ${IOS_MIN_OS_VERSION} -replace \
|
||||||
-output "${base_dir}/${output_lib}" "${base_dir}/${output_lib}"
|
-output "${base_dir}/${output_lib}" "${base_dir}/${output_lib}"
|
||||||
;;
|
;;
|
||||||
"visionos")
|
"visionos")
|
||||||
echo "Marking binary as a framework binary for visionOS..."
|
echo "Marking binary as a framework binary for visionOS..."
|
||||||
if [[ "$MAJOR_VERSION" -gt 16 ]] || [[ "$MAJOR_VERSION" -eq 16 && "$MINOR_VERSION" -gt 2 ]]; then
|
vtool -set-build-version xros ${VISIONOS_MIN_OS_VERSION} ${VISIONOS_MIN_OS_VERSION} -replace \
|
||||||
echo "Xcode version greater than 16.2, using visionOS."
|
|
||||||
VISION_OS_BUILD_VERSION="visionos"
|
|
||||||
else
|
|
||||||
echo "Xcode version less than or equal to 16.2, using xros."
|
|
||||||
VISION_OS_BUILD_VERSION="xros"
|
|
||||||
fi
|
|
||||||
xcrun vtool -set-build-version ${VISION_OS_BUILD_VERSION} ${VISIONOS_MIN_OS_VERSION} ${VISIONOS_MIN_OS_VERSION} -replace \
|
|
||||||
-output "${base_dir}/${output_lib}" "${base_dir}/${output_lib}"
|
-output "${base_dir}/${output_lib}" "${base_dir}/${output_lib}"
|
||||||
;;
|
;;
|
||||||
"tvos")
|
"tvos")
|
||||||
echo "Marking binary as a framework binary for tvOS..."
|
echo "Marking binary as a framework binary for tvOS..."
|
||||||
xcrun vtool -set-build-version tvos ${TVOS_MIN_OS_VERSION} ${TVOS_MIN_OS_VERSION} -replace \
|
vtool -set-build-version tvos ${TVOS_MIN_OS_VERSION} ${TVOS_MIN_OS_VERSION} -replace \
|
||||||
-output "${base_dir}/${output_lib}" "${base_dir}/${output_lib}"
|
-output "${base_dir}/${output_lib}" "${base_dir}/${output_lib}"
|
||||||
;;
|
;;
|
||||||
esac
|
esac
|
||||||
|
@ -19,12 +19,6 @@ const whisperParamsMock = {
|
|||||||
no_timestamps: false,
|
no_timestamps: false,
|
||||||
audio_ctx: 0,
|
audio_ctx: 0,
|
||||||
max_len: 0,
|
max_len: 0,
|
||||||
prompt: "",
|
|
||||||
print_progress: false,
|
|
||||||
progress_callback: (progress) => {
|
|
||||||
console.log(`Progress: ${progress}`);
|
|
||||||
},
|
|
||||||
max_context: -1
|
|
||||||
};
|
};
|
||||||
|
|
||||||
describe("Run whisper.node", () => {
|
describe("Run whisper.node", () => {
|
||||||
|
@ -368,12 +368,6 @@ Napi::Value whisper(const Napi::CallbackInfo& info) {
|
|||||||
bool comma_in_time = whisper_params.Get("comma_in_time").As<Napi::Boolean>();
|
bool comma_in_time = whisper_params.Get("comma_in_time").As<Napi::Boolean>();
|
||||||
int32_t max_len = whisper_params.Get("max_len").As<Napi::Number>();
|
int32_t max_len = whisper_params.Get("max_len").As<Napi::Number>();
|
||||||
|
|
||||||
// Add support for max_context
|
|
||||||
int32_t max_context = -1;
|
|
||||||
if (whisper_params.Has("max_context") && whisper_params.Get("max_context").IsNumber()) {
|
|
||||||
max_context = whisper_params.Get("max_context").As<Napi::Number>();
|
|
||||||
}
|
|
||||||
|
|
||||||
// support prompt
|
// support prompt
|
||||||
std::string prompt = "";
|
std::string prompt = "";
|
||||||
if (whisper_params.Has("prompt") && whisper_params.Get("prompt").IsString()) {
|
if (whisper_params.Has("prompt") && whisper_params.Get("prompt").IsString()) {
|
||||||
@ -413,7 +407,6 @@ Napi::Value whisper(const Napi::CallbackInfo& info) {
|
|||||||
params.pcmf32 = pcmf32_vec;
|
params.pcmf32 = pcmf32_vec;
|
||||||
params.comma_in_time = comma_in_time;
|
params.comma_in_time = comma_in_time;
|
||||||
params.max_len = max_len;
|
params.max_len = max_len;
|
||||||
params.max_context = max_context;
|
|
||||||
params.print_progress = print_progress;
|
params.print_progress = print_progress;
|
||||||
params.prompt = prompt;
|
params.prompt = prompt;
|
||||||
|
|
||||||
|
@ -4,7 +4,7 @@ A very basic tool for benchmarking the inference performance on your device. The
|
|||||||
the transformer on some random audio data and records the execution time. This way we can have an objective comparison
|
the transformer on some random audio data and records the execution time. This way we can have an objective comparison
|
||||||
of the performance of the model for various setups.
|
of the performance of the model for various setups.
|
||||||
|
|
||||||
Benchmark results are tracked in the following Github issue: https://github.com/ggml-org/whisper.cpp/issues/89
|
Benchmark results are tracked in the following Github issue: https://github.com/ggerganov/whisper.cpp/issues/89
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# run the bench too on the small.en model using 4 threads
|
# run the bench too on the small.en model using 4 threads
|
||||||
@ -40,7 +40,7 @@ system_info: n_threads = 4 | AVX2 = 0 | AVX512 = 0 | NEON = 1 | FP16_VA = 1 | WA
|
|||||||
|
|
||||||
If you wish, you can submit these results here:
|
If you wish, you can submit these results here:
|
||||||
|
|
||||||
https://github.com/ggml-org/whisper.cpp/issues/89
|
https://github.com/ggerganov/whisper.cpp/issues/89
|
||||||
|
|
||||||
Please include the following information:
|
Please include the following information:
|
||||||
|
|
||||||
|
@ -3,7 +3,7 @@
|
|||||||
// Speak short text commands to the microphone.
|
// Speak short text commands to the microphone.
|
||||||
// This program will detect your voice command and convert them to text.
|
// This program will detect your voice command and convert them to text.
|
||||||
//
|
//
|
||||||
// ref: https://github.com/ggml-org/whisper.cpp/issues/171
|
// ref: https://github.com/ggerganov/whisper.cpp/issues/171
|
||||||
//
|
//
|
||||||
|
|
||||||
#include "common-sdl.h"
|
#include "common-sdl.h"
|
||||||
|
@ -249,20 +249,6 @@ static int decode_audio(struct audio_buffer *audio_buf, s16 **data, int *size)
|
|||||||
/* prepare resampler */
|
/* prepare resampler */
|
||||||
swr = swr_alloc();
|
swr = swr_alloc();
|
||||||
|
|
||||||
#if LIBAVCODEC_VERSION_MAJOR > 60
|
|
||||||
AVChannelLayout in_ch_layout = codec->ch_layout;
|
|
||||||
AVChannelLayout out_ch_layout = AV_CHANNEL_LAYOUT_MONO;
|
|
||||||
|
|
||||||
/* Set the source audio layout as-is */
|
|
||||||
av_opt_set_chlayout(swr, "in_chlayout", &in_ch_layout, 0);
|
|
||||||
av_opt_set_int(swr, "in_sample_rate", codec->sample_rate, 0);
|
|
||||||
av_opt_set_sample_fmt(swr, "in_sample_fmt", codec->sample_fmt, 0);
|
|
||||||
|
|
||||||
/* Convert it into 16khz Mono */
|
|
||||||
av_opt_set_chlayout(swr, "out_chlayout", &out_ch_layout, 0);
|
|
||||||
av_opt_set_int(swr, "out_sample_rate", WAVE_SAMPLE_RATE, 0);
|
|
||||||
av_opt_set_sample_fmt(swr, "out_sample_fmt", AV_SAMPLE_FMT_S16, 0);
|
|
||||||
#else
|
|
||||||
av_opt_set_int(swr, "in_channel_count", codec->channels, 0);
|
av_opt_set_int(swr, "in_channel_count", codec->channels, 0);
|
||||||
av_opt_set_int(swr, "out_channel_count", 1, 0);
|
av_opt_set_int(swr, "out_channel_count", 1, 0);
|
||||||
av_opt_set_int(swr, "in_channel_layout", codec->channel_layout, 0);
|
av_opt_set_int(swr, "in_channel_layout", codec->channel_layout, 0);
|
||||||
@ -271,7 +257,6 @@ static int decode_audio(struct audio_buffer *audio_buf, s16 **data, int *size)
|
|||||||
av_opt_set_int(swr, "out_sample_rate", WAVE_SAMPLE_RATE, 0);
|
av_opt_set_int(swr, "out_sample_rate", WAVE_SAMPLE_RATE, 0);
|
||||||
av_opt_set_sample_fmt(swr, "in_sample_fmt", codec->sample_fmt, 0);
|
av_opt_set_sample_fmt(swr, "in_sample_fmt", codec->sample_fmt, 0);
|
||||||
av_opt_set_sample_fmt(swr, "out_sample_fmt", AV_SAMPLE_FMT_S16, 0);
|
av_opt_set_sample_fmt(swr, "out_sample_fmt", AV_SAMPLE_FMT_S16, 0);
|
||||||
#endif
|
|
||||||
|
|
||||||
swr_init(swr);
|
swr_init(swr);
|
||||||
if (!swr_is_initialized(swr)) {
|
if (!swr_is_initialized(swr)) {
|
||||||
|
@ -2,7 +2,7 @@
|
|||||||
#
|
#
|
||||||
# Transcribe audio livestream by feeding ffmpeg output to whisper.cpp at regular intervals
|
# Transcribe audio livestream by feeding ffmpeg output to whisper.cpp at regular intervals
|
||||||
# Idea by @semiformal-net
|
# Idea by @semiformal-net
|
||||||
# ref: https://github.com/ggml-org/whisper.cpp/issues/185
|
# ref: https://github.com/ggerganov/whisper.cpp/issues/185
|
||||||
#
|
#
|
||||||
|
|
||||||
set -eo pipefail
|
set -eo pipefail
|
||||||
|
@ -1,115 +1,39 @@
|
|||||||
import http.server
|
import http.server
|
||||||
import socketserver
|
import socketserver
|
||||||
import os
|
import os
|
||||||
import sys
|
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
import urllib.parse
|
|
||||||
|
|
||||||
SCRIPT_DIR = Path(__file__).parent.absolute()
|
SCRIPT_DIR = Path(__file__).parent.absolute()
|
||||||
DIRECTORY = os.path.join(SCRIPT_DIR, "../build-em/bin")
|
DIRECTORY = os.path.join(SCRIPT_DIR, "../build-em/bin")
|
||||||
DIRECTORY = os.path.abspath(DIRECTORY)
|
DIRECTORY = os.path.abspath(DIRECTORY)
|
||||||
|
|
||||||
# The context root we want for all applications
|
|
||||||
CONTEXT_ROOT = "/whisper.cpp"
|
|
||||||
|
|
||||||
class CustomHTTPRequestHandler(http.server.SimpleHTTPRequestHandler):
|
class CustomHTTPRequestHandler(http.server.SimpleHTTPRequestHandler):
|
||||||
def __init__(self, *args, **kwargs):
|
def __init__(self, *args, **kwargs):
|
||||||
super().__init__(*args, directory=DIRECTORY, **kwargs)
|
super().__init__(*args, directory=DIRECTORY, **kwargs)
|
||||||
|
|
||||||
def do_GET(self):
|
def do_GET(self):
|
||||||
# Redirect root to the context root
|
# If requesting a worker file from any subdirectory
|
||||||
if self.path == '/':
|
if '.worker.js' in self.path:
|
||||||
self.send_response(302)
|
|
||||||
self.send_header('Location', CONTEXT_ROOT + '/')
|
|
||||||
self.end_headers()
|
|
||||||
return
|
|
||||||
|
|
||||||
# Handle requests under the context root
|
|
||||||
if self.path.startswith(CONTEXT_ROOT):
|
|
||||||
# Remove the context root prefix to get the actual path
|
|
||||||
actual_path = self.path[len(CONTEXT_ROOT):]
|
|
||||||
|
|
||||||
if not actual_path:
|
|
||||||
self.send_response(302)
|
|
||||||
self.send_header('Location', CONTEXT_ROOT + '/')
|
|
||||||
self.end_headers()
|
|
||||||
return
|
|
||||||
|
|
||||||
if '.worker.js' in actual_path:
|
|
||||||
worker_file = os.path.basename(actual_path)
|
|
||||||
worker_path = os.path.join(DIRECTORY, worker_file)
|
|
||||||
|
|
||||||
if os.path.exists(worker_path):
|
|
||||||
print(f"Found worker file: {worker_path}")
|
|
||||||
self.path = '/' + worker_file
|
|
||||||
else:
|
|
||||||
print(f"Worker file not found: {worker_path}")
|
|
||||||
|
|
||||||
elif actual_path == '/':
|
|
||||||
self.path = '/whisper.wasm/index.html'
|
|
||||||
elif actual_path.startswith('/bench.wasm/') or actual_path.startswith('/command.wasm/') or actual_path.startswith('/stream.wasm/'):
|
|
||||||
# Keep the path as is, just remove the context root
|
|
||||||
self.path = actual_path
|
|
||||||
# For all other paths under the context root
|
|
||||||
else:
|
|
||||||
# Check if this is a request to a file in whisper.wasm
|
|
||||||
potential_file = os.path.join(DIRECTORY, 'whisper.wasm', actual_path.lstrip('/'))
|
|
||||||
if os.path.exists(potential_file) and not os.path.isdir(potential_file):
|
|
||||||
self.path = '/whisper.wasm' + actual_path
|
|
||||||
else:
|
|
||||||
# Try to resolve the file from the base directory
|
|
||||||
potential_file = os.path.join(DIRECTORY, actual_path.lstrip('/'))
|
|
||||||
if os.path.exists(potential_file):
|
|
||||||
self.path = actual_path
|
|
||||||
|
|
||||||
# For direct requests to worker files (without context root as these
|
|
||||||
# are in the build-em/bin directory
|
|
||||||
elif '.worker.js' in self.path:
|
|
||||||
worker_file = os.path.basename(self.path)
|
worker_file = os.path.basename(self.path)
|
||||||
worker_path = os.path.join(DIRECTORY, worker_file)
|
worker_path = os.path.join(DIRECTORY, worker_file)
|
||||||
|
|
||||||
if os.path.exists(worker_path):
|
if os.path.exists(worker_path):
|
||||||
self.path = '/' + worker_file
|
self.path = '/' + worker_file
|
||||||
|
|
||||||
# Handle coi-serviceworker.js separately
|
|
||||||
if 'coi-serviceworker.js' in self.path:
|
|
||||||
worker_file = "coi-serviceworker.js"
|
|
||||||
worker_path = os.path.join(SCRIPT_DIR, worker_file)
|
|
||||||
if os.path.exists(worker_path):
|
|
||||||
self.send_response(200)
|
|
||||||
self.send_header('Content-type', 'application/javascript')
|
|
||||||
self.end_headers()
|
|
||||||
with open(worker_path, 'rb') as file:
|
|
||||||
self.wfile.write(file.read())
|
|
||||||
return
|
|
||||||
else:
|
|
||||||
print(f"Warning: Could not find {worker_path}")
|
|
||||||
|
|
||||||
return super().do_GET()
|
return super().do_GET()
|
||||||
|
|
||||||
def end_headers(self):
|
def end_headers(self):
|
||||||
# Add required headers for SharedArrayBuffer
|
# Add required headers for SharedArrayBuffer
|
||||||
self.send_header("Cross-Origin-Opener-Policy", "same-origin")
|
self.send_header("Cross-Origin-Opener-Policy", "same-origin")
|
||||||
self.send_header("Cross-Origin-Embedder-Policy", "require-corp")
|
self.send_header("Cross-Origin-Embedder-Policy", "require-corp")
|
||||||
self.send_header("Access-Control-Allow-Origin", "*")
|
self.send_header("Access-Control-Allow-Origin", "*");
|
||||||
super().end_headers()
|
super().end_headers()
|
||||||
|
|
||||||
PORT = 8000
|
PORT = 8000
|
||||||
|
|
||||||
# Enable address reuse
|
with socketserver.TCPServer(("", PORT), CustomHTTPRequestHandler) as httpd:
|
||||||
class CustomServer(socketserver.TCPServer):
|
|
||||||
allow_reuse_address = True
|
|
||||||
|
|
||||||
try:
|
|
||||||
with CustomServer(("", PORT), CustomHTTPRequestHandler) as httpd:
|
|
||||||
print(f"Serving directory '{DIRECTORY}' at http://localhost:{PORT}")
|
print(f"Serving directory '{DIRECTORY}' at http://localhost:{PORT}")
|
||||||
print(f"Application context root: http://localhost:{PORT}{CONTEXT_ROOT}/")
|
|
||||||
try:
|
try:
|
||||||
httpd.serve_forever()
|
httpd.serve_forever()
|
||||||
except KeyboardInterrupt:
|
except KeyboardInterrupt:
|
||||||
print("\nServer stopped.")
|
print("\nServer stopped.")
|
||||||
# Force complete exit
|
|
||||||
sys.exit(0)
|
|
||||||
except OSError as e:
|
|
||||||
print(f"Error: {e}")
|
|
||||||
sys.exit(1)
|
|
||||||
|
@ -79,7 +79,6 @@ struct whisper_params {
|
|||||||
bool use_gpu = true;
|
bool use_gpu = true;
|
||||||
bool flash_attn = false;
|
bool flash_attn = false;
|
||||||
bool suppress_nst = false;
|
bool suppress_nst = false;
|
||||||
bool no_context = false;
|
|
||||||
|
|
||||||
std::string language = "en";
|
std::string language = "en";
|
||||||
std::string prompt = "";
|
std::string prompt = "";
|
||||||
@ -141,7 +140,6 @@ void whisper_print_usage(int /*argc*/, char ** argv, const whisper_params & para
|
|||||||
fprintf(stderr, " --convert, [%-7s] Convert audio to WAV, requires ffmpeg on the server\n", sparams.ffmpeg_converter ? "true" : "false");
|
fprintf(stderr, " --convert, [%-7s] Convert audio to WAV, requires ffmpeg on the server\n", sparams.ffmpeg_converter ? "true" : "false");
|
||||||
fprintf(stderr, " -sns, --suppress-nst [%-7s] suppress non-speech tokens\n", params.suppress_nst ? "true" : "false");
|
fprintf(stderr, " -sns, --suppress-nst [%-7s] suppress non-speech tokens\n", params.suppress_nst ? "true" : "false");
|
||||||
fprintf(stderr, " -nth N, --no-speech-thold N [%-7.2f] no speech threshold\n", params.no_speech_thold);
|
fprintf(stderr, " -nth N, --no-speech-thold N [%-7.2f] no speech threshold\n", params.no_speech_thold);
|
||||||
fprintf(stderr, " -nc, --no-context [%-7s] do not use previous audio context\n", params.no_context ? "true" : "false");
|
|
||||||
fprintf(stderr, "\n");
|
fprintf(stderr, "\n");
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -188,7 +186,6 @@ bool whisper_params_parse(int argc, char ** argv, whisper_params & params, serve
|
|||||||
else if (arg == "-fa" || arg == "--flash-attn") { params.flash_attn = true; }
|
else if (arg == "-fa" || arg == "--flash-attn") { params.flash_attn = true; }
|
||||||
else if (arg == "-sns" || arg == "--suppress-nst") { params.suppress_nst = true; }
|
else if (arg == "-sns" || arg == "--suppress-nst") { params.suppress_nst = true; }
|
||||||
else if (arg == "-nth" || arg == "--no-speech-thold") { params.no_speech_thold = std::stof(argv[++i]); }
|
else if (arg == "-nth" || arg == "--no-speech-thold") { params.no_speech_thold = std::stof(argv[++i]); }
|
||||||
else if (arg == "-nc" || arg == "--no-context") { params.no_context = true; }
|
|
||||||
|
|
||||||
// server params
|
// server params
|
||||||
else if ( arg == "--port") { sparams.port = std::stoi(argv[++i]); }
|
else if ( arg == "--port") { sparams.port = std::stoi(argv[++i]); }
|
||||||
@ -509,10 +506,6 @@ void get_req_parameters(const Request & req, whisper_params & params)
|
|||||||
{
|
{
|
||||||
params.suppress_nst = parse_str_to_bool(req.get_file_value("suppress_nst").content);
|
params.suppress_nst = parse_str_to_bool(req.get_file_value("suppress_nst").content);
|
||||||
}
|
}
|
||||||
if (req.has_file("no_context"))
|
|
||||||
{
|
|
||||||
params.no_context = parse_str_to_bool(req.get_file_value("no_context").content);
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
} // namespace
|
} // namespace
|
||||||
@ -825,7 +818,6 @@ int main(int argc, char ** argv) {
|
|||||||
|
|
||||||
wparams.no_timestamps = params.no_timestamps;
|
wparams.no_timestamps = params.no_timestamps;
|
||||||
wparams.token_timestamps = !params.no_timestamps && params.response_format == vjson_format;
|
wparams.token_timestamps = !params.no_timestamps && params.response_format == vjson_format;
|
||||||
wparams.no_context = params.no_context;
|
|
||||||
|
|
||||||
wparams.suppress_nst = params.suppress_nst;
|
wparams.suppress_nst = params.suppress_nst;
|
||||||
|
|
||||||
|
@ -2,7 +2,7 @@
|
|||||||
#
|
#
|
||||||
# Transcribe twitch.tv livestream by feeding audio input to whisper.cpp at regular intervals
|
# Transcribe twitch.tv livestream by feeding audio input to whisper.cpp at regular intervals
|
||||||
# Thanks to @keyehzy
|
# Thanks to @keyehzy
|
||||||
# ref: https://github.com/ggml-org/whisper.cpp/issues/209
|
# ref: https://github.com/ggerganov/whisper.cpp/issues/209
|
||||||
#
|
#
|
||||||
# The script currently depends on the third-party tool "streamlink"
|
# The script currently depends on the third-party tool "streamlink"
|
||||||
# On Mac OS, you can install it via "brew install streamlink"
|
# On Mac OS, you can install it via "brew install streamlink"
|
||||||
|
@ -14,8 +14,6 @@ set(SOURCE_FILES
|
|||||||
${WHISPER_LIB_DIR}/ggml/src/ggml-cpu/ggml-cpu.cpp
|
${WHISPER_LIB_DIR}/ggml/src/ggml-cpu/ggml-cpu.cpp
|
||||||
${WHISPER_LIB_DIR}/ggml/src/ggml-cpu/unary-ops.cpp
|
${WHISPER_LIB_DIR}/ggml/src/ggml-cpu/unary-ops.cpp
|
||||||
${WHISPER_LIB_DIR}/ggml/src/ggml-cpu/binary-ops.cpp
|
${WHISPER_LIB_DIR}/ggml/src/ggml-cpu/binary-ops.cpp
|
||||||
${WHISPER_LIB_DIR}/ggml/src/ggml-cpu/vec.cpp
|
|
||||||
${WHISPER_LIB_DIR}/ggml/src/ggml-cpu/ops.cpp
|
|
||||||
${WHISPER_LIB_DIR}/ggml/src/ggml-alloc.c
|
${WHISPER_LIB_DIR}/ggml/src/ggml-alloc.c
|
||||||
${WHISPER_LIB_DIR}/ggml/src/ggml-backend.cpp
|
${WHISPER_LIB_DIR}/ggml/src/ggml-backend.cpp
|
||||||
${WHISPER_LIB_DIR}/ggml/src/ggml-backend-reg.cpp
|
${WHISPER_LIB_DIR}/ggml/src/ggml-backend-reg.cpp
|
||||||
|
@ -34,8 +34,6 @@ if (NOT GGML_HOME)
|
|||||||
${WHISPER_LIB_DIR}/ggml/src/ggml-cpu/ggml-cpu-traits.cpp
|
${WHISPER_LIB_DIR}/ggml/src/ggml-cpu/ggml-cpu-traits.cpp
|
||||||
${WHISPER_LIB_DIR}/ggml/src/ggml-cpu/unary-ops.cpp
|
${WHISPER_LIB_DIR}/ggml/src/ggml-cpu/unary-ops.cpp
|
||||||
${WHISPER_LIB_DIR}/ggml/src/ggml-cpu/binary-ops.cpp
|
${WHISPER_LIB_DIR}/ggml/src/ggml-cpu/binary-ops.cpp
|
||||||
${WHISPER_LIB_DIR}/ggml/src/ggml-cpu/vec.cpp
|
|
||||||
${WHISPER_LIB_DIR}/ggml/src/ggml-cpu/ops.cpp
|
|
||||||
)
|
)
|
||||||
endif()
|
endif()
|
||||||
|
|
||||||
|
@ -5,7 +5,7 @@
|
|||||||
# This simple script is called by Neovim to capture audio from the microphone and transcribe it with Whisper.
|
# This simple script is called by Neovim to capture audio from the microphone and transcribe it with Whisper.
|
||||||
# In order for this to work, you need to clone the whisper.cpp repo and build the 'stream' tool
|
# In order for this to work, you need to clone the whisper.cpp repo and build the 'stream' tool
|
||||||
#
|
#
|
||||||
# git clone https://github.com/ggml-org/whisper.cpp
|
# git clone https://github.com/ggerganov/whisper.cpp
|
||||||
# cd whisper.cpp
|
# cd whisper.cpp
|
||||||
# make stream
|
# make stream
|
||||||
#
|
#
|
||||||
@ -31,7 +31,7 @@
|
|||||||
model="base.en"
|
model="base.en"
|
||||||
|
|
||||||
# export the path to the whisper.cpp repo in the WHISPER_CPP_HOME env variable
|
# export the path to the whisper.cpp repo in the WHISPER_CPP_HOME env variable
|
||||||
# https://github.com/ggml-org/whisper.cpp
|
# https://github.com/ggerganov/whisper.cpp
|
||||||
cd "${WHISPER_CPP_HOME}"
|
cd "${WHISPER_CPP_HOME}"
|
||||||
|
|
||||||
if [ ! -f ./stream ] ; then
|
if [ ! -f ./stream ] ; then
|
||||||
|
@ -36,7 +36,7 @@ set_target_properties(${TARGET} PROPERTIES LINK_FLAGS " \
|
|||||||
-s MAXIMUM_MEMORY=2000MB \
|
-s MAXIMUM_MEMORY=2000MB \
|
||||||
-s ALLOW_MEMORY_GROWTH=1 \
|
-s ALLOW_MEMORY_GROWTH=1 \
|
||||||
-s FORCE_FILESYSTEM=1 \
|
-s FORCE_FILESYSTEM=1 \
|
||||||
-s EXPORTED_RUNTIME_METHODS=\"['print', 'printErr', 'ccall', 'cwrap', 'HEAPU8']\" \
|
-s EXPORTED_RUNTIME_METHODS=\"['print', 'printErr', 'ccall', 'cwrap']\" \
|
||||||
${EXTRA_FLAGS} \
|
${EXTRA_FLAGS} \
|
||||||
")
|
")
|
||||||
|
|
||||||
|
@ -30,7 +30,7 @@ Link: https://ggerganov.github.io/whisper.cpp/
|
|||||||
|
|
||||||
```bash (v3.1.2)
|
```bash (v3.1.2)
|
||||||
# build using Emscripten
|
# build using Emscripten
|
||||||
git clone https://github.com/ggml-org/whisper.cpp
|
git clone https://github.com/ggerganov/whisper.cpp
|
||||||
cd whisper.cpp
|
cd whisper.cpp
|
||||||
mkdir build-em && cd build-em
|
mkdir build-em && cd build-em
|
||||||
emcmake cmake ..
|
emcmake cmake ..
|
||||||
|
@ -65,14 +65,13 @@ EMSCRIPTEN_BINDINGS(whisper) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
struct whisper_full_params params = whisper_full_default_params(whisper_sampling_strategy::WHISPER_SAMPLING_GREEDY);
|
struct whisper_full_params params = whisper_full_default_params(whisper_sampling_strategy::WHISPER_SAMPLING_GREEDY);
|
||||||
bool is_multilingual = whisper_is_multilingual(g_contexts[index]);
|
|
||||||
|
|
||||||
params.print_realtime = true;
|
params.print_realtime = true;
|
||||||
params.print_progress = false;
|
params.print_progress = false;
|
||||||
params.print_timestamps = true;
|
params.print_timestamps = true;
|
||||||
params.print_special = false;
|
params.print_special = false;
|
||||||
params.translate = translate;
|
params.translate = translate;
|
||||||
params.language = is_multilingual ? strdup(lang.c_str()) : "en";
|
params.language = whisper_is_multilingual(g_contexts[index]) ? lang.c_str() : "en";
|
||||||
params.n_threads = std::min(nthreads, std::min(16, mpow2(std::thread::hardware_concurrency())));
|
params.n_threads = std::min(nthreads, std::min(16, mpow2(std::thread::hardware_concurrency())));
|
||||||
params.offset_ms = 0;
|
params.offset_ms = 0;
|
||||||
|
|
||||||
@ -103,13 +102,10 @@ EMSCRIPTEN_BINDINGS(whisper) {
|
|||||||
|
|
||||||
// run the worker
|
// run the worker
|
||||||
{
|
{
|
||||||
g_worker = std::thread([index, params, pcmf32 = std::move(pcmf32), is_multilingual]() {
|
g_worker = std::thread([index, params, pcmf32 = std::move(pcmf32)]() {
|
||||||
whisper_reset_timings(g_contexts[index]);
|
whisper_reset_timings(g_contexts[index]);
|
||||||
whisper_full(g_contexts[index], params, pcmf32.data(), pcmf32.size());
|
whisper_full(g_contexts[index], params, pcmf32.data(), pcmf32.size());
|
||||||
whisper_print_timings(g_contexts[index]);
|
whisper_print_timings(g_contexts[index]);
|
||||||
if (is_multilingual) {
|
|
||||||
free((void*)params.language);
|
|
||||||
}
|
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -25,12 +25,12 @@
|
|||||||
# SOFTWARE.
|
# SOFTWARE.
|
||||||
|
|
||||||
# Small shell script to more easily automatically download and transcribe live stream VODs.
|
# Small shell script to more easily automatically download and transcribe live stream VODs.
|
||||||
# This uses YT-DLP, ffmpeg and the CPP version of Whisper: https://github.com/ggml-org/whisper.cpp
|
# This uses YT-DLP, ffmpeg and the CPP version of Whisper: https://github.com/ggerganov/whisper.cpp
|
||||||
# Use `./examples/yt-wsp.sh help` to print help info.
|
# Use `./examples/yt-wsp.sh help` to print help info.
|
||||||
#
|
#
|
||||||
# Sample usage:
|
# Sample usage:
|
||||||
#
|
#
|
||||||
# git clone https://github.com/ggml-org/whisper.cpp
|
# git clone https://github.com/ggerganov/whisper.cpp
|
||||||
# cd whisper.cpp
|
# cd whisper.cpp
|
||||||
# make
|
# make
|
||||||
# ./examples/yt-wsp.sh https://www.youtube.com/watch?v=1234567890
|
# ./examples/yt-wsp.sh https://www.youtube.com/watch?v=1234567890
|
||||||
@ -44,7 +44,7 @@ SCRIPT_DIR="${SCRIPT_PATH%/*}"
|
|||||||
|
|
||||||
################################################################################
|
################################################################################
|
||||||
# Documentation on downloading models can be found in the whisper.cpp repo:
|
# Documentation on downloading models can be found in the whisper.cpp repo:
|
||||||
# https://github.com/ggml-org/whisper.cpp/#usage
|
# https://github.com/ggerganov/whisper.cpp/#usage
|
||||||
#
|
#
|
||||||
# note: unless a multilingual model is specified, WHISPER_LANG will be ignored
|
# note: unless a multilingual model is specified, WHISPER_LANG will be ignored
|
||||||
# and the video will be transcribed as if the audio were in the English language
|
# and the video will be transcribed as if the audio were in the English language
|
||||||
@ -103,10 +103,10 @@ check_requirements() {
|
|||||||
fi;
|
fi;
|
||||||
|
|
||||||
if ! command -v "${WHISPER_EXECUTABLE}" &>/dev/null; then
|
if ! command -v "${WHISPER_EXECUTABLE}" &>/dev/null; then
|
||||||
echo "The C++ implementation of Whisper is required: https://github.com/ggml-org/whisper.cpp"
|
echo "The C++ implementation of Whisper is required: https://github.com/ggerganov/whisper.cpp"
|
||||||
echo "Sample usage:";
|
echo "Sample usage:";
|
||||||
echo "";
|
echo "";
|
||||||
echo " git clone https://github.com/ggml-org/whisper.cpp";
|
echo " git clone https://github.com/ggerganov/whisper.cpp";
|
||||||
echo " cd whisper.cpp";
|
echo " cd whisper.cpp";
|
||||||
echo " make";
|
echo " make";
|
||||||
echo " ./examples/yt-wsp.sh https://www.youtube.com/watch?v=1234567890";
|
echo " ./examples/yt-wsp.sh https://www.youtube.com/watch?v=1234567890";
|
||||||
|
@ -28,11 +28,6 @@ function(ggml_add_cpu_backend_variant_impl tag_name)
|
|||||||
ggml-cpu/binary-ops.cpp
|
ggml-cpu/binary-ops.cpp
|
||||||
ggml-cpu/unary-ops.h
|
ggml-cpu/unary-ops.h
|
||||||
ggml-cpu/unary-ops.cpp
|
ggml-cpu/unary-ops.cpp
|
||||||
ggml-cpu/simd-mappings.h
|
|
||||||
ggml-cpu/vec.h
|
|
||||||
ggml-cpu/vec.cpp
|
|
||||||
ggml-cpu/ops.h
|
|
||||||
ggml-cpu/ops.cpp
|
|
||||||
)
|
)
|
||||||
|
|
||||||
target_compile_features(${GGML_CPU_NAME} PRIVATE c_std_11 cxx_std_17)
|
target_compile_features(${GGML_CPU_NAME} PRIVATE c_std_11 cxx_std_17)
|
||||||
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@ -1,128 +0,0 @@
|
|||||||
#pragma once
|
|
||||||
|
|
||||||
#include "ggml.h"
|
|
||||||
|
|
||||||
//
|
|
||||||
// cache line
|
|
||||||
//
|
|
||||||
|
|
||||||
#if defined(__cpp_lib_hardware_interference_size)
|
|
||||||
#define CACHE_LINE_SIZE std::hardware_destructive_interference_size
|
|
||||||
#else
|
|
||||||
#if defined(__POWER9_VECTOR__)
|
|
||||||
#define CACHE_LINE_SIZE 128
|
|
||||||
#elif defined(__VXE__) || defined(__VXE2__)
|
|
||||||
#define CACHE_LINE_SIZE 256
|
|
||||||
#else
|
|
||||||
#define CACHE_LINE_SIZE 64
|
|
||||||
#endif
|
|
||||||
#endif
|
|
||||||
|
|
||||||
static const size_t CACHE_LINE_SIZE_F32 = CACHE_LINE_SIZE/sizeof(float);
|
|
||||||
|
|
||||||
#ifdef __cplusplus
|
|
||||||
extern "C" {
|
|
||||||
#endif
|
|
||||||
|
|
||||||
void ggml_compute_forward_dup(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_add(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_add1(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_acc(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_sum(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_sum_rows(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_mean(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_argmax(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_count_equal(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_repeat(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_repeat_back(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_concat(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_silu_back(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_norm(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_rms_norm(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_rms_norm_back(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_group_norm(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_l2_norm(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_out_prod(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_scale(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_set(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_cpy(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_cont(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_reshape(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_view(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_permute(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_transpose(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_get_rows(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_get_rows_back(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_diag(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_diag_mask_inf(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_diag_mask_zero(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_soft_max(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_soft_max_ext_back(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_rope(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_rope_back(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_clamp(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_conv_transpose_1d(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_im2col(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_im2col_back_f32(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_conv_transpose_2d(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_pool_1d(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_pool_2d(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_pool_2d_back(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_upscale(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_pad(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_pad_reflect_1d(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_arange(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_timestep_embedding(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_argsort(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_leaky_relu(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_flash_attn_ext(
|
|
||||||
const struct ggml_compute_params * params,
|
|
||||||
const struct ggml_tensor * q,
|
|
||||||
const struct ggml_tensor * k,
|
|
||||||
const struct ggml_tensor * v,
|
|
||||||
const struct ggml_tensor * mask,
|
|
||||||
struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_flash_attn_back(
|
|
||||||
const struct ggml_compute_params * params,
|
|
||||||
const bool masked,
|
|
||||||
struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_ssm_conv(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_ssm_scan(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_win_part(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_win_unpart(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_unary(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_get_rel_pos(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_add_rel_pos(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_rwkv_wkv6(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_rwkv_wkv7(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_gla(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_map_unary(
|
|
||||||
const struct ggml_compute_params * params,
|
|
||||||
struct ggml_tensor * dst,
|
|
||||||
const ggml_unary_op_f32_t fun);
|
|
||||||
void ggml_compute_forward_map_binary(
|
|
||||||
const struct ggml_compute_params * params,
|
|
||||||
struct ggml_tensor * dst,
|
|
||||||
const ggml_binary_op_f32_t fun);
|
|
||||||
void ggml_compute_forward_map_custom1_f32(
|
|
||||||
const struct ggml_compute_params * params,
|
|
||||||
struct ggml_tensor * dst,
|
|
||||||
const ggml_custom1_op_f32_t fun);
|
|
||||||
void ggml_compute_forward_map_custom2_f32(
|
|
||||||
const struct ggml_compute_params * params,
|
|
||||||
struct ggml_tensor * dst,
|
|
||||||
const ggml_custom2_op_f32_t fun);
|
|
||||||
void ggml_compute_forward_map_custom3_f32(
|
|
||||||
const struct ggml_compute_params * params,
|
|
||||||
struct ggml_tensor * dst,
|
|
||||||
const ggml_custom3_op_f32_t fun);
|
|
||||||
void ggml_compute_forward_map_custom1(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_map_custom2(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_map_custom3(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_cross_entropy_loss(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_cross_entropy_loss_back(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
void ggml_compute_forward_opt_step_adamw(const struct ggml_compute_params * params, struct ggml_tensor * dst);
|
|
||||||
|
|
||||||
#ifdef __cplusplus
|
|
||||||
}
|
|
||||||
#endif
|
|
@ -1,884 +0,0 @@
|
|||||||
#pragma once
|
|
||||||
|
|
||||||
#include "ggml-cpu-impl.h"
|
|
||||||
|
|
||||||
//
|
|
||||||
// simd mappings
|
|
||||||
//
|
|
||||||
|
|
||||||
// we define a common set of C macros which map to specific intrinsics based on the current architecture
|
|
||||||
// we then implement the fundamental computation operations below using only these macros
|
|
||||||
// adding support for new architectures requires to define the corresponding SIMD macros
|
|
||||||
//
|
|
||||||
// GGML_F32_STEP / GGML_F16_STEP
|
|
||||||
// number of elements to process in a single step
|
|
||||||
//
|
|
||||||
// GGML_F32_EPR / GGML_F16_EPR
|
|
||||||
// number of elements to fit in a single register
|
|
||||||
//
|
|
||||||
|
|
||||||
#if defined(__ARM_NEON) && defined(__ARM_FEATURE_FMA)
|
|
||||||
|
|
||||||
#define GGML_SIMD
|
|
||||||
|
|
||||||
// F32 NEON
|
|
||||||
|
|
||||||
#define GGML_F32_STEP 16
|
|
||||||
#define GGML_F32_EPR 4
|
|
||||||
|
|
||||||
#define GGML_F32x4 float32x4_t
|
|
||||||
#define GGML_F32x4_ZERO vdupq_n_f32(0.0f)
|
|
||||||
#define GGML_F32x4_SET1(x) vdupq_n_f32(x)
|
|
||||||
#define GGML_F32x4_LOAD vld1q_f32
|
|
||||||
#define GGML_F32x4_STORE vst1q_f32
|
|
||||||
#define GGML_F32x4_FMA(a, b, c) vfmaq_f32(a, b, c)
|
|
||||||
#define GGML_F32x4_ADD vaddq_f32
|
|
||||||
#define GGML_F32x4_MUL vmulq_f32
|
|
||||||
#define GGML_F32x4_REDUCE_ONE(x) vaddvq_f32(x)
|
|
||||||
#define GGML_F32x4_REDUCE(res, x) \
|
|
||||||
{ \
|
|
||||||
int offset = GGML_F32_ARR >> 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
(x)[i] = vaddq_f32((x)[i], (x)[offset+i]); \
|
|
||||||
} \
|
|
||||||
offset >>= 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
(x)[i] = vaddq_f32((x)[i], (x)[offset+i]); \
|
|
||||||
} \
|
|
||||||
offset >>= 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
(x)[i] = vaddq_f32((x)[i], (x)[offset+i]); \
|
|
||||||
} \
|
|
||||||
(res) = (ggml_float) GGML_F32x4_REDUCE_ONE((x)[0]); \
|
|
||||||
}
|
|
||||||
|
|
||||||
#define GGML_F32_VEC GGML_F32x4
|
|
||||||
#define GGML_F32_VEC_ZERO GGML_F32x4_ZERO
|
|
||||||
#define GGML_F32_VEC_SET1 GGML_F32x4_SET1
|
|
||||||
#define GGML_F32_VEC_LOAD GGML_F32x4_LOAD
|
|
||||||
#define GGML_F32_VEC_STORE GGML_F32x4_STORE
|
|
||||||
#define GGML_F32_VEC_FMA GGML_F32x4_FMA
|
|
||||||
#define GGML_F32_VEC_ADD GGML_F32x4_ADD
|
|
||||||
#define GGML_F32_VEC_MUL GGML_F32x4_MUL
|
|
||||||
#define GGML_F32_VEC_REDUCE GGML_F32x4_REDUCE
|
|
||||||
|
|
||||||
// F16 NEON
|
|
||||||
|
|
||||||
#if defined(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
|
|
||||||
#define GGML_F16_STEP 32
|
|
||||||
#define GGML_F16_EPR 8
|
|
||||||
|
|
||||||
#define GGML_F16x8 float16x8_t
|
|
||||||
#define GGML_F16x8_ZERO vdupq_n_f16(0.0f)
|
|
||||||
#define GGML_F16x8_SET1(x) vdupq_n_f16(x)
|
|
||||||
#define GGML_F16x8_LOAD(x) vld1q_f16((const ggml_fp16_internal_t *)(x))
|
|
||||||
#define GGML_F16x8_STORE vst1q_f16
|
|
||||||
#define GGML_F16x8_FMA(a, b, c) vfmaq_f16(a, b, c)
|
|
||||||
#define GGML_F16x8_ADD vaddq_f16
|
|
||||||
#define GGML_F16x8_MUL vmulq_f16
|
|
||||||
#define GGML_F16x8_REDUCE(res, x) \
|
|
||||||
do { \
|
|
||||||
int offset = GGML_F16_ARR >> 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
(x)[i] = vaddq_f16((x)[i], (x)[offset+i]); \
|
|
||||||
} \
|
|
||||||
offset >>= 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
(x)[i] = vaddq_f16((x)[i], (x)[offset+i]); \
|
|
||||||
} \
|
|
||||||
offset >>= 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
(x)[i] = vaddq_f16((x)[i], (x)[offset+i]); \
|
|
||||||
} \
|
|
||||||
const float32x4_t t0 = vcvt_f32_f16(vget_low_f16 ((x)[0])); \
|
|
||||||
const float32x4_t t1 = vcvt_f32_f16(vget_high_f16((x)[0])); \
|
|
||||||
(res) = (ggml_float) vaddvq_f32(vaddq_f32(t0, t1)); \
|
|
||||||
} while (0)
|
|
||||||
|
|
||||||
#define GGML_F16_VEC GGML_F16x8
|
|
||||||
#define GGML_F16_VEC_ZERO GGML_F16x8_ZERO
|
|
||||||
#define GGML_F16_VEC_SET1 GGML_F16x8_SET1
|
|
||||||
#define GGML_F16_VEC_LOAD(p, i) GGML_F16x8_LOAD(p)
|
|
||||||
#define GGML_F16_VEC_STORE(p, r, i) GGML_F16x8_STORE((ggml_fp16_internal_t *)(p), (r)[i])
|
|
||||||
#define GGML_F16_VEC_FMA GGML_F16x8_FMA
|
|
||||||
#define GGML_F16_VEC_ADD GGML_F16x8_ADD
|
|
||||||
#define GGML_F16_VEC_MUL GGML_F16x8_MUL
|
|
||||||
#define GGML_F16_VEC_REDUCE GGML_F16x8_REDUCE
|
|
||||||
#else
|
|
||||||
// if FP16 vector arithmetic is not supported, we use FP32 instead
|
|
||||||
// and take advantage of the vcvt_ functions to convert to/from FP16
|
|
||||||
|
|
||||||
#define GGML_F16_STEP 16
|
|
||||||
#define GGML_F16_EPR 4
|
|
||||||
|
|
||||||
#define GGML_F32Cx4 float32x4_t
|
|
||||||
#define GGML_F32Cx4_ZERO vdupq_n_f32(0.0f)
|
|
||||||
#define GGML_F32Cx4_SET1(x) vdupq_n_f32(x)
|
|
||||||
#define GGML_F32Cx4_LOAD(x) vcvt_f32_f16(vld1_f16((const ggml_fp16_internal_t *)(x)))
|
|
||||||
#define GGML_F32Cx4_STORE(x, y) vst1_f16(x, vcvt_f16_f32(y))
|
|
||||||
#define GGML_F32Cx4_FMA(a, b, c) vfmaq_f32(a, b, c)
|
|
||||||
#define GGML_F32Cx4_ADD vaddq_f32
|
|
||||||
#define GGML_F32Cx4_MUL vmulq_f32
|
|
||||||
#define GGML_F32Cx4_REDUCE GGML_F32x4_REDUCE
|
|
||||||
|
|
||||||
#define GGML_F16_VEC GGML_F32Cx4
|
|
||||||
#define GGML_F16_VEC_ZERO GGML_F32Cx4_ZERO
|
|
||||||
#define GGML_F16_VEC_SET1 GGML_F32Cx4_SET1
|
|
||||||
#define GGML_F16_VEC_LOAD(p, i) GGML_F32Cx4_LOAD(p)
|
|
||||||
#define GGML_F16_VEC_STORE(p, r, i) GGML_F32Cx4_STORE((ggml_fp16_internal_t *)(p), r[i])
|
|
||||||
#define GGML_F16_VEC_FMA GGML_F32Cx4_FMA
|
|
||||||
#define GGML_F16_VEC_ADD GGML_F32Cx4_ADD
|
|
||||||
#define GGML_F16_VEC_MUL GGML_F32Cx4_MUL
|
|
||||||
#define GGML_F16_VEC_REDUCE GGML_F32Cx4_REDUCE
|
|
||||||
#endif
|
|
||||||
|
|
||||||
#elif defined(__AVX512F__)
|
|
||||||
|
|
||||||
#define GGML_SIMD
|
|
||||||
|
|
||||||
// F32 AVX512
|
|
||||||
|
|
||||||
#define GGML_F32_STEP 64
|
|
||||||
#define GGML_F32_EPR 16
|
|
||||||
|
|
||||||
#define GGML_F32x16 __m512
|
|
||||||
#define GGML_F32x16_ZERO _mm512_setzero_ps()
|
|
||||||
#define GGML_F32x16_SET1(x) _mm512_set1_ps(x)
|
|
||||||
#define GGML_F32x16_LOAD _mm512_loadu_ps
|
|
||||||
#define GGML_F32x16_STORE _mm512_storeu_ps
|
|
||||||
// _mm512_fmadd_ps is defined in AVX512F so no guard is required
|
|
||||||
#define GGML_F32x16_FMA(a, b, c) _mm512_fmadd_ps(b, c, a)
|
|
||||||
#define GGML_F32x16_ADD _mm512_add_ps
|
|
||||||
#define GGML_F32x16_MUL _mm512_mul_ps
|
|
||||||
#define GGML_F32x16_REDUCE(res, x) \
|
|
||||||
do { \
|
|
||||||
int offset = GGML_F32_ARR >> 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
x[i] = _mm512_add_ps(x[i], x[offset+i]); \
|
|
||||||
} \
|
|
||||||
offset >>= 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
x[i] = _mm512_add_ps(x[i], x[offset+i]); \
|
|
||||||
} \
|
|
||||||
offset >>= 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
x[i] = _mm512_add_ps(x[i], x[offset+i]); \
|
|
||||||
} \
|
|
||||||
res = (ggml_float) _mm512_reduce_add_ps(x[0]); \
|
|
||||||
} while (0)
|
|
||||||
|
|
||||||
// TODO: is this optimal ?
|
|
||||||
|
|
||||||
#define GGML_F32_VEC GGML_F32x16
|
|
||||||
#define GGML_F32_VEC_ZERO GGML_F32x16_ZERO
|
|
||||||
#define GGML_F32_VEC_SET1 GGML_F32x16_SET1
|
|
||||||
#define GGML_F32_VEC_LOAD GGML_F32x16_LOAD
|
|
||||||
#define GGML_F32_VEC_STORE GGML_F32x16_STORE
|
|
||||||
#define GGML_F32_VEC_FMA GGML_F32x16_FMA
|
|
||||||
#define GGML_F32_VEC_ADD GGML_F32x16_ADD
|
|
||||||
#define GGML_F32_VEC_MUL GGML_F32x16_MUL
|
|
||||||
#define GGML_F32_VEC_REDUCE GGML_F32x16_REDUCE
|
|
||||||
|
|
||||||
// F16 AVX512
|
|
||||||
|
|
||||||
// F16 AVX
|
|
||||||
|
|
||||||
#define GGML_F16_STEP 64
|
|
||||||
#define GGML_F16_EPR 16
|
|
||||||
|
|
||||||
// AVX512 has FP16 extension (AVX512_FP16) but I don't have it on my machine so I use FP32 instead
|
|
||||||
|
|
||||||
#define GGML_F32Cx16 __m512
|
|
||||||
#define GGML_F32Cx16_ZERO _mm512_setzero_ps()
|
|
||||||
#define GGML_F32Cx16_SET1(x) _mm512_set1_ps(x)
|
|
||||||
|
|
||||||
// unlike _mm256_cvt intrinsics that require F16C, _mm512_cvt is defined in AVX512F
|
|
||||||
// so F16C guard isn't required
|
|
||||||
#define GGML_F32Cx16_LOAD(x) _mm512_cvtph_ps(_mm256_loadu_si256((const __m256i *)(x)))
|
|
||||||
#define GGML_F32Cx16_STORE(x, y) _mm256_storeu_si256((__m256i *)(x), _mm512_cvtps_ph(y, 0))
|
|
||||||
|
|
||||||
#define GGML_F32Cx16_FMA(a, b, c) _mm512_fmadd_ps(b, c, a)
|
|
||||||
#define GGML_F32Cx16_ADD _mm512_add_ps
|
|
||||||
#define GGML_F32Cx16_MUL _mm512_mul_ps
|
|
||||||
#define GGML_F32Cx16_REDUCE(res, x) \
|
|
||||||
do { \
|
|
||||||
int offset = GGML_F32_ARR >> 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
x[i] = _mm512_add_ps(x[i], x[offset+i]); \
|
|
||||||
} \
|
|
||||||
offset >>= 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
x[i] = _mm512_add_ps(x[i], x[offset+i]); \
|
|
||||||
} \
|
|
||||||
offset >>= 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
x[i] = _mm512_add_ps(x[i], x[offset+i]); \
|
|
||||||
} \
|
|
||||||
res = (ggml_float) _mm512_reduce_add_ps(x[0]); \
|
|
||||||
} while (0)
|
|
||||||
|
|
||||||
#define GGML_F16_VEC GGML_F32Cx16
|
|
||||||
#define GGML_F16_VEC_ZERO GGML_F32Cx16_ZERO
|
|
||||||
#define GGML_F16_VEC_SET1 GGML_F32Cx16_SET1
|
|
||||||
#define GGML_F16_VEC_LOAD(p, i) GGML_F32Cx16_LOAD(p)
|
|
||||||
#define GGML_F16_VEC_STORE(p, r, i) GGML_F32Cx16_STORE(p, r[i])
|
|
||||||
#define GGML_F16_VEC_FMA GGML_F32Cx16_FMA
|
|
||||||
#define GGML_F16_VEC_ADD GGML_F32Cx16_ADD
|
|
||||||
#define GGML_F16_VEC_MUL GGML_F32Cx16_MUL
|
|
||||||
|
|
||||||
#define GGML_F16_VEC_REDUCE GGML_F32Cx16_REDUCE
|
|
||||||
#elif defined(__AVX__)
|
|
||||||
|
|
||||||
#define GGML_SIMD
|
|
||||||
|
|
||||||
// F32 AVX
|
|
||||||
|
|
||||||
#define GGML_F32_STEP 32
|
|
||||||
#define GGML_F32_EPR 8
|
|
||||||
|
|
||||||
#define GGML_F32x8 __m256
|
|
||||||
#define GGML_F32x8_ZERO _mm256_setzero_ps()
|
|
||||||
#define GGML_F32x8_SET1(x) _mm256_set1_ps(x)
|
|
||||||
#define GGML_F32x8_LOAD _mm256_loadu_ps
|
|
||||||
#define GGML_F32x8_STORE _mm256_storeu_ps
|
|
||||||
#if defined(__FMA__)
|
|
||||||
#define GGML_F32x8_FMA(a, b, c) _mm256_fmadd_ps(b, c, a)
|
|
||||||
#else
|
|
||||||
#define GGML_F32x8_FMA(a, b, c) _mm256_add_ps(_mm256_mul_ps(b, c), a)
|
|
||||||
#endif
|
|
||||||
#define GGML_F32x8_ADD _mm256_add_ps
|
|
||||||
#define GGML_F32x8_MUL _mm256_mul_ps
|
|
||||||
#define GGML_F32x8_REDUCE(res, x) \
|
|
||||||
do { \
|
|
||||||
int offset = GGML_F32_ARR >> 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
x[i] = _mm256_add_ps(x[i], x[offset+i]); \
|
|
||||||
} \
|
|
||||||
offset >>= 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
x[i] = _mm256_add_ps(x[i], x[offset+i]); \
|
|
||||||
} \
|
|
||||||
offset >>= 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
x[i] = _mm256_add_ps(x[i], x[offset+i]); \
|
|
||||||
} \
|
|
||||||
const __m128 t0 = _mm_add_ps(_mm256_castps256_ps128(x[0]), \
|
|
||||||
_mm256_extractf128_ps(x[0], 1)); \
|
|
||||||
const __m128 t1 = _mm_hadd_ps(t0, t0); \
|
|
||||||
res = (ggml_float) _mm_cvtss_f32(_mm_hadd_ps(t1, t1)); \
|
|
||||||
} while (0)
|
|
||||||
// TODO: is this optimal ?
|
|
||||||
|
|
||||||
#define GGML_F32_VEC GGML_F32x8
|
|
||||||
#define GGML_F32_VEC_ZERO GGML_F32x8_ZERO
|
|
||||||
#define GGML_F32_VEC_SET1 GGML_F32x8_SET1
|
|
||||||
#define GGML_F32_VEC_LOAD GGML_F32x8_LOAD
|
|
||||||
#define GGML_F32_VEC_STORE GGML_F32x8_STORE
|
|
||||||
#define GGML_F32_VEC_FMA GGML_F32x8_FMA
|
|
||||||
#define GGML_F32_VEC_ADD GGML_F32x8_ADD
|
|
||||||
#define GGML_F32_VEC_MUL GGML_F32x8_MUL
|
|
||||||
#define GGML_F32_VEC_REDUCE GGML_F32x8_REDUCE
|
|
||||||
|
|
||||||
// F16 AVX
|
|
||||||
|
|
||||||
#define GGML_F16_STEP 32
|
|
||||||
#define GGML_F16_EPR 8
|
|
||||||
|
|
||||||
// F16 arithmetic is not supported by AVX, so we use F32 instead
|
|
||||||
|
|
||||||
#define GGML_F32Cx8 __m256
|
|
||||||
#define GGML_F32Cx8_ZERO _mm256_setzero_ps()
|
|
||||||
#define GGML_F32Cx8_SET1(x) _mm256_set1_ps(x)
|
|
||||||
|
|
||||||
#if defined(__F16C__)
|
|
||||||
// the _mm256_cvt intrinsics require F16C
|
|
||||||
#define GGML_F32Cx8_LOAD(x) _mm256_cvtph_ps(_mm_loadu_si128((const __m128i *)(x)))
|
|
||||||
#define GGML_F32Cx8_STORE(x, y) _mm_storeu_si128((__m128i *)(x), _mm256_cvtps_ph(y, 0))
|
|
||||||
#else
|
|
||||||
static inline __m256 __avx_f32cx8_load(const ggml_fp16_t * x) {
|
|
||||||
float tmp[8];
|
|
||||||
|
|
||||||
for (int i = 0; i < 8; i++) {
|
|
||||||
tmp[i] = GGML_FP16_TO_FP32(x[i]);
|
|
||||||
}
|
|
||||||
|
|
||||||
return _mm256_loadu_ps(tmp);
|
|
||||||
}
|
|
||||||
static inline void __avx_f32cx8_store(ggml_fp16_t *x, __m256 y) {
|
|
||||||
float arr[8];
|
|
||||||
|
|
||||||
_mm256_storeu_ps(arr, y);
|
|
||||||
|
|
||||||
for (int i = 0; i < 8; i++)
|
|
||||||
x[i] = GGML_FP32_TO_FP16(arr[i]);
|
|
||||||
}
|
|
||||||
#define GGML_F32Cx8_LOAD(x) __avx_f32cx8_load(x)
|
|
||||||
#define GGML_F32Cx8_STORE(x, y) __avx_f32cx8_store(x, y)
|
|
||||||
#endif
|
|
||||||
|
|
||||||
#define GGML_F32Cx8_FMA GGML_F32x8_FMA
|
|
||||||
#define GGML_F32Cx8_ADD _mm256_add_ps
|
|
||||||
#define GGML_F32Cx8_MUL _mm256_mul_ps
|
|
||||||
#define GGML_F32Cx8_REDUCE GGML_F32x8_REDUCE
|
|
||||||
|
|
||||||
#define GGML_F16_VEC GGML_F32Cx8
|
|
||||||
#define GGML_F16_VEC_ZERO GGML_F32Cx8_ZERO
|
|
||||||
#define GGML_F16_VEC_SET1 GGML_F32Cx8_SET1
|
|
||||||
#define GGML_F16_VEC_LOAD(p, i) GGML_F32Cx8_LOAD(p)
|
|
||||||
#define GGML_F16_VEC_STORE(p, r, i) GGML_F32Cx8_STORE(p, r[i])
|
|
||||||
#define GGML_F16_VEC_FMA GGML_F32Cx8_FMA
|
|
||||||
#define GGML_F16_VEC_ADD GGML_F32Cx8_ADD
|
|
||||||
#define GGML_F16_VEC_MUL GGML_F32Cx8_MUL
|
|
||||||
#define GGML_F16_VEC_REDUCE GGML_F32Cx8_REDUCE
|
|
||||||
|
|
||||||
#elif defined(__POWER9_VECTOR__)
|
|
||||||
|
|
||||||
#define GGML_SIMD
|
|
||||||
|
|
||||||
// F32 POWER9
|
|
||||||
|
|
||||||
#define GGML_F32_STEP 32
|
|
||||||
#define GGML_F32_EPR 4
|
|
||||||
|
|
||||||
#define GGML_F32x4 vector float
|
|
||||||
#define GGML_F32x4_ZERO 0.0f
|
|
||||||
#define GGML_F32x4_SET1 vec_splats
|
|
||||||
#define GGML_F32x4_LOAD(p) vec_xl(0, p)
|
|
||||||
#define GGML_F32x4_STORE(p, r) vec_xst(r, 0, p)
|
|
||||||
#define GGML_F32x4_FMA(a, b, c) vec_madd(b, c, a)
|
|
||||||
#define GGML_F32x4_ADD vec_add
|
|
||||||
#define GGML_F32x4_MUL vec_mul
|
|
||||||
#define GGML_F32x4_REDUCE(res, x) \
|
|
||||||
{ \
|
|
||||||
int offset = GGML_F32_ARR >> 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
x[i] = vec_add(x[i], x[offset+i]); \
|
|
||||||
} \
|
|
||||||
offset >>= 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
x[i] = vec_add(x[i], x[offset+i]); \
|
|
||||||
} \
|
|
||||||
offset >>= 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
x[i] = vec_add(x[i], x[offset+i]); \
|
|
||||||
} \
|
|
||||||
res = vec_extract(x[0], 0) + \
|
|
||||||
vec_extract(x[0], 1) + \
|
|
||||||
vec_extract(x[0], 2) + \
|
|
||||||
vec_extract(x[0], 3); \
|
|
||||||
}
|
|
||||||
|
|
||||||
#define GGML_F32_VEC GGML_F32x4
|
|
||||||
#define GGML_F32_VEC_ZERO GGML_F32x4_ZERO
|
|
||||||
#define GGML_F32_VEC_SET1 GGML_F32x4_SET1
|
|
||||||
#define GGML_F32_VEC_LOAD GGML_F32x4_LOAD
|
|
||||||
#define GGML_F32_VEC_STORE GGML_F32x4_STORE
|
|
||||||
#define GGML_F32_VEC_FMA GGML_F32x4_FMA
|
|
||||||
#define GGML_F32_VEC_ADD GGML_F32x4_ADD
|
|
||||||
#define GGML_F32_VEC_MUL GGML_F32x4_MUL
|
|
||||||
#define GGML_F32_VEC_REDUCE GGML_F32x4_REDUCE
|
|
||||||
|
|
||||||
// F16 POWER9
|
|
||||||
#define GGML_F16_STEP GGML_F32_STEP
|
|
||||||
#define GGML_F16_EPR GGML_F32_EPR
|
|
||||||
#define GGML_F16_VEC GGML_F32x4
|
|
||||||
#define GGML_F16_VEC_ZERO GGML_F32x4_ZERO
|
|
||||||
#define GGML_F16_VEC_SET1 GGML_F32x4_SET1
|
|
||||||
#define GGML_F16_VEC_FMA GGML_F32x4_FMA
|
|
||||||
#define GGML_F16_VEC_ADD GGML_F32x4_ADD
|
|
||||||
#define GGML_F16_VEC_MUL GGML_F32x4_MUL
|
|
||||||
#define GGML_F16_VEC_REDUCE GGML_F32x4_REDUCE
|
|
||||||
// Use vec_xl, not vec_ld, in case the load address is not aligned.
|
|
||||||
#define GGML_F16_VEC_LOAD(p, i) (i & 0x1) ? \
|
|
||||||
vec_extract_fp32_from_shorth(vec_xl(0, p - GGML_F16_EPR)) : \
|
|
||||||
vec_extract_fp32_from_shortl(vec_xl(0, p))
|
|
||||||
#define GGML_ENDIAN_BYTE(i) ((unsigned char *)&(uint16_t){1})[i]
|
|
||||||
#define GGML_F16_VEC_STORE(p, r, i) \
|
|
||||||
if (i & 0x1) \
|
|
||||||
vec_xst(vec_pack_to_short_fp32(r[i - GGML_ENDIAN_BYTE(1)], \
|
|
||||||
r[i - GGML_ENDIAN_BYTE(0)]), \
|
|
||||||
0, p - GGML_F16_EPR)
|
|
||||||
|
|
||||||
#elif defined(__wasm_simd128__)
|
|
||||||
|
|
||||||
#define GGML_SIMD
|
|
||||||
|
|
||||||
// F32 WASM
|
|
||||||
|
|
||||||
#define GGML_F32_STEP 16
|
|
||||||
#define GGML_F32_EPR 4
|
|
||||||
|
|
||||||
#define GGML_F32x4 v128_t
|
|
||||||
#define GGML_F32x4_ZERO wasm_f32x4_splat(0.0f)
|
|
||||||
#define GGML_F32x4_SET1(x) wasm_f32x4_splat(x)
|
|
||||||
#define GGML_F32x4_LOAD wasm_v128_load
|
|
||||||
#define GGML_F32x4_STORE wasm_v128_store
|
|
||||||
#define GGML_F32x4_FMA(a, b, c) wasm_f32x4_add(wasm_f32x4_mul(b, c), a)
|
|
||||||
#define GGML_F32x4_ADD wasm_f32x4_add
|
|
||||||
#define GGML_F32x4_MUL wasm_f32x4_mul
|
|
||||||
#define GGML_F32x4_REDUCE(res, x) \
|
|
||||||
{ \
|
|
||||||
int offset = GGML_F32_ARR >> 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
x[i] = wasm_f32x4_add(x[i], x[offset+i]); \
|
|
||||||
} \
|
|
||||||
offset >>= 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
x[i] = wasm_f32x4_add(x[i], x[offset+i]); \
|
|
||||||
} \
|
|
||||||
offset >>= 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
x[i] = wasm_f32x4_add(x[i], x[offset+i]); \
|
|
||||||
} \
|
|
||||||
res = wasm_f32x4_extract_lane(x[0], 0) + \
|
|
||||||
wasm_f32x4_extract_lane(x[0], 1) + \
|
|
||||||
wasm_f32x4_extract_lane(x[0], 2) + \
|
|
||||||
wasm_f32x4_extract_lane(x[0], 3); \
|
|
||||||
}
|
|
||||||
|
|
||||||
#define GGML_F32_VEC GGML_F32x4
|
|
||||||
#define GGML_F32_VEC_ZERO GGML_F32x4_ZERO
|
|
||||||
#define GGML_F32_VEC_SET1 GGML_F32x4_SET1
|
|
||||||
#define GGML_F32_VEC_LOAD GGML_F32x4_LOAD
|
|
||||||
#define GGML_F32_VEC_STORE GGML_F32x4_STORE
|
|
||||||
#define GGML_F32_VEC_FMA GGML_F32x4_FMA
|
|
||||||
#define GGML_F32_VEC_ADD GGML_F32x4_ADD
|
|
||||||
#define GGML_F32_VEC_MUL GGML_F32x4_MUL
|
|
||||||
#define GGML_F32_VEC_REDUCE GGML_F32x4_REDUCE
|
|
||||||
|
|
||||||
// F16 WASM
|
|
||||||
|
|
||||||
#define GGML_F16_STEP 16
|
|
||||||
#define GGML_F16_EPR 4
|
|
||||||
|
|
||||||
inline static v128_t __wasm_f16x4_load(const ggml_fp16_t * p) {
|
|
||||||
float tmp[4];
|
|
||||||
|
|
||||||
tmp[0] = GGML_FP16_TO_FP32(p[0]);
|
|
||||||
tmp[1] = GGML_FP16_TO_FP32(p[1]);
|
|
||||||
tmp[2] = GGML_FP16_TO_FP32(p[2]);
|
|
||||||
tmp[3] = GGML_FP16_TO_FP32(p[3]);
|
|
||||||
|
|
||||||
return wasm_v128_load(tmp);
|
|
||||||
}
|
|
||||||
|
|
||||||
inline static void __wasm_f16x4_store(ggml_fp16_t * p, v128_t x) {
|
|
||||||
float tmp[4];
|
|
||||||
|
|
||||||
wasm_v128_store(tmp, x);
|
|
||||||
|
|
||||||
p[0] = GGML_FP32_TO_FP16(tmp[0]);
|
|
||||||
p[1] = GGML_FP32_TO_FP16(tmp[1]);
|
|
||||||
p[2] = GGML_FP32_TO_FP16(tmp[2]);
|
|
||||||
p[3] = GGML_FP32_TO_FP16(tmp[3]);
|
|
||||||
}
|
|
||||||
|
|
||||||
#define GGML_F16x4 v128_t
|
|
||||||
#define GGML_F16x4_ZERO wasm_f32x4_splat(0.0f)
|
|
||||||
#define GGML_F16x4_SET1(x) wasm_f32x4_splat(x)
|
|
||||||
#define GGML_F16x4_LOAD(x) __wasm_f16x4_load(x)
|
|
||||||
#define GGML_F16x4_STORE(x, y) __wasm_f16x4_store(x, y)
|
|
||||||
#define GGML_F16x4_FMA GGML_F32x4_FMA
|
|
||||||
#define GGML_F16x4_ADD wasm_f32x4_add
|
|
||||||
#define GGML_F16x4_MUL wasm_f32x4_mul
|
|
||||||
#define GGML_F16x4_REDUCE(res, x) \
|
|
||||||
{ \
|
|
||||||
int offset = GGML_F16_ARR >> 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
x[i] = wasm_f32x4_add(x[i], x[offset+i]); \
|
|
||||||
} \
|
|
||||||
offset >>= 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
x[i] = wasm_f32x4_add(x[i], x[offset+i]); \
|
|
||||||
} \
|
|
||||||
offset >>= 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
x[i] = wasm_f32x4_add(x[i], x[offset+i]); \
|
|
||||||
} \
|
|
||||||
res = (ggml_float) (wasm_f32x4_extract_lane(x[0], 0) + \
|
|
||||||
wasm_f32x4_extract_lane(x[0], 1) + \
|
|
||||||
wasm_f32x4_extract_lane(x[0], 2) + \
|
|
||||||
wasm_f32x4_extract_lane(x[0], 3)); \
|
|
||||||
}
|
|
||||||
|
|
||||||
#define GGML_F16_VEC GGML_F16x4
|
|
||||||
#define GGML_F16_VEC_ZERO GGML_F16x4_ZERO
|
|
||||||
#define GGML_F16_VEC_SET1 GGML_F16x4_SET1
|
|
||||||
#define GGML_F16_VEC_LOAD(p, i) GGML_F16x4_LOAD(p)
|
|
||||||
#define GGML_F16_VEC_STORE(p, r, i) GGML_F16x4_STORE(p, r[i])
|
|
||||||
#define GGML_F16_VEC_FMA GGML_F16x4_FMA
|
|
||||||
#define GGML_F16_VEC_ADD GGML_F16x4_ADD
|
|
||||||
#define GGML_F16_VEC_MUL GGML_F16x4_MUL
|
|
||||||
#define GGML_F16_VEC_REDUCE GGML_F16x4_REDUCE
|
|
||||||
|
|
||||||
#elif defined(__SSE3__)
|
|
||||||
|
|
||||||
#define GGML_SIMD
|
|
||||||
|
|
||||||
// F32 SSE
|
|
||||||
|
|
||||||
#define GGML_F32_STEP 32
|
|
||||||
#define GGML_F32_EPR 4
|
|
||||||
|
|
||||||
#define GGML_F32x4 __m128
|
|
||||||
#define GGML_F32x4_ZERO _mm_setzero_ps()
|
|
||||||
#define GGML_F32x4_SET1(x) _mm_set1_ps(x)
|
|
||||||
#define GGML_F32x4_LOAD _mm_loadu_ps
|
|
||||||
#define GGML_F32x4_STORE _mm_storeu_ps
|
|
||||||
#if defined(__FMA__)
|
|
||||||
// TODO: Does this work?
|
|
||||||
#define GGML_F32x4_FMA(a, b, c) _mm_fmadd_ps(b, c, a)
|
|
||||||
#else
|
|
||||||
#define GGML_F32x4_FMA(a, b, c) _mm_add_ps(_mm_mul_ps(b, c), a)
|
|
||||||
#endif
|
|
||||||
#define GGML_F32x4_ADD _mm_add_ps
|
|
||||||
#define GGML_F32x4_MUL _mm_mul_ps
|
|
||||||
#define GGML_F32x4_REDUCE(res, x) \
|
|
||||||
{ \
|
|
||||||
int offset = GGML_F32_ARR >> 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
x[i] = _mm_add_ps(x[i], x[offset+i]); \
|
|
||||||
} \
|
|
||||||
offset >>= 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
x[i] = _mm_add_ps(x[i], x[offset+i]); \
|
|
||||||
} \
|
|
||||||
offset >>= 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
x[i] = _mm_add_ps(x[i], x[offset+i]); \
|
|
||||||
} \
|
|
||||||
const __m128 t0 = _mm_hadd_ps(x[0], x[0]); \
|
|
||||||
res = (ggml_float) _mm_cvtss_f32(_mm_hadd_ps(t0, t0)); \
|
|
||||||
}
|
|
||||||
// TODO: is this optimal ?
|
|
||||||
|
|
||||||
#define GGML_F32_VEC GGML_F32x4
|
|
||||||
#define GGML_F32_VEC_ZERO GGML_F32x4_ZERO
|
|
||||||
#define GGML_F32_VEC_SET1 GGML_F32x4_SET1
|
|
||||||
#define GGML_F32_VEC_LOAD GGML_F32x4_LOAD
|
|
||||||
#define GGML_F32_VEC_STORE GGML_F32x4_STORE
|
|
||||||
#define GGML_F32_VEC_FMA GGML_F32x4_FMA
|
|
||||||
#define GGML_F32_VEC_ADD GGML_F32x4_ADD
|
|
||||||
#define GGML_F32_VEC_MUL GGML_F32x4_MUL
|
|
||||||
#define GGML_F32_VEC_REDUCE GGML_F32x4_REDUCE
|
|
||||||
|
|
||||||
// F16 SSE
|
|
||||||
|
|
||||||
#define GGML_F16_STEP 32
|
|
||||||
#define GGML_F16_EPR 4
|
|
||||||
|
|
||||||
static inline __m128 __sse_f16x4_load(const ggml_fp16_t * x) {
|
|
||||||
float tmp[4];
|
|
||||||
|
|
||||||
tmp[0] = GGML_FP16_TO_FP32(x[0]);
|
|
||||||
tmp[1] = GGML_FP16_TO_FP32(x[1]);
|
|
||||||
tmp[2] = GGML_FP16_TO_FP32(x[2]);
|
|
||||||
tmp[3] = GGML_FP16_TO_FP32(x[3]);
|
|
||||||
|
|
||||||
return _mm_loadu_ps(tmp);
|
|
||||||
}
|
|
||||||
|
|
||||||
static inline void __sse_f16x4_store(ggml_fp16_t * x, __m128 y) {
|
|
||||||
float arr[4];
|
|
||||||
|
|
||||||
_mm_storeu_ps(arr, y);
|
|
||||||
|
|
||||||
x[0] = GGML_FP32_TO_FP16(arr[0]);
|
|
||||||
x[1] = GGML_FP32_TO_FP16(arr[1]);
|
|
||||||
x[2] = GGML_FP32_TO_FP16(arr[2]);
|
|
||||||
x[3] = GGML_FP32_TO_FP16(arr[3]);
|
|
||||||
}
|
|
||||||
|
|
||||||
#define GGML_F32Cx4 __m128
|
|
||||||
#define GGML_F32Cx4_ZERO _mm_setzero_ps()
|
|
||||||
#define GGML_F32Cx4_SET1(x) _mm_set1_ps(x)
|
|
||||||
#define GGML_F32Cx4_LOAD(x) __sse_f16x4_load(x)
|
|
||||||
#define GGML_F32Cx4_STORE(x, y) __sse_f16x4_store(x, y)
|
|
||||||
#define GGML_F32Cx4_FMA GGML_F32x4_FMA
|
|
||||||
#define GGML_F32Cx4_ADD _mm_add_ps
|
|
||||||
#define GGML_F32Cx4_MUL _mm_mul_ps
|
|
||||||
#define GGML_F32Cx4_REDUCE GGML_F32x4_REDUCE
|
|
||||||
|
|
||||||
#define GGML_F16_VEC GGML_F32Cx4
|
|
||||||
#define GGML_F16_VEC_ZERO GGML_F32Cx4_ZERO
|
|
||||||
#define GGML_F16_VEC_SET1 GGML_F32Cx4_SET1
|
|
||||||
#define GGML_F16_VEC_LOAD(p, i) GGML_F32Cx4_LOAD(p)
|
|
||||||
#define GGML_F16_VEC_STORE(p, r, i) GGML_F32Cx4_STORE(p, r[i])
|
|
||||||
#define GGML_F16_VEC_FMA GGML_F32Cx4_FMA
|
|
||||||
#define GGML_F16_VEC_ADD GGML_F32Cx4_ADD
|
|
||||||
#define GGML_F16_VEC_MUL GGML_F32Cx4_MUL
|
|
||||||
#define GGML_F16_VEC_REDUCE GGML_F32Cx4_REDUCE
|
|
||||||
|
|
||||||
#elif defined(__loongarch_asx)
|
|
||||||
|
|
||||||
#define GGML_SIMD
|
|
||||||
|
|
||||||
// F32 LASX
|
|
||||||
#define GGML_F32_STEP 32
|
|
||||||
#define GGML_F32_EPR 8
|
|
||||||
|
|
||||||
#define GGML_F32x8 __m256
|
|
||||||
#define GGML_F32x8_ZERO (__m256)__lasx_xvldi(0)
|
|
||||||
#define GGML_F32x8_SET1(x) (__m256)__lasx_xvreplfr2vr_s((x))
|
|
||||||
#define GGML_F32x8_LOAD(x) (__m256)__lasx_xvld((x), 0)
|
|
||||||
#define GGML_F32x8_STORE(x,y) __lasx_xvst((y), (x), 0)
|
|
||||||
#define GGML_F32x8_FMA(a, b, c) __lasx_xvfmadd_s(b, c, a)
|
|
||||||
#define GGML_F32x8_ADD __lasx_xvfadd_s
|
|
||||||
#define GGML_F32x8_MUL __lasx_xvfmul_s
|
|
||||||
#define GGML_F32x8_REDUCE(res, x) \
|
|
||||||
do { \
|
|
||||||
int offset = GGML_F32_ARR >> 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
x[i] = __lasx_xvfadd_s(x[i], x[offset+i]); \
|
|
||||||
} \
|
|
||||||
offset >>= 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
x[i] = __lasx_xvfadd_s(x[i], x[offset+i]); \
|
|
||||||
} \
|
|
||||||
offset >>= 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
x[i] = __lasx_xvfadd_s(x[i], x[offset+i]); \
|
|
||||||
} \
|
|
||||||
float *tmp_p = (float *)&x[0]; \
|
|
||||||
res = tmp_p[0] + tmp_p[1] + tmp_p[2] + tmp_p[3] + tmp_p[4] + tmp_p[5] + tmp_p[6] + tmp_p[7]; \
|
|
||||||
} while (0)
|
|
||||||
// TODO: is this optimal ?
|
|
||||||
|
|
||||||
#define GGML_F32_VEC GGML_F32x8
|
|
||||||
#define GGML_F32_VEC_ZERO GGML_F32x8_ZERO
|
|
||||||
#define GGML_F32_VEC_SET1 GGML_F32x8_SET1
|
|
||||||
#define GGML_F32_VEC_LOAD GGML_F32x8_LOAD
|
|
||||||
#define GGML_F32_VEC_STORE GGML_F32x8_STORE
|
|
||||||
#define GGML_F32_VEC_FMA GGML_F32x8_FMA
|
|
||||||
#define GGML_F32_VEC_ADD GGML_F32x8_ADD
|
|
||||||
#define GGML_F32_VEC_MUL GGML_F32x8_MUL
|
|
||||||
#define GGML_F32_VEC_REDUCE GGML_F32x8_REDUCE
|
|
||||||
|
|
||||||
// F16 LASX
|
|
||||||
|
|
||||||
#define GGML_F16_STEP 32
|
|
||||||
#define GGML_F16_EPR 8
|
|
||||||
|
|
||||||
// F16 arithmetic is not supported by LASX, so we use F32 instead
|
|
||||||
|
|
||||||
#define GGML_F32Cx8 __m256
|
|
||||||
#define GGML_F32Cx8_ZERO (__m256)__lasx_xvldi(0)
|
|
||||||
#define GGML_F32Cx8_SET1(x) (__m256)__lasx_xvreplgr2vr_w((x))
|
|
||||||
|
|
||||||
static inline __m256 __lasx_f32cx8_load(const ggml_fp16_t * x) {
|
|
||||||
__m256i a;
|
|
||||||
memcpy(&a, x, sizeof(ggml_fp16_t) * 8);
|
|
||||||
a = __lasx_xvpermi_d(a, 0 | (1 << 4));
|
|
||||||
return __lasx_xvfcvtl_s_h(a);
|
|
||||||
}
|
|
||||||
|
|
||||||
static inline void __lasx_f32cx8_store(ggml_fp16_t * x, __m256 y) {
|
|
||||||
__m256i a = __lasx_xvfcvt_h_s(y, y);
|
|
||||||
a = __lasx_xvpermi_d(a, 0 | (2 << 2));
|
|
||||||
memcpy(x, &a, sizeof(ggml_fp16_t) * 8);
|
|
||||||
}
|
|
||||||
#define GGML_F32Cx8_LOAD(x) __lasx_f32cx8_load(x)
|
|
||||||
#define GGML_F32Cx8_STORE(x, y) __lasx_f32cx8_store(x, y)
|
|
||||||
|
|
||||||
#define GGML_F32Cx8_FMA GGML_F32x8_FMA
|
|
||||||
#define GGML_F32Cx8_ADD __lasx_xvfadd_s
|
|
||||||
#define GGML_F32Cx8_MUL __lasx_xvfmul_s
|
|
||||||
#define GGML_F32Cx8_REDUCE GGML_F32x8_REDUCE
|
|
||||||
|
|
||||||
#define GGML_F16_VEC GGML_F32Cx8
|
|
||||||
#define GGML_F16_VEC_ZERO GGML_F32Cx8_ZERO
|
|
||||||
#define GGML_F16_VEC_SET1 GGML_F32Cx8_SET1
|
|
||||||
#define GGML_F16_VEC_LOAD(p, i) GGML_F32Cx8_LOAD(p)
|
|
||||||
#define GGML_F16_VEC_STORE(p, r, i) GGML_F32Cx8_STORE(p, r[i])
|
|
||||||
#define GGML_F16_VEC_FMA GGML_F32Cx8_FMA
|
|
||||||
#define GGML_F16_VEC_ADD GGML_F32Cx8_ADD
|
|
||||||
#define GGML_F16_VEC_MUL GGML_F32Cx8_MUL
|
|
||||||
#define GGML_F16_VEC_REDUCE GGML_F32Cx8_REDUCE
|
|
||||||
|
|
||||||
#elif defined(__loongarch_sx)
|
|
||||||
|
|
||||||
#define GGML_SIMD
|
|
||||||
|
|
||||||
// F32 LSX
|
|
||||||
|
|
||||||
#define GGML_F32_STEP 32
|
|
||||||
#define GGML_F32_EPR 4
|
|
||||||
|
|
||||||
#define GGML_F32x4 __m128
|
|
||||||
#define GGML_F32x4_ZERO __lsx_vldi(0)
|
|
||||||
#define GGML_F32x4_SET1(x) __lsx_vinsgr2vr_w(__lsx_vldi(0),(x), 0)
|
|
||||||
#define GGML_F32x4_LOAD(x) __lsx_vld((x), 0)
|
|
||||||
#define GGML_F32x4_STORE((x),(y)) __lsx_vst((y), (x), 0)
|
|
||||||
#define GGML_F32x4_FMA(a, b, c) __lsx_vfmadd_s(b, c, a)
|
|
||||||
#define GGML_F32x4_ADD __lsx_vfadd_s
|
|
||||||
#define GGML_F32x4_MUL __lsx_vfmul_s
|
|
||||||
#define GGML_F32x4_REDUCE(res, x) \
|
|
||||||
{ \
|
|
||||||
int offset = GGML_F32_ARR >> 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
x[i] = __lsx_vfadd_s(x[i], x[offset + i]); \
|
|
||||||
} \
|
|
||||||
offset >>= 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
x[i] = __lsx_vfadd_s(x[i], x[offset + i]); \
|
|
||||||
} \
|
|
||||||
offset >>= 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
x[i] = __lsx_vfadd_s(x[i], x[offset + i]); \
|
|
||||||
} \
|
|
||||||
__m128i tmp = __lsx_vsrli_d((__m128i) x[0], 32); \
|
|
||||||
tmp = (__m128i) __lsx_vfadd_s((__m128) tmp, x[0]); \
|
|
||||||
tmp = __lsx_vpickev_w(__lsx_vldi(0), tmp); \
|
|
||||||
const __m128 t0 = __lsx_vshuf4i_w(tmp, 0x88); \
|
|
||||||
tmp = __lsx_vsrli_d((__m128i) t0, 32); \
|
|
||||||
tmp = (__m128i) __lsx_vfadd_s((__m128) tmp, t0); \
|
|
||||||
tmp = __lsx_vpickev_w(__lsx_vldi(0), tmp); \
|
|
||||||
res = (ggml_float) __lsx_vpickve2gr_w(__lsx_vshuf4i_w(tmp, 0x88), 0); \
|
|
||||||
}
|
|
||||||
|
|
||||||
#define GGML_F32_VEC GGML_F32x4
|
|
||||||
#define GGML_F32_VEC_ZERO GGML_F32x4_ZERO
|
|
||||||
#define GGML_F32_VEC_SET1 GGML_F32x4_SET1
|
|
||||||
#define GGML_F32_VEC_LOAD GGML_F32x4_LOAD
|
|
||||||
#define GGML_F32_VEC_STORE GGML_F32x4_STORE
|
|
||||||
#define GGML_F32_VEC_FMA GGML_F32x4_FMA
|
|
||||||
#define GGML_F32_VEC_ADD GGML_F32x4_ADD
|
|
||||||
#define GGML_F32_VEC_MUL GGML_F32x4_MUL
|
|
||||||
#define GGML_F32_VEC_REDUCE GGML_F32x4_REDUCE
|
|
||||||
|
|
||||||
// F16 LSX
|
|
||||||
|
|
||||||
#define GGML_F16_STEP 32
|
|
||||||
#define GGML_F16_EPR 4
|
|
||||||
|
|
||||||
static inline __m128 __lsx_f16x4_load(const ggml_fp16_t * x) {
|
|
||||||
float tmp[4];
|
|
||||||
|
|
||||||
tmp[0] = GGML_FP16_TO_FP32(x[0]);
|
|
||||||
tmp[1] = GGML_FP16_TO_FP32(x[1]);
|
|
||||||
tmp[2] = GGML_FP16_TO_FP32(x[2]);
|
|
||||||
tmp[3] = GGML_FP16_TO_FP32(x[3]);
|
|
||||||
|
|
||||||
return __lsx_vld(tmp, 0);
|
|
||||||
}
|
|
||||||
|
|
||||||
static inline void __lsx_f16x4_store(ggml_fp16_t * x, __m128 y) {
|
|
||||||
float arr[4];
|
|
||||||
|
|
||||||
__lsx_vst(y, arr, 0);
|
|
||||||
|
|
||||||
x[0] = GGML_FP32_TO_FP16(arr[0]);
|
|
||||||
x[1] = GGML_FP32_TO_FP16(arr[1]);
|
|
||||||
x[2] = GGML_FP32_TO_FP16(arr[2]);
|
|
||||||
x[3] = GGML_FP32_TO_FP16(arr[3]);
|
|
||||||
}
|
|
||||||
|
|
||||||
#define GGML_F32Cx4 __m128
|
|
||||||
#define GGML_F32Cx4_ZERO __lsx_vldi(0)
|
|
||||||
#define GGML_F32Cx4_SET1(x) __lsx_vinsgr2vr_w(__lsx_vldi(0),(x), 0)
|
|
||||||
#define GGML_F32Cx4_LOAD(x) __lsx_f16x4_load(x)
|
|
||||||
#define GGML_F32Cx4_STORE(x, y) __lsx_f16x4_store(x, y)
|
|
||||||
#define GGML_F32Cx4_FMA GGML_F32x4_FMA
|
|
||||||
#define GGML_F32Cx4_ADD __lsx_vfadd_s
|
|
||||||
#define GGML_F32Cx4_MUL __lsx_vfmul_s
|
|
||||||
#define GGML_F32Cx4_REDUCE GGML_F32x4_REDUCE
|
|
||||||
|
|
||||||
#define GGML_F16_VEC GGML_F32Cx4
|
|
||||||
#define GGML_F16_VEC_ZERO GGML_F32Cx4_ZERO
|
|
||||||
#define GGML_F16_VEC_SET1 GGML_F32Cx4_SET1
|
|
||||||
#define GGML_F16_VEC_LOAD(p, i) GGML_F32Cx4_LOAD(p)
|
|
||||||
#define GGML_F16_VEC_STORE(p, r, i) GGML_F32Cx4_STORE(p, r[i])
|
|
||||||
#define GGML_F16_VEC_FMA GGML_F32Cx4_FMA
|
|
||||||
#define GGML_F16_VEC_ADD GGML_F32Cx4_ADD
|
|
||||||
#define GGML_F16_VEC_MUL GGML_F32Cx4_MUL
|
|
||||||
#define GGML_F16_VEC_REDUCE GGML_F32Cx4_REDUCE
|
|
||||||
|
|
||||||
#elif defined(__VXE__) || defined(__VXE2__)
|
|
||||||
|
|
||||||
#define GGML_SIMD
|
|
||||||
|
|
||||||
// F32 s390x
|
|
||||||
|
|
||||||
#define GGML_F32_STEP 32
|
|
||||||
#define GGML_F32_EPR 4
|
|
||||||
|
|
||||||
#define GGML_F32x4 __vector float
|
|
||||||
#define GGML_F32x4_ZERO vec_splats(0.0f)
|
|
||||||
#define GGML_F32x4_SET1 vec_splats
|
|
||||||
#define GGML_F32x4_LOAD(p) vec_xl(0, p)
|
|
||||||
#define GGML_F32x4_STORE(p, r) vec_xst(r, 0, p)
|
|
||||||
#define GGML_F32x4_FMA(a, b, c) vec_madd(b, c, a)
|
|
||||||
#define GGML_F32x4_ADD vec_add
|
|
||||||
#define GGML_F32x4_MUL vec_mul
|
|
||||||
#define GGML_F32x4_REDUCE(res, x) \
|
|
||||||
{ \
|
|
||||||
int offset = GGML_F32_ARR >> 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
x[i] = vec_add(x[i], x[offset + i]); \
|
|
||||||
} \
|
|
||||||
offset >>= 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
x[i] = vec_add(x[i], x[offset + i]); \
|
|
||||||
} \
|
|
||||||
offset >>= 1; \
|
|
||||||
for (int i = 0; i < offset; ++i) { \
|
|
||||||
x[i] = vec_add(x[i], x[offset + i]); \
|
|
||||||
} \
|
|
||||||
res = vec_extract(x[0], 0) + \
|
|
||||||
vec_extract(x[0], 1) + \
|
|
||||||
vec_extract(x[0], 2) + \
|
|
||||||
vec_extract(x[0], 3); \
|
|
||||||
}
|
|
||||||
|
|
||||||
#define GGML_F32_VEC GGML_F32x4
|
|
||||||
#define GGML_F32_VEC_ZERO GGML_F32x4_ZERO
|
|
||||||
#define GGML_F32_VEC_SET1 GGML_F32x4_SET1
|
|
||||||
#define GGML_F32_VEC_LOAD GGML_F32x4_LOAD
|
|
||||||
#define GGML_F32_VEC_STORE GGML_F32x4_STORE
|
|
||||||
#define GGML_F32_VEC_FMA GGML_F32x4_FMA
|
|
||||||
#define GGML_F32_VEC_ADD GGML_F32x4_ADD
|
|
||||||
#define GGML_F32_VEC_MUL GGML_F32x4_MUL
|
|
||||||
#define GGML_F32_VEC_REDUCE GGML_F32x4_REDUCE
|
|
||||||
|
|
||||||
// F16 s390x
|
|
||||||
#define GGML_F16_STEP GGML_F32_STEP
|
|
||||||
#define GGML_F16_EPR GGML_F32_EPR
|
|
||||||
|
|
||||||
static inline __vector float __lzs_f16cx4_load(const ggml_fp16_t * x) {
|
|
||||||
float tmp[4];
|
|
||||||
|
|
||||||
for (int i = 0; i < 4; i++) {
|
|
||||||
tmp[i] = GGML_FP16_TO_FP32(x[i]);
|
|
||||||
}
|
|
||||||
|
|
||||||
return vec_xl(0, tmp);
|
|
||||||
}
|
|
||||||
|
|
||||||
static inline void __lzs_f16cx4_store(ggml_fp16_t * x, __vector float y) {
|
|
||||||
float arr[4];
|
|
||||||
|
|
||||||
vec_xst(y, 0, arr);
|
|
||||||
|
|
||||||
for (int i = 0; i < 4; i++) {
|
|
||||||
x[i] = GGML_FP32_TO_FP16(arr[i]);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#define GGML_F16_VEC GGML_F32x4
|
|
||||||
#define GGML_F16_VEC_ZERO GGML_F32x4_ZERO
|
|
||||||
#define GGML_F16_VEC_SET1 GGML_F32x4_SET1
|
|
||||||
#define GGML_F16_VEC_LOAD(p, i) __lzs_f16cx4_load(p)
|
|
||||||
#define GGML_F16_VEC_STORE(p, r, i) __lzs_f16cx4_store(p, r[i])
|
|
||||||
#define GGML_F16_VEC_FMA GGML_F32x4_FMA
|
|
||||||
#define GGML_F16_VEC_ADD GGML_F32x4_ADD
|
|
||||||
#define GGML_F16_VEC_MUL GGML_F32x4_MUL
|
|
||||||
#define GGML_F16_VEC_REDUCE GGML_F32x4_REDUCE
|
|
||||||
|
|
||||||
#endif
|
|
||||||
|
|
||||||
// GGML_F32_ARR / GGML_F16_ARR
|
|
||||||
// number of registers to use per step
|
|
||||||
#ifdef GGML_SIMD
|
|
||||||
#define GGML_F32_ARR (GGML_F32_STEP/GGML_F32_EPR)
|
|
||||||
#define GGML_F16_ARR (GGML_F16_STEP/GGML_F16_EPR)
|
|
||||||
#endif
|
|
@ -1,258 +0,0 @@
|
|||||||
#include "vec.h"
|
|
||||||
|
|
||||||
#include <cassert>
|
|
||||||
|
|
||||||
#if defined(_MSC_VER)
|
|
||||||
// disable "possible loss of data" to avoid hundreds of casts
|
|
||||||
// we should just be careful :)
|
|
||||||
#pragma warning(disable: 4244 4267)
|
|
||||||
#endif
|
|
||||||
|
|
||||||
// precomputed gelu table for f16 (128 KB)
|
|
||||||
ggml_fp16_t ggml_table_gelu_f16[1 << 16];
|
|
||||||
|
|
||||||
// precomputed quick gelu table for f16 (128 KB)
|
|
||||||
ggml_fp16_t ggml_table_gelu_quick_f16[1 << 16];
|
|
||||||
|
|
||||||
void ggml_vec_dot_f32(int n, float * GGML_RESTRICT s, size_t bs, const float * GGML_RESTRICT x, size_t bx, const float * GGML_RESTRICT y, size_t by, int nrc) {
|
|
||||||
assert(nrc == 1);
|
|
||||||
GGML_UNUSED(nrc);
|
|
||||||
GGML_UNUSED(bx);
|
|
||||||
GGML_UNUSED(by);
|
|
||||||
GGML_UNUSED(bs);
|
|
||||||
|
|
||||||
#if defined(GGML_SIMD)
|
|
||||||
float sumf = 0.0f;
|
|
||||||
const int np = (n & ~(GGML_F32_STEP - 1));
|
|
||||||
|
|
||||||
GGML_F32_VEC sum[GGML_F32_ARR] = { GGML_F32_VEC_ZERO };
|
|
||||||
|
|
||||||
GGML_F32_VEC ax[GGML_F32_ARR];
|
|
||||||
GGML_F32_VEC ay[GGML_F32_ARR];
|
|
||||||
|
|
||||||
for (int i = 0; i < np; i += GGML_F32_STEP) {
|
|
||||||
for (int j = 0; j < GGML_F32_ARR; j++) {
|
|
||||||
ax[j] = GGML_F32_VEC_LOAD(x + i + j*GGML_F32_EPR);
|
|
||||||
ay[j] = GGML_F32_VEC_LOAD(y + i + j*GGML_F32_EPR);
|
|
||||||
|
|
||||||
sum[j] = GGML_F32_VEC_FMA(sum[j], ax[j], ay[j]);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// reduce sum0..sum3 to sum0
|
|
||||||
GGML_F32_VEC_REDUCE(sumf, sum);
|
|
||||||
|
|
||||||
// leftovers
|
|
||||||
for (int i = np; i < n; ++i) {
|
|
||||||
sumf += x[i]*y[i];
|
|
||||||
}
|
|
||||||
#else
|
|
||||||
// scalar
|
|
||||||
ggml_float sumf = 0.0;
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
sumf += (ggml_float)(x[i]*y[i]);
|
|
||||||
}
|
|
||||||
#endif
|
|
||||||
|
|
||||||
*s = sumf;
|
|
||||||
}
|
|
||||||
|
|
||||||
void ggml_vec_dot_bf16(int n, float * GGML_RESTRICT s, size_t bs, ggml_bf16_t * GGML_RESTRICT x, size_t bx, ggml_bf16_t * GGML_RESTRICT y, size_t by, int nrc) {
|
|
||||||
assert(nrc == 1);
|
|
||||||
GGML_UNUSED(nrc);
|
|
||||||
GGML_UNUSED(bx);
|
|
||||||
GGML_UNUSED(by);
|
|
||||||
GGML_UNUSED(bs);
|
|
||||||
int i = 0;
|
|
||||||
ggml_float sumf = 0;
|
|
||||||
|
|
||||||
#if defined(__AVX512BF16__)
|
|
||||||
__m512 c1 = _mm512_setzero_ps();
|
|
||||||
__m512 c2 = _mm512_setzero_ps();
|
|
||||||
for (; i + 64 <= n; i += 64) {
|
|
||||||
c1 = _mm512_dpbf16_ps(c1, m512bh(_mm512_loadu_si512((x + i))),
|
|
||||||
m512bh(_mm512_loadu_si512((y + i))));
|
|
||||||
c2 = _mm512_dpbf16_ps(c2, m512bh(_mm512_loadu_si512((x + i + 32))),
|
|
||||||
m512bh(_mm512_loadu_si512((y + i + 32))));
|
|
||||||
}
|
|
||||||
sumf += (ggml_float)_mm512_reduce_add_ps(c1);
|
|
||||||
sumf += (ggml_float)_mm512_reduce_add_ps(c2);
|
|
||||||
|
|
||||||
#elif defined(__AVX512F__)
|
|
||||||
#define LOAD(p) _mm512_castsi512_ps(_mm512_slli_epi32(_mm512_cvtepu16_epi32(_mm256_loadu_si256((const __m256i *)(p))), 16))
|
|
||||||
__m512 c1 = _mm512_setzero_ps();
|
|
||||||
__m512 c2 = _mm512_setzero_ps();
|
|
||||||
for (; i + 32 <= n; i += 32) {
|
|
||||||
c1 = _mm512_add_ps(_mm512_mul_ps(LOAD(x + i), LOAD(y + i)), c1);
|
|
||||||
c2 = _mm512_add_ps(_mm512_mul_ps(LOAD(x + i + 16), LOAD(y + i + 16)), c2);
|
|
||||||
}
|
|
||||||
sumf += (ggml_float)_mm512_reduce_add_ps(c1);
|
|
||||||
sumf += (ggml_float)_mm512_reduce_add_ps(c2);
|
|
||||||
|
|
||||||
#undef LOAD
|
|
||||||
#elif defined(__AVX2__) || defined(__AVX__)
|
|
||||||
#if defined(__AVX2__)
|
|
||||||
#define LOAD(p) _mm256_castsi256_ps(_mm256_slli_epi32(_mm256_cvtepu16_epi32(_mm_loadu_si128((const __m128i *)(p))), 16))
|
|
||||||
#else
|
|
||||||
#define LOAD(p) _mm256_castsi256_ps(_mm256_insertf128_si256(_mm256_castsi128_si256(_mm_slli_epi32(_mm_cvtepu16_epi32(_mm_loadu_si128((const __m128i *)(p))), 16)), (_mm_slli_epi32(_mm_cvtepu16_epi32(_mm_bsrli_si128(_mm_loadu_si128((const __m128i *)(p)), 8)), 16)), 1))
|
|
||||||
#endif
|
|
||||||
__m256 c1 = _mm256_setzero_ps();
|
|
||||||
__m256 c2 = _mm256_setzero_ps();
|
|
||||||
__m256 c3 = _mm256_setzero_ps();
|
|
||||||
__m256 c4 = _mm256_setzero_ps();
|
|
||||||
for (; i + 32 <= n; i += 32) {
|
|
||||||
c1 = _mm256_add_ps(_mm256_mul_ps(LOAD(x + i), LOAD(y + i)), c1);
|
|
||||||
c2 = _mm256_add_ps(_mm256_mul_ps(LOAD(x + i + 8), LOAD(y + i + 8)), c2);
|
|
||||||
c3 = _mm256_add_ps(_mm256_mul_ps(LOAD(x + i + 16), LOAD(y + i + 16)), c3);
|
|
||||||
c4 = _mm256_add_ps(_mm256_mul_ps(LOAD(x + i + 24), LOAD(y + i + 24)), c4);
|
|
||||||
}
|
|
||||||
__m128 g;
|
|
||||||
c1 = _mm256_add_ps(_mm256_add_ps(c1, c3),
|
|
||||||
_mm256_add_ps(c2, c4));
|
|
||||||
g = _mm_add_ps(_mm256_extractf128_ps(c1, 1),
|
|
||||||
_mm256_castps256_ps128(c1));
|
|
||||||
g = _mm_add_ps(g, _mm_movehl_ps(g, g));
|
|
||||||
g = _mm_add_ss(g, _mm_movehdup_ps(g));
|
|
||||||
sumf += (ggml_float)_mm_cvtss_f32(g);
|
|
||||||
|
|
||||||
#undef LOAD
|
|
||||||
#endif
|
|
||||||
|
|
||||||
for (; i < n; ++i) {
|
|
||||||
sumf += (ggml_float)(GGML_BF16_TO_FP32(x[i]) *
|
|
||||||
GGML_BF16_TO_FP32(y[i]));
|
|
||||||
}
|
|
||||||
*s = sumf;
|
|
||||||
}
|
|
||||||
|
|
||||||
void ggml_vec_dot_f16(int n, float * GGML_RESTRICT s, size_t bs, ggml_fp16_t * GGML_RESTRICT x, size_t bx, ggml_fp16_t * GGML_RESTRICT y, size_t by, int nrc) {
|
|
||||||
assert(nrc == 1);
|
|
||||||
GGML_UNUSED(nrc);
|
|
||||||
GGML_UNUSED(bx);
|
|
||||||
GGML_UNUSED(by);
|
|
||||||
GGML_UNUSED(bs);
|
|
||||||
|
|
||||||
ggml_float sumf = 0.0;
|
|
||||||
|
|
||||||
#if defined(GGML_SIMD)
|
|
||||||
const int np = (n & ~(GGML_F16_STEP - 1));
|
|
||||||
|
|
||||||
GGML_F16_VEC sum[GGML_F16_ARR] = { GGML_F16_VEC_ZERO };
|
|
||||||
|
|
||||||
GGML_F16_VEC ax[GGML_F16_ARR];
|
|
||||||
GGML_F16_VEC ay[GGML_F16_ARR];
|
|
||||||
|
|
||||||
for (int i = 0; i < np; i += GGML_F16_STEP) {
|
|
||||||
for (int j = 0; j < GGML_F16_ARR; j++) {
|
|
||||||
ax[j] = GGML_F16_VEC_LOAD(x + i + j*GGML_F16_EPR, j);
|
|
||||||
ay[j] = GGML_F16_VEC_LOAD(y + i + j*GGML_F16_EPR, j);
|
|
||||||
|
|
||||||
sum[j] = GGML_F16_VEC_FMA(sum[j], ax[j], ay[j]);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// reduce sum0..sum3 to sum0
|
|
||||||
GGML_F16_VEC_REDUCE(sumf, sum);
|
|
||||||
|
|
||||||
// leftovers
|
|
||||||
for (int i = np; i < n; ++i) {
|
|
||||||
sumf += (ggml_float)(GGML_FP16_TO_FP32(x[i])*GGML_FP16_TO_FP32(y[i]));
|
|
||||||
}
|
|
||||||
#else
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
sumf += (ggml_float)(GGML_FP16_TO_FP32(x[i])*GGML_FP16_TO_FP32(y[i]));
|
|
||||||
}
|
|
||||||
#endif
|
|
||||||
|
|
||||||
*s = sumf;
|
|
||||||
}
|
|
||||||
|
|
||||||
void ggml_vec_silu_f32(const int n, float * y, const float * x) {
|
|
||||||
int i = 0;
|
|
||||||
#if defined(__AVX512F__) && defined(__AVX512DQ__)
|
|
||||||
for (; i + 15 < n; i += 16) {
|
|
||||||
_mm512_storeu_ps(y + i, ggml_v_silu(_mm512_loadu_ps(x + i)));
|
|
||||||
}
|
|
||||||
#elif defined(__AVX2__) && defined(__FMA__)
|
|
||||||
for (; i + 7 < n; i += 8) {
|
|
||||||
_mm256_storeu_ps(y + i, ggml_v_silu(_mm256_loadu_ps(x + i)));
|
|
||||||
}
|
|
||||||
#elif defined(__SSE2__)
|
|
||||||
for (; i + 3 < n; i += 4) {
|
|
||||||
_mm_storeu_ps(y + i, ggml_v_silu(_mm_loadu_ps(x + i)));
|
|
||||||
}
|
|
||||||
#elif defined(__ARM_NEON) && defined(__aarch64__)
|
|
||||||
for (; i + 3 < n; i += 4) {
|
|
||||||
vst1q_f32(y + i, ggml_v_silu(vld1q_f32(x + i)));
|
|
||||||
}
|
|
||||||
#endif
|
|
||||||
for (; i < n; ++i) {
|
|
||||||
y[i] = ggml_silu_f32(x[i]);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
ggml_float ggml_vec_soft_max_f32(const int n, float * y, const float * x, float max) {
|
|
||||||
int i = 0;
|
|
||||||
ggml_float sum = 0;
|
|
||||||
#if defined(__AVX512F__) && defined(__AVX512DQ__)
|
|
||||||
for (; i + 15 < n; i += 16) {
|
|
||||||
__m512 val = ggml_v_expf(_mm512_sub_ps(_mm512_loadu_ps(x + i),
|
|
||||||
_mm512_set1_ps(max)));
|
|
||||||
_mm512_storeu_ps(y + i, val);
|
|
||||||
sum += (ggml_float)_mm512_reduce_add_ps(val);
|
|
||||||
}
|
|
||||||
#elif defined(__AVX2__) && defined(__FMA__)
|
|
||||||
for (; i + 7 < n; i += 8) {
|
|
||||||
__m256 val = ggml_v_expf(_mm256_sub_ps(_mm256_loadu_ps(x + i),
|
|
||||||
_mm256_set1_ps(max)));
|
|
||||||
_mm256_storeu_ps(y + i, val);
|
|
||||||
__m128 val2 = _mm_add_ps(_mm256_extractf128_ps(val, 1),
|
|
||||||
_mm256_castps256_ps128(val));
|
|
||||||
val2 = _mm_add_ps(val2, _mm_movehl_ps(val2, val2));
|
|
||||||
val2 = _mm_add_ss(val2, _mm_movehdup_ps(val2));
|
|
||||||
sum += (ggml_float)_mm_cvtss_f32(val2);
|
|
||||||
}
|
|
||||||
#elif defined(__SSE2__)
|
|
||||||
for (; i + 3 < n; i += 4) {
|
|
||||||
__m128 val = ggml_v_expf(_mm_sub_ps(_mm_loadu_ps(x + i),
|
|
||||||
_mm_set1_ps(max)));
|
|
||||||
_mm_storeu_ps(y + i, val);
|
|
||||||
#if defined(__AVX__) || defined(__AVX2__) || defined(__AVX512F__)
|
|
||||||
val = _mm_add_ps(val, _mm_movehl_ps(val, val));
|
|
||||||
val = _mm_add_ss(val, _mm_movehdup_ps(val));
|
|
||||||
#else
|
|
||||||
__m128 tmp = _mm_shuffle_ps(val, val, _MM_SHUFFLE(2, 3, 0, 1));
|
|
||||||
val = _mm_add_ps(val, tmp);
|
|
||||||
tmp = _mm_movehl_ps(tmp, val);
|
|
||||||
val = _mm_add_ss(val, tmp);
|
|
||||||
#endif
|
|
||||||
sum += (ggml_float)_mm_cvtss_f32(val);
|
|
||||||
}
|
|
||||||
#elif defined(__ARM_NEON) && defined(__aarch64__)
|
|
||||||
for (; i + 3 < n; i += 4) {
|
|
||||||
float32x4_t val = ggml_v_expf(vsubq_f32(vld1q_f32(x + i),
|
|
||||||
vdupq_n_f32(max)));
|
|
||||||
vst1q_f32(y + i, val);
|
|
||||||
sum += (ggml_float)vaddvq_f32(val);
|
|
||||||
}
|
|
||||||
#endif
|
|
||||||
for (; i < n; ++i) {
|
|
||||||
float val = expf(x[i] - max);
|
|
||||||
sum += (ggml_float)val;
|
|
||||||
y[i] = val;
|
|
||||||
}
|
|
||||||
return sum;
|
|
||||||
}
|
|
||||||
|
|
||||||
ggml_float ggml_vec_log_soft_max_f32(const int n, float * y, const float * x, float max) {
|
|
||||||
// log(soft_max) = log(soft_max_i / soft_max_sum) = log(soft_max_i) - log(soft_max_sum) = (logit_i - max) - log(soft_max_i)
|
|
||||||
|
|
||||||
int i = 0;
|
|
||||||
ggml_float sum = 0;
|
|
||||||
for (; i < n; ++i) {
|
|
||||||
float val = x[i] - max;
|
|
||||||
y[i] = val;
|
|
||||||
sum += (ggml_float)expf(val);
|
|
||||||
}
|
|
||||||
return sum = (ggml_float)logf(sum);
|
|
||||||
}
|
|
@ -1,802 +0,0 @@
|
|||||||
// Vectorized functions for fundamental operations
|
|
||||||
|
|
||||||
#pragma once
|
|
||||||
|
|
||||||
#include "ggml-impl.h"
|
|
||||||
#include "simd-mappings.h"
|
|
||||||
#include "ggml.h"
|
|
||||||
|
|
||||||
#if defined(GGML_USE_ACCELERATE)
|
|
||||||
#include <Accelerate/Accelerate.h>
|
|
||||||
#endif
|
|
||||||
|
|
||||||
// floating point type used to accumulate sums
|
|
||||||
typedef double ggml_float;
|
|
||||||
|
|
||||||
#define GGML_GELU_FP16
|
|
||||||
#define GGML_GELU_QUICK_FP16
|
|
||||||
|
|
||||||
#define GGML_SOFT_MAX_UNROLL 4
|
|
||||||
#define GGML_VEC_DOT_UNROLL 2
|
|
||||||
#define GGML_VEC_MAD_UNROLL 32
|
|
||||||
|
|
||||||
#ifdef __cplusplus
|
|
||||||
extern "C" {
|
|
||||||
#endif
|
|
||||||
|
|
||||||
//
|
|
||||||
// global data
|
|
||||||
//
|
|
||||||
|
|
||||||
// precomputed gelu table for f16 (128 KB)
|
|
||||||
extern ggml_fp16_t ggml_table_gelu_f16[1 << 16];
|
|
||||||
|
|
||||||
// precomputed quick gelu table for f16 (128 KB)
|
|
||||||
extern ggml_fp16_t ggml_table_gelu_quick_f16[1 << 16];
|
|
||||||
|
|
||||||
//
|
|
||||||
// fundamental operations
|
|
||||||
//
|
|
||||||
|
|
||||||
void ggml_vec_dot_f32(int n, float * GGML_RESTRICT s, size_t bs, const float * GGML_RESTRICT x, size_t bx, const float * GGML_RESTRICT y, size_t by, int nrc);
|
|
||||||
void ggml_vec_dot_bf16(int n, float * GGML_RESTRICT s, size_t bs, ggml_bf16_t * GGML_RESTRICT x, size_t bx, ggml_bf16_t * GGML_RESTRICT y, size_t by, int nrc);
|
|
||||||
void ggml_vec_dot_f16(int n, float * GGML_RESTRICT s, size_t bs, ggml_fp16_t * GGML_RESTRICT x, size_t bx, ggml_fp16_t * GGML_RESTRICT y, size_t by, int nrc);
|
|
||||||
|
|
||||||
void ggml_vec_silu_f32(const int n, float * y, const float * x);
|
|
||||||
ggml_float ggml_vec_soft_max_f32(const int n, float * y, const float * x, float max);
|
|
||||||
ggml_float ggml_vec_log_soft_max_f32(const int n, float * y, const float * x, float max);
|
|
||||||
|
|
||||||
inline static void ggml_vec_set_i8(const int n, int8_t * x, const int8_t v) { for (int i = 0; i < n; ++i) x[i] = v; }
|
|
||||||
inline static void ggml_vec_set_i16(const int n, int16_t * x, const int16_t v) { for (int i = 0; i < n; ++i) x[i] = v; }
|
|
||||||
|
|
||||||
inline static void ggml_vec_set_i32(const int n, int32_t * x, const int32_t v) { for (int i = 0; i < n; ++i) x[i] = v; }
|
|
||||||
inline static void ggml_vec_cpy_i32(const int n, int32_t * y, const int32_t * x) { for (int i = 0; i < n; ++i) y[i] = x[i]; }
|
|
||||||
|
|
||||||
inline static void ggml_vec_set_f16(const int n, ggml_fp16_t * x, const ggml_fp16_t v) { for (int i = 0; i < n; ++i) x[i] = v; }
|
|
||||||
inline static void ggml_vec_set_bf16(const int n, ggml_bf16_t * x, const ggml_bf16_t v) { for (int i = 0; i < n; ++i) x[i] = v; }
|
|
||||||
inline static void ggml_vec_add_f32 (const int n, float * z, const float * x, const float * y) { for (int i = 0; i < n; ++i) z[i] = x[i] + y[i]; }
|
|
||||||
inline static void ggml_vec_add_f16 (const int n, ggml_fp16_t * z, const ggml_fp16_t * x, const ggml_fp16_t * y) {
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
z[i] = GGML_FP32_TO_FP16(GGML_FP16_TO_FP32(x[i]) + GGML_FP16_TO_FP32(y[i]));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
inline static void ggml_vec_add1_f32(const int n, float * z, const float * x, const float v) { for (int i = 0; i < n; ++i) z[i] = x[i] + v; }
|
|
||||||
inline static void ggml_vec_acc_f32 (const int n, float * y, const float * x) { for (int i = 0; i < n; ++i) y[i] += x[i]; }
|
|
||||||
inline static void ggml_vec_acc1_f32(const int n, float * y, const float v) { for (int i = 0; i < n; ++i) y[i] += v; }
|
|
||||||
inline static void ggml_vec_sub_f32 (const int n, float * z, const float * x, const float * y) { for (int i = 0; i < n; ++i) z[i] = x[i] - y[i]; }
|
|
||||||
inline static void ggml_vec_sub_f16 (const int n, ggml_fp16_t * z, const ggml_fp16_t * x, const ggml_fp16_t * y) {
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
z[i] = GGML_FP32_TO_FP16(GGML_FP16_TO_FP32(x[i]) - GGML_FP16_TO_FP32(y[i]));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
inline static void ggml_vec_set_f32 (const int n, float * x, const float v) { for (int i = 0; i < n; ++i) x[i] = v; }
|
|
||||||
inline static void ggml_vec_cpy_f32 (const int n, float * y, const float * x) { for (int i = 0; i < n; ++i) y[i] = x[i]; }
|
|
||||||
inline static void ggml_vec_neg_f32 (const int n, float * y, const float * x) { for (int i = 0; i < n; ++i) y[i] = -x[i]; }
|
|
||||||
inline static void ggml_vec_neg_f16 (const int n, ggml_fp16_t * y, const ggml_fp16_t * x) {
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
y[i] = GGML_FP32_TO_FP16(-GGML_FP16_TO_FP32(x[i]));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
inline static void ggml_vec_mul_f32 (const int n, float * z, const float * x, const float * y) { for (int i = 0; i < n; ++i) z[i] = x[i]*y[i]; }
|
|
||||||
inline static void ggml_vec_mul_f16 (const int n, ggml_fp16_t * z, const ggml_fp16_t * x, const ggml_fp16_t * y) {
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
z[i] = GGML_FP32_TO_FP16(GGML_FP16_TO_FP32(x[i]) * GGML_FP16_TO_FP32(y[i]));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
inline static void ggml_vec_div_f32 (const int n, float * z, const float * x, const float * y) { for (int i = 0; i < n; ++i) z[i] = x[i]/y[i]; }
|
|
||||||
inline static void ggml_vec_div_f16 (const int n, ggml_fp16_t * z, const ggml_fp16_t * x, const ggml_fp16_t * y) {
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
z[i] = GGML_FP32_TO_FP16(GGML_FP16_TO_FP32(x[i]) / GGML_FP16_TO_FP32(y[i]));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// compute GGML_VEC_DOT_UNROLL dot products at once
|
|
||||||
// xs - x row stride in bytes
|
|
||||||
inline static void ggml_vec_dot_f16_unroll(const int n, const int xs, float * GGML_RESTRICT s, void * GGML_RESTRICT xv, ggml_fp16_t * GGML_RESTRICT y) {
|
|
||||||
ggml_float sumf[GGML_VEC_DOT_UNROLL] = { 0.0 };
|
|
||||||
|
|
||||||
ggml_fp16_t * GGML_RESTRICT x[GGML_VEC_DOT_UNROLL];
|
|
||||||
|
|
||||||
for (int i = 0; i < GGML_VEC_DOT_UNROLL; ++i) {
|
|
||||||
x[i] = (ggml_fp16_t *) ((char *) xv + i*xs);
|
|
||||||
}
|
|
||||||
|
|
||||||
#if defined(GGML_SIMD)
|
|
||||||
const int np = (n & ~(GGML_F16_STEP - 1));
|
|
||||||
|
|
||||||
GGML_F16_VEC sum[GGML_VEC_DOT_UNROLL][GGML_F16_ARR] = { { GGML_F16_VEC_ZERO } };
|
|
||||||
|
|
||||||
GGML_F16_VEC ax[GGML_F16_ARR];
|
|
||||||
GGML_F16_VEC ay[GGML_F16_ARR];
|
|
||||||
|
|
||||||
for (int i = 0; i < np; i += GGML_F16_STEP) {
|
|
||||||
for (int j = 0; j < GGML_F16_ARR; j++) {
|
|
||||||
ay[j] = GGML_F16_VEC_LOAD(y + i + j*GGML_F16_EPR, j);
|
|
||||||
|
|
||||||
for (int k = 0; k < GGML_VEC_DOT_UNROLL; ++k) {
|
|
||||||
ax[j] = GGML_F16_VEC_LOAD(x[k] + i + j*GGML_F16_EPR, j);
|
|
||||||
|
|
||||||
sum[k][j] = GGML_F16_VEC_FMA(sum[k][j], ax[j], ay[j]);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// reduce sum0..sum3 to sum0
|
|
||||||
for (int k = 0; k < GGML_VEC_DOT_UNROLL; ++k) {
|
|
||||||
GGML_F16_VEC_REDUCE(sumf[k], sum[k]);
|
|
||||||
}
|
|
||||||
|
|
||||||
// leftovers
|
|
||||||
for (int i = np; i < n; ++i) {
|
|
||||||
for (int j = 0; j < GGML_VEC_DOT_UNROLL; ++j) {
|
|
||||||
sumf[j] += (ggml_float)(GGML_FP16_TO_FP32(x[j][i])*GGML_FP16_TO_FP32(y[i]));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
#else
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
for (int j = 0; j < GGML_VEC_DOT_UNROLL; ++j) {
|
|
||||||
sumf[j] += (ggml_float)(GGML_FP16_TO_FP32(x[j][i])*GGML_FP16_TO_FP32(y[i]));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
#endif
|
|
||||||
|
|
||||||
for (int i = 0; i < GGML_VEC_DOT_UNROLL; ++i) {
|
|
||||||
s[i] = (float)sumf[i];
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
inline static void ggml_vec_mad_f32(const int n, float * GGML_RESTRICT y, const float * GGML_RESTRICT x, const float v) {
|
|
||||||
#if defined(GGML_SIMD)
|
|
||||||
const int np = (n & ~(GGML_F32_STEP - 1));
|
|
||||||
|
|
||||||
GGML_F32_VEC vx = GGML_F32_VEC_SET1(v);
|
|
||||||
|
|
||||||
GGML_F32_VEC ax[GGML_F32_ARR];
|
|
||||||
GGML_F32_VEC ay[GGML_F32_ARR];
|
|
||||||
|
|
||||||
for (int i = 0; i < np; i += GGML_F32_STEP) {
|
|
||||||
for (int j = 0; j < GGML_F32_ARR; j++) {
|
|
||||||
ax[j] = GGML_F32_VEC_LOAD(x + i + j*GGML_F32_EPR);
|
|
||||||
ay[j] = GGML_F32_VEC_LOAD(y + i + j*GGML_F32_EPR);
|
|
||||||
ay[j] = GGML_F32_VEC_FMA(ay[j], ax[j], vx);
|
|
||||||
|
|
||||||
GGML_F32_VEC_STORE(y + i + j*GGML_F32_EPR, ay[j]);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// leftovers
|
|
||||||
for (int i = np; i < n; ++i) {
|
|
||||||
y[i] += x[i]*v;
|
|
||||||
}
|
|
||||||
#else
|
|
||||||
// scalar
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
y[i] += x[i]*v;
|
|
||||||
}
|
|
||||||
#endif
|
|
||||||
}
|
|
||||||
|
|
||||||
inline static void ggml_vec_mad_f16(const int n, ggml_fp16_t * GGML_RESTRICT y, const ggml_fp16_t * GGML_RESTRICT x, const float v) {
|
|
||||||
#if defined(GGML_SIMD)
|
|
||||||
const int np = (n & ~(GGML_F16_STEP - 1));
|
|
||||||
|
|
||||||
GGML_F16_VEC vx = GGML_F16_VEC_SET1(v);
|
|
||||||
|
|
||||||
GGML_F16_VEC ax[GGML_F16_ARR];
|
|
||||||
GGML_F16_VEC ay[GGML_F16_ARR];
|
|
||||||
|
|
||||||
for (int i = 0; i < np; i += GGML_F16_STEP) {
|
|
||||||
for (int j = 0; j < GGML_F16_ARR; j++) {
|
|
||||||
ax[j] = GGML_F16_VEC_LOAD(x + i + j*GGML_F16_EPR, j);
|
|
||||||
ay[j] = GGML_F16_VEC_LOAD(y + i + j*GGML_F16_EPR, j);
|
|
||||||
ay[j] = GGML_F16_VEC_FMA(ay[j], ax[j], vx);
|
|
||||||
|
|
||||||
GGML_F16_VEC_STORE(y + i + j*GGML_F16_EPR, ay, j);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// leftovers
|
|
||||||
for (int i = np; i < n; ++i) {
|
|
||||||
y[i] = GGML_FP32_TO_FP16(GGML_FP16_TO_FP32(y[i]) + GGML_FP16_TO_FP32(x[i])*v);
|
|
||||||
}
|
|
||||||
#else
|
|
||||||
// scalar
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
y[i] = GGML_FP32_TO_FP16(GGML_FP16_TO_FP32(y[i]) + GGML_FP16_TO_FP32(x[i])*v);
|
|
||||||
}
|
|
||||||
#endif
|
|
||||||
}
|
|
||||||
|
|
||||||
// xs and vs are byte strides of x and v
|
|
||||||
inline static void ggml_vec_mad_f32_unroll(const int n, const int xs, const int vs, float * GGML_RESTRICT y, const float * GGML_RESTRICT xv, const float * GGML_RESTRICT vv) {
|
|
||||||
|
|
||||||
const float * GGML_RESTRICT x[GGML_VEC_MAD_UNROLL];
|
|
||||||
const float * GGML_RESTRICT v[GGML_VEC_MAD_UNROLL];
|
|
||||||
|
|
||||||
for (int i = 0; i < GGML_VEC_MAD_UNROLL; ++i) {
|
|
||||||
x[i] = (const float *) ((const char *) xv + i*xs);
|
|
||||||
v[i] = (const float *) ((const char *) vv + i*vs);
|
|
||||||
}
|
|
||||||
|
|
||||||
#if defined(GGML_SIMD)
|
|
||||||
const int np = (n & ~(GGML_F32_STEP - 1));
|
|
||||||
|
|
||||||
GGML_F32_VEC vx[GGML_VEC_MAD_UNROLL];
|
|
||||||
|
|
||||||
for (int k = 0; k < GGML_VEC_MAD_UNROLL; ++k) {
|
|
||||||
vx[k] = GGML_F32_VEC_SET1(v[k][0]);
|
|
||||||
}
|
|
||||||
|
|
||||||
GGML_F32_VEC ax[GGML_VEC_MAD_UNROLL][GGML_F32_ARR];
|
|
||||||
GGML_F32_VEC ay[GGML_F32_ARR];
|
|
||||||
|
|
||||||
for (int i = 0; i < np; i += GGML_F32_STEP) {
|
|
||||||
for (int j = 0; j < GGML_F32_ARR; j++) {
|
|
||||||
ay[j] = GGML_F32_VEC_LOAD(y + i + j*GGML_F32_EPR);
|
|
||||||
|
|
||||||
for (int k = 0; k < GGML_VEC_MAD_UNROLL; ++k) {
|
|
||||||
ax[k][j] = GGML_F32_VEC_LOAD(x[k] + i + j*GGML_F32_EPR);
|
|
||||||
ay[j] = GGML_F32_VEC_FMA(ay[j], ax[k][j], vx[k]);
|
|
||||||
}
|
|
||||||
|
|
||||||
GGML_F32_VEC_STORE(y + i + j*GGML_F32_EPR, ay[j]);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// leftovers
|
|
||||||
for (int k = 0; k < GGML_VEC_MAD_UNROLL; ++k) {
|
|
||||||
for (int i = np; i < n; ++i) {
|
|
||||||
y[i] += x[k][i]*v[k][0];
|
|
||||||
}
|
|
||||||
}
|
|
||||||
#else
|
|
||||||
// scalar
|
|
||||||
for (int k = 0; k < GGML_VEC_MAD_UNROLL; ++k) {
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
y[i] += x[k][i]*v[k][0];
|
|
||||||
}
|
|
||||||
}
|
|
||||||
#endif
|
|
||||||
}
|
|
||||||
|
|
||||||
//inline static void ggml_vec_scale_f32(const int n, float * y, const float v) { for (int i = 0; i < n; ++i) y[i] *= v; }
|
|
||||||
inline static void ggml_vec_scale_f32(const int n, float * y, const float v) {
|
|
||||||
#if defined(GGML_USE_ACCELERATE)
|
|
||||||
vDSP_vsmul(y, 1, &v, y, 1, n);
|
|
||||||
#elif defined(GGML_SIMD)
|
|
||||||
const int np = (n & ~(GGML_F32_STEP - 1));
|
|
||||||
|
|
||||||
GGML_F32_VEC vx = GGML_F32_VEC_SET1(v);
|
|
||||||
|
|
||||||
GGML_F32_VEC ay[GGML_F32_ARR];
|
|
||||||
|
|
||||||
for (int i = 0; i < np; i += GGML_F32_STEP) {
|
|
||||||
for (int j = 0; j < GGML_F32_ARR; j++) {
|
|
||||||
ay[j] = GGML_F32_VEC_LOAD(y + i + j*GGML_F32_EPR);
|
|
||||||
ay[j] = GGML_F32_VEC_MUL(ay[j], vx);
|
|
||||||
|
|
||||||
GGML_F32_VEC_STORE(y + i + j*GGML_F32_EPR, ay[j]);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// leftovers
|
|
||||||
for (int i = np; i < n; ++i) {
|
|
||||||
y[i] *= v;
|
|
||||||
}
|
|
||||||
#else
|
|
||||||
// scalar
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
y[i] *= v;
|
|
||||||
}
|
|
||||||
#endif
|
|
||||||
}
|
|
||||||
|
|
||||||
inline static void ggml_vec_scale_f16(const int n, ggml_fp16_t * y, const float v) {
|
|
||||||
#if defined(GGML_SIMD)
|
|
||||||
const int np = (n & ~(GGML_F16_STEP - 1));
|
|
||||||
|
|
||||||
GGML_F16_VEC vx = GGML_F16_VEC_SET1(v);
|
|
||||||
|
|
||||||
GGML_F16_VEC ay[GGML_F16_ARR];
|
|
||||||
|
|
||||||
for (int i = 0; i < np; i += GGML_F16_STEP) {
|
|
||||||
for (int j = 0; j < GGML_F16_ARR; j++) {
|
|
||||||
ay[j] = GGML_F16_VEC_LOAD(y + i + j*GGML_F16_EPR, j);
|
|
||||||
ay[j] = GGML_F16_VEC_MUL(ay[j], vx);
|
|
||||||
|
|
||||||
GGML_F16_VEC_STORE(y + i + j*GGML_F16_EPR, ay, j);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// leftovers
|
|
||||||
for (int i = np; i < n; ++i) {
|
|
||||||
y[i] = GGML_FP32_TO_FP16(GGML_FP16_TO_FP32(y[i])*v);
|
|
||||||
}
|
|
||||||
#else
|
|
||||||
// scalar
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
y[i] = GGML_FP32_TO_FP16(GGML_FP16_TO_FP32(y[i])*v);
|
|
||||||
}
|
|
||||||
#endif
|
|
||||||
}
|
|
||||||
|
|
||||||
inline static void ggml_vec_norm_f32 (const int n, float * s, const float * x) { ggml_vec_dot_f32(n, s, 0, x, 0, x, 0, 1); *s = sqrtf(*s); }
|
|
||||||
inline static void ggml_vec_sqr_f32 (const int n, float * y, const float * x) { for (int i = 0; i < n; ++i) y[i] = x[i]*x[i]; }
|
|
||||||
inline static void ggml_vec_sqr_f16 (const int n, ggml_fp16_t * y, const ggml_fp16_t * x) {
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
float v = GGML_FP16_TO_FP32(x[i]);
|
|
||||||
y[i] = GGML_FP32_TO_FP16(v*v);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
inline static void ggml_vec_sqrt_f32 (const int n, float * y, const float * x) { for (int i = 0; i < n; ++i) y[i] = sqrtf(x[i]); }
|
|
||||||
inline static void ggml_vec_sqrt_f16 (const int n, ggml_fp16_t * y, const ggml_fp16_t * x) {
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
y[i] = GGML_FP32_TO_FP16(sqrtf(GGML_FP16_TO_FP32(x[i])));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
inline static void ggml_vec_log_f32 (const int n, float * y, const float * x) { for (int i = 0; i < n; ++i) y[i] = logf(x[i]); }
|
|
||||||
inline static void ggml_vec_log_f16 (const int n, ggml_fp16_t * y, const ggml_fp16_t * x) {
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
y[i] = GGML_FP32_TO_FP16(logf(GGML_FP16_TO_FP32(x[i])));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
inline static void ggml_vec_sin_f32 (const int n, float * y, const float * x) { for (int i = 0; i < n; ++i) y[i] = sinf(x[i]); }
|
|
||||||
inline static void ggml_vec_sin_f16 (const int n, ggml_fp16_t * y, const ggml_fp16_t * x) {
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
y[i] = GGML_FP32_TO_FP16(sinf(GGML_FP16_TO_FP32(x[i])));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
inline static void ggml_vec_cos_f32 (const int n, float * y, const float * x) { for (int i = 0; i < n; ++i) y[i] = cosf(x[i]); }
|
|
||||||
inline static void ggml_vec_cos_f16 (const int n, ggml_fp16_t * y, const ggml_fp16_t * x) {
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
y[i] = GGML_FP32_TO_FP16(cosf(GGML_FP16_TO_FP32(x[i])));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
inline static void ggml_vec_abs_f32 (const int n, float * y, const float * x) { for (int i = 0; i < n; ++i) y[i] = fabsf(x[i]); }
|
|
||||||
inline static void ggml_vec_abs_f16 (const int n, ggml_fp16_t * y, const ggml_fp16_t * x) {
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
y[i] = GGML_FP32_TO_FP16(fabsf(GGML_FP16_TO_FP32(x[i])));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
inline static void ggml_vec_sgn_f32 (const int n, float * y, const float * x) { for (int i = 0; i < n; ++i) y[i] = (x[i] > 0.f) ? 1.f : ((x[i] < 0.f) ? -1.f : 0.f); }
|
|
||||||
inline static void ggml_vec_sgn_f16 (const int n, ggml_fp16_t * y, const ggml_fp16_t * x) {
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
float v = GGML_FP16_TO_FP32(x[i]);
|
|
||||||
y[i] = GGML_FP32_TO_FP16((v > 0.f) ? 1.f : ((v < 0.f) ? -1.f : 0.f));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
inline static void ggml_vec_step_f32 (const int n, float * y, const float * x) { for (int i = 0; i < n; ++i) y[i] = (x[i] > 0.f) ? 1.f : 0.f; }
|
|
||||||
inline static void ggml_vec_step_f16 (const int n, ggml_fp16_t * y, const ggml_fp16_t * x) {
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
y[i] = GGML_FP32_TO_FP16((GGML_FP16_TO_FP32(x[i]) > 0.f) ? 1.f : 0.f);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
inline static void ggml_vec_tanh_f32 (const int n, float * y, const float * x) { for (int i = 0; i < n; ++i) y[i] = tanhf(x[i]); }
|
|
||||||
inline static void ggml_vec_tanh_f16 (const int n, ggml_fp16_t * y, const ggml_fp16_t * x) {
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
y[i] = GGML_FP32_TO_FP16(tanhf(GGML_FP16_TO_FP32(x[i])));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
inline static void ggml_vec_elu_f32 (const int n, float * y, const float * x) { for (int i = 0; i < n; ++i) y[i] = (x[i] > 0.f) ? x[i] : expm1f(x[i]); }
|
|
||||||
inline static void ggml_vec_elu_f16 (const int n, ggml_fp16_t * y, const ggml_fp16_t * x) {
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
y[i] = GGML_FP32_TO_FP16(expm1f(GGML_FP16_TO_FP32(x[i])));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
inline static void ggml_vec_relu_f32 (const int n, float * y, const float * x) { for (int i = 0; i < n; ++i) y[i] = (x[i] > 0.f) ? x[i] : 0.f; }
|
|
||||||
inline static void ggml_vec_relu_f16 (const int n, ggml_fp16_t * y, const ggml_fp16_t * x) {
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
float v = GGML_FP16_TO_FP32(x[i]);
|
|
||||||
y[i] = GGML_FP32_TO_FP16((v > 0.f) ? v : 0.f);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
inline static void ggml_vec_leaky_relu_f32 (const int n, float * y, const float * x, const float ns) { for (int i = 0; i < n; ++i) y[i] = ((x[i] > 0.f) ? x[i] : 0.f) + ns * ((x[i] < 0.0f) ? x[i] : 0.f); }
|
|
||||||
inline static void ggml_vec_leaky_relu_f16 (const int n, ggml_fp16_t * y, const ggml_fp16_t * x, const float ns) {
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
float v = GGML_FP16_TO_FP32(x[i]);
|
|
||||||
y[i] = GGML_FP32_TO_FP16(((v > 0.f) ? v : 0.f) + ns * ((v < 0.0f) ? v : 0.f));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
inline static void ggml_vec_sigmoid_f32 (const int n, float * y, const float * x) { for (int i = 0; i < n; ++i) y[i] = 1.f / (1.f + expf(-x[i])); }
|
|
||||||
inline static void ggml_vec_sigmoid_f16 (const int n, ggml_fp16_t * y, const ggml_fp16_t * x) {
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
y[i] = GGML_FP32_TO_FP16(1.f / (1.f + expf(-GGML_FP16_TO_FP32(x[i]))));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
// TODO: optimize performance
|
|
||||||
inline static void ggml_vec_hardswish_f32 (const int n, float * y, const float * x) { for (int i = 0; i < n; ++i) y[i] = x[i] * fminf(1.0f, fmaxf(0.0f, (x[i] + 3.0f) / 6.0f)); }
|
|
||||||
inline static void ggml_vec_hardswish_f16 (const int n, ggml_fp16_t * y, const ggml_fp16_t * x) {
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
float v = GGML_FP16_TO_FP32(x[i]);
|
|
||||||
y[i] = GGML_FP32_TO_FP16(v * fminf(1.0f, fmaxf(0.0f, (v + 3.0f) / 6.0f)));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
inline static void ggml_vec_hardsigmoid_f32 (const int n, float * y, const float * x) { for (int i = 0; i < n; ++i) y[i] = fminf(1.0f, fmaxf(0.0f, (x[i] + 3.0f) / 6.0f)); }
|
|
||||||
inline static void ggml_vec_hardsigmoid_f16 (const int n, ggml_fp16_t * y, const ggml_fp16_t * x) {
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
y[i] = GGML_FP32_TO_FP16(fminf(1.0f, fmaxf(0.0f, (GGML_FP16_TO_FP32(x[i]) + 3.0f) / 6.0f)));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
inline static void ggml_vec_exp_f32 (const int n, float * y, const float * x) { for (int i = 0; i < n; ++i) y[i] = expf(x[i]); }
|
|
||||||
inline static void ggml_vec_exp_f16 (const int n, ggml_fp16_t * y, const ggml_fp16_t * x) {
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
y[i] = GGML_FP32_TO_FP16(expf(GGML_FP16_TO_FP32(x[i])));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
static const float GELU_COEF_A = 0.044715f;
|
|
||||||
static const float GELU_QUICK_COEF = -1.702f;
|
|
||||||
static const float SQRT_2_OVER_PI = 0.79788456080286535587989211986876f;
|
|
||||||
|
|
||||||
inline static float ggml_gelu_f32(float x) {
|
|
||||||
return 0.5f*x*(1.0f + tanhf(SQRT_2_OVER_PI*x*(1.0f + GELU_COEF_A*x*x)));
|
|
||||||
}
|
|
||||||
|
|
||||||
inline static void ggml_vec_gelu_f16(const int n, ggml_fp16_t * y, const ggml_fp16_t * x) {
|
|
||||||
const uint16_t * i16 = (const uint16_t *) x;
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
y[i] = ggml_table_gelu_f16[i16[i]];
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#ifdef GGML_GELU_FP16
|
|
||||||
inline static void ggml_vec_gelu_f32(const int n, float * y, const float * x) {
|
|
||||||
uint16_t t;
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
if (x[i] <= -10.0f) {
|
|
||||||
y[i] = 0.0f;
|
|
||||||
} else if (x[i] >= 10.0f) {
|
|
||||||
y[i] = x[i];
|
|
||||||
} else {
|
|
||||||
ggml_fp16_t fp16 = GGML_FP32_TO_FP16(x[i]);
|
|
||||||
memcpy(&t, &fp16, sizeof(uint16_t));
|
|
||||||
y[i] = GGML_FP16_TO_FP32(ggml_table_gelu_f16[t]);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
#else
|
|
||||||
inline static void ggml_vec_gelu_f32(const int n, float * y, const float * x) {
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
y[i] = ggml_gelu_f32(x[i]);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
#endif
|
|
||||||
|
|
||||||
inline static float ggml_gelu_quick_f32(float x) {
|
|
||||||
return x*(1.0f/(1.0f+expf(GELU_QUICK_COEF*x)));
|
|
||||||
}
|
|
||||||
|
|
||||||
//inline static void ggml_vec_gelu_quick_f16(const int n, ggml_fp16_t * y, const ggml_fp16_t * x) {
|
|
||||||
// const uint16_t * i16 = (const uint16_t *) x;
|
|
||||||
// for (int i = 0; i < n; ++i) {
|
|
||||||
// y[i] = ggml_table_gelu_quick_f16[i16[i]];
|
|
||||||
// }
|
|
||||||
//}
|
|
||||||
|
|
||||||
#ifdef GGML_GELU_QUICK_FP16
|
|
||||||
inline static void ggml_vec_gelu_quick_f32(const int n, float * y, const float * x) {
|
|
||||||
uint16_t t;
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
ggml_fp16_t fp16 = GGML_FP32_TO_FP16(x[i]);
|
|
||||||
memcpy(&t, &fp16, sizeof(uint16_t));
|
|
||||||
y[i] = GGML_FP16_TO_FP32(ggml_table_gelu_quick_f16[t]);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
#else
|
|
||||||
inline static void ggml_vec_gelu_quick_f32(const int n, float * y, const float * x) {
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
y[i] = ggml_gelu_quick_f32(x[i]);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
#endif
|
|
||||||
|
|
||||||
inline static void ggml_vec_gelu_quick_f16(const int n, ggml_fp16_t * y, const ggml_fp16_t * x) {
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
float v = GGML_FP16_TO_FP32(x[i]);
|
|
||||||
y[i] = GGML_FP32_TO_FP16(v*(1.0f/(1.0f+expf(GELU_QUICK_COEF*v))));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Sigmoid Linear Unit (SiLU) function
|
|
||||||
inline static float ggml_silu_f32(float x) {
|
|
||||||
return x/(1.0f + expf(-x));
|
|
||||||
}
|
|
||||||
inline static ggml_fp16_t ggml_silu_f16(ggml_fp16_t x) {
|
|
||||||
float v = GGML_FP16_TO_FP32(x);
|
|
||||||
return GGML_FP32_TO_FP16(v/(1.0f + expf(-v)));
|
|
||||||
}
|
|
||||||
|
|
||||||
#if __FINITE_MATH_ONLY__
|
|
||||||
#error "some routines in ggml.c require non-finite math arithmetics -- pass -fno-finite-math-only to the compiler to fix"
|
|
||||||
#error "ref: https://github.com/ggml-org/llama.cpp/pull/7154#issuecomment-2143844461"
|
|
||||||
#endif
|
|
||||||
|
|
||||||
#if defined(__ARM_NEON) && defined(__aarch64__)
|
|
||||||
|
|
||||||
// adapted from arm limited optimized routine
|
|
||||||
// the maximum error is 1.45358 plus 0.5 ulps
|
|
||||||
// numbers above 88.38 will flush to infinity
|
|
||||||
// numbers beneath -103.97 will flush to zero
|
|
||||||
inline static float32x4_t ggml_v_expf(float32x4_t x) {
|
|
||||||
const float32x4_t r = vdupq_n_f32(0x1.8p23f);
|
|
||||||
const float32x4_t z = vfmaq_f32(r, x, vdupq_n_f32(0x1.715476p+0f));
|
|
||||||
const float32x4_t n = vsubq_f32(z, r);
|
|
||||||
const float32x4_t b = vfmsq_f32(vfmsq_f32(x, n, vdupq_n_f32(0x1.62e4p-1f)), n,
|
|
||||||
vdupq_n_f32(0x1.7f7d1cp-20f));
|
|
||||||
const uint32x4_t e = vshlq_n_u32(vreinterpretq_u32_f32(z), 23);
|
|
||||||
const float32x4_t k = vreinterpretq_f32_u32(vaddq_u32(e, vreinterpretq_u32_f32(vdupq_n_f32(1))));
|
|
||||||
const uint32x4_t c = vcagtq_f32(n, vdupq_n_f32(126));
|
|
||||||
const float32x4_t u = vmulq_f32(b, b);
|
|
||||||
const float32x4_t j = vfmaq_f32(
|
|
||||||
vmulq_f32(vdupq_n_f32(0x1.ffffecp-1f), b),
|
|
||||||
vfmaq_f32(vfmaq_f32(vdupq_n_f32(0x1.fffdb6p-2f), vdupq_n_f32(0x1.555e66p-3f), b),
|
|
||||||
vfmaq_f32(vdupq_n_f32(0x1.573e2ep-5f), vdupq_n_f32(0x1.0e4020p-7f), b), u), u);
|
|
||||||
if (!vpaddd_u64(vreinterpretq_u64_u32(c)))
|
|
||||||
return vfmaq_f32(k, j, k);
|
|
||||||
const uint32x4_t d = vandq_u32(vclezq_f32(n), vdupq_n_u32(0x82000000));
|
|
||||||
const float32x4_t s1 = vreinterpretq_f32_u32(vaddq_u32(d, vdupq_n_u32(0x7f000000)));
|
|
||||||
const float32x4_t s2 = vreinterpretq_f32_u32(vsubq_u32(e, d));
|
|
||||||
return vbslq_f32(vcagtq_f32(n, vdupq_n_f32(192)), vmulq_f32(s1, s1),
|
|
||||||
vbslq_f32(c, vmulq_f32(vfmaq_f32(s2, s2, j), s1), vfmaq_f32(k, k, j)));
|
|
||||||
}
|
|
||||||
|
|
||||||
// computes silu x/(1+exp(-x)) in single precision vector
|
|
||||||
inline static float32x4_t ggml_v_silu(float32x4_t x) {
|
|
||||||
const float32x4_t one = vdupq_n_f32(1.0f);
|
|
||||||
const float32x4_t zero = vdupq_n_f32(0.0f);
|
|
||||||
const float32x4_t neg_x = vsubq_f32(zero, x);
|
|
||||||
const float32x4_t exp_neg_x = ggml_v_expf(neg_x);
|
|
||||||
const float32x4_t one_plus_exp_neg_x = vaddq_f32(one, exp_neg_x);
|
|
||||||
return vdivq_f32(x, one_plus_exp_neg_x);
|
|
||||||
}
|
|
||||||
|
|
||||||
#elif defined(__AVX512F__) && defined(__AVX512DQ__)
|
|
||||||
|
|
||||||
// adapted from arm limited optimized routine
|
|
||||||
// the maximum error is 1.45358 plus 0.5 ulps
|
|
||||||
// numbers above 88.38 will flush to infinity
|
|
||||||
// numbers beneath -103.97 will flush to zero
|
|
||||||
inline static __m512 ggml_v_expf(__m512 x) {
|
|
||||||
const __m512 r = _mm512_set1_ps(0x1.8p23f);
|
|
||||||
const __m512 z = _mm512_fmadd_ps(x, _mm512_set1_ps(0x1.715476p+0f), r);
|
|
||||||
const __m512 n = _mm512_sub_ps(z, r);
|
|
||||||
const __m512 b =
|
|
||||||
_mm512_fnmadd_ps(n, _mm512_set1_ps(0x1.7f7d1cp-20f),
|
|
||||||
_mm512_fnmadd_ps(n, _mm512_set1_ps(0x1.62e4p-1f), x));
|
|
||||||
const __mmask16 d =
|
|
||||||
_mm512_cmp_ps_mask(_mm512_abs_ps(n), _mm512_set1_ps(192), _CMP_GT_OQ);
|
|
||||||
const __m512 u = _mm512_mul_ps(b, b);
|
|
||||||
const __m512 j = _mm512_fmadd_ps(
|
|
||||||
_mm512_fmadd_ps(_mm512_fmadd_ps(_mm512_set1_ps(0x1.0e4020p-7f), b,
|
|
||||||
_mm512_set1_ps(0x1.573e2ep-5f)),
|
|
||||||
u,
|
|
||||||
_mm512_fmadd_ps(_mm512_set1_ps(0x1.555e66p-3f), b,
|
|
||||||
_mm512_set1_ps(0x1.fffdb6p-2f))),
|
|
||||||
u,
|
|
||||||
_mm512_fmadd_ps(_mm512_set1_ps(0x1.ffffecp-1f), b, _mm512_set1_ps(1.0F)));
|
|
||||||
const __m512 res = _mm512_scalef_ps(j, n);
|
|
||||||
if (_mm512_kortestz(d, d))
|
|
||||||
return res;
|
|
||||||
const __m512 zero = _mm512_setzero_ps();
|
|
||||||
const __m512 alt = _mm512_mask_blend_ps(
|
|
||||||
_mm512_cmp_ps_mask(n, zero, _CMP_LE_OQ), _mm512_set1_ps(INFINITY), zero);
|
|
||||||
return _mm512_mask_blend_ps(d, res, alt);
|
|
||||||
}
|
|
||||||
|
|
||||||
// computes silu x/(1+exp(-x)) in single precision vector
|
|
||||||
inline static __m512 ggml_v_silu(__m512 x) {
|
|
||||||
const __m512 one = _mm512_set1_ps(1);
|
|
||||||
const __m512 zero = _mm512_setzero_ps();
|
|
||||||
const __m512 neg_x = _mm512_sub_ps(zero, x);
|
|
||||||
const __m512 exp_neg_x = ggml_v_expf(neg_x);
|
|
||||||
const __m512 one_plus_exp_neg_x = _mm512_add_ps(one, exp_neg_x);
|
|
||||||
return _mm512_div_ps(x, one_plus_exp_neg_x);
|
|
||||||
}
|
|
||||||
|
|
||||||
#elif defined(__AVX2__) && defined(__FMA__)
|
|
||||||
|
|
||||||
// adapted from arm limited optimized routine
|
|
||||||
// the maximum error is 1.45358 plus 0.5 ulps
|
|
||||||
// numbers above 88.38 will flush to infinity
|
|
||||||
// numbers beneath -103.97 will flush to zero
|
|
||||||
inline static __m256 ggml_v_expf(__m256 x) {
|
|
||||||
const __m256 r = _mm256_set1_ps(0x1.8p23f);
|
|
||||||
const __m256 z = _mm256_fmadd_ps(x, _mm256_set1_ps(0x1.715476p+0f), r);
|
|
||||||
const __m256 n = _mm256_sub_ps(z, r);
|
|
||||||
const __m256 b = _mm256_fnmadd_ps(n, _mm256_set1_ps(0x1.7f7d1cp-20f),
|
|
||||||
_mm256_fnmadd_ps(n, _mm256_set1_ps(0x1.62e4p-1f), x));
|
|
||||||
const __m256i e = _mm256_slli_epi32(_mm256_castps_si256(z), 23);
|
|
||||||
const __m256 k = _mm256_castsi256_ps(
|
|
||||||
_mm256_add_epi32(e, _mm256_castps_si256(_mm256_set1_ps(1))));
|
|
||||||
const __m256i c = _mm256_castps_si256(
|
|
||||||
_mm256_cmp_ps(_mm256_andnot_ps(_mm256_set1_ps(-0.f), n),
|
|
||||||
_mm256_set1_ps(126), _CMP_GT_OQ));
|
|
||||||
const __m256 u = _mm256_mul_ps(b, b);
|
|
||||||
const __m256 j = _mm256_fmadd_ps(_mm256_fmadd_ps(_mm256_fmadd_ps(_mm256_set1_ps(0x1.0e4020p-7f), b,
|
|
||||||
_mm256_set1_ps(0x1.573e2ep-5f)), u,
|
|
||||||
_mm256_fmadd_ps(_mm256_set1_ps(0x1.555e66p-3f), b,
|
|
||||||
_mm256_set1_ps(0x1.fffdb6p-2f))),
|
|
||||||
u, _mm256_mul_ps(_mm256_set1_ps(0x1.ffffecp-1f), b));
|
|
||||||
if (!_mm256_movemask_ps(_mm256_castsi256_ps(c)))
|
|
||||||
return _mm256_fmadd_ps(j, k, k);
|
|
||||||
const __m256i g = _mm256_and_si256(
|
|
||||||
_mm256_castps_si256(_mm256_cmp_ps(n, _mm256_setzero_ps(), _CMP_LE_OQ)),
|
|
||||||
_mm256_set1_epi32(0x82000000u));
|
|
||||||
const __m256 s1 =
|
|
||||||
_mm256_castsi256_ps(_mm256_add_epi32(g, _mm256_set1_epi32(0x7f000000u)));
|
|
||||||
const __m256 s2 = _mm256_castsi256_ps(_mm256_sub_epi32(e, g));
|
|
||||||
const __m256i d = _mm256_castps_si256(
|
|
||||||
_mm256_cmp_ps(_mm256_andnot_ps(_mm256_set1_ps(-0.f), n),
|
|
||||||
_mm256_set1_ps(192), _CMP_GT_OQ));
|
|
||||||
return _mm256_or_ps(
|
|
||||||
_mm256_and_ps(_mm256_castsi256_ps(d), _mm256_mul_ps(s1, s1)),
|
|
||||||
_mm256_andnot_ps(
|
|
||||||
_mm256_castsi256_ps(d),
|
|
||||||
_mm256_or_ps(
|
|
||||||
_mm256_and_ps(_mm256_castsi256_ps(c),
|
|
||||||
_mm256_mul_ps(_mm256_fmadd_ps(s2, j, s2), s1)),
|
|
||||||
_mm256_andnot_ps(_mm256_castsi256_ps(c), _mm256_fmadd_ps(k, j, k)))));
|
|
||||||
}
|
|
||||||
|
|
||||||
// computes silu x/(1+exp(-x)) in single precision vector
|
|
||||||
inline static __m256 ggml_v_silu(__m256 x) {
|
|
||||||
const __m256 one = _mm256_set1_ps(1);
|
|
||||||
const __m256 zero = _mm256_setzero_ps();
|
|
||||||
const __m256 neg_x = _mm256_sub_ps(zero, x);
|
|
||||||
const __m256 exp_neg_x = ggml_v_expf(neg_x);
|
|
||||||
const __m256 one_plus_exp_neg_x = _mm256_add_ps(one, exp_neg_x);
|
|
||||||
return _mm256_div_ps(x, one_plus_exp_neg_x);
|
|
||||||
}
|
|
||||||
|
|
||||||
#elif defined(__SSE2__) // __AVX2__ / __ARM_NEON
|
|
||||||
|
|
||||||
#if defined(__FMA__)
|
|
||||||
#define MADD128(x, y, z) _mm_fmadd_ps(x, y, z)
|
|
||||||
#define NMADD128(x, y, z) _mm_fnmadd_ps(x, y, z)
|
|
||||||
#else
|
|
||||||
#define MADD128(x, y, z) _mm_add_ps(_mm_mul_ps(x, y), z)
|
|
||||||
#define NMADD128(x, y, z) _mm_sub_ps(z, _mm_mul_ps(x, y))
|
|
||||||
#endif
|
|
||||||
|
|
||||||
// adapted from arm limited optimized routine
|
|
||||||
// the maximum error is 1.45358 plus 0.5 ulps
|
|
||||||
// numbers above 88.38 will flush to infinity
|
|
||||||
// numbers beneath -103.97 will flush to zero
|
|
||||||
inline static __m128 ggml_v_expf(__m128 x) {
|
|
||||||
const __m128 r = _mm_set1_ps(0x1.8p23f);
|
|
||||||
const __m128 z = MADD128(x, _mm_set1_ps(0x1.715476p+0f), r);
|
|
||||||
const __m128 n = _mm_sub_ps(z, r);
|
|
||||||
const __m128 b =
|
|
||||||
NMADD128(n, _mm_set1_ps(0x1.7f7d1cp-20f), NMADD128(n, _mm_set1_ps(0x1.62e4p-1f), x));
|
|
||||||
const __m128i e = _mm_slli_epi32(_mm_castps_si128(z), 23);
|
|
||||||
const __m128 k = _mm_castsi128_ps(_mm_add_epi32(e, _mm_castps_si128(_mm_set1_ps(1))));
|
|
||||||
const __m128i c =
|
|
||||||
_mm_castps_si128(_mm_cmpgt_ps(_mm_andnot_ps(_mm_set1_ps(-0.f), n), _mm_set1_ps(126)));
|
|
||||||
const __m128 u = _mm_mul_ps(b, b);
|
|
||||||
const __m128 j =
|
|
||||||
MADD128(MADD128(MADD128(_mm_set1_ps(0x1.0e4020p-7f), b, _mm_set1_ps(0x1.573e2ep-5f)), u,
|
|
||||||
MADD128(_mm_set1_ps(0x1.555e66p-3f), b, _mm_set1_ps(0x1.fffdb6p-2f))),
|
|
||||||
u, _mm_mul_ps(_mm_set1_ps(0x1.ffffecp-1f), b));
|
|
||||||
if (!_mm_movemask_epi8(c))
|
|
||||||
return MADD128(j, k, k);
|
|
||||||
const __m128i g = _mm_and_si128(_mm_castps_si128(_mm_cmple_ps(n, _mm_setzero_ps())),
|
|
||||||
_mm_set1_epi32(0x82000000u));
|
|
||||||
const __m128 s1 = _mm_castsi128_ps(_mm_add_epi32(g, _mm_set1_epi32(0x7f000000u)));
|
|
||||||
const __m128 s2 = _mm_castsi128_ps(_mm_sub_epi32(e, g));
|
|
||||||
const __m128i d =
|
|
||||||
_mm_castps_si128(_mm_cmpgt_ps(_mm_andnot_ps(_mm_set1_ps(-0.f), n), _mm_set1_ps(192)));
|
|
||||||
return _mm_or_ps(
|
|
||||||
_mm_and_ps(_mm_castsi128_ps(d), _mm_mul_ps(s1, s1)),
|
|
||||||
_mm_andnot_ps(_mm_castsi128_ps(d),
|
|
||||||
_mm_or_ps(_mm_and_ps(_mm_castsi128_ps(c), _mm_mul_ps(MADD128(s2, j, s2), s1)),
|
|
||||||
_mm_andnot_ps(_mm_castsi128_ps(c), MADD128(k, j, k)))));
|
|
||||||
}
|
|
||||||
|
|
||||||
// computes silu x/(1+exp(-x)) in single precision vector
|
|
||||||
inline static __m128 ggml_v_silu(__m128 x) {
|
|
||||||
const __m128 one = _mm_set1_ps(1);
|
|
||||||
const __m128 zero = _mm_setzero_ps();
|
|
||||||
const __m128 neg_x = _mm_sub_ps(zero, x);
|
|
||||||
const __m128 exp_neg_x = ggml_v_expf(neg_x);
|
|
||||||
const __m128 one_plus_exp_neg_x = _mm_add_ps(one, exp_neg_x);
|
|
||||||
return _mm_div_ps(x, one_plus_exp_neg_x);
|
|
||||||
}
|
|
||||||
|
|
||||||
#endif // __ARM_NEON / __AVX2__ / __SSE2__
|
|
||||||
|
|
||||||
inline static void ggml_vec_silu_f16(const int n, ggml_fp16_t * y, const ggml_fp16_t * x) {
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
y[i] = ggml_silu_f16(x[i]);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
inline static float ggml_silu_backward_f32(float x, float dy) {
|
|
||||||
const float s = 1.0f/(1.0f + expf(-x));
|
|
||||||
return dy*s*(1.0f + x*(1.0f - s));
|
|
||||||
}
|
|
||||||
|
|
||||||
inline static ggml_fp16_t ggml_silu_backward_f16(ggml_fp16_t x, ggml_fp16_t dy) {
|
|
||||||
const float v = GGML_FP16_TO_FP32(x);
|
|
||||||
const float s = 1.0f/(1.0f + expf(-v));
|
|
||||||
return GGML_FP32_TO_FP16(GGML_FP16_TO_FP32(dy)*s*(1.0f + v*(1.0f - s)));
|
|
||||||
}
|
|
||||||
|
|
||||||
inline static void ggml_vec_silu_backward_f32(const int n, float * dx, const float * x, const float * dy) {
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
dx[i] = ggml_silu_backward_f32(x[i], dy[i]);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
inline static void ggml_vec_silu_backward_f16(const int n, ggml_fp16_t * dx, const ggml_fp16_t * x, const ggml_fp16_t * dy) {
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
dx[i] = ggml_silu_backward_f16(x[i], dy[i]);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
inline static void ggml_vec_sum_f32(const int n, float * s, const float * x) {
|
|
||||||
#ifndef GGML_USE_ACCELERATE
|
|
||||||
ggml_float sum = 0.0;
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
sum += (ggml_float)x[i];
|
|
||||||
}
|
|
||||||
*s = (float)sum;
|
|
||||||
#else
|
|
||||||
vDSP_sve(x, 1, s, n);
|
|
||||||
#endif
|
|
||||||
}
|
|
||||||
|
|
||||||
inline static void ggml_vec_sum_f32_ggf(const int n, ggml_float * s, const float * x) {
|
|
||||||
ggml_float sum = 0.0;
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
sum += (ggml_float)x[i];
|
|
||||||
}
|
|
||||||
*s = sum;
|
|
||||||
}
|
|
||||||
|
|
||||||
inline static void ggml_vec_sum_f16_ggf(const int n, float * s, const ggml_fp16_t * x) {
|
|
||||||
float sum = 0.0f;
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
sum += GGML_FP16_TO_FP32(x[i]);
|
|
||||||
}
|
|
||||||
*s = sum;
|
|
||||||
}
|
|
||||||
|
|
||||||
inline static void ggml_vec_sum_bf16_ggf(const int n, float * s, const ggml_bf16_t * x) {
|
|
||||||
float sum = 0.0f;
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
sum += GGML_BF16_TO_FP32(x[i]);
|
|
||||||
}
|
|
||||||
*s = sum;
|
|
||||||
}
|
|
||||||
|
|
||||||
inline static void ggml_vec_max_f32(const int n, float * s, const float * x) {
|
|
||||||
#ifndef GGML_USE_ACCELERATE
|
|
||||||
float max = -INFINITY;
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
max = MAX(max, x[i]);
|
|
||||||
}
|
|
||||||
*s = max;
|
|
||||||
#else
|
|
||||||
vDSP_maxv(x, 1, s, n);
|
|
||||||
#endif
|
|
||||||
}
|
|
||||||
|
|
||||||
inline static void ggml_vec_norm_inv_f32(const int n, float * s, const float * x) {
|
|
||||||
ggml_vec_norm_f32(n, s, x);
|
|
||||||
*s = 1.f/(*s);
|
|
||||||
}
|
|
||||||
|
|
||||||
inline static void ggml_vec_argmax_f32(const int n, int * s, const float * x) {
|
|
||||||
float max = -INFINITY;
|
|
||||||
int idx = 0;
|
|
||||||
for (int i = 0; i < n; ++i) {
|
|
||||||
max = MAX(max, x[i]);
|
|
||||||
if (max == x[i]) { idx = i; }
|
|
||||||
}
|
|
||||||
*s = idx;
|
|
||||||
}
|
|
||||||
|
|
||||||
#ifdef __cplusplus
|
|
||||||
}
|
|
||||||
#endif
|
|
@ -25,6 +25,7 @@ You can now use it like this:
|
|||||||
`ggml` models are available from the following locations:
|
`ggml` models are available from the following locations:
|
||||||
|
|
||||||
- https://huggingface.co/ggerganov/whisper.cpp/tree/main
|
- https://huggingface.co/ggerganov/whisper.cpp/tree/main
|
||||||
|
- https://ggml.ggerganov.com
|
||||||
|
|
||||||
### 3. Convert with [convert-pt-to-ggml.py](convert-pt-to-ggml.py)
|
### 3. Convert with [convert-pt-to-ggml.py](convert-pt-to-ggml.py)
|
||||||
|
|
||||||
@ -77,7 +78,7 @@ OpenAI format. To read the HF models you can use the [convert-h5-to-ggml.py](con
|
|||||||
|
|
||||||
```bash
|
```bash
|
||||||
git clone https://github.com/openai/whisper
|
git clone https://github.com/openai/whisper
|
||||||
git clone https://github.com/ggml-org/whisper.cpp
|
git clone https://github.com/ggerganov/whisper.cpp
|
||||||
|
|
||||||
# clone HF fine-tuned model (this is just an example)
|
# clone HF fine-tuned model (this is just an example)
|
||||||
git clone https://huggingface.co/openai/whisper-medium
|
git clone https://huggingface.co/openai/whisper-medium
|
||||||
@ -95,7 +96,7 @@ Currently, the chunk-based transcription strategy is not implemented, so there c
|
|||||||
```bash
|
```bash
|
||||||
# clone OpenAI whisper and whisper.cpp
|
# clone OpenAI whisper and whisper.cpp
|
||||||
git clone https://github.com/openai/whisper
|
git clone https://github.com/openai/whisper
|
||||||
git clone https://github.com/ggml-org/whisper.cpp
|
git clone https://github.com/ggerganov/whisper.cpp
|
||||||
|
|
||||||
# get the models
|
# get the models
|
||||||
cd whisper.cpp/models
|
cd whisper.cpp/models
|
||||||
|
@ -3,7 +3,7 @@
|
|||||||
# Usage:
|
# Usage:
|
||||||
#
|
#
|
||||||
# git clone https://github.com/openai/whisper
|
# git clone https://github.com/openai/whisper
|
||||||
# git clone https://github.com/ggml-org/whisper.cpp
|
# git clone https://github.com/ggerganov/whisper.cpp
|
||||||
# git clone https://huggingface.co/openai/whisper-medium
|
# git clone https://huggingface.co/openai/whisper-medium
|
||||||
#
|
#
|
||||||
# python3 ./whisper.cpp/models/convert-h5-to-ggml.py ./whisper-medium/ ./whisper .
|
# python3 ./whisper.cpp/models/convert-h5-to-ggml.py ./whisper-medium/ ./whisper .
|
||||||
@ -12,7 +12,7 @@
|
|||||||
#
|
#
|
||||||
# For more info:
|
# For more info:
|
||||||
#
|
#
|
||||||
# https://github.com/ggml-org/whisper.cpp/issues/157
|
# https://github.com/ggerganov/whisper.cpp/issues/157
|
||||||
#
|
#
|
||||||
|
|
||||||
import io
|
import io
|
||||||
|
@ -254,10 +254,10 @@ def convert_encoder(hparams, model, quantize=False):
|
|||||||
|
|
||||||
model = ct.convert(
|
model = ct.convert(
|
||||||
traced_model,
|
traced_model,
|
||||||
convert_to="mlprogram",
|
convert_to="neuralnetwork",
|
||||||
inputs=[ct.TensorType(name="logmel_data", shape=input_shape)],
|
inputs=[ct.TensorType(name="logmel_data", shape=input_shape)],
|
||||||
outputs=[ct.TensorType(name="output")],
|
outputs=[ct.TensorType(name="output")],
|
||||||
compute_units=ct.ComputeUnit.ALL,
|
compute_units=ct.ComputeUnit.ALL
|
||||||
)
|
)
|
||||||
|
|
||||||
if quantize:
|
if quantize:
|
||||||
@ -278,11 +278,11 @@ def convert_decoder(hparams, model, quantize=False):
|
|||||||
|
|
||||||
model = ct.convert(
|
model = ct.convert(
|
||||||
traced_model,
|
traced_model,
|
||||||
convert_to="mlprogram",
|
convert_to="neuralnetwork",
|
||||||
inputs=[
|
inputs=[
|
||||||
ct.TensorType(name="token_data", shape=tokens_shape, dtype=int),
|
ct.TensorType(name="token_data", shape=tokens_shape, dtype=int),
|
||||||
ct.TensorType(name="audio_data", shape=audio_shape)
|
ct.TensorType(name="audio_data", shape=audio_shape)
|
||||||
],
|
]
|
||||||
)
|
)
|
||||||
|
|
||||||
if quantize:
|
if quantize:
|
||||||
|
@ -1 +1 @@
|
|||||||
d920dfd7da37b22d1eb0813cdaf340c1870d76c3
|
7d7aa2dee2eb55dc683af80b769b81a0642226a1
|
||||||
|
@ -5527,13 +5527,11 @@ int whisper_full_with_state(
|
|||||||
const int seek_start = params.offset_ms/10;
|
const int seek_start = params.offset_ms/10;
|
||||||
const int seek_end = params.duration_ms == 0 ? whisper_n_len_from_state(state) : seek_start + params.duration_ms/10;
|
const int seek_end = params.duration_ms == 0 ? whisper_n_len_from_state(state) : seek_start + params.duration_ms/10;
|
||||||
|
|
||||||
// if length of spectrogram is less than 100ms (10 frames), then return
|
// if length of spectrogram is less than 1.0s (100 frames), then return
|
||||||
// basically don't process anything that is less than 100ms
|
// basically don't process anything that is less than 1.0s
|
||||||
// ref: https://github.com/ggml-org/whisper.cpp/issues/2065
|
// see issue #39: https://github.com/ggerganov/whisper.cpp/issues/39
|
||||||
const int delta_min = 10;
|
if (seek_end < seek_start + 100) {
|
||||||
|
WHISPER_LOG_WARN("%s: input is too short - %d ms < 1000 ms. consider padding the input audio with silence\n", __func__, (seek_end - seek_start)*10);
|
||||||
if (seek_end < seek_start + delta_min) {
|
|
||||||
WHISPER_LOG_WARN("%s: input is too short - %d ms < 100 ms. consider padding the input audio with silence\n", __func__, (seek_end - seek_start)*10);
|
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -5677,8 +5675,8 @@ int whisper_full_with_state(
|
|||||||
ctx, state, progress_cur, params.progress_callback_user_data);
|
ctx, state, progress_cur, params.progress_callback_user_data);
|
||||||
}
|
}
|
||||||
|
|
||||||
// if only 100ms left, then stop
|
// if only 1 second left, then stop
|
||||||
if (seek + delta_min >= seek_end) {
|
if (seek + 100 >= seek_end) {
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -6025,10 +6023,10 @@ int whisper_full_with_state(
|
|||||||
// end of segment
|
// end of segment
|
||||||
if (token.id == whisper_token_eot(ctx) || // end of text token
|
if (token.id == whisper_token_eot(ctx) || // end of text token
|
||||||
(params.max_tokens > 0 && i >= params.max_tokens) || // max tokens per segment reached
|
(params.max_tokens > 0 && i >= params.max_tokens) || // max tokens per segment reached
|
||||||
(has_ts && seek + seek_delta + delta_min >= seek_end) // end of audio reached (100ms)
|
(has_ts && seek + seek_delta + 100 >= seek_end) // end of audio reached
|
||||||
) {
|
) {
|
||||||
if (result_len == 0 && !params.no_timestamps) {
|
if (result_len == 0 && !params.no_timestamps) {
|
||||||
if (seek + seek_delta + delta_min >= seek_end) {
|
if (seek + seek_delta + 100 >= seek_end) {
|
||||||
result_len = i + 1;
|
result_len = i + 1;
|
||||||
} else {
|
} else {
|
||||||
WHISPER_LOG_DEBUG("%s: decoder %d failed (result_len = 0)\n", __func__, j);
|
WHISPER_LOG_DEBUG("%s: decoder %d failed (result_len = 0)\n", __func__, j);
|
||||||
@ -6377,7 +6375,7 @@ int whisper_full_with_state(
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// ref: https://github.com/ggml-org/whisper.cpp/pull/2629
|
// ref: https://github.com/ggerganov/whisper.cpp/pull/2629
|
||||||
const bool single_timestamp_ending = tokens_cur.size() > 1 &&
|
const bool single_timestamp_ending = tokens_cur.size() > 1 &&
|
||||||
tokens_cur[tokens_cur.size() - 2].id < whisper_token_beg(ctx) &&
|
tokens_cur[tokens_cur.size() - 2].id < whisper_token_beg(ctx) &&
|
||||||
tokens_cur[tokens_cur.size() - 1].id > whisper_token_beg(ctx);
|
tokens_cur[tokens_cur.size() - 1].id > whisper_token_beg(ctx);
|
||||||
|
6
tests/librispeech/.gitignore
vendored
6
tests/librispeech/.gitignore
vendored
@ -1,6 +0,0 @@
|
|||||||
__pycache__
|
|
||||||
*.tar.gz
|
|
||||||
*.txt
|
|
||||||
eval.conf
|
|
||||||
venv
|
|
||||||
LibriSpeech
|
|
@ -1,15 +0,0 @@
|
|||||||
TAR_URL = https://www.openslr.org/resources/12/test-clean.tar.gz
|
|
||||||
|
|
||||||
all: eval
|
|
||||||
|
|
||||||
eval:
|
|
||||||
$(MAKE) -f eval.mk
|
|
||||||
|
|
||||||
clean:
|
|
||||||
$(MAKE) -f eval.mk clean
|
|
||||||
|
|
||||||
get-audio:
|
|
||||||
wget -c $(TAR_URL)
|
|
||||||
tar -xf test-clean.tar.gz
|
|
||||||
|
|
||||||
.PHONY: all eval clean setup-venv clean-venv get-audio
|
|
@ -1,60 +0,0 @@
|
|||||||
# whisper.cpp/tests/librispeech
|
|
||||||
|
|
||||||
[LibriSpeech](https://www.openslr.org/12) is a standard dataset for
|
|
||||||
training and evaluating automatic speech recognition systems.
|
|
||||||
|
|
||||||
This directory contains a set of tools to evaluate the recognition
|
|
||||||
performance of whisper.cpp on LibriSpeech corpus.
|
|
||||||
|
|
||||||
## Quick Start
|
|
||||||
|
|
||||||
1. (Pre-requirement) Compile `whisper-cli` and prepare the Whisper
|
|
||||||
model in `ggml` format.
|
|
||||||
|
|
||||||
```
|
|
||||||
$ # Execute the commands below in the project root dir.
|
|
||||||
$ cmake -B build
|
|
||||||
$ cmake --build build --config Release
|
|
||||||
$ ./models/download-ggml-model.sh tiny
|
|
||||||
```
|
|
||||||
|
|
||||||
Consult [whisper.cpp/README.md](../../README.md) for more details.
|
|
||||||
|
|
||||||
2. Download the audio files from LibriSpeech project.
|
|
||||||
|
|
||||||
```
|
|
||||||
$ make get-audio
|
|
||||||
```
|
|
||||||
|
|
||||||
3. Set up the environment to compute WER score.
|
|
||||||
|
|
||||||
```
|
|
||||||
$ pip install -r requirements.txt
|
|
||||||
```
|
|
||||||
|
|
||||||
For example, if you use `virtualenv`, you can set up it as follows:
|
|
||||||
|
|
||||||
```
|
|
||||||
$ python3 -m venv venv
|
|
||||||
$ . venv/bin/activate
|
|
||||||
$ pip install -r requirements.txt
|
|
||||||
```
|
|
||||||
|
|
||||||
4. Run the benchmark test.
|
|
||||||
|
|
||||||
```
|
|
||||||
$ make
|
|
||||||
```
|
|
||||||
|
|
||||||
## How-to guides
|
|
||||||
|
|
||||||
### How to change the inferece parameters
|
|
||||||
|
|
||||||
Create `eval.conf` and override variables.
|
|
||||||
|
|
||||||
```
|
|
||||||
WHISPER_MODEL = large-v3-turbo
|
|
||||||
WHISPER_FLAGS = --no-prints --threads 8 --language en --output-txt
|
|
||||||
```
|
|
||||||
|
|
||||||
Check out `eval.mk` for more details.
|
|
@ -1,39 +0,0 @@
|
|||||||
PYTHON = python
|
|
||||||
|
|
||||||
WHISPER_PREFIX = ../../
|
|
||||||
WHISPER_MODEL = tiny
|
|
||||||
|
|
||||||
WHISPER_CLI = $(WHISPER_PREFIX)build/bin/whisper-cli
|
|
||||||
WHISPER_FLAGS = --no-prints --language en --output-txt
|
|
||||||
|
|
||||||
# You can create eval.conf to override the WHISPER_* variables
|
|
||||||
# defined above.
|
|
||||||
-include eval.conf
|
|
||||||
|
|
||||||
# This follows the file structure of the LibriSpeech project.
|
|
||||||
AUDIO_SRCS = $(sort $(wildcard LibriSpeech/*/*/*/*.flac))
|
|
||||||
TRANS_TXTS = $(addsuffix .txt, $(AUDIO_SRCS))
|
|
||||||
|
|
||||||
# We output the evaluation result to this file.
|
|
||||||
DONE = $(WHISPER_MODEL).txt
|
|
||||||
|
|
||||||
all: $(DONE)
|
|
||||||
|
|
||||||
$(DONE): $(TRANS_TXTS)
|
|
||||||
$(PYTHON) eval.py > $@.tmp
|
|
||||||
mv $@.tmp $@
|
|
||||||
|
|
||||||
# Note: This task writes to a temporary file first to
|
|
||||||
# create the target file atomically.
|
|
||||||
%.flac.txt: %.flac
|
|
||||||
$(WHISPER_CLI) $(WHISPER_FLAGS) --model $(WHISPER_PREFIX)models/ggml-$(WHISPER_MODEL).bin --file $^ --output-file $^.tmp
|
|
||||||
mv $^.tmp.txt $^.txt
|
|
||||||
|
|
||||||
archive:
|
|
||||||
tar -czf $(WHISPER_MODEL).tar.gz --exclude="*.flac" LibriSpeech $(DONE)
|
|
||||||
|
|
||||||
clean:
|
|
||||||
@rm -f $(TRANS_TXTS)
|
|
||||||
@rm -f $(DONE)
|
|
||||||
|
|
||||||
.PHONY: all clean
|
|
@ -1,47 +0,0 @@
|
|||||||
import os
|
|
||||||
import glob
|
|
||||||
import jiwer
|
|
||||||
from normalizers import EnglishTextNormalizer
|
|
||||||
|
|
||||||
def get_reference():
|
|
||||||
ref = {}
|
|
||||||
for path in glob.glob('LibriSpeech/*/*/*/*.trans.txt'):
|
|
||||||
with open(path) as fp:
|
|
||||||
for line in fp:
|
|
||||||
code, text = line.strip().split(" ", maxsplit=1)
|
|
||||||
ref [code] = text
|
|
||||||
return ref
|
|
||||||
|
|
||||||
def get_hypothesis():
|
|
||||||
hyp = {}
|
|
||||||
for path in glob.glob('LibriSpeech/*/*/*/*.flac.txt'):
|
|
||||||
with open(path) as fp:
|
|
||||||
text = fp.read().strip()
|
|
||||||
code = os.path.basename(path).replace('.flac.txt', '')
|
|
||||||
hyp[code] = text
|
|
||||||
return hyp
|
|
||||||
|
|
||||||
def get_codes():
|
|
||||||
codes = []
|
|
||||||
for path in glob.glob('LibriSpeech/*/*/*/*.flac'):
|
|
||||||
codes.append(os.path.basename(path).replace('.flac', ''))
|
|
||||||
return sorted(codes)
|
|
||||||
|
|
||||||
def main():
|
|
||||||
normalizer = EnglishTextNormalizer()
|
|
||||||
|
|
||||||
ref_orig = get_reference()
|
|
||||||
hyp_orig = get_hypothesis()
|
|
||||||
|
|
||||||
ref_clean = []
|
|
||||||
hyp_clean = []
|
|
||||||
|
|
||||||
for code in get_codes():
|
|
||||||
ref_clean.append(normalizer(ref_orig[code]))
|
|
||||||
hyp_clean.append(normalizer(hyp_orig[code]))
|
|
||||||
|
|
||||||
wer = jiwer.wer(ref_clean, hyp_clean)
|
|
||||||
print(f"WER: {wer * 100:.2f}%")
|
|
||||||
|
|
||||||
if __name__ == '__main__':
|
|
||||||
main()
|
|
@ -1,25 +0,0 @@
|
|||||||
Code in this directory is adapted from OpenAI Whisper project
|
|
||||||
(https://github.com/openai/whisper) and carries the following
|
|
||||||
copyright and license.
|
|
||||||
|
|
||||||
MIT License
|
|
||||||
|
|
||||||
Copyright (c) 2022 OpenAI
|
|
||||||
|
|
||||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
||||||
of this software and associated documentation files (the "Software"), to deal
|
|
||||||
in the Software without restriction, including without limitation the rights
|
|
||||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
||||||
copies of the Software, and to permit persons to whom the Software is
|
|
||||||
furnished to do so, subject to the following conditions:
|
|
||||||
|
|
||||||
The above copyright notice and this permission notice shall be included in all
|
|
||||||
copies or substantial portions of the Software.
|
|
||||||
|
|
||||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
||||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
||||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
||||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
||||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
||||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
||||||
SOFTWARE.
|
|
@ -1,2 +0,0 @@
|
|||||||
from .basic import BasicTextNormalizer as BasicTextNormalizer
|
|
||||||
from .english import EnglishTextNormalizer as EnglishTextNormalizer
|
|
@ -1,80 +0,0 @@
|
|||||||
import re
|
|
||||||
import unicodedata
|
|
||||||
|
|
||||||
import regex
|
|
||||||
|
|
||||||
# non-ASCII letters that are not separated by "NFKD" normalization
|
|
||||||
ADDITIONAL_DIACRITICS = {
|
|
||||||
"œ": "oe",
|
|
||||||
"Œ": "OE",
|
|
||||||
"ø": "o",
|
|
||||||
"Ø": "O",
|
|
||||||
"æ": "ae",
|
|
||||||
"Æ": "AE",
|
|
||||||
"ß": "ss",
|
|
||||||
"ẞ": "SS",
|
|
||||||
"đ": "d",
|
|
||||||
"Đ": "D",
|
|
||||||
"ð": "d",
|
|
||||||
"Ð": "D",
|
|
||||||
"þ": "th",
|
|
||||||
"Þ": "th",
|
|
||||||
"ł": "l",
|
|
||||||
"Ł": "L",
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
def remove_symbols_and_diacritics(s: str, keep=""):
|
|
||||||
"""
|
|
||||||
Replace any other markers, symbols, and punctuations with a space,
|
|
||||||
and drop any diacritics (category 'Mn' and some manual mappings)
|
|
||||||
"""
|
|
||||||
return "".join(
|
|
||||||
(
|
|
||||||
c
|
|
||||||
if c in keep
|
|
||||||
else (
|
|
||||||
ADDITIONAL_DIACRITICS[c]
|
|
||||||
if c in ADDITIONAL_DIACRITICS
|
|
||||||
else (
|
|
||||||
""
|
|
||||||
if unicodedata.category(c) == "Mn"
|
|
||||||
else " " if unicodedata.category(c)[0] in "MSP" else c
|
|
||||||
)
|
|
||||||
)
|
|
||||||
)
|
|
||||||
for c in unicodedata.normalize("NFKD", s)
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def remove_symbols(s: str):
|
|
||||||
"""
|
|
||||||
Replace any other markers, symbols, punctuations with a space, keeping diacritics
|
|
||||||
"""
|
|
||||||
return "".join(
|
|
||||||
" " if unicodedata.category(c)[0] in "MSP" else c
|
|
||||||
for c in unicodedata.normalize("NFKC", s)
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
class BasicTextNormalizer:
|
|
||||||
def __init__(self, remove_diacritics: bool = False, split_letters: bool = False):
|
|
||||||
self.clean = (
|
|
||||||
remove_symbols_and_diacritics if remove_diacritics else remove_symbols
|
|
||||||
)
|
|
||||||
self.split_letters = split_letters
|
|
||||||
|
|
||||||
def __call__(self, s: str):
|
|
||||||
s = s.lower()
|
|
||||||
s = re.sub(r"[<\[][^>\]]*[>\]]", "", s) # remove words between brackets
|
|
||||||
s = re.sub(r"\(([^)]+?)\)", "", s) # remove words between parenthesis
|
|
||||||
s = self.clean(s).lower()
|
|
||||||
|
|
||||||
if self.split_letters:
|
|
||||||
s = " ".join(regex.findall(r"\X", s, regex.U))
|
|
||||||
|
|
||||||
s = re.sub(
|
|
||||||
r"\s+", " ", s
|
|
||||||
) # replace any successive whitespace characters with a space
|
|
||||||
|
|
||||||
return s
|
|
File diff suppressed because it is too large
Load Diff
@ -1,550 +0,0 @@
|
|||||||
import json
|
|
||||||
import os
|
|
||||||
import re
|
|
||||||
from fractions import Fraction
|
|
||||||
from typing import Iterator, List, Match, Optional, Union
|
|
||||||
|
|
||||||
from more_itertools import windowed
|
|
||||||
|
|
||||||
from .basic import remove_symbols_and_diacritics
|
|
||||||
|
|
||||||
|
|
||||||
class EnglishNumberNormalizer:
|
|
||||||
"""
|
|
||||||
Convert any spelled-out numbers into arabic numbers, while handling:
|
|
||||||
|
|
||||||
- remove any commas
|
|
||||||
- keep the suffixes such as: `1960s`, `274th`, `32nd`, etc.
|
|
||||||
- spell out currency symbols after the number. e.g. `$20 million` -> `20000000 dollars`
|
|
||||||
- spell out `one` and `ones`
|
|
||||||
- interpret successive single-digit numbers as nominal: `one oh one` -> `101`
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self):
|
|
||||||
super().__init__()
|
|
||||||
|
|
||||||
self.zeros = {"o", "oh", "zero"}
|
|
||||||
self.ones = {
|
|
||||||
name: i
|
|
||||||
for i, name in enumerate(
|
|
||||||
[
|
|
||||||
"one",
|
|
||||||
"two",
|
|
||||||
"three",
|
|
||||||
"four",
|
|
||||||
"five",
|
|
||||||
"six",
|
|
||||||
"seven",
|
|
||||||
"eight",
|
|
||||||
"nine",
|
|
||||||
"ten",
|
|
||||||
"eleven",
|
|
||||||
"twelve",
|
|
||||||
"thirteen",
|
|
||||||
"fourteen",
|
|
||||||
"fifteen",
|
|
||||||
"sixteen",
|
|
||||||
"seventeen",
|
|
||||||
"eighteen",
|
|
||||||
"nineteen",
|
|
||||||
],
|
|
||||||
start=1,
|
|
||||||
)
|
|
||||||
}
|
|
||||||
self.ones_plural = {
|
|
||||||
"sixes" if name == "six" else name + "s": (value, "s")
|
|
||||||
for name, value in self.ones.items()
|
|
||||||
}
|
|
||||||
self.ones_ordinal = {
|
|
||||||
"zeroth": (0, "th"),
|
|
||||||
"first": (1, "st"),
|
|
||||||
"second": (2, "nd"),
|
|
||||||
"third": (3, "rd"),
|
|
||||||
"fifth": (5, "th"),
|
|
||||||
"twelfth": (12, "th"),
|
|
||||||
**{
|
|
||||||
name + ("h" if name.endswith("t") else "th"): (value, "th")
|
|
||||||
for name, value in self.ones.items()
|
|
||||||
if value > 3 and value != 5 and value != 12
|
|
||||||
},
|
|
||||||
}
|
|
||||||
self.ones_suffixed = {**self.ones_plural, **self.ones_ordinal}
|
|
||||||
|
|
||||||
self.tens = {
|
|
||||||
"twenty": 20,
|
|
||||||
"thirty": 30,
|
|
||||||
"forty": 40,
|
|
||||||
"fifty": 50,
|
|
||||||
"sixty": 60,
|
|
||||||
"seventy": 70,
|
|
||||||
"eighty": 80,
|
|
||||||
"ninety": 90,
|
|
||||||
}
|
|
||||||
self.tens_plural = {
|
|
||||||
name.replace("y", "ies"): (value, "s") for name, value in self.tens.items()
|
|
||||||
}
|
|
||||||
self.tens_ordinal = {
|
|
||||||
name.replace("y", "ieth"): (value, "th")
|
|
||||||
for name, value in self.tens.items()
|
|
||||||
}
|
|
||||||
self.tens_suffixed = {**self.tens_plural, **self.tens_ordinal}
|
|
||||||
|
|
||||||
self.multipliers = {
|
|
||||||
"hundred": 100,
|
|
||||||
"thousand": 1_000,
|
|
||||||
"million": 1_000_000,
|
|
||||||
"billion": 1_000_000_000,
|
|
||||||
"trillion": 1_000_000_000_000,
|
|
||||||
"quadrillion": 1_000_000_000_000_000,
|
|
||||||
"quintillion": 1_000_000_000_000_000_000,
|
|
||||||
"sextillion": 1_000_000_000_000_000_000_000,
|
|
||||||
"septillion": 1_000_000_000_000_000_000_000_000,
|
|
||||||
"octillion": 1_000_000_000_000_000_000_000_000_000,
|
|
||||||
"nonillion": 1_000_000_000_000_000_000_000_000_000_000,
|
|
||||||
"decillion": 1_000_000_000_000_000_000_000_000_000_000_000,
|
|
||||||
}
|
|
||||||
self.multipliers_plural = {
|
|
||||||
name + "s": (value, "s") for name, value in self.multipliers.items()
|
|
||||||
}
|
|
||||||
self.multipliers_ordinal = {
|
|
||||||
name + "th": (value, "th") for name, value in self.multipliers.items()
|
|
||||||
}
|
|
||||||
self.multipliers_suffixed = {
|
|
||||||
**self.multipliers_plural,
|
|
||||||
**self.multipliers_ordinal,
|
|
||||||
}
|
|
||||||
self.decimals = {*self.ones, *self.tens, *self.zeros}
|
|
||||||
|
|
||||||
self.preceding_prefixers = {
|
|
||||||
"minus": "-",
|
|
||||||
"negative": "-",
|
|
||||||
"plus": "+",
|
|
||||||
"positive": "+",
|
|
||||||
}
|
|
||||||
self.following_prefixers = {
|
|
||||||
"pound": "£",
|
|
||||||
"pounds": "£",
|
|
||||||
"euro": "€",
|
|
||||||
"euros": "€",
|
|
||||||
"dollar": "$",
|
|
||||||
"dollars": "$",
|
|
||||||
"cent": "¢",
|
|
||||||
"cents": "¢",
|
|
||||||
}
|
|
||||||
self.prefixes = set(
|
|
||||||
list(self.preceding_prefixers.values())
|
|
||||||
+ list(self.following_prefixers.values())
|
|
||||||
)
|
|
||||||
self.suffixers = {
|
|
||||||
"per": {"cent": "%"},
|
|
||||||
"percent": "%",
|
|
||||||
}
|
|
||||||
self.specials = {"and", "double", "triple", "point"}
|
|
||||||
|
|
||||||
self.words = set(
|
|
||||||
[
|
|
||||||
key
|
|
||||||
for mapping in [
|
|
||||||
self.zeros,
|
|
||||||
self.ones,
|
|
||||||
self.ones_suffixed,
|
|
||||||
self.tens,
|
|
||||||
self.tens_suffixed,
|
|
||||||
self.multipliers,
|
|
||||||
self.multipliers_suffixed,
|
|
||||||
self.preceding_prefixers,
|
|
||||||
self.following_prefixers,
|
|
||||||
self.suffixers,
|
|
||||||
self.specials,
|
|
||||||
]
|
|
||||||
for key in mapping
|
|
||||||
]
|
|
||||||
)
|
|
||||||
self.literal_words = {"one", "ones"}
|
|
||||||
|
|
||||||
def process_words(self, words: List[str]) -> Iterator[str]:
|
|
||||||
prefix: Optional[str] = None
|
|
||||||
value: Optional[Union[str, int]] = None
|
|
||||||
skip = False
|
|
||||||
|
|
||||||
def to_fraction(s: str):
|
|
||||||
try:
|
|
||||||
return Fraction(s)
|
|
||||||
except ValueError:
|
|
||||||
return None
|
|
||||||
|
|
||||||
def output(result: Union[str, int]):
|
|
||||||
nonlocal prefix, value
|
|
||||||
result = str(result)
|
|
||||||
if prefix is not None:
|
|
||||||
result = prefix + result
|
|
||||||
value = None
|
|
||||||
prefix = None
|
|
||||||
return result
|
|
||||||
|
|
||||||
if len(words) == 0:
|
|
||||||
return
|
|
||||||
|
|
||||||
for prev, current, next in windowed([None] + words + [None], 3):
|
|
||||||
if skip:
|
|
||||||
skip = False
|
|
||||||
continue
|
|
||||||
|
|
||||||
next_is_numeric = next is not None and re.match(r"^\d+(\.\d+)?$", next)
|
|
||||||
has_prefix = current[0] in self.prefixes
|
|
||||||
current_without_prefix = current[1:] if has_prefix else current
|
|
||||||
if re.match(r"^\d+(\.\d+)?$", current_without_prefix):
|
|
||||||
# arabic numbers (potentially with signs and fractions)
|
|
||||||
f = to_fraction(current_without_prefix)
|
|
||||||
assert f is not None
|
|
||||||
if value is not None:
|
|
||||||
if isinstance(value, str) and value.endswith("."):
|
|
||||||
# concatenate decimals / ip address components
|
|
||||||
value = str(value) + str(current)
|
|
||||||
continue
|
|
||||||
else:
|
|
||||||
yield output(value)
|
|
||||||
|
|
||||||
prefix = current[0] if has_prefix else prefix
|
|
||||||
if f.denominator == 1:
|
|
||||||
value = f.numerator # store integers as int
|
|
||||||
else:
|
|
||||||
value = current_without_prefix
|
|
||||||
elif current not in self.words:
|
|
||||||
# non-numeric words
|
|
||||||
if value is not None:
|
|
||||||
yield output(value)
|
|
||||||
yield output(current)
|
|
||||||
elif current in self.zeros:
|
|
||||||
value = str(value or "") + "0"
|
|
||||||
elif current in self.ones:
|
|
||||||
ones = self.ones[current]
|
|
||||||
|
|
||||||
if value is None:
|
|
||||||
value = ones
|
|
||||||
elif isinstance(value, str) or prev in self.ones:
|
|
||||||
if (
|
|
||||||
prev in self.tens and ones < 10
|
|
||||||
): # replace the last zero with the digit
|
|
||||||
assert value[-1] == "0"
|
|
||||||
value = value[:-1] + str(ones)
|
|
||||||
else:
|
|
||||||
value = str(value) + str(ones)
|
|
||||||
elif ones < 10:
|
|
||||||
if value % 10 == 0:
|
|
||||||
value += ones
|
|
||||||
else:
|
|
||||||
value = str(value) + str(ones)
|
|
||||||
else: # eleven to nineteen
|
|
||||||
if value % 100 == 0:
|
|
||||||
value += ones
|
|
||||||
else:
|
|
||||||
value = str(value) + str(ones)
|
|
||||||
elif current in self.ones_suffixed:
|
|
||||||
# ordinal or cardinal; yield the number right away
|
|
||||||
ones, suffix = self.ones_suffixed[current]
|
|
||||||
if value is None:
|
|
||||||
yield output(str(ones) + suffix)
|
|
||||||
elif isinstance(value, str) or prev in self.ones:
|
|
||||||
if prev in self.tens and ones < 10:
|
|
||||||
assert value[-1] == "0"
|
|
||||||
yield output(value[:-1] + str(ones) + suffix)
|
|
||||||
else:
|
|
||||||
yield output(str(value) + str(ones) + suffix)
|
|
||||||
elif ones < 10:
|
|
||||||
if value % 10 == 0:
|
|
||||||
yield output(str(value + ones) + suffix)
|
|
||||||
else:
|
|
||||||
yield output(str(value) + str(ones) + suffix)
|
|
||||||
else: # eleven to nineteen
|
|
||||||
if value % 100 == 0:
|
|
||||||
yield output(str(value + ones) + suffix)
|
|
||||||
else:
|
|
||||||
yield output(str(value) + str(ones) + suffix)
|
|
||||||
value = None
|
|
||||||
elif current in self.tens:
|
|
||||||
tens = self.tens[current]
|
|
||||||
if value is None:
|
|
||||||
value = tens
|
|
||||||
elif isinstance(value, str):
|
|
||||||
value = str(value) + str(tens)
|
|
||||||
else:
|
|
||||||
if value % 100 == 0:
|
|
||||||
value += tens
|
|
||||||
else:
|
|
||||||
value = str(value) + str(tens)
|
|
||||||
elif current in self.tens_suffixed:
|
|
||||||
# ordinal or cardinal; yield the number right away
|
|
||||||
tens, suffix = self.tens_suffixed[current]
|
|
||||||
if value is None:
|
|
||||||
yield output(str(tens) + suffix)
|
|
||||||
elif isinstance(value, str):
|
|
||||||
yield output(str(value) + str(tens) + suffix)
|
|
||||||
else:
|
|
||||||
if value % 100 == 0:
|
|
||||||
yield output(str(value + tens) + suffix)
|
|
||||||
else:
|
|
||||||
yield output(str(value) + str(tens) + suffix)
|
|
||||||
elif current in self.multipliers:
|
|
||||||
multiplier = self.multipliers[current]
|
|
||||||
if value is None:
|
|
||||||
value = multiplier
|
|
||||||
elif isinstance(value, str) or value == 0:
|
|
||||||
f = to_fraction(value)
|
|
||||||
p = f * multiplier if f is not None else None
|
|
||||||
if f is not None and p.denominator == 1:
|
|
||||||
value = p.numerator
|
|
||||||
else:
|
|
||||||
yield output(value)
|
|
||||||
value = multiplier
|
|
||||||
else:
|
|
||||||
before = value // 1000 * 1000
|
|
||||||
residual = value % 1000
|
|
||||||
value = before + residual * multiplier
|
|
||||||
elif current in self.multipliers_suffixed:
|
|
||||||
multiplier, suffix = self.multipliers_suffixed[current]
|
|
||||||
if value is None:
|
|
||||||
yield output(str(multiplier) + suffix)
|
|
||||||
elif isinstance(value, str):
|
|
||||||
f = to_fraction(value)
|
|
||||||
p = f * multiplier if f is not None else None
|
|
||||||
if f is not None and p.denominator == 1:
|
|
||||||
yield output(str(p.numerator) + suffix)
|
|
||||||
else:
|
|
||||||
yield output(value)
|
|
||||||
yield output(str(multiplier) + suffix)
|
|
||||||
else: # int
|
|
||||||
before = value // 1000 * 1000
|
|
||||||
residual = value % 1000
|
|
||||||
value = before + residual * multiplier
|
|
||||||
yield output(str(value) + suffix)
|
|
||||||
value = None
|
|
||||||
elif current in self.preceding_prefixers:
|
|
||||||
# apply prefix (positive, minus, etc.) if it precedes a number
|
|
||||||
if value is not None:
|
|
||||||
yield output(value)
|
|
||||||
|
|
||||||
if next in self.words or next_is_numeric:
|
|
||||||
prefix = self.preceding_prefixers[current]
|
|
||||||
else:
|
|
||||||
yield output(current)
|
|
||||||
elif current in self.following_prefixers:
|
|
||||||
# apply prefix (dollars, cents, etc.) only after a number
|
|
||||||
if value is not None:
|
|
||||||
prefix = self.following_prefixers[current]
|
|
||||||
yield output(value)
|
|
||||||
else:
|
|
||||||
yield output(current)
|
|
||||||
elif current in self.suffixers:
|
|
||||||
# apply suffix symbols (percent -> '%')
|
|
||||||
if value is not None:
|
|
||||||
suffix = self.suffixers[current]
|
|
||||||
if isinstance(suffix, dict):
|
|
||||||
if next in suffix:
|
|
||||||
yield output(str(value) + suffix[next])
|
|
||||||
skip = True
|
|
||||||
else:
|
|
||||||
yield output(value)
|
|
||||||
yield output(current)
|
|
||||||
else:
|
|
||||||
yield output(str(value) + suffix)
|
|
||||||
else:
|
|
||||||
yield output(current)
|
|
||||||
elif current in self.specials:
|
|
||||||
if next not in self.words and not next_is_numeric:
|
|
||||||
# apply special handling only if the next word can be numeric
|
|
||||||
if value is not None:
|
|
||||||
yield output(value)
|
|
||||||
yield output(current)
|
|
||||||
elif current == "and":
|
|
||||||
# ignore "and" after hundreds, thousands, etc.
|
|
||||||
if prev not in self.multipliers:
|
|
||||||
if value is not None:
|
|
||||||
yield output(value)
|
|
||||||
yield output(current)
|
|
||||||
elif current == "double" or current == "triple":
|
|
||||||
if next in self.ones or next in self.zeros:
|
|
||||||
repeats = 2 if current == "double" else 3
|
|
||||||
ones = self.ones.get(next, 0)
|
|
||||||
value = str(value or "") + str(ones) * repeats
|
|
||||||
skip = True
|
|
||||||
else:
|
|
||||||
if value is not None:
|
|
||||||
yield output(value)
|
|
||||||
yield output(current)
|
|
||||||
elif current == "point":
|
|
||||||
if next in self.decimals or next_is_numeric:
|
|
||||||
value = str(value or "") + "."
|
|
||||||
else:
|
|
||||||
# should all have been covered at this point
|
|
||||||
raise ValueError(f"Unexpected token: {current}")
|
|
||||||
else:
|
|
||||||
# all should have been covered at this point
|
|
||||||
raise ValueError(f"Unexpected token: {current}")
|
|
||||||
|
|
||||||
if value is not None:
|
|
||||||
yield output(value)
|
|
||||||
|
|
||||||
def preprocess(self, s: str):
|
|
||||||
# replace "<number> and a half" with "<number> point five"
|
|
||||||
results = []
|
|
||||||
|
|
||||||
segments = re.split(r"\band\s+a\s+half\b", s)
|
|
||||||
for i, segment in enumerate(segments):
|
|
||||||
if len(segment.strip()) == 0:
|
|
||||||
continue
|
|
||||||
if i == len(segments) - 1:
|
|
||||||
results.append(segment)
|
|
||||||
else:
|
|
||||||
results.append(segment)
|
|
||||||
last_word = segment.rsplit(maxsplit=2)[-1]
|
|
||||||
if last_word in self.decimals or last_word in self.multipliers:
|
|
||||||
results.append("point five")
|
|
||||||
else:
|
|
||||||
results.append("and a half")
|
|
||||||
|
|
||||||
s = " ".join(results)
|
|
||||||
|
|
||||||
# put a space at number/letter boundary
|
|
||||||
s = re.sub(r"([a-z])([0-9])", r"\1 \2", s)
|
|
||||||
s = re.sub(r"([0-9])([a-z])", r"\1 \2", s)
|
|
||||||
|
|
||||||
# but remove spaces which could be a suffix
|
|
||||||
s = re.sub(r"([0-9])\s+(st|nd|rd|th|s)\b", r"\1\2", s)
|
|
||||||
|
|
||||||
return s
|
|
||||||
|
|
||||||
def postprocess(self, s: str):
|
|
||||||
def combine_cents(m: Match):
|
|
||||||
try:
|
|
||||||
currency = m.group(1)
|
|
||||||
integer = m.group(2)
|
|
||||||
cents = int(m.group(3))
|
|
||||||
return f"{currency}{integer}.{cents:02d}"
|
|
||||||
except ValueError:
|
|
||||||
return m.string
|
|
||||||
|
|
||||||
def extract_cents(m: Match):
|
|
||||||
try:
|
|
||||||
return f"¢{int(m.group(1))}"
|
|
||||||
except ValueError:
|
|
||||||
return m.string
|
|
||||||
|
|
||||||
# apply currency postprocessing; "$2 and ¢7" -> "$2.07"
|
|
||||||
s = re.sub(r"([€£$])([0-9]+) (?:and )?¢([0-9]{1,2})\b", combine_cents, s)
|
|
||||||
s = re.sub(r"[€£$]0.([0-9]{1,2})\b", extract_cents, s)
|
|
||||||
|
|
||||||
# write "one(s)" instead of "1(s)", just for the readability
|
|
||||||
s = re.sub(r"\b1(s?)\b", r"one\1", s)
|
|
||||||
|
|
||||||
return s
|
|
||||||
|
|
||||||
def __call__(self, s: str):
|
|
||||||
s = self.preprocess(s)
|
|
||||||
s = " ".join(word for word in self.process_words(s.split()) if word is not None)
|
|
||||||
s = self.postprocess(s)
|
|
||||||
|
|
||||||
return s
|
|
||||||
|
|
||||||
|
|
||||||
class EnglishSpellingNormalizer:
|
|
||||||
"""
|
|
||||||
Applies British-American spelling mappings as listed in [1].
|
|
||||||
|
|
||||||
[1] https://www.tysto.com/uk-us-spelling-list.html
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self):
|
|
||||||
mapping_path = os.path.join(os.path.dirname(__file__), "english.json")
|
|
||||||
self.mapping = json.load(open(mapping_path))
|
|
||||||
|
|
||||||
def __call__(self, s: str):
|
|
||||||
return " ".join(self.mapping.get(word, word) for word in s.split())
|
|
||||||
|
|
||||||
|
|
||||||
class EnglishTextNormalizer:
|
|
||||||
def __init__(self):
|
|
||||||
self.ignore_patterns = r"\b(hmm|mm|mhm|mmm|uh|um)\b"
|
|
||||||
self.replacers = {
|
|
||||||
# common contractions
|
|
||||||
r"\bwon't\b": "will not",
|
|
||||||
r"\bcan't\b": "can not",
|
|
||||||
r"\blet's\b": "let us",
|
|
||||||
r"\bain't\b": "aint",
|
|
||||||
r"\by'all\b": "you all",
|
|
||||||
r"\bwanna\b": "want to",
|
|
||||||
r"\bgotta\b": "got to",
|
|
||||||
r"\bgonna\b": "going to",
|
|
||||||
r"\bi'ma\b": "i am going to",
|
|
||||||
r"\bimma\b": "i am going to",
|
|
||||||
r"\bwoulda\b": "would have",
|
|
||||||
r"\bcoulda\b": "could have",
|
|
||||||
r"\bshoulda\b": "should have",
|
|
||||||
r"\bma'am\b": "madam",
|
|
||||||
# contractions in titles/prefixes
|
|
||||||
r"\bmr\b": "mister ",
|
|
||||||
r"\bmrs\b": "missus ",
|
|
||||||
r"\bst\b": "saint ",
|
|
||||||
r"\bdr\b": "doctor ",
|
|
||||||
r"\bprof\b": "professor ",
|
|
||||||
r"\bcapt\b": "captain ",
|
|
||||||
r"\bgov\b": "governor ",
|
|
||||||
r"\bald\b": "alderman ",
|
|
||||||
r"\bgen\b": "general ",
|
|
||||||
r"\bsen\b": "senator ",
|
|
||||||
r"\brep\b": "representative ",
|
|
||||||
r"\bpres\b": "president ",
|
|
||||||
r"\brev\b": "reverend ",
|
|
||||||
r"\bhon\b": "honorable ",
|
|
||||||
r"\basst\b": "assistant ",
|
|
||||||
r"\bassoc\b": "associate ",
|
|
||||||
r"\blt\b": "lieutenant ",
|
|
||||||
r"\bcol\b": "colonel ",
|
|
||||||
r"\bjr\b": "junior ",
|
|
||||||
r"\bsr\b": "senior ",
|
|
||||||
r"\besq\b": "esquire ",
|
|
||||||
# prefect tenses, ideally it should be any past participles, but it's harder..
|
|
||||||
r"'d been\b": " had been",
|
|
||||||
r"'s been\b": " has been",
|
|
||||||
r"'d gone\b": " had gone",
|
|
||||||
r"'s gone\b": " has gone",
|
|
||||||
r"'d done\b": " had done", # "'s done" is ambiguous
|
|
||||||
r"'s got\b": " has got",
|
|
||||||
# general contractions
|
|
||||||
r"n't\b": " not",
|
|
||||||
r"'re\b": " are",
|
|
||||||
r"'s\b": " is",
|
|
||||||
r"'d\b": " would",
|
|
||||||
r"'ll\b": " will",
|
|
||||||
r"'t\b": " not",
|
|
||||||
r"'ve\b": " have",
|
|
||||||
r"'m\b": " am",
|
|
||||||
}
|
|
||||||
self.standardize_numbers = EnglishNumberNormalizer()
|
|
||||||
self.standardize_spellings = EnglishSpellingNormalizer()
|
|
||||||
|
|
||||||
def __call__(self, s: str):
|
|
||||||
s = s.lower()
|
|
||||||
|
|
||||||
s = re.sub(r"[<\[][^>\]]*[>\]]", "", s) # remove words between brackets
|
|
||||||
s = re.sub(r"\(([^)]+?)\)", "", s) # remove words between parenthesis
|
|
||||||
s = re.sub(self.ignore_patterns, "", s)
|
|
||||||
s = re.sub(r"\s+'", "'", s) # when there's a space before an apostrophe
|
|
||||||
|
|
||||||
for pattern, replacement in self.replacers.items():
|
|
||||||
s = re.sub(pattern, replacement, s)
|
|
||||||
|
|
||||||
s = re.sub(r"(\d),(\d)", r"\1\2", s) # remove commas between digits
|
|
||||||
s = re.sub(r"\.([^0-9]|$)", r" \1", s) # remove periods not followed by numbers
|
|
||||||
s = remove_symbols_and_diacritics(s, keep=".%$¢€£") # keep numeric symbols
|
|
||||||
|
|
||||||
s = self.standardize_numbers(s)
|
|
||||||
s = self.standardize_spellings(s)
|
|
||||||
|
|
||||||
# now remove prefix/suffix symbols that are not preceded/followed by numbers
|
|
||||||
s = re.sub(r"[.$¢€£]([^0-9])", r" \1", s)
|
|
||||||
s = re.sub(r"([^0-9])%", r"\1 ", s)
|
|
||||||
|
|
||||||
s = re.sub(r"\s+", " ", s) # replace any successive whitespaces with a space
|
|
||||||
|
|
||||||
return s
|
|
@ -1,6 +0,0 @@
|
|||||||
# This is the minimal set of dependencies we need to compute
|
|
||||||
# WER score. Read Section 3.2. of the original paper
|
|
||||||
# (https://arxiv.org/abs/2212.04356) for more contexts.
|
|
||||||
jiwer
|
|
||||||
regex
|
|
||||||
more-itertools
|
|
Loading…
x
Reference in New Issue
Block a user