Compare commits

..

1092 Commits

Author SHA1 Message Date
e400aeb770 examples : add new sources
ggml-ci
2025-04-02 15:52:29 +03:00
cb9a21b957 sync : ggml 2025-04-02 15:52:29 +03:00
dacb7caed6 cpu: move all the operators into a separate c++ file (except mul_mat) (ggml/1167)
* cpu: refactor SIMD mappings and vectorized op functions into separate files

* Fix warning for ggml_float to float

* Fix warnings

* cpu: move all the operations (except mul_mat) to a separate c++ file

* fix whitespace

* Update ggml/src/ggml-cpu/vec.h

Co-authored-by: Diego Devesa <slarengh@gmail.com>

* Fix PR comments - use GGML_UNUSED, use cassert in ops.cpp

* Reverse the order of import for ops.h and vec.h, to match what was present in ggml-cpu.c previously

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-04-02 15:52:28 +03:00
ad4e350933 sync : ggml
ggml-ci
2025-04-02 15:51:57 +03:00
d7a9346ab1 get_rows and dup optimization (llama/12671)
* [CANN]get_rows and dup optimization.

Co-authored-by: hipudding <huafengchun@gmail.com>
Signed-off-by: noemotiovon <noemotiovon@gmail.com>

* [CANN]GET_ROWS and CPY/DUP optimization

Co-authored-by: hipudding <huafengchun@gmail.com>
Signed-off-by: noemotiovon <noemotiovon@gmail.com>

* [CANN]code style adjustment

Signed-off-by: noemotiovon <noemotiovon@gmail.com>

* [CANN]code style adjustment

Signed-off-by: noemotiovon <noemotiovon@gmail.com>

* [CANN]code style adjustment

Signed-off-by: noemotiovon <noemotiovon@gmail.com>

* [CANN]code style adjustment

Signed-off-by: noemotiovon <noemotiovon@gmail.com>

---------

Signed-off-by: noemotiovon <noemotiovon@gmail.com>
Co-authored-by: noemotiovon <noemotiovon@gmail.com>
Co-authored-by: hipudding <huafengchun@gmail.com>
2025-04-02 15:51:57 +03:00
b63d23f728 opencl : fix memory allocation size (llama/12649)
issue:
https://github.com/CodeLinaro/llama.cpp/pull/17#issuecomment-2760611283

This patch fixes the memory allocation size
not exceeding the maximum size of the OpenCL device.
2025-04-02 15:51:57 +03:00
f6ce10e4a1 metal : use F32 prec in FA kernels (llama/12688)
* metal : use F32 prec in FA kernels

ggml-ci

* cont : fix FA vec kernel

ggml-ci
2025-04-02 15:51:57 +03:00
6cb2b86581 Fix clang warning in gguf_check_reserved_keys (llama/12686)
* Fix clang warning in gguf_check_reserved_keys

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* Fix typo

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-04-02 15:51:57 +03:00
801d6bd809 vulkan: fix build when glslc doesn't support coopmat (llama/12683) 2025-04-02 15:51:57 +03:00
ddf7e6a15d SYCL: Rename oneMKL to oneMath (llama/12192)
* Rename oneMKL Interface to oneMath

* Use oneMath for Intel vendor

* Rename occurences to mkl

* clang-format

* Silence verbose warnings

* Set oneMath HIP_TARGETS

* Fix silence warnings

* Remove step to build oneMath from build instructions

* Use fixed oneMath version

* Remove INTEL_CPU

* Fold CMake oneDNN conditions

* Use Intel oneMKL for Intel devices

* Improve CMake message

* Link against MKL::MKL_SYCL::BLAS only

* Move oneMath documentation to Nvidia and AMD sections
2025-04-02 15:51:57 +03:00
0d42097fd3 SYCL: switch to SYCL namespace (llama/12674) 2025-04-02 15:51:57 +03:00
842b9c984c ggml : faster ssm scan (llama/10558)
* faster ssm_scan

* delete unused commnet

* clang format

* add space

* modify unnecessary calculations

* faster ssm conv implementatioin

* modify file name with dash
2025-04-02 15:51:57 +03:00
0810f02547 Vulkan: Add DP4A MMQ and Q8_1 quantization shader (llama/12135)
* Vulkan: Add DP4A MMQ and Q8_1 quantization shader

* Add q4_0 x q8_1 matrix matrix multiplication support

* Vulkan: Add int8 coopmat MMQ support

* Vulkan: Add q4_1, q5_0 and q5_1 quants, improve integer dot code

* Add GL_EXT_integer_dot_product check

* Remove ggml changes, fix mmq pipeline picker

* Remove ggml changes, restore Intel coopmat behaviour

* Fix glsl compile attempt when integer vec dot is not supported

* Remove redundant code, use non-saturating integer dot, enable all matmul sizes for mmq

* Remove redundant comment

* Fix integer dot check

* Fix compile issue with unsupported int dot glslc

* Update Windows build Vulkan SDK version
2025-04-02 15:51:57 +03:00
8c13c78f9d cmake : fix whitespace (llama/0) 2025-04-02 15:51:57 +03:00
f31b404fcb tests : remove gh label test-whisper-cli-tiny-en (#2988)
Some checks are pending
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Blocked by required conditions
CI / android (push) Waiting to run
CI / android_java (push) Waiting to run
CI / quantize (push) Waiting to run
CI / release (push) Blocked by required conditions
CI / coreml-base-en (push) Blocked by required conditions
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
Examples WASM / deploy-wasm-github-pages (push) Waiting to run
This commit removes test-whisper-cli-tiny-en from the gh label.

The motivation for this change is that until recently the tests were
disabled. But now that they are enabled some of the tests, specifically
the ci jobs that use sanatizers (e.g. thread-sanitizer) take a long time
to run as they are instrumented.
Some of these jobs also have matricies which means that there are
multiple jobs are created that all run these tests.
The suggestion here is to limit the number of tests that are run in the
ci jobs so cut down the CI build time.
2025-04-02 10:50:31 +02:00
854c0518bc examples : clarify Core ML encoder model usage [no ci] (#2987)
This commit clarifies the usage of the Core ML encoder model in the
whisper.obj and whisper.swiftui examples.

Refs: https://github.com/ggerganov/whisper.cpp/issues/2783
2025-04-02 08:32:14 +02:00
c8e3968edd ci : remove intermediate build on push to master (#2986)
This commit removes the builds that happen on each push to master.

Refs: https://github.com/ggerganov/whisper.cpp/discussions/2983#discussioncomment-12691424
2025-04-02 08:29:28 +02:00
b358de2458 whisper.objc : fix typo in README.md [no ci] (#2985)
This commit fixes a typo in the README.md file of the whisper.objc
example.

Resolves: https://github.com/ggerganov/whisper.cpp/issues/2984
2025-04-02 08:26:57 +02:00
11688b262f coreml: fix Whisper to CoreML conversion by disabling SDPA [no ci] (#2979)
* coreml: fix Whisper to CoreML conversion by disabling SDPA

This commit disables the use of PyTorch's
`scaled_dot_product_attention` in the Whisper model to avoid
compatibility issues during CoreML conversion.
The issue occurs because coremltools requires PyTorch 2.5.0, but the
Whisper implementation may expect behavior from newer PyTorch versions.

By setting `MultiHeadAttention.use_sdpa = False`, we force Whisper to
use its fallback manual attention implementation, which works correctly
with PyTorch 2.5.0 during the tracing process.

Refs: https://github.com/ggerganov/whisper.cpp/issues/2783

* coreml: fix audio shape in whisper decoder conversion

This commit fixes the audio shape in the whisper decoder conversion
script.

The motivation for this is that the  audio shape was incorrect and
was causing the conversion to fail.

* coreml : set -e in generate-coreml-interface.sh

The commit sets the -e flag in the generate-coreml-interface.sh script
to make sure the script fails if any command fails.

* coreml : update generated encoder/decoder interfaces

This commit updates the generated encoder/decoder interfaces for the
whisper model which is the result of running the
generate-coreml-interface.sh script.
2025-04-01 18:01:23 +02:00
04b9508fb3 ci : add coreml job that converts base.en to coreml [no ci] (#2981)
* ci : add coreml job that converts base.en to coreml [no ci]

This commit adds a new job to the CI pipeline that downloads the base.en
model and converts it to CoreML format. The CoreML model is then packed
into a zip file and uploaded as an artifact.

This will only be done for pushes to master, releases, or pre-releases.

Refs: https://github.com/ggerganov/whisper.cpp/issues/2783

* coreml : remove publishing of coreml model

* ci : add GGML_OPENMP=OFF to ubuntu-22-gcc-sanitized
2025-04-01 17:04:32 +02:00
4200430e75 tests : re-enable tests [no ci] (#2977)
This commit re-enables the tests in the build process which are
currently commented out.

It is possible to build the tests using `-DWHISPER_BUILD_TESTS=ON` and
then run a single test using:
```console
$ ctest -R test-whisper-cli-tiny.en --test-dir build
Internal ctest changing into directory: /home/danbev/work/ai/whisper-work/build
Test project /home/danbev/work/ai/whisper-work/build
    Start 2: test-whisper-cli-tiny.en
1/1 Test #2: test-whisper-cli-tiny.en .........   Passed    4.44 sec

100% tests passed, 0 tests failed out of 1

Label Time Summary:
en      =   4.44 sec*proc (1 test)
gh      =   4.44 sec*proc (1 test)
tiny    =   4.44 sec*proc (1 test)

Total Test time (real) =   4.44 sec
```

Some of the tests take a long time to run so it might not be a good idea
to enable them in CI, or perhaps we could only run a subset of the tests
in CI.
2025-03-31 17:04:37 +02:00
e153b8eaa2 android.java : re-add ggml source updates (#2975)
This commit updates the ggml source to include the new unary and binary
operations. I merged https://github.com/ggerganov/whisper.cpp/pull/2958
which seems to have overwritten the changes to the ggml source which
were added in https://github.com/ggerganov/whisper.cpp/pull/2972.

Sorry about this.
2025-03-31 16:14:33 +02:00
83af237f0b ci : re-enable freeBDS-latest job (#2973)
This commit re-enables the freeBSD-latest job which has been commented
out.

Refs: https://github.com/ggerganov/whisper.cpp/issues/2781
2025-03-31 15:24:08 +02:00
7a2e39750a ci : re-enable android_java job (#2958)
This commit re-enables the android_java job in the CI workflow. The job
was disabled because of a failing build.

The motivation for this is that Commit
226d344f56 ("whisper.android.java : update
build with ggml source changes") addressed build issues and it should
now be possible to re-enable this job.
2025-03-31 15:14:24 +02:00
0a40ae9728 android : add new ggml source files
ggml-ci
2025-03-31 14:56:53 +03:00
32cfdcbf42 ruby : add new ggml sources
ggml-ci
2025-03-31 14:56:53 +03:00
cfa42aca09 sync : ggml
ggml-ci
2025-03-31 14:56:53 +03:00
2e2f0f954b SYCL: Remove misleading ggml_sycl_op_flatten function (llama/12387)
* SYCL: Remove misleading ggml_sycl_op_flatten function

* remove trailing whitespace

* Fix L2 norm from rebase

* remove try catch block from element_wise.cpp

* remove comment from common.hp

* ggml-sycl.cpp: Add try catch sycl::exception block in compute_forward

* norm.cpp: remove try catch exception block
2025-03-31 14:56:53 +03:00
93631b2be6 metal : use constexpr in FA kernels + fix typedef (llama/12659)
* metal : use constexpr in FA kernels

ggml-ci

* cont

ggml-ci

* cont : fix typedef

ggml-ci
2025-03-31 14:56:53 +03:00
f9015b585b musa: fix all warnings, re-enable -DLLAMA_FATAL_WARNINGS=ON in ci and update doc (llama/12611)
* musa: fix all warnings

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* musa: enable -DLLAMA_FATAL_WARNINGS=ON in run.sh

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* musa: update ci doc (install ccache)

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* fix Windows build issue

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* Address review comments

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* Address review comments

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-03-31 14:56:53 +03:00
Jay
1880ffd7ff cmake : fix ccache conflict (llama/12522)
If users already set CMAKE_C_COMPILER_LAUNCHER globally, setting it in
cmake again will lead to conflict and compile fail.

Signed-off-by: Jay <BusyJay@users.noreply.github.com>
2025-03-31 14:56:53 +03:00
9173932c78 cpu : rm unused variable (ggml/1166) 2025-03-31 14:56:53 +03:00
94c3f3877f cpu: de-duplicate some of the operators and refactor (ggml/1144)
* cpu: de-duplicate some of the operators and refactor

* Fix PR comments

* Fix PR comments
2025-03-31 14:56:53 +03:00
00086469fb cmake: improve Vulkan cooperative matrix support checks (#2966)
Co-authored-by: Sandro Hanea <me@sandro.rocks>
2025-03-31 13:44:36 +03:00
2d8e40e2a0 examples : update README links to point to pages deployment (#2971)
This commit updates the README links to point to the pages deployment
instead of whisper.ggerganov.com.
2025-03-31 12:32:27 +02:00
e17af6524f ci : add github pages workflow for wasm examples (#2969)
* ci : add github pages workflow for wasm examples

This commit adds a github workflow to build and deploy the wasm examples
to github pages. The whisper.wasm example is deployed as the main page.

This workflow is trigged by a push to master and will deploy the
examples to: https://ggerganov.github.io/whisper.cpp/.

This requires that the repository has enabled github actions in
`Settings` -> `Pages` -> `Build and deployment` -> `Source` be set to
`GitHub Actions`.

One thing to note is that this commit removes the `talk` example as I'm
not sure how this example is built yet.

Refs: https://github.com/ggerganov/whisper.cpp/issues/2784
2025-03-31 11:34:40 +02:00
88d13a17a7 feat: add health check endpoint to server (#2968)
Some checks failed
CI / ubuntu-22-clang (linux/amd64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / quantize (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / release (push) Has been cancelled
2025-03-31 11:03:41 +03:00
f92bd59951 whisper : remove unnecessary GGML_UNUSED macro (#2960)
Some checks failed
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Blocked by required conditions
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
CI / release (push) Blocked by required conditions
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
Bindings Tests (Ruby) / ubuntu-22 (push) Has been cancelled
2025-03-30 05:56:10 +02:00
6e7629b146 sync : ggml
Some checks failed
CI / ubuntu-22-clang (linux/amd64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / quantize (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / release (push) Has been cancelled
ggml-ci
2025-03-28 21:47:42 +02:00
27533e7f63 metal : improve FA + improve MoE (llama/12612)
* ggml : FA with different K, V head sizes (CPU)

ggml-ci

* metal : add FA with HS=192

* metal : extend FA to support different K and V head sizes

ggml-ci

* metal : add FA vector kernels for heads K 192 and V 128

ggml-ci

* ggml : restrict op on other backends to equal head sizes

ggml-ci

* metal : optimize FA-vec kernel

ggml-ci

* metal : FA remove mq registers

* metal : improve MoE mul_mat_id condition

ggml-ci

* metal : fix comments + remove unnecessary addition

ggml-ci

* metal : avoid too much shared memory usage with mul_mat_id

ggml-ci
2025-03-28 21:47:42 +02:00
1b81415963 vulkan: fix coopmat shader generation when cross-compiling (llama/12272)
* vulkan: fix coopmat shader generation when cross-compiling

Previously the status of coopmat{,2} support isn't passed to the
vulkan-shaders-gen project building on the host, which leads to build
failure because of the cross-compiling code expecting coopmat{,2}
shaders that didn't get generated.

Fix this by passing the coopmat{,2} support status to vulkan-shaders
subproject.

Signed-off-by: Icenowy Zheng <uwu@icenowy.me>

* Only call coop-mat shaders once

* Fix whitespace

---------

Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Co-authored-by: bandoti <141645996+bandoti@users.noreply.github.com>
2025-03-28 21:47:42 +02:00
0001ec075f llamafile : ppc64le GEMV forwarding for FP32. (llama/12594)
This patch enables usage of MMA when one of the
dimensions of the matrix(ie either M or N) is 1. This
is useful in case of token generation where N < 2.

The concept of 'GEMV Forwarding' is used where when one
of the matrix has a single row/column, the elements are
broadcasted, instead of using packing routine to prepack
the matrix elements.

This change results in 5% - 15% improvement in total
speed(ie all tokens/total time), across various batch
sizes. This is in comparision with the corresponding
dot product implementation.

The patch is tested with FP32 models of Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf on a IBM POWER10 machine.

Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
2025-03-28 21:47:42 +02:00
5bad2e5099 rpc : send hash when tensor data is above some fixed threshold (llama/12496)
* rpc : send hash when tensor data is above some fixed threshold

ref #10095

* rpc : put cache under $HOME/.cache/llama.cpp

* try to fix win32 build

* another try to fix win32 build

* remove llama as dependency
2025-03-28 21:47:42 +02:00
6fc0ae2f5a opencl: add multi and vision rope, gelu_quick and im2col (llama/12600)
* opencl: add `im2col`

* opencl: add `gelu_quick`

* opencl: add mrope

* opencl: add vision rope
2025-03-28 21:47:42 +02:00
de6b38c6d9 bindings.go : add DetectedLanguage to go bindings (#2947)
Some checks failed
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Blocked by required conditions
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
CI / release (push) Blocked by required conditions
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
Bindings Tests (Go) / ubuntu-22 (push) Has been cancelled
Adding in DetectedLanguage(), a function to retrieve the detected
language that's populated by processing audio. Also adding in a unit
test to test the success.

Co-authored-by: Amanda Der Bedrosian <aderbedrosian@sdl.com>
2025-03-28 12:26:22 +01:00
46d6e0abc1 ruby : fix test failures in test_whisper (#2955)
* bindings.ruby : fix test failures in test_whisper

This commit updates the parallel tests to use 2 processors instead of
the number of processors on the system. It also comments out the setting
of the log callback to an empty lambda as this causes a segfault when
enabled.

The motivation for the change to the number of processors is that if one
has a large number of processors, for example I have 16 on the machine I
used to test this, this would cause the following warning to be printed:
```console
whisper_full_with_state: input is too short - 680 ms < 1000 ms. consider padding the input audio with silence
```

This is logged from:
```c++
int whisper_full_with_state(
        struct whisper_context * ctx,
          struct whisper_state * state,
    struct whisper_full_params   params,
                   const float * samples,
                           int   n_samples) {
   ...
    if (seek_end < seek_start + 100) {
        WHISPER_LOG_WARN("%s: input is too short - %d ms < 1000 ms. consider padding the input audio with silence\n", __func__, (seek_end - seek_start)*10);
        return 0;
    }
```
This will return early and there will be segment callbacks to be invoked
which in turn will cause the tests to fail.

* bindings.ruby : fix warnings in tests

This commit fixes the following warnings in the Ruby tests:
```console
/whisper/bindings/ruby/tests/test_segment.rb:52:
warning: ambiguity between regexp and two divisions:
wrap regexp in parentheses or add a space after `/' operator
```
And also adds a '_' prefix to some unused variables to avoid warnings.

* bindings.ruby : enable Wisper.log_set in tests

The commit reverts the commenting out of the Whisper.log_set call in
the test_whisper.rb tests.

I'm no longer getting segfaults when running the tests with this
which was the case earlier. One theory could be that I rebased this to
include the latest ggml sync to master to make sure things still worked.
With the latest changes in ggml, I can't reproduce the segfaults.
2025-03-28 17:29:56 +09:00
1279f0d0bc examples : support progress_callback API for addon.node (#2941)
Some checks failed
CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Blocked by required conditions
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
CI / release (push) Blocked by required conditions
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
Examples Tests / addon_node-ubuntu-22 (16.x) (push) Has been cancelled
Examples Tests / addon_node-ubuntu-22 (18.x) (push) Has been cancelled
* feat: progress supported

* fix: missing params

* style: Format the code to improve readability

Unified code indentation ensures consistent coding style, enhancing code readability and maintainability.

* feat: support prompt api

---------

Co-authored-by: linxiaodong <calm.lin@wukongsch.com>
2025-03-28 06:34:26 +01:00
f28bf5d186 xcf : fix visionOS build
Some checks are pending
CI / ubuntu-22-clang (linux/amd64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Blocked by required conditions
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
CI / release (push) Blocked by required conditions
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
ref: https://github.com/ggml-org/llama.cpp/pull/12415

ggml-ci
2025-03-27 11:06:03 +02:00
1fbdfb1d36 files : remove old wkv6 (#0)
ggml-ci
2025-03-27 11:06:03 +02:00
ee5581633b sync : ggml
ggml-ci
2025-03-27 11:06:03 +02:00
8ca67df291 ggml : sync/merge cmake,riscv,powerpc, add common.cmake (ggml/0) 2025-03-27 11:06:03 +02:00
fc6d343e76 llamafile : ppc64le MMA implementation for Q4_0. (llama/12489)
This change upstreams llamafile's cpu matrix
multiplication kernels for ppc64le ISA using MMA
builtins. This patch handles matrix multiplication
between quantised datatypes, block_q4_0 and
block_q8_0.

This change results in 5% - 50% improvement
in total speed(ie all tokens/total time), across
various batch sizes.

The patch is tested with Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf models on a
IBM POWER10 machine.

Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
2025-03-27 11:06:03 +02:00
3199356d3a SYCL: implement memset ggml backend buffer interface (llama/12580)
* SYCL: implement memset ggml backend buffer interface

* use GGML_ABORT macro

* Do not wait for all queues to finish for memset operation
2025-03-27 11:06:03 +02:00
e0c43b0bbf HIP: Add support for RDNA4 targets (llama/12372) 2025-03-27 11:06:03 +02:00
f4f619ea8e metal : refactor mat-vec code (llama/12569)
* metal : refactor mat-vec code

ggml-ci

* metal : rename all_sum -> sum_all

ggml-ci

* metal : fix comments [no ci]

* metal : fix nr constant [no ci]

* metal : mv q6_K support nr0 > 1

ggml-ci

* metal : reduce register pressure

ggml-ci

* metal : fix typo [no ci]

* metal : reduce register pressure

ggml-ci
2025-03-27 11:06:03 +02:00
3c4d363872 ggml : fix MUL_MAT_ID repack with Q8_K (llama/12544)
* ggml : fix MUL_MAT_ID repack with Q8_K

ggml-ci

* ggml : improve repack templates

ggml-ci
2025-03-27 11:06:03 +02:00
15aa189329 ggml-cpu : update KleidiAI to v1.5.0 (llama/12568)
ggml-cpu : bug fix related to KleidiAI LHS packing

Signed-off-by: Dan Johansson <dan.johansson@arm.com>
2025-03-27 11:06:03 +02:00
c53d5c9e85 SYCL: disable Q4_0 reorder optimization (llama/12560)
ggml-ci
2025-03-27 11:06:03 +02:00
ba6f584f30 opencl: simplify kernel embedding logic in cmakefile (llama/12503)
Co-authored-by: Max Krasnyansky <quic_maxk@quicinc.com>
2025-03-27 11:06:03 +02:00
a219941812 CUDA: Fix clang warnings (llama/12540)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-03-27 11:06:03 +02:00
a2cc8c2666 vulkan: fix mul_mat_vec failure in backend tests (llama/12529)
The OOB calculation could be wrong if the last iteration was during one of
the unrolled loops. Adjust the unrolling counts to avoid this. Add a couple
new backend tests that hit this failure on NVIDIA GPUs.
2025-03-27 11:06:03 +02:00
388ed98220 ggml : fix quantized cpy op (llama/12310)
* ggml : fix quantized cpy op

ggml-ci

* tests : add cpy tests for all types

ggml-ci

* tests : add BF16 copy tests

ggml-ci

* tests : fix loop for same-type copy

ggml-ci

* tests : add option to permute the dst tensor

ggml-ci
2025-03-27 11:06:03 +02:00
d487a28ae1 musa: refine compute capability (llama/12493)
* musa: refine compute capability

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* Address review comments

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-03-27 11:06:03 +02:00
cbb88c4050 vulkan: Optimize mul_mat_vec p021 and nc shaders (llama/12505)
* tests: add mul_mat perf/functional tests for p021/nc vulkan shaders

* vulkan: Optimize mul_mat_vec p021 and nc shaders.

These shaders are used in attention calculations, and when the KV cache grows
large they start to dominate the run time. For the nc shader (which is called
with large 'k' dimension), use unrolling and vector loads. For the p021 shader
(which is called with large 'm' and small 'k' dimensions), take advantage of
grouped query attention to reuse loads from the A matrix for the whole group,
and reduce the number of workgroups (too much overhead from tiny dispatches).

Using subgroupAdd in the p021 shader also helps, use that conditionally.
2025-03-27 11:06:03 +02:00
13455c0b5f Vulkan: RTE rounding for cpy to quant (llama/12480)
* Vulkan: RTE rounding for cpy to quant

Co-Authored-By: Jeff Bolz <jbolz@nvidia.com>

* remove trailing whitespace

* avoid duplicating pipeline_cpy_f32_quant

* fix copypasting issue

* remove duplicated code

---------

Co-authored-by: Jeff Bolz <jbolz@nvidia.com>
2025-03-27 11:06:03 +02:00
Eve
2f77a9e9bd vulkan: workaround for AMD Windows driver 16 bit unpack8 bug (llama/12472) 2025-03-27 11:06:03 +02:00
fa2b5249ff Fix build on Windows when ccache enabled (ggml/9954) (llama/9976)
* [SYCL] Fix build on Windows when ccache enabled (llama/9954)

* take effect only on windows and force it to icl

---------

Co-authored-by: Romain Biessy <romain.biessy@codeplay.com>
2025-03-27 11:06:03 +02:00
5b854ebba5 sycl: cleanup oneDNN related code (llama/12097) 2025-03-27 11:06:03 +02:00
8058f19d0b ggml : block interleaving support for Q4_K quantization for x86 AVX2 architecture (llama/12332)
* Add block interleaving support for Q4_K quantization

* Remove whitespaces and fix CI/CD issues

* Update pointer of bsums from int16_t to const int16_t

* Add vector version of quantize_q8_K_4x8 function

* Update code formatting based on review comments
2025-03-27 11:06:03 +02:00
ae6a9bb9a5 CUDA: Improve flash decoding kernel GPU occupancy for BS=1 case (llama/12183)
- Find out active blocks per SM using cudaOccupancyMaxActiveBlocksPerMultiprocessor API. Use this value to determine the optimal parallel_blocks value.
- Prefer vector flash attention kernels over MMA kernel for BS=1

Fixes Issue: #12182
---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-03-27 11:06:03 +02:00
24faba9e9b vulkan: optimize iq1 coopmat2 dequant functions (llama/12427) 2025-03-27 11:06:03 +02:00
c722ff84d3 Fix visionOS build and add CI (llama/12415)
* ci: add visionOS build workflow

Add a new GitHub Actions workflow for building on visionOS with CMake and Xcode.

* ggml: Define _DARWIN_C_SOURCE for visionOS to fix missing u_xxx typedefs

* ci: remove define hacks for u_xxx system types

---------

Co-authored-by: Giovanni Petrantoni <7008900+sinkingsugar@users.noreply.github.com>
2025-03-27 11:06:03 +02:00
102af79f63 vulkan: Submit once enough matmul work has been recorded (llama/12406)
I've been seeing significantly worse performance for tg with flash attention
enabled vs disabled, and it seems to be related to the submit heuristic.
Change the heuristic to check how many bytes worth of weight matrix are
used and flush every 100MB, and ramp up after the first few submits.
This seems to resolve the issue, and also increases perf for non-FA a bit.
2025-03-27 11:06:03 +02:00
03c364557d opencl: improve profiling (llama/12442)
* opencl: more profiling timing

* opencl: generate trace for profiling

* opencl: reduce profiling overhead

* Populate profiling timing info at the end rather than after each
  kernel run

* opencl: fix for chrome tracing
2025-03-27 11:06:03 +02:00
31b62276cf musa: override warp_size of musa device to 32 (llama/12445)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-03-27 11:06:03 +02:00
97b5a3055d SYCL: using graphs is configurable by environment variable and compile option (llama/12371)
* alberto changes

* enable sycl graphs by env variable

* fixed compilation warnings in ggml-sycl.cpp

* renamed graph variables

* fix markdown in docs/backend/SYCL.md

Co-authored-by: Romain Biessy <romain.biessy@codeplay.com>

* fix markdown in docs/backend/SYCL.md again

* compiling graphs by default, renamed graph_enable to graph_disable

---------

Co-authored-by: Romain Biessy <romain.biessy@codeplay.com>
2025-03-27 11:06:03 +02:00
9993c3f703 ggml : add SVE support for q6_K_q8_K (llama/12361) 2025-03-27 11:06:03 +02:00
fa72479cfb Vulkan: Default to 1GB allocations instead of 4GB to avoid fragmentation and driver issues (llama/12434) 2025-03-27 11:06:03 +02:00
6c15539c54 fixed compilation warnings in ggml-sycl (llama/12424) 2025-03-27 11:06:03 +02:00
52c4c03b0a llama: Add support for RWKV v7 architecture (llama/12412)
* ggml: Add op l2_norm

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* ggml: Add op rwkv_wkv7

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* llama: Add support for RWKV7 and ARWKV7 models

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* llama: fix inference with RWKV6Qwen2

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* llama: add more (a)rwkv7 variants in size

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Apply code-format changes

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* fix MUSA build

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* llama: fix shape error with rwkv using llama-parallel

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

---------

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2025-03-27 11:06:03 +02:00
cfc2560e41 cuda : enable CUDA Graph on CUDA Toolkit < 12.x (llama/12394)
* Enable CUDA Graph on CTK < 12.x

`cudaGraphExecUpdate` API was changed on 12.x. For this reason CUDA graph support was disabled on older CUDA toolkit. This change enables CUDA support in CTK version < 12.x by using older API if CTK < 12.x.

* Fix compilation errors with MUSA

* Disable CUDA Graph for MUSA
2025-03-27 11:06:03 +02:00
db6e8056b5 ggml-vulkan: remove unused find_program(glslc) (llama/12416)
It's already found by FindVulkan.cmake in the parent CMakeLists
2025-03-27 11:06:03 +02:00
b3f3779c1b vulkan: Add N/2 and N/4 optimized paths in coopmat2 shader (llama/12312) 2025-03-27 11:06:03 +02:00
13eeebb1b2 vulkan: subgroup size tuning (llama/12087)
* vulkan: subgroup size test

* Vulkan: Add device architecture enum and logic to recognize AMD generations

* vulkan: use new architecture logic to specify subgroup size

* Initial vulkan subgroup size tuning for RDNA3

* vulkan: commonize RDNA subgroup tuning

* vulkan: override subgroup size if required_subgroup_size = 0

* vulkan: disable warp 32 for RDNA3

* vulkan: fine tuned RDNA1 subgroup sizes

* vulkan: adjusted subgroup size map

* vulkan: fixed RDNA2 subgroup map

---------

Co-authored-by: 0cc4m <picard12@live.de>
2025-03-27 11:06:03 +02:00
905b834af1 vulkan: use fp32 in coopmat2 q4_k dequant function (llama/12309) 2025-03-27 11:06:03 +02:00
2cd3061a23 vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bounds checking (llama/12273)
* vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bounds checking
2025-03-27 11:06:03 +02:00
88d59e21b2 vulkan: Adjust coopmat2 tile sizes and selection heuristic (llama/12258) 2025-03-27 11:06:03 +02:00
4917f122d4 cmake : enable building llama.cpp using system libggml (llama/12321)
* cmake: Factor out compiler flag function from ggml

llama.cpps's build requires it, too, and we may want to make use of it
without add_subdirectory(ggml).

* cmake: Enable building against system ggml

This facilitates package maintenance for Linux distributions, where the
libggml library most likely will be shipped as an individual package
upon which a llama.cpp package depends.
2025-03-27 11:06:03 +02:00
16a1b77249 SYCL: set extras only on GGML_TYPE_Q4_0 (llama/12366)
* SYCL: set extras only on GGML_TYPE_Q4_0

* release tensor_extras in reset buffer interface
2025-03-27 11:06:03 +02:00
51d1398a0a SYCL: Delete redundant plus sign and space (llama/12391) 2025-03-27 11:06:03 +02:00
3499dd83c0 SYCL : support non-contiguous tensors in binary ops (add, sub, etc) (llama/12399)
* sycl : support non-contiguous tensors in binary ops

* sycl : silence unused variable warning

---------

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
2025-03-27 11:06:03 +02:00
7b7d9ae35e MUL_MAT optimization (llama/12382) 2025-03-27 11:06:03 +02:00
2dcb7181ff sycl : variable sg_size support for mmvq kernels (llama/12336) 2025-03-27 11:06:03 +02:00
96ab3b2465 CUDA/HIP: Fix fattn-vec-* when device warp size is not 32 (llama/12315)
When fattn-wmma was ported over to warp64 various bits that also touch fattn-vec where converted to
selectable warp size, however the fattn-vec kernels dont work with 64 wide warps for now, so we need
to avoid launching them with parameters for warp64
2025-03-27 11:06:03 +02:00
08f32992d0 vulkan: fix bug in coopmat1 mul_mat_id (llama/12316)
* tests: run mul_mat_id with a larger N

* vulkan: fix bug in coopmat1 mul_mat_id
2025-03-27 11:06:03 +02:00
394fae57c3 CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows per block between host and device code. (llama/12177)
refactor mmqv to unify the calculation of nwarps and rows per block between host and device code.

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-03-27 11:06:03 +02:00
0708835301 ggml-backend : fix backend search path (llama/12330)
* Fix backend search path

* replace .native() with '/'

* reverted .native()
2025-03-27 11:06:03 +02:00
774c519433 metal : Cache the Metal library at the device context level (llama/12265) 2025-03-27 11:06:03 +02:00
Eve
776cdceb9e mat vec double buffer (llama/12188) 2025-03-27 11:06:03 +02:00
03d050481e musa: support new arch mp_31 and update doc (llama/12296)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-03-27 11:06:03 +02:00
3d60219622 opencl: use OpenCL C standard supported by the device (llama/12221)
This patch nudges the llama.cpp a bit to be supported on PoCL which
doesn't support OpenCL C CL2.0. The issue is solved by querying the
device for the supported OpenCL C versions and using the highest one
available.
2025-03-27 11:06:03 +02:00
521d72d76e ggml-backend : make path_str compatible with C++20 (llama/12269) 2025-03-27 11:06:03 +02:00
9fb9025a40 ggml : skip intermediate .air file when compiling .metallib (llama/12247)
This commit updates the compilation of default.metallib to skip the
intermediate .air (Apple Intermediate Representation) file.

The motivation for this change is to simplify the custom command a
little and avoid generating and then removing the .air file.
2025-03-27 11:06:03 +02:00
3c2abb01e8 cmake: Enable specifying exact PowerPC CPU architecture (ggml/1138)
In the process, guard automatic CPU detection with GGML_NATIVE.

https://gcc.gnu.org/onlinedocs/gcc/RS_002f6000-and-PowerPC-Options.html#index-mcpu-10
2025-03-27 11:06:03 +02:00
efd9407e22 cmake: Comment out GGML_BIN_DIR for now (ggml/1139)
Nothing installs to it yet, so when attempting to use the cmake package,
set_and_check() triggers an error if the directory doesn't already exist
for other reasons.
2025-03-27 11:06:03 +02:00
3684af2594 scripts : update sync 2025-03-27 11:06:03 +02:00
206459a804 bindings-go : update Makefile to use cmake (#2952)
Some checks failed
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Blocked by required conditions
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
CI / release (push) Blocked by required conditions
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
Bindings Tests (Go) / ubuntu-22 (push) Has been cancelled
This commit updates the Makefile to use cmake instead of make to build
whisper.cpp.

The motivation for this change is that currently the make recipe test
will fail with the following error:
```console
$ make test
Mkdir build
Mkdir models
Build whisper
make[1]: Entering directory '/home/danbev/work/ai/whisper-work'
make[1]: *** No rule to make target 'libwhisper.a'.  Stop.
make[1]: Leaving directory '/home/danbev/work/ai/whisper-work'
make: *** [Makefile:33: whisper] Error 2
```
2025-03-26 16:21:07 +01:00
21d890d534 whisper : add support for backends with multiple ggml_backend_buffer_type (#2863)
* whisper : add support for ggml_backend_buffer_type

Signed-off-by: Dan Johansson <dan.johansson@arm.com>

* fix compile error when building on Ubuntu

Signed-off-by: Dan Johansson <dan.johansson@arm.com>

* remove copyright header from include file

Signed-off-by: Dan Johansson <dan.johansson@arm.com>

---------

Signed-off-by: Dan Johansson <dan.johansson@arm.com>
2025-03-26 16:54:02 +02:00
0b43a02be8 bindings.java : enable copyLibs task [no ci] (#2949)
* bindings.java : enable copyLibs task [no ci]

This commit adds a dependency on the copyLibs task to the sourcesJar and
jar tasks. This ensures that the libwhisper.so file is copied to the
correct location before the jar is built.

It also sets the executable bit on the gradlew file.

* bindings.java : add copyLibs dep for processResources [no ci]

This will otherwise cause builds to fail after doing an initial build.

* bindings.java : pass structs by value to native code

This commit refactors the code to pass the structs by value to the
native code. This is done by creating a ByValue class for each struct
and using it in the Java code.

The motivation for this change is that without this application crashes
due to what I believe was memory mis-alignement. When the structs were
passed to the native code they would be att different memory locations.
Passing by value overcomes this issue and considering that the structs
hold parementers (context and full params) it might be alright do to
this. These changes allow all the tests to pass.

* bindings.java : fix javadoc warnings [no ci]

* bindings.java : fix libwhisper.dylib path in build.gradle [no ci]

This commit fixes the copyLibwhisperDynlib task in the build.gradle file
to copy the correct libwhisper.dylib file from build/src.
2025-03-26 15:01:28 +01:00
2699e1485a bindings.javascript : update test instructions [no ci] (#2951)
This commit updates the instructions for running the test in the
JavaScript bindings README file.

The motivation for this is for Node.js versions after v16.4.0 the
`--experimental-wasm-threads` and `--experimental-wasm-simd` flags are
no longer required and they generate the following errors:
```console
$ node --experimental-wasm-threads --experimental-wasm-simd ../tests/test-whisper.js
node: bad option: --experimental-wasm-threads
node: bad option: --experimental-wasm-simd
```
2025-03-26 14:49:12 +01:00
594a121f3e readme : add note about SDL2 (#2946)
Some checks are pending
CI / ubuntu-22-clang (linux/amd64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Blocked by required conditions
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
CI / release (push) Blocked by required conditions
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
Precise the README section about real time audio processing, stating that sdl2 is needed.
2025-03-26 09:30:59 +02:00
996581c5e2 whisper.android : add GGML_USE_CPU compile definition (#2945)
Some checks are pending
CI / ubuntu-22-clang (linux/amd64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Blocked by required conditions
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
CI / release (push) Blocked by required conditions
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
This commit add GGML_USE_CPU to built target library to enable CPU
backend.

The motivation for this that without the compile definition the CPU
backend is not enabled and the app will crash when trying to use it.
2025-03-25 18:01:18 +01:00
226d344f56 whisper.android.java : update build with ggml source changes (#2942)
* whisper.android.java : update build with ggml source changes

This commit updates the whisper.android.java build to include the
new ggml source files and directories. The gradle build configuration is
also updated to include the aliyun maven repository.
2025-03-25 16:01:59 +01:00
bb9f68129f ci: fix SYCL build (#2943)
Some checks are pending
CI / ubuntu-22-clang (linux/amd64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Blocked by required conditions
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
CI / release (push) Blocked by required conditions
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
2025-03-25 11:20:37 +02:00
30cf30ca82 examples : reduce initial memory to 512MB (#2939)
Some checks are pending
CI / ubuntu-22-clang (linux/amd64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Blocked by required conditions
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
CI / release (push) Blocked by required conditions
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
* examples : reduce initial memory to 512MB

This commit reduces the initial memory size to 512MB. This is done to
to avoid WebAssembly memory allocation issues on some platforms. It also
adds a flag to allow the memory to grow dynamically (up to the maximum).

The motivation for this change is that currently the initial memory is
set to 2GB which might be to large for some platforms. This will lead to
an error being thrown from the JavaScript code generated by Emscripten
when trying to allocate memory. More details can be found in the
referenced issue below.


* examples : set MAXIMUM_MEMORY instead of TOTAL_MEMORY

This commit sets MAXIMUM_MEMORY instead of TOTAL_MEMORY in the
whisper.wasm example.

The motivation for this is that TOTAL_MEMORY and INITIAL_MEMORY are
actually the same thing. Instead we want to set MAXIMUM_MEMORY to
2GB. 

Refs: https://github.com/ggerganov/whisper.cpp/issues/2920
Refs: https://emscripten.org/docs/tools_reference/settings_reference.html#initial-memory
2025-03-24 14:42:12 +01:00
ee6286c35d examples : fix nthread parsing in whisper.wasm (#2938)
This commit fixes the nthread parsing in the whisper.wasm example when
using the `Threads` slider to change the number of threads to be used.

Currently this results in the following error:
```console
main.js:5597 Uncaught TypeError: Cannot convert "5" to int
    at checkAssertions (main.js:5597:21)
    at Object.toWireType (main.js:5611:15)
    at Object.full_default (eval at new_ (main.js:5292:27), <anonymous>:10:26)
    at whisper.wasm/:649:42
```
2025-03-24 14:40:00 +01:00
c7941d5ccc examples : fix request path for local worker files (#2937)
This commit adds a fix to the server.py file to handle requests for
web worker files when running the local python server to test the wasm
examples.

The motivation for this is that currently the server is serving files
from the build-em/bin directory which is where the .worker.js files
exist. But when examples access these resources they do so with the
application context path, for example /whisper.wasm/libmain.worker.js
but this will not be found as it currently works.
2025-03-24 14:33:45 +01:00
b82ac32a6c ggml : add logging for native build options/vars (#2935)
Some checks failed
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Blocked by required conditions
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
CI / release (push) Blocked by required conditions
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
Bindings Tests (Ruby) / ubuntu-22 (push) Has been cancelled
This commit adds debug level logging for the native build options and
variables to ggml/CMakeLists.txt.

The motivation for this is that it can be useful to see the effective
result of `GGML_NATIVE`, `GGML_NATIVE_DEFAULT`, and `INS_ENB` for a
cmake build. I've found myself adding similar logging a few times now,
so I thought it might be a good idea to add this.

Example output, specifying `-DCMAKE_MESSAGE_LOG_LEVEL=DEBUG` when
running cmake produces the following output:
```console
-- GGML_NATIVE         : OFF
-- GGML_NATIVE_DEFAULT : OFF
-- INS_ENB             : OFF
```
2025-03-24 09:53:38 +01:00
edf1ee1ef8 whisper : enhance model download scripts functionality and resolve compiler warning (#2925)
* whisper : improve whisper-cli executable path detection in model download shell scripts

If whisper-cli is found on the path, do not suggest invoking from build directory. This improves flexibility and usability for distribution and packaging scenarios.

* whisper : enhance Windows model download batch script to have comparable functionality and behaviour as shell scripts

* Download models to the current directory if the script is executed from the \bin\ directory (for future distribution scenarios where the script is in the \bin\ subdirectory of a Windows build)
* Add model_path command line argument
* If whisper-cli is found on the path, do not suggest invoking from build directory

* whisper : resolve compiler warning by removing duplicate definition of NOMINMAX in whisper-cli code
2025-03-24 10:39:50 +02:00
cf5ddb8c21 whisper : initialize decoder's rng with unique seed (#2932)
This change initializes each decoder's random number generator with a
unique seed.

The motivation for this is that currently all decoders are initialized
with the same seed value, 0. The result of this is that for the same
state (logits, probs, and logprobs) they will produce the same output.
2025-03-24 09:36:07 +01:00
7fe4979f25 ci : remove CMAKE_CUDA_ARCHITECTURES in windows-cublas (#2923)
Some checks failed
CI / ubuntu-22-clang (linux/amd64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / quantize (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / release (push) Has been cancelled
This commit removes the -DCMAKE_CUDA_ARCHITECTURES=all flag from the
windows-cublas job in the build.yml file.

The motivation for this is that building for all architectures is
unnecessary and takes a long time. Without this flag the architectures
will instead be set by ggml-cuda.

Refs: https://github.com/ggerganov/whisper.cpp/pull/2915#issuecomment-2743160743
2025-03-22 15:40:28 +01:00
9bc0dc7235 whisper : update default model download directory behavior to use current working directory when script is in /bin/ directory (#2924)
This change ensures that when the script is packaged and distributed, models are downloaded to the current directory instead of the script's location, preventing conflicts with system directories. This improves flexibility and usability for distribution and packaging scenarios.
2025-03-22 16:27:57 +02:00
3fc6ad97a3 whisper.swiftui : Add Core ML support to README [no ci] (#2921)
This commit updates the README to include instructions on how to use
a Core ML model with the example.
2025-03-21 11:38:32 +01:00
663cafc1e8 readme : update Python version to 3.11 for Core ML support [no -ci] (#2919)
This commit updates the recommended version of Python to 3.11 for Core
ML conversion support. It also adds the `-e` flag to the
`generate-coreml-model.sh` script to ensure that the script exits on the
first error.

The motivation for this that when following the installation instructions
using Python 3.10 I get the following error:
```console
(venv) $ ./models/generate-coreml-model.sh base.en

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.1.3 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "/whisper-work/models/convert-whisper-to-coreml.py", line 2, in <module>
    import torch
  File "/whisper-work/venv/lib/python3.10/site-packages/torch/__init__.py", line 870, in <module>
    from . import _masked
  File "/whisper-work/venv/lib/python3.10/site-packages/torch/_masked/__init__.py", line 420, in <module>
    def sum(input: Tensor,
  File "/whisper-work/venv/lib/python3.10/site-packages/torch/_masked/__init__.py", line 223, in _apply_docstring_templates
    example_input = torch.tensor([[-3, -2, -1], [0, 1, 2]])
/whisper-work/venv/lib/python3.10/site-packages/torch/_masked/__init__.py:223: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at  /Users/distiller/project/pytorch/torch/csrc/utils/tensor_numpy.cpp:68.)
  example_input = torch.tensor([[-3, -2, -1], [0, 1, 2]])
Minimum required torch version for importing coremltools.optimize.torch is 2.1.0. Got torch version 1.11.0.
Traceback (most recent call last):
  File "/whisper-work/models/convert-whisper-to-coreml.py", line 4, in <module>
    import coremltools as ct
  File "/whisper-work/venv/lib/python3.10/site-packages/coremltools/__init__.py", line 120, in <module>
    from . import converters, models, optimize, proto
  File "/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/__init__.py", line 7, in <module>
    from . import libsvm, sklearn, xgboost
  File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/xgboost/__init__.py", line 6, in <module>
    from ._tree import convert
  File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/xgboost/_tree.py", line 9, in <module>
    from ._tree_ensemble import convert_tree_ensemble as _convert_tree_ensemble
  File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/xgboost/_tree_ensemble.py", line 11, in <module>
    from ...models.tree_ensemble import TreeEnsembleClassifier
  File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/models/__init__.py", line 6, in <module>
    from . import (
  File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/models/ml_program/__init__.py", line 6, in <module>
    from . import compression_utils
  File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/models/ml_program/compression_utils.py", line 8, in <module>
    from coremltools.converters.mil.mil import Operation as _Operation
  File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/mil/__init__.py", line 7, in <module>
    from .frontend.tensorflow.tf_op_registry import register_tf_op
  File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/__init__.py", line 6, in <module>
    from . import tensorflow, tensorflow2, torch
  File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/__init__.py", line 11, in <module>
    from . import ops, quantization_ops
  File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 36, in <module>
    from .internal_graph import InternalTorchIRGraph, InternalTorchIRNode
  File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/internal_graph.py", line 15, in <module>
    from .exir_utils import extract_io_from_exir_program
  File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/exir_utils.py", line 99, in <module>
    ) -> Dict[str, torch.fx.Node]:
AttributeError: module 'torch' has no attribute 'fx'
```
Using Python3.11 the conversion script runs without any errors.
2025-03-21 10:31:55 +01:00
be9de81171 whisper : add check for CPU backend initialization (#2918)
This commit adds a check for the CPU backend initialization in the
whisper library. If the initialization fails, an exception is thrown.

The motivation for this change is to make the library more robust and
handle the case when the CPU backend initialization fails.

Resolves: https://github.com/ggerganov/whisper.cpp/issues/2917
2025-03-21 09:53:26 +01:00
21fb513ef1 examples : update whisper.objc README.md (#2916)
This commit updates the hisper.objc README.md to reflect the changes of
using the xcframework and the new build process.

Since whisper.cpp is no longer compiled by the example project, instead
the library from the xframework will be used, the build instructions
have been removed.
2025-03-21 09:52:53 +01:00
4e56747944 ci : increase windows-cublas evict-old-files to 5d (#2915)
This commit updates the evict-old-files parameter for the windows-cublas
build job to 5 days.

The motivation for this change is to avoid the full rebuild which takes
around 1.5 hours for the windows-cublas build job. Considering that
there are periods of low traffic on whisper.cpp (like weekends etc.) it
might be better to have a longer eviction policy to avoid the full
rebuild.
2025-03-21 08:19:24 +01:00
ca75449a92 xcframework : add support for CoreML to ios/macOS (#2912)
Some checks failed
CI / ubuntu-22-clang (linux/amd64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / quantize (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / release (push) Has been cancelled
* xcframework : add support for CoreML to ios/macOS

This commit add support for compiling whisper with CoreML support for
iOS and macOS.

The motivation for this change is it will allow users to use a Core ML
model or fall back to a ggml model if Core ML is not available.

With the updated xcframework, I was able to run the whisper.objc example
and successfully load a Core ML model:
```console
whisper_init_state: loading Core ML model from '/Users/danbev/Library/Developer/CoreSimulator/Devices/25E8C27D-0253-4281-AF17-C3F2A4D1D8F4/data/Containers/Bundle/Application/B81F6FF0-BF1A-40DF-AC2A-3908EC4BCC9A/whisper.objc.app/ggml-base.en-encoder.mlmodelc'
whisper_init_state: first run on a device may take a while ...
whisper_init_state: Core ML model loaded
```

* squash! xcframework : add support for CoreML to ios/macOS

Fix grammar in output message.
2025-03-20 18:39:08 +01:00
80dad86b2c examples : add WHISPER_SDL2 check to deprecation executables (#2911)
This commit adds a check for `WHISPER_SDL2` to the deprecation warning
examples. This is to prevent the examples from being built when
WHISPER_SDL2 is not enabled.

The motivation for this is that currently these deprecation executables
are generate and when run they refer the user to examples with other
names, for example `whisper-command` but unless they have built with
`WHISPER_SDL2` those executable will not be present:
```console
$ ls build/bin/
bench  command  main  quantize  stream  whisper-bench  whisper-cli
whisper-server

$ ./build/bin/command

WARNING: The binary 'command' is deprecated.
 Please use 'whisper-command' instead.
 See https://github.com/ggerganov/whisper.cpp/tree/master/examples/deprecation-warning/README.md for more information.
```
2025-03-20 18:36:02 +01:00
485ece6725 ci : use ninja and fix caching for windows-cublas (#2910)
This commit updates the windows-cublas job to use Ninja as the build
system instead of msbuild/msvc.

The motivation for this is that msbuild/mscv does not seem to handle
ccache/sccache well, for example it ignores the
`CMAKE_C_COMPILER_LAUNCHER` etc. variables. But using Ninja as the build
caching works and the build is initially the same speed as it is
currently (without caching) subsequently builds are much faster.

Refs: https://github.com/ggerganov/whisper.cpp/issues/2781
2025-03-20 17:01:48 +01:00
e7d9d8687a examples : update wasm examples to include server.py [no ci] (#2908)
This commit updates the README files for the wasm examples to include
instructions on how to run the examples using the provided server.py
which was included in Commit 6e8242f7fe
("examples : command.wasm updates (#2904)").

The motivation for this is consistency with the command.wasm example.
2025-03-20 09:07:43 +01:00
6e8242f7fe examples : command.wasm updates (#2904)
This commit updates the command.wasm example by adding a server.py script to make it easy to start a local http server to try out the example, updates the build instructions, and also addresses some of the compiler warnings that were being generated.

* emscripten : fix TOTAL_STACK for wasm

This commit moves the TOTAL_STACK setting from the compile flags to the
linker flags. This is because the TOTAL_STACK setting is a linker
setting.

The motivation for this change is that currently the following warnings
are generated when building:
```console
em++: warning: linker setting ignored during compilation: 'TOTAL_STACK' [-Wunused-command-line-argument]
em++: warning: linker setting ignored during compilation: 'TOTAL_STACK' [-Wunused-command-line-argument]
em++: warning: linker setting ignored during compilation: 'TOTAL_STACK' [-Wunused-command-line-argument]
em++: warning: linker setting ignored during compilation: 'TOTAL_STACK' [-Wunused-command-line-argument]
em++: warning: linker setting ignored during compilation: 'TOTAL_STACK' [-Wunused-command-line-argument]
em++: warning: linker setting ignored during compilation: 'TOTAL_STACK' [-Wunused-command-line-argument]
```

* examples : suppress C++17 deprecation warning for std::codecvt_utf8

This commit suppresses the C++17 deprecation warning for
std::codecvt_utf8 similar to what is done in
examples/talk-llama/unicode.cpp.

The motivation for this change is to suppress these warnings:
```console
/Users/danbev/work/ai/whisper-work/examples/common.cpp:251:31: warning: 'codecvt_utf8<wchar_t>' is deprecated [-Wdeprecated-declarations]
  251 |     std::wstring_convert<std::codecvt_utf8<wchar_t>> converter;
      |                               ^
/Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/codecvt:193:28: note: 'codecvt_utf8<wchar_t>' has been explicitly marked deprecated here
  193 | class _LIBCPP_TEMPLATE_VIS _LIBCPP_DEPRECATED_IN_CXX17 codecvt_utf8 : public __codecvt_utf8<_Elem> {
      |                            ^
/Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/__config:723:41: note: expanded from macro '_LIBCPP_DEPRECATED_IN_CXX17'
  723 | #    define _LIBCPP_DEPRECATED_IN_CXX17 _LIBCPP_DEPRECATED
      |                                         ^
/Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/__config:688:49: note: expanded from macro '_LIBCPP_DEPRECATED'
  688 | #      define _LIBCPP_DEPRECATED __attribute__((__deprecated__))
      |                                                 ^
/Users/danbev/work/ai/whisper-work/examples/common.cpp:251:10: warning: 'wstring_convert<std::codecvt_utf8<wchar_t>>' is deprecated [-Wdeprecated-declarations]
  251 |     std::wstring_convert<std::codecvt_utf8<wchar_t>> converter;
      |          ^
/Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/locale:3145:28: note: 'wstring_convert<std::codecvt_utf8<wchar_t>>' has been explicitly marked deprecated here
 3145 | class _LIBCPP_TEMPLATE_VIS _LIBCPP_DEPRECATED_IN_CXX17 wstring_convert {
      |                            ^
/Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/__config:723:41: note: expanded from macro '_LIBCPP_DEPRECATED_IN_CXX17'
  723 | #    define _LIBCPP_DEPRECATED_IN_CXX17 _LIBCPP_DEPRECATED
      |                                         ^
/Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/__config:688:49: note: expanded from macro '_LIBCPP_DEPRECATED'
  688 | #      define _LIBCPP_DEPRECATED __attribute__((__deprecated__))
      |                                                 ^
/Users/danbev/work/ai/whisper-work/examples/common.cpp:257:31: warning: 'codecvt_utf8<wchar_t>' is deprecated [-Wdeprecated-declarations]
  257 |     std::wstring_convert<std::codecvt_utf8<wchar_t>> converter;
      |                               ^
/Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/codecvt:193:28: note: 'codecvt_utf8<wchar_t>' has been explicitly marked deprecated here
  193 | class _LIBCPP_TEMPLATE_VIS _LIBCPP_DEPRECATED_IN_CXX17 codecvt_utf8 : public __codecvt_utf8<_Elem> {
      |                            ^
/Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/__config:723:41: note: expanded from macro '_LIBCPP_DEPRECATED_IN_CXX17'
  723 | #    define _LIBCPP_DEPRECATED_IN_CXX17 _LIBCPP_DEPRECATED
      |                                         ^
/Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/__config:688:49: note: expanded from macro '_LIBCPP_DEPRECATED'
  688 | #      define _LIBCPP_DEPRECATED __attribute__((__deprecated__))
      |                                                 ^
/Users/danbev/work/ai/whisper-work/examples/common.cpp:257:10: warning: 'wstring_convert<std::codecvt_utf8<wchar_t>>' is deprecated [-Wdeprecated-declarations]
  257 |     std::wstring_convert<std::codecvt_utf8<wchar_t>> converter;
      |          ^
/Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/locale:3145:28: note: 'wstring_convert<std::codecvt_utf8<wchar_t>>' has been explicitly marked deprecated here
 3145 | class _LIBCPP_TEMPLATE_VIS _LIBCPP_DEPRECATED_IN_CXX17 wstring_convert {
      |                            ^
/Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/__config:723:41: note: expanded from macro '_LIBCPP_DEPRECATED_IN_CXX17'
  723 | #    define _LIBCPP_DEPRECATED_IN_CXX17 _LIBCPP_DEPRECATED
      |                                         ^
/Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/__config:688:49: note: expanded from macro '_LIBCPP_DEPRECATED'
  688 | #      define _LIBCPP_DEPRECATED __attribute__((__deprecated__))
      |                                                 ^
4 warnings generated.
```

* ggml : suppress double-promotion warning in GGML_F16x4_REDUCE

This commit adds a cast to `ggml_float` in the `GGML_F16x4_REDUCE` macro
to suppress a double-promotion warning.

Currently the following warning is generated when compiling the
command.wasm example:
```console
/whisper-work/ggml/src/ggml-cpu/ggml-cpu.c:1592:5: warning: implicit conversion increases floating-point precision: 'float' to 'ggml_float' (aka 'double') [-Wdouble-promotion]
 1592 |     GGML_F16_VEC_REDUCE(sumf, sum);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/danbev/work/ai/whisper-work/ggml/src/ggml-cpu/ggml-cpu.c:932:37: note: expanded from macro 'GGML_F16_VEC_REDUCE'
  932 | #define GGML_F16_VEC_REDUCE         GGML_F16x4_REDUCE
      |                                     ^
/Users/danbev/work/ai/whisper-work/ggml/src/ggml-cpu/ggml-cpu.c:920:44: note: expanded from macro 'GGML_F16x4_REDUCE'
  918 |     res = wasm_f32x4_extract_lane(x[0], 0) +       \
      |         ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  919 |           wasm_f32x4_extract_lane(x[0], 1) +       \
      |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  920 |           wasm_f32x4_extract_lane(x[0], 2) +       \
      |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~
  921 |           wasm_f32x4_extract_lane(x[0], 3);        \
      |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/whisper-work/ggml/src/ggml-cpu/ggml-cpu.c:1640:9: warning: implicit conversion increases floating-point precision: 'float' to 'ggml_float' (aka 'double') [-Wdouble-promotion]
 1640 |         GGML_F16_VEC_REDUCE(sumf[k], sum[k]);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/danbev/work/ai/whisper-work/ggml/src/ggml-cpu/ggml-cpu.c:932:37: note: expanded from macro 'GGML_F16_VEC_REDUCE'
  932 | #define GGML_F16_VEC_REDUCE         GGML_F16x4_REDUCE
      |                                     ^
/Users/danbev/work/ai/whisper-work/ggml/src/ggml-cpu/ggml-cpu.c:920:44: note: expanded from macro 'GGML_F16x4_REDUCE'
  918 |     res = wasm_f32x4_extract_lane(x[0], 0) +       \
      |         ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  919 |           wasm_f32x4_extract_lane(x[0], 1) +       \
      |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  920 |           wasm_f32x4_extract_lane(x[0], 2) +       \
      |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~
  921 |           wasm_f32x4_extract_lane(x[0], 3);        \
      |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2 warnings generated.
```
wasm_f32x4_extract_lane returns a 32-bit float and this is what the
addition is performed on. But there is an implicit conversion from
32-bit float to 64-bit double when the result is assigned to `res`,
which is of type `ggml_float`. My understanding here is that this is
intentional and adding a cast to `ggml_float` should suppress the
warning.

* emscripten : add -Wno-deprecated to for emscripten

This commit adds -Wno-deprecated to the CMAKE_CXX_FLAGS for emscripten
builds.

The motivation for this is that currently there a number of warnings
generated like the following:
```console
warning: JS library symbol '$print' is deprecated. Please open a bug if you have a continuing need for this symbol [-Wdeprecated]
warning: JS library symbol '$printErr' is deprecated. Please open a bug if you have a continuing need for this symbol [-Wdeprecated]
em++: warning: warnings in JS library compilation [-Wjs-compiler]
em++: warning: linker setting ignored during compilation: 'ENVIRONMENT' [-Wunused-command-line-argument]
warning: JS library symbol '$print' is deprecated. Please open a bug if you have a continuing need for this symbol [-Wdeprecated]
warning: JS library symbol '$printErr' is deprecated. Please open a bug if you have a continuing need for this symbol [-Wdeprecated]
em++: warning: warnings in JS library compilation [-Wjs-compiler]
warning: JS library symbol '$print' is deprecated. Please open a bug if you have a continuing need for this symbol [-Wdeprecated]
warning: JS library symbol '$printErr' is deprecated. Please open a bug if you have a continuing need for this symbol [-Wdeprecated]
em++: warning: warnings in JS library compilation [-Wjs-compiler]
em++: warning: linker setting ignored during compilation: 'ENVIRONMENT' [-Wunused-command-line-argument]
em++: warning: linker setting ignored during compilation: 'ENVIRONMENT' [-Wunused-command-line-argument]
```

The downside of this is that we might miss other deprecation warnings
in the future so I'm not sure if this is acceptable. But it make the
wasm examples cleaner without the warnings.

* examples : fix tautological-compare warning in stb_vorbis.c [no ci]

This commit applies a fix to address a tautological-compare warning
in stb_vorbis.c.

The motivation for this is that currently the following warning is
generated when compiling the commmand-wasm example:
```console
/Users/danbev/work/ai/whisper-work/examples/stb_vorbis.c:1404:75: warning: pointer comparison always evaluates to false [-Wtautological-compare]
 1404 |       if (f->stream_start + loc >= f->stream_end || f->stream_start + loc < f->stream_start) {
      |                                                                           ^
1 warning generated.
```

This fix was taken from an open pull request on the stb repository
that addreses this issue:
https://github.com/nothings/stb/pull/1746

* squash! examples : update command.wasm instructions [no ci]

This commit adds a Python script to serve the the wasm examples build
in the `build-em` directory. Initially I thought that it would be enough
to start a simple python server but I did not notice that there was an
error in the browser console when I did that:
```console
command.js:1 Uncaught (in promise) DataCloneError: Failed to execute 'postMessage' on 'Worker': SharedArrayBuffer transfer requires self.crossOriginIsolated.
    at command.js:1:1206224
    at new Promise (<anonymous>)
    at loadWasmModuleToWorker (command.js:1:1204981)
    at Array.map (<anonymous>)
    at Object.loadWasmModuleToAllWorkers (command.js:1:1206428)
    at command.js:1:1204318
    at callRuntimeCallbacks (command.js:1:1202062)
    at preRun (command.js:1:6136)
    at run (command.js:1:1294094)
    at removeRunDependency (command.js:1:7046)
```
We need a few CORS headers to be set and in order hopefully make this
easy for users a Python script is added to the examples directory.
This should be able to server all the wasm examples provided they have
been built. command.wasm's README.md is updated to reflect this change.

* examples : remove unused functions

This commit removed the unused functions convert_to_utf8 and
convert_to_wstring from examples/common.cpp.

* Revert "examples : fix tautological-compare warning in stb_vorbis.c [no ci]"

This reverts commit 8e3c47d961.

We should not make this change here and instead when the upstream PR is
merged we can sync with it.

Refs: https://github.com/ggerganov/whisper.cpp/issues/2784
2025-03-20 07:02:18 +01:00
e27fd6f0c0 ci : refactor cuda toolkit installation steps (#2902)
Some checks failed
CI / ubuntu-22-clang (linux/amd64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / quantize (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / release (push) Has been cancelled
The commit updates the CUDA tookkit installation steps to use variables
for the CUDA version and the components versions.

The motivation for this change is that the currently the versions for
the components are used in multiple places and it is hard to update
and maintain.
2025-03-19 09:41:14 +01:00
96db0c5a9c go : add Encoder Begin Callback (#2900)
Adding in EncoderBeginCallback to the Context's Process callback.
This optional callback function returns false if computation should
be aborted.

Co-authored-by: Amanda Der Bedrosian <aderbedr@gmail.com>
2025-03-19 09:05:04 +02:00
d2aaffd5d9 ci : add ccache action to windows-cublas job (#2893)
* ci : add ccache action to windows-cublas job

This commit adds the ccache action to the windows-cublas job. This will
allow us to cache the build artifacts and hopefully speed up the build
process.

Refs: https://github.com/ggerganov/whisper.cpp/issues/2781
2025-03-19 04:53:08 +01:00
215990abde whisper : fix compiler warnings in whisper.cpp (#2895)
Some checks failed
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Blocked by required conditions
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
CI / release (push) Blocked by required conditions
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
Bindings Tests (Ruby) / ubuntu-22 (push) Has been cancelled
This commit fixes compiler warnings in whisper.cpp by changing the type
of the loop index variable from int64_t to size_t.

Currently the following warnings are generated by the compiler:
```console
/whisper.cpp/src/whisper.cpp:209:27: warning: comparison of integers of different signs: 'int64_t' (aka 'long long') and 'size_t' (aka 'unsigned long') [-Wsign-compare]
  209 |     for (int64_t i = 0; i < nels; ++i) {
      |                         ~ ^ ~~~~
/whisper.cpp/src/whisper.cpp:219:27: warning: comparison of integers of different signs: 'int64_t' (aka 'long long') and 'size_t' (aka 'unsigned long') [-Wsign-compare]
  219 |     for (int64_t i = 0; i < nels; ++i) {
      |                         ~ ^ ~~~~
```
2025-03-18 13:38:41 +01:00
7e23d8c64a ci : add missing env.branch_name to build.yml (#2896)
This commit adds the missing env.branch_name to the build.yml file.

The motivation for this is that the currently the build is failing
during the release job because the branch_name is not set in the
an invalid tag is being used.
2025-03-18 13:38:21 +01:00
740bf7f6a1 whisper : enable compiler warnings for src (#2891)
Some checks are pending
CI / ubuntu-22-clang (linux/amd64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Blocked by required conditions
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
CI / release (push) Blocked by required conditions
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
* whisper : enable compiler warnings for src

This commit enables compiler warnings for the src directory. Currently
when the WHISPER_ALL_WARNINGS flag is set to ON is only enables warnings
in ggml, by setting GGML_ALL_WARNINGS to ON. This commit adds the same
compiler flags for whisper's src directory.

The motivation for this is to catch potential bugs and issues early on
in the development process.

* squash! whisper : enable compiler warnings for src

Remove GF_C_FLAGS and GF_CXX_FLAGS from add_compile_options.
2025-03-18 05:19:18 +01:00
c8e12f59dd ci : add release job and include xcframework (#2889)
* ci : add release job and include xcframework

This commit adds a release job that uploads the xcframework as an
artifact and creates a release with the xcframework as an asset.

This job can be triggered manually and enables a pre-release tag name to
be specified to that these releases can be distinguished from the
regular releases more easily.

Resolves: https://github.com/ggerganov/whisper.cpp/issues/2886
2025-03-18 05:18:20 +01:00
83b14c357c examples : use xcframework in whisper.objc example (#2882)
Some checks are pending
CI / ubuntu-22-clang (linux/amd64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/amd64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Waiting to run
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
* examples : use xcframework in whisper.objc example

This commit updates the whisper.objc example to use the xcframework.

The motivation for this to be consistent with the swift example and to
also act as a reference for how to use the xcframework in an objc
project.

Resolves: https://github.com/ggerganov/whisper.cpp/issues/2881

* examples : setup audio session viewDidload

This commit adds the setup of the audio session in the viewDidload
method of the ViewController.m file. This is necessary to allow the app
to record audio.

The motivation for this is that without this it was not possible to
caputue audio from the microphone. It was possible to click on the
Capture button but nothing happened after that, and the button was not
marked red indicating that the button could be clicked again to stop
capturing. With this change it is possible to capture audio from the
microphone and get it transcribed.
2025-03-17 13:01:24 +01:00
60b481d881 whisper : add option to use system-installed GGML (#2887)
Some checks are pending
CI / ubuntu-22-clang (linux/amd64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/amd64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Waiting to run
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
2025-03-17 09:54:48 +02:00
4854789751 convert : update convert-h5-to-ggml.py (#2840)
improved handling of missing max_length
2025-03-17 09:41:05 +02:00
e0f3c9d4dd examples : add GGML_USE_CPU=ON flag to whisper.objc (#2880)
Some checks failed
CI / ubuntu-22-clang (linux/amd64, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/amd64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / quantize (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled
This commit adds the GGML_USE_CPU=ON flag to the whisper.objc project in
order to enable the CPU backend for the whisper.objc project.

The motivation for this change is that currently the following error
is generated when running the example:
```console
ggml_backend_buffer_type_t ggml_backend_get_default_buffer_type(ggml_backend_t backend) {
    return ggml_backend_dev_buffer_type(backend->device); <- Thread 1: EXC_BAD_ACCESS (code=1, address=0x70)
}
```
If we inspect the `backend` variable we can see that it is a `nullptr`.
```console
(lldb) p backend
(ggml_backend_t) nullptr
```
When running in a simulator and that automatically means that there will
be no gpu as there is a check for this in the code. But the CPU backend
should still be present.

The objective-c code will compile the whisper sources including the ggml
sources. And if `-DGGMLL_USE_CPU` is not defined then there will be no
CPU backend, and in this particular case of backend at all.

Resolves: https://github.com/ggerganov/whisper.cpp/issues/2870
2025-03-14 15:40:20 +01:00
1f4886b40d ggml-ci: update input env variables to GG_BUILD_ (#2879) 2025-03-14 10:53:29 +02:00
f11de0e73c ggml-ci: add run.sh (#2877)
Some checks are pending
CI / ubuntu-22-clang (linux/amd64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/amd64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Waiting to run
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
2025-03-14 09:29:55 +02:00
d5cc27ee4d examples : add dl to the list of libraries linked (#2875)
* examples : add dl to the list of libraries linked

This commit adds the dynamic linker library to the list of libraries
linked by the examples.

The motivation for this change is that when building the examples on
ubuntu 20.04, which uses GCC 9.4.0, the dynamic linker requires
explicit linking or the following error is generated:
```console
[ 64%] Linking CXX executable ../../bin/whisper-cli
cd /app/whisper.cpp/build/examples/cli && /usr/bin/cmake -E cmake_link_script CMakeFiles/whisper-cli.dir/link.txt --verbose=1
/usr/bin/c++  -O3 -DNDEBUG   CMakeFiles/whisper-cli.dir/cli.cpp.o  -o ../../bin/whisper-cli  -Wl,-rpath,/app/whisper.cpp/build/src:/app/whisper.cpp/build/ggml/src: ../libcommon.a ../../src/libwhisper.so.1.7.4 -pthread ../../ggml/src/libggml.so ../../ggml/src/libggml-cpu.so ../../ggml/src/libggml-base.so
/usr/bin/ld: ../libcommon.a(common-whisper.cpp.o): undefined reference to symbol 'dlclose@@GLIBC_2.2.5'
/usr/bin/ld: /lib/x86_64-linux-gnu/libdl.so.2: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
make[2]: *** [examples/cli/CMakeFiles/whisper-cli.dir/build.make:89: bin/whisper-cli] Error 1
make[2]: Leaving directory '/app/whisper.cpp/build'
make[1]: *** [CMakeFiles/Makefile2:433: examples/cli/CMakeFiles/whisper-cli.dir/all] Error 2
make[1]: Leaving directory '/app/whisper.cpp/build'
make: *** [Makefile:130: all] Error 2
```

Resolves: https://github.com/ggerganov/whisper.cpp/issues/2854
2025-03-14 04:42:20 +01:00
5bb1d58c6a whisper: add xcframework build script (#2873)
Some checks are pending
CI / ubuntu-22-clang (linux/amd64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/amd64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Waiting to run
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
* whisper: add xcframework build script

* added apple validation scripts

* fixed Readme

* validation script fix
2025-03-13 13:56:39 +01:00
7d14005717 objc : fix build, tmp remove GPU support, use C++17
Some checks failed
CI / ubuntu-22-clang (linux/amd64, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/amd64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / quantize (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled
2025-03-08 15:13:01 +02:00
4ffb8e3e4d cmake : fix ggml-config (ggml/0) 2025-03-08 15:13:01 +02:00
1d8d8ae55e sync : ggml 2025-03-08 15:13:01 +02:00
eebf6bc0bd ggml-cpu: faster AVX2 variant for IQ1_M (llama/12216) 2025-03-08 15:13:01 +02:00
dc8f423b40 metal : simplify kernel arguments using a struct (ggml/3229) (llama/12194)
* metal : refactor im2col parameters into a struct

* metal: Change im2col offset types from int32_t to uint64_t to support larger memory offsets

* metal : refactor sum_rows parameters into a struct

* metal : refactor soft_max parameters into a struct

* metal : refactor diag_mask_inf parameters into a struct

* metal : refactor ssm_conv parameters into a struct

* metal : refactor ssm_scan parameters into a struct

* metal : refactor get_rows parameters into a struct

* metal : refactor group_norm parameters into a struct

* metal : refactor conv_transpose_1d parameters into a struct

* metal : refactor upscale parameters into a struct

* metal : refactor pad parameters into a struct

* metal : refactor pad_reflect_1d parameters into a struct

* metal : refactor arange parameters into a struct

* metal : refactor timestep_embedding parameters into a struct

* metal : refactor argsort parameters into a struct

* metal : refactor leaky_relu parameters into a struct

* metal : refactor pool_2d parameters into a struct

* metal : fix trailing whitespace

---------

Co-authored-by: alexju <alexju@tencent.com>
2025-03-08 15:13:01 +02:00
548e7052f1 metal : fix default.metallib build (llama/12224)
This commit updates the custom command to build the default.metallib
file to use the correct path to ../ggml-common.h by using the variable
METALLIB_COMMON.

The motivation for this change is that currently when building and
specifying GGML_METAL_EMBED_LIBRARY=OFF the following error is
generated:
```console
[ 11%] Linking CXX shared library ../../bin/libggml.dylib
[ 11%] Built target ggml
make[2]: *** No rule to make target `ggml/src/ggml-metal/ggml-common.h', needed by `bin/default.metallib'.  Stop.
make[1]: *** [ggml/src/ggml-metal/CMakeFiles/ggml-metal-lib.dir/all] Error 2
```

With the above change the build could progress but there was a follow
on error about not being able to find the ggml-common.h file in
ggml-metal.metal where is was included as a relative path:
```console
[ 11%] Compiling Metal kernels
/Users/danbev/work/llama.cpp/build/bin/ggml-metal.metal:6:10: error: '../ggml-common.h' file not found, did you mean 'ggml-common.h'?
         ^~~~~~~~~~~~~~~~~~
         "ggml-common.h"
1 error generated.
```
Removing the relative path then allowed the build to complete
successfully.
2025-03-08 15:13:01 +02:00
a34cb73dc2 opencl: Noncontiguous norm, rms_norm, disable fp16 for some ops (llama/12217)
* opencl: support noncontiguous `norm`

* opencl: support noncontiguous `rms_norm`

* opencl: disable fp16 for `ADD`, `MUL`, `SCALE`, `RELU`, `GELU`, `SILU`, `CLAMP`
2025-03-08 15:13:01 +02:00
82f9496657 cmake : fix undefined reference errors for std::filesystem in ggml (#12092) (llama/12094)
Signed-off-by: Ray Lee <hburaylee@gmail.com>
Co-authored-by: Ray Lee <hburaylee@gmail.com>
2025-03-08 15:13:01 +02:00
e3c85e75bd CUDA: fix FA logic for PTX 7.0 and CC >= 7.5 (llama/12222) 2025-03-08 15:13:01 +02:00
b9eab73fa2 HIP/CUDA: set the paramerter value in maintain_cuda_graph instead of replaceing it. (llama/12209)
This avoids conflict with internal cuda/hip runtimes memory managment behavior.
2025-03-08 15:13:01 +02:00
76385c8311 opencl : fix buffer alignment (llama/12197)
Fix the following error:

```
ggml-alloc.c:99: not enough space in the buffer
ggml_tallocr_alloc: not enough space in the buffer to allocate blk.17.ffn_down.weight (needed 27525120, available 27521024)
```

which occurs when `ggml_backend_opencl_context::alignment` is larger
than `cl_ptr_base` (hard-coded to `0x1000`).

Also, fix `ggml_backend_opencl_context::alignment` was set to
`CL_DEVICE_MEM_BASE_ADDR_ALIGN` which was treated as bytes but the
value is reported in bits.
2025-03-08 15:13:01 +02:00
442cd1d2e7 opencl : fix ulong kernel args were set from int variables (llama/12174)
... which left garbage bits in the upper half of the kernel args. This
caused segmentation faults when running PoCL.
2025-03-08 15:13:01 +02:00
bc8cb97e02 opencl : fix profile-related errors (llama/12095)
Co-authored-by: ubuntu <ubuntu@localhost.localdomain>
2025-03-08 15:13:01 +02:00
8dcadf736b ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions (llama/12154)
* ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions

* cmake: Add GGML_BMI2 build option

* ggml: enable BMI2 on relevant CPU variants

* ggml-cpu: include BMI2 in backend score

* ggml-cpu: register BMI2 in ggml_backend_cpu_get_features

* ggml-cpu: add __BMI2__ define when using MSVC
2025-03-08 15:13:01 +02:00
93986b61e0 SYCL: Disable f16 Unary OPs as not supported by the kernels (llama/12201) 2025-03-08 15:13:01 +02:00
bd1a9e34c9 ggml : fix GGMLMetalClass ODR (llama/12200)
-- it might happen if ggml is loaded from 2 separate libraries since each one of them will expose the class. This is more of a guard since we want to use only Metal as embedded library and don't care about the other case.
2025-03-08 15:13:01 +02:00
cc03608e78 ggml : ggml_compute_forward_concat() for arbitrary tensor type (ggml/1118)
* ggml_compute_forward_concat() for arbitrary tensor type

* Check that tensors' type match

* ggml-cpu.c: check type of source tensors

* ggml-cpu.c: move tensor type check to ggml_compute_forward_concat()

* ggml.c: check concatenated tensor type

* Remove tensor type check from ggml_compute_forward_concat() in ggml-cpu.c

..., as it was moved to ggml.c.
2025-03-08 15:13:01 +02:00
54a54faee4 vulkan : sync (llama/0)
ggml-ci
2025-03-08 15:13:01 +02:00
96a92ecc4c ggml : portability fixes for VS 2017 (llama/12150)
* Add include files for std::min/max and std::toupper/tolower

* win32: move _USE_MATH_DEFINES before includes to ensure M_PI is defined

* Use GGML_RESTRICT instead of "restrict" keyword everywhere, and use "__restrict" in MSVC plain C mode

* win32: only use __restrict in MSVC if C11/C17 support is not enabled

---------

Co-authored-by: Marcus Groeber <Marcus.Groeber@cerence.com>
2025-03-08 15:13:01 +02:00
edd1d8686a HIP: implement FlashAttention via rocWMMA for CDNA and RDNA3+ (llama/12032)
Adds GGML_HIP_ROCWMMA_FATTN and rocwmma header check
Adds rocWMMA support to fattn-wmma-f16
2025-03-08 15:13:01 +02:00
dc6f4e7c05 ggml : fix kleidiai build (llama/12159)
The libggml API has changed, but this has not been updated.
2025-03-08 15:13:01 +02:00
74c85d154e SYCL: Move CPY kernels to a separate file and add few missing kernels (llama/12133)
* SYCL: refactor and move cpy kernels to a separate file

* Add few missing cpy kernels

* refactor and add debug logs
2025-03-08 15:13:01 +02:00
eb2d8b6ffd ggml-backend : keep paths in native string type when possible (llama/12144) 2025-03-08 15:13:01 +02:00
b442dcd598 CUDA: compress mode option and default to size (llama/12029)
cuda 12.8 added the option to specify stronger compression for binaries, so we now default to "size".
2025-03-08 15:13:01 +02:00
c98681e6d5 ggml : upgrade init_tensor API to return a ggml_status (llama/11854)
* Upgrade init_tensor API to return a ggml_status

To prepare for an 'abort-free' ggml
(ggml not to abort on OOMs but return a OOM status),
as agreeed with Diego in the ggml repo,
upgrade the init_tensor() and view_init() APIs
to return a ggml_status.

* misc fixes

---------

Co-authored-by: slaren <slarengh@gmail.com>
2025-03-08 15:13:01 +02:00
3bab804981 vulkan: add specific MMV kernels for IQ2 and IQ3 quants + optimizations (llama/11595)
* vulkan: implement specialized MMV kernels for IQ2 quantizations

* vulkan: add MMV kernels for IQ3 quants

* vulkan: Increase MMV batch size and unroll IQ LUT setup

* vulkan: fix init_iq_shmem for WG sizes larger than tables

* vulkan: common batch size for all I-quants
2025-03-08 15:13:01 +02:00
c927830a70 CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ (llama/12098) 2025-03-08 15:13:01 +02:00
992b51b3d5 ggml: aarch64: implement SVE kernels for q2_k_q8_k vector dot (llama/12064)
* Added SVE Support for Q2_K Quantized Models

* Use 4-space indentation in the switch cases

* removed comments lines

* Remove the loop Retain the curly bracess for better understanding of code

* Remove the comment like added for q3_k_q8_k kernel

---------

Co-authored-by: vithulep <p.m.vithule1517@gmail.com>
2025-03-08 15:13:01 +02:00
2c882cbe4c CANN: Fix build error with GCC 13 (llama/11990)
Remove unused header file that causes compilation failure on ARM
platform with GCC 13.
2025-03-08 15:13:01 +02:00
Eve
1fbb119b1e vulkan: matmul dequantization improvements (llama/12015)
* faster dequant for old quants

* dont use unpack for iq4_nl

* vec2 unpack for q8
2025-03-08 15:13:01 +02:00
40dea850fd vulkan: improve im2col (llama/11826)
* vulkan: improve im2col performance
2025-03-08 15:13:01 +02:00
8255a830a8 cmake: Fix ggml backend dependencies and installation (llama/11818)
* Fix dependencies between ggml and backends

ggml backends link only to ggml-base and ggml links to all backends.

* Fix installation of ggml backends

Set up GNUInstallDirs before setting the installation directory of ggml backends
2025-03-08 15:13:01 +02:00
a0f76b2da7 vulkan: fix assertion when qy_needs_dequant (llama/12068)
Looks like a copy/paste bug from qx_needs_dequant.
2025-03-08 15:13:01 +02:00
394768c48b ggml-cpu: Fix build with sve (llama/12059)
* ggml-cpu: Fix build with sve

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* ggml-cpu: Remove unused variable in sve q3_k vec dot

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

---------

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2025-03-08 15:13:01 +02:00
846e01b2c0 cuda: unary ops as float + de-duplicate (ggml/1130) 2025-03-08 15:13:01 +02:00
6ac8e6b2ce cuda/vulkan: specify fp32-only support for some operations in supports_op (ggml/1129)
* cuda: restrict SILU_BACK to fp32, since fp16 exceeds the desired test threshold

* vulkan: specify fp32-only support for certain ops (that are now tested for fp16 as well)

* f32 sigmoid in vulkan supports op

* Revert "f32 sigmoid in vulkan supports op"

This reverts commit c6f04b3c19bf4504c2776149c6d8cd84e0b48acb.
2025-03-08 15:13:01 +02:00
60d2ddebdf cuda/cpu: Increase support for fp16 unary operations (ggml/1125)
* Support fp16 unary operations in the CUDA backend

* cpu: increase fp16 support for unary operators in the CPU backend

* cuda: increase fp16 support for unary operators in the CUDA backend

* Add test cases for fp16 unary operators

* metal: update supports_op for unary operators that don't support fp16, to prevent test-backend-ops from failing

* metal: fix PR comments for unary op support after fp16 unary tests
2025-03-08 15:13:01 +02:00
2e180184a8 Told cmake to install ggml-cpp.h as a public header file. (ggml/1126)
It is used by Whisper talk-llama example.

Co-authored-by: Petter Reinholdtsen <pere@debian.org>
2025-03-08 15:13:01 +02:00
ef40950c4a common : more general m_audio_len update logic (#2855)
Co-authored-by: Ivy233 <wangjinrun@uniontech.com>
2025-03-07 10:10:03 +02:00
c774eec709 go : improve model download (#2756)
* Updated models download URL

* Updated list of models available

All of the high efficiency quantized models are rejected when trying to download. They exist on the server. Let's allow them.

* added path prefix for whisper-cli in message to user. The message is misleading if this script is called from another script in a different folder. So the message has to be fixed.

* undid download URL change I made earlier. Fixed filepath.Join(urlPath, model) bug.

* Undid download URL change I made earlier.

Seems that the old URL works but only when provided a model to download. Still doesn't explain why there's a different download URL that also works. Please elucidate in docs.

* Fixed URLForModel Function's bug

filepath.Join is designed for filesystem paths, and it uses backslashes (\) on Windows. URLs, however, require forward slashes (/), so the use of filepath.Join is inappropriate for constructing URLs.

The fmt.Sprintf function ensures that forward slashes are used.

* Fixed URL trailing / double slash bug

Ensure no double slash by trimming trailing '/' from srcUrl if present

* Fixed bad download URL, missing ggml prefix

Not sure if that was a bug I introduced but it was trying to download without the prefix.

* Added question before downloading all models. Added download size estimate

HEAD Requests:
Efficiently fetches file sizes without downloading the content.
Interactive Workflow:
Allows the user to make informed decisions about downloading all models.
Safe Defaults:
Aborts if the user does not explicitly confirm.

* Fixed Unbuffered channel warning.

warning in context.go : misuse of unbuffered os.Signal channel as argument to signal.

The warning indicates that the unbuffered channel used in signal.Notify in context.go may be misused. In Go, unbuffered channels can cause potential deadlocks if signals are sent faster than they are received.

* Fixed download size calculation, download URL prefix bug, added link to models URL for user.

The URL formatter was prepending the model name to the formatted model name in the URL

* Added logs and exes to gitignore

* Delete bindings/go/examples/go-model-download/go-model-download.exe

* Delete whisper_build.log
2025-03-07 10:03:51 +02:00
5b481a27a6 common : fix audio loading by miniaudio (#2862)
Some checks failed
CI / ubuntu-22-clang (linux/amd64, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/amd64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / quantize (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled
2025-03-04 19:05:21 +02:00
fc7b1ee521 fix: missing include common-whisper (#2858)
Some checks failed
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / quantize (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled
Examples Tests / addon_node-ubuntu-22 (16.x) (push) Has been cancelled
Examples Tests / addon_node-ubuntu-22 (18.x) (push) Has been cancelled
2025-03-02 20:55:11 +02:00
c42f67e2d2 ruby : follow audio library change (#2851)
Some checks failed
CI / ubuntu-22-clang (linux/amd64, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/amd64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / quantize (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled
* Enable CPU

* Follow audio lib change
2025-02-28 08:09:02 +02:00
339a1cba5d whisper : support GGML_BACKEND_DL (#2843)
Some checks are pending
CI / ubuntu-22-clang (linux/amd64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/amd64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Waiting to run
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
* whisper : support GGML_BACKEND_DL

* fix DTW crash

* whisper.objc : fix build - add ggml-cpp.h

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-02-27 13:35:07 +01:00
c64f3e8ada common : separate whisper sources (#2846)
Some checks failed
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Waiting to run
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
Examples Tests / addon_node-ubuntu-22 (16.x) (push) Has been cancelled
Examples Tests / addon_node-ubuntu-22 (18.x) (push) Has been cancelled
* common : separate whisper sources

* examples : add chrono

* examples : add more headers
2025-02-27 12:50:32 +02:00
9f83f67221 common : fix build min/max (#2845)
* common : try to fix build

* cont : try another fix
2025-02-27 10:39:13 +02:00
7d3da68f79 examples : use miniaudio for direct decoding flac, mp3, ogg and wav (#2759) 2025-02-27 09:06:54 +02:00
b5d21359c1 stream : stop on ^C when no audio is received (#2822)
Add check for ctrl-c in potentially endless loop while calling audio.get()
to receive sound.

Co-authored-by: Petter Reinholdtsen <pere@debian.org>
2025-02-27 08:59:51 +02:00
17addf7104 sync : ggml 2025-02-27 08:55:36 +02:00
cdaee8b4bd Support pure float16 add/sub/mul/div operations in the CUDA (and CPU) backend (ggml/1121)
* Support float16-to-float16 add/sub/mul/div operations in the CUDA backend

* Add fp16 support for add/sub/mul/div on the CPU backend

* Add test cases for fp16 add/sub/mul/div
2025-02-27 08:55:36 +02:00
4b60ff4f92 metal : copy kernels for quant to F32/F16 conversions (llama/12017)
metal: use dequantize_q templates

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-02-27 08:55:36 +02:00
b43b9d928c opencl: fix for small models (llama/11950)
* opencl: fix small shape gemv, remove unused extensions

* opencl: fix `transpose_16`, `dump_tensor`, enforce subgroup size

* opencl: fix for token length < 4

* opencl: use wave size of 64 for all Adreno GPUs

---------

Co-authored-by: Shawn Gu <quic_shawngu@quicinc.com>
Co-authored-by: Skyler Szot <quic_sszot@quicinc.com>
2025-02-27 08:55:36 +02:00
e3cb412a59 Optimize mul_mat for Q4_0 on Intel GPU (llama/12035)
* opt performance by reorder for Intel GPU

* detect hw type and save opt feature, and print opt feature

* correct name

* support optimize graph once when compute graph, record the opt status in tensor->extra, make CI passed

* add env variable GGML_SYCL_DISABLE_OPT for debug

* use syclex::architecture replace the custom hw define, update the guide for GGML_SYCL_DISABLE_OPT

* add performance data

* mv getrows functions to separeted files

* fix global variables

---------

Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
2025-02-27 08:55:36 +02:00
ac301a7d9b SYCL: Fix GGML_SYCL_DEBUG macro (llama/11995) 2025-02-27 08:55:36 +02:00
82e04e7670 ggml-cpu: Support s390x SIMD Instruction Set (llama/12019)
* ggml: add s390x ARCH_FLAGS for compilation

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: add SIMD for s390x using vector intrinsics

SIMD is activated for:
* ggml_vec_dot_f32
* ggml_vec_dot_f16
* ggml_vec_mad_f32
* ggml_vec_mad_f16
* ggml_vec_mad_f32_unroll
* ggml_vec_scale_f32
* ggml_vec_scale_f16

SIMD is NOT activated for:
* ggml_vec_dot_f16_unroll (pending bugfix)

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: fix missing escape character in GGML_F32x4_REDUCE

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: add temporary patch for GGML_F32_ARR and GGML_F16_ARR

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: fix s390x GGML_F32x4_REDUCE

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: full SIMD activation for F32,F16 s390x

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: add option to disable s390x VXE/VXE2

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: change vecintrin.h include to ggml-cpu-impl

* add __VXE__ and __VXE2__ macros

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* cmake: add s390x target detection for VX/VXE/VXE2

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: move s390x vector intrinsics to ggml-cpu-impl.h

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: s390x Q8_0 SIMD

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: correct documentation for Q8_0

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: s390x reduce code complexity Q8_0

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: s390x bugfix typo Q8_0

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: s390x SIMD activated for Q4_1

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: s390x inline vec_reve

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: s390x SIMD activation for Q4_0

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: add VXE backend feature

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: remove test.py

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: s390x SIMD activation for quantize_row_q8_0

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: s390x SIMD activation for quantize_row_q8_1

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: s390x SIMD activation for iq4_xs

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: bugfix iq4_xs

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: s390x SIMD activation for iq4_nl

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: add float, double, and long vector data type

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: clean up iq4_xs SIMD

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: fix improper use of restrict keyword

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: update warning message for ggml_vec_tbl

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: untested implementation of ggml_vec_dot_iq2_xxs_q8_K

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: update ggml_vec_dot_q4_1_q8_1 to use typedefs

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: switch to restrict for iq4_nl

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: slight dot product speed improvement for q4_1_q8_1

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: s390x SIMD activation for q6_K

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: add missing `_t` to ggml_int8x16x4_t

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: fix missing `_t` for ggml_vec_xl_s8x4

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: fix more missing `_t`

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: add unroll and prefetch to Q8_0

increase of 3.86% for prompt processing and 32.22% for token generation

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: patch Q8_0 to use proper vector sizes

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: optimise Q8_0 dot prod compute kernel further

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: add unroll and prefetch to Q4_1

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: refactor Q6_K variable naming for readability

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: fix Q6_K typos

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: s390x SIMD activation for Q5_K

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: fix wrong char*x16_t naming

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: Q5_K y0 wrong signness

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: fix Q5_K invalid uchar type

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: fix Q5_K invalid uchar type

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: s390x SIMD activation for Q4_K

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: fix Q4_K invalid vector intrinsics

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: simplify ggml_padd_s16 compute kernel

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: correct ggml-cpu vxe wording

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: change ggml_aligned_malloc alignment to 256

256 is the cache line size for s390x platforms

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: resolve pr merge via cherry-pick 225bbbf

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml : fix LoongArch compile error with 128-bit SIMD (llama/11701)

* ggml: resolve pr merge via cherry-pick 4571953

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: cmake remove fork when determining s390x machine type

thank you @ericcurtin

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

---------

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Co-authored-by: Jinyang He <hejinyang@loongson.cn>
Co-authored-by: junchao-zhao <68935141+junchao-loongson@users.noreply.github.com>
2025-02-27 08:55:36 +02:00
38ac47cd4d CUDA: app option to compile without FlashAttention (llama/12025) 2025-02-27 08:55:36 +02:00
2d70cd36d7 CUDA: optimize FA for GQA + large batches (llama/12014) 2025-02-27 08:55:36 +02:00
98dab49b9a cuda: Add Q5_1, Q5_0, Q4_1 and Q4_0 to F32 conversion support. (llama/12000) 2025-02-27 08:55:36 +02:00
b1385e9aa9 CUDA: correct the lowest Maxwell supported by CUDA 12 (llama/11984)
* CUDA: correct the lowest Maxwell supported by CUDA 12

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-02-27 08:55:36 +02:00
48f5e893f5 MUSA: support ARM64 and enable dp4a .etc (llama/11843)
* MUSA:  support ARM64 and enable __dp4a .etc

* fix cross entropy loss op for musa

* update

* add cc info log for musa

* add comment for the MUSA .cc calculation block

---------

Co-authored-by: Bodhi Hu <huaishun.hu@mthreads.com>
2025-02-27 08:55:36 +02:00
dc21871fcb ggml-cpu: Add CPU backend support for KleidiAI library (llama/11390)
* ggml-cpu: Add CPU backend support for KleidiAI library

* Add environmental variable GGML_KLEIDIAI_SME

* Add support for multithread LHS conversion

* Switch kernel selection order to dotprod and i8mm

* updates for review comments

* More updates for review comments

* Reorganize and rename KleidiAI files

* Move ggml-cpu-traits.h to source file

* Update cmake for SME build and add alignment for SME

* Remove append GGML_USE_CPU_KLEIDIAI to the GGML_CDEF_PUBLIC list
2025-02-27 08:55:36 +02:00
64a430bc81 ggml: aarch64: implement SVE kernels for q3_K_q8_K vector dot (llama/11917)
* Added SVE Implementation for Q3_K Kernel in ggml-cpu-quants.c file

* Improved Formating of code in  ggml-cpu-quants.c file

* style : minor fixes

* style : less whitespaces

* style : ptr spaceing

---------

Co-authored-by: vithulep <p.m.vithule1517@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-02-27 08:55:36 +02:00
51a3580c79 CUDA: use async data loading for FlashAttention (llama/11894)
* CUDA: use async data loading for FlashAttention

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-02-27 08:55:36 +02:00
37a21dd43d vulkan: implement several ops relevant for ggml_opt (llama/11769)
* vulkan: support memset_tensor

* vulkan: support GGML_OP_SUM

* vulkan: implement GGML_OP_ARGMAX

* vulkan: implement GGML_OP_SUB

* vulkan: implement GGML_OP_COUNT_EQUAL

* vulkan: implement GGML_OP_OPT_STEP_ADAMW

* vulkan: fix check_results RWKV_WKV6 crash and memory leaks

* vulkan: implement GGML_OP_REPEAT_BACK

* tests: remove invalid test-backend-ops REPEAT_BACK tests

* vulkan: fix COUNT_EQUAL memset using a fillBuffer command
2025-02-27 08:55:36 +02:00
8a22a8b17f vulkan: support multi/vision rope, and noncontiguous rope (llama/11902) 2025-02-27 08:55:36 +02:00
fcbcad0c90 metal : fix the crash caused by the lack of residency set support on Intel Macs. (llama/11904) 2025-02-27 08:55:36 +02:00
4444db7360 metal : optimize dequant q6_K kernel (llama/11892) 2025-02-27 08:55:36 +02:00
a7fc1038ca repo : update links to new url (llama/11886)
* repo : update links to new url

ggml-ci

* cont : more urls

ggml-ci
2025-02-27 08:55:36 +02:00
1689aaf854 vulkan: initial support for IQ1_S and IQ1_M quantizations (llama/11528)
* vulkan: initial support for IQ1_S and IQ1_M quantizations

* vulkan: define MMV kernels for IQ1 quantizations

* devops: increase timeout of Vulkan tests again

* vulkan: simplify ifdef for init_iq_shmem
2025-02-27 08:55:36 +02:00
4b48fe449a opencl: Fix rope and softmax (llama/11833)
* opencl: fix `ROPE`

* opencl: fix `SOFT_MAX`

* Add fp16 variant

* opencl: enforce subgroup size for `soft_max`
2025-02-27 08:55:36 +02:00
47cc043e69 cuda : add ampere to the list of default architectures (llama/11870) 2025-02-27 08:55:36 +02:00
e3d9ffb98b ggml: optimize some vec dot functions for LoongArch ASX (llama/11842)
* Optimize ggml_vec_dot_q3_K_q8_K for LoongArch ASX

* Optimize ggml_vec_dot_q4_K_q8_K for LoongArch ASX

* Optimize ggml_vec_dot_q6_K_q8_K for LoongArch ASX

* Optimize ggml_vec_dot_q5_K_q8_K for LoongArch ASX

* Optimize ggml_vec_dot_q2_K_q8_K for LoongArch ASX

* Optimize mul_sum_i8_pairs_float for LoongArch ASX

* Optimize ggml_vec_dot_iq4_xs_q8_K for LoongArch ASX
2025-02-27 08:55:36 +02:00
Eve
e22d69839d vulkan: linux builds + small subgroup size fixes (llama/11767)
* mm subgroup size

* upload vulkan x86 builds
2025-02-27 08:55:36 +02:00
defe731263 llamafile: use member variable instead of constant for iq4nlt (llama/11780) 2025-02-27 08:55:36 +02:00
4e07957bf9 musa: bump MUSA SDK version to rc3.1.1 (llama/11822)
* musa: Update MUSA SDK version to rc3.1.1

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* musa: Remove workaround in PR #10042

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-02-27 08:55:36 +02:00
d2c5154bb5 ggml-cpu : add chunking support to mul_mat_id (llama/11666)
* ggml-cpu : add chunking support to mul_mat_id

* allocate chunk counter in wdata
parallelize src1 quantization by column to allows parallelization even when there is only one row

* disable for arm

* cleanup

* better way to disable for arm

* fix uninitialized counter when using 1 thread only

* revert test-backend-ops changes
2025-02-27 08:55:36 +02:00
4fac43fe00 ggml : x2 speed for WASM by optimizing SIMD (llama/11453)
* ggml : x2 speed for WASM by optimizing SIMD

* fix bad merging

* rm trailing spaces

* rm redundant clamp

* better quantize_row_q8_K

Co-authored-by: camel-cdr <camel-cdr@protonmail.com>

* remove memset that causes buffer overflow
Co-authored-by: camel-cdr <camel-cdr@protonmail.com>

---------

Co-authored-by: camel-cdr <camel-cdr@protonmail.com>
2025-02-27 08:55:36 +02:00
3be9670f17 HIP: Remove GCN from list of devices that avoid MMQ (llama/11831) 2025-02-27 08:55:36 +02:00
86729fcd6d HIP: Switch to std::vector in rocblas version check (llama/11820) 2025-02-27 08:55:36 +02:00
7fbca6304e cleanup: fix compile warnings associated with gnu_printf (llama/11811) 2025-02-27 08:55:36 +02:00
d597f83e1a ggml : fix multi-threaded clamp_f32 (llama/11824)
* Bug fix for clamp_f32

When using tensors larger than 1d clamp operation does not work due to the restriction of returning if ith is not 0.

* Bug fix for clamp_f32

* Bug fix for clamp_f32
2025-02-27 08:55:36 +02:00
e5edcc6259 ggml-cpu: Fix duplicate MATMUL_INT8 (llama/11817)
Signed-off-by: Weizhao Ouyang <o451686892@gmail.com>
2025-02-27 08:55:36 +02:00
556f773d53 CUDA: fix CUDART_VERSION checks (llama/11821) 2025-02-27 08:55:36 +02:00
91d02de332 Fix #11802: Compile bug - RegQueryValueExA changed to RegQueryValueEx (llama/11803)
* Fix #11802: Compile bug - RegQueryValueExA changed to RegQueryValueEx

* Fix #11802: PR #11803 - keep RegQueryValueExA, remove TEXT macro, description needs to be ANSI string
2025-02-27 08:55:36 +02:00
1b67d72f87 CUDA: use arch list for compatibility check (llama/11775)
* CUDA: use arch list for feature availability check

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-02-27 08:55:36 +02:00
14d7c0368d fix: typos in documentation files (llama/11791)
* Update ggml.c

* Update arg.cpp

* Update speculative.h
2025-02-27 08:55:36 +02:00
db6e19188a vulkan: Make Vulkan optional at runtime (ggml/11493). (llama/11494)
Co-authored-by: Jeff Bolz <jbolz@nvidia.com>
2025-02-27 08:55:36 +02:00
b4b063a5c9 vulkan: add environment variable GGML_VK_PREFER_HOST_MEMORY to avoid VRAM allocation (llama/11592) 2025-02-27 08:55:36 +02:00
930b739e7a vulkan: account for lookup tables when checking shared memory size (llama/11502) 2025-02-27 08:55:36 +02:00
5981352bb5 ggml: Fix data race in ggml threadpool (llama/11736)
After the barrier in last iteration is executed, still the loop termination
condition will be executed. However main thread can destroy the cgraph object
and its nodes already, then another thread will access it, but the thing is already gone.
Also trouble can happen when n_nodes == 0 or abort is called, but I'm not sure if the
prior situation is possible.

Last syncronization should be done after the loop to ensure the cgraph/cplan won't be
accessed after the main thread exits from the function.
2025-02-27 08:55:36 +02:00
7561da244e CUDA: fix min. version for movmatrix (llama/11751) 2025-02-27 08:55:36 +02:00
be83f342fb vulkan: print shared memory size (llama/11719) 2025-02-27 08:55:36 +02:00
fd369871f7 SYCL: remove XMX info from print devices (llama/11712) 2025-02-27 08:55:36 +02:00
bbd8364f5e ggml : optimize and build warning fix for LoongArch (llama/11709)
* ggml : optimize convert f32<->f16 for loongarch_asx

* ggml : optimize loongarch_asx extend i16,i8,u8 to i32,i16

* ggml : Fix warnings when run cpu CI locally on LoongArch
2025-02-27 08:55:36 +02:00
e4102440ef SYCL: Adjust support condition for norm operators (llama/11674)
SYCL does not support non contiguous tensors for norm operations
2025-02-27 08:55:36 +02:00
f8242ec483 ggml : fix LoongArch compile error with 128-bit SIMD (llama/11701) 2025-02-27 08:55:36 +02:00
ef51b4cba4 vulkan: optimize coopmat2 iq2/iq3 callbacks (llama/11521)
* vulkan: optimize coopmat2 iq2/iq3 callbacks

* build: trigger CI on GLSL compute shader changes
2025-02-27 08:55:36 +02:00
6f08b24146 vulkan: initial support for IQ4_XS quantization (llama/11501) 2025-02-27 08:55:36 +02:00
7c165d7fa8 vulkan: use smaller combined allocations to avoid fragmentation (llama/11551) 2025-02-27 08:55:36 +02:00
2f0cf44915 metal : avoid breaking build when metal API predates TARGET_OS_VISION (llama/11690)
Avoids breakage in nix flake build introduced by b0569130c5e9c671152c913d82803b7c2f014ff9
2025-02-27 08:55:36 +02:00
b9c972fd0d metal : adjust support conditions for norm operators (llama/11671)
cont #11659

ggml-ci
2025-02-27 08:55:36 +02:00
01c9aafbfd CUDA: support for mat. mul. with ne03 != ne13 (llama/11656) 2025-02-27 08:55:36 +02:00
bae6bbf487 CUDA: non-contiguous (RMS) norm support (llama/11659)
* CUDA: non-contiguous (RMS) norm support

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-02-27 08:55:36 +02:00
c310272fa0 HIP: force max threads per block to be 1024 (llama/11621)
Some old/vendor forked version of llvm still use 256. Explicitly set it to 1024 to align with upstream llvm.

Signed-off-by: fxzjshm <fxzjshm@163.com>
2025-02-27 08:55:36 +02:00
bd0b55dbe0 metal : use residency set for other platforms (llama/11648) 2025-02-27 08:55:36 +02:00
ba4645db2c rpc: fix known RCE in rpc-server (ggml/1103)
Add bounds checking in `rpc_server::copy_tensor` to prevent out-of-bounds writes
+ Check if  `(uint8_t *)dst->data + ggml_nbytes(src)` remains within the destination buffer’s allocated region.
2025-02-27 08:55:36 +02:00
dfc6ca62f3 stream : add beam size parameter(#2836)
Some checks failed
CI / ubuntu-22-clang (linux/amd64, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/amd64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / quantize (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled
* feat: Add beam size parameter to stream.cpp for beam search configuration

* feat: Add beam size parameter to whisper full params in stream example

* fix: Remove duplicate beam search size assignment in server.cpp
2025-02-25 11:39:33 +02:00
47e14c0529 whisper : restore big endian support (#2816)
* whisper : fix BYTESWAP whitespace

* whisper : make byteswap useable with C++17

* cmake : define WHISPER_BIG_ENDIAN for big-endian targets

* ci : fix (again) arm64 build fails

* docker : attempt fixing arm64 build on ci

* qemu v7.0.0-28

[imported from
https://github.com/ggml-org/llama.cpp
/commit/818a340ea8be55b3706e1772527cb8738e90a8c7
(#11895)]

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-02-25 11:38:13 +02:00
d682e15090 Fixes for Windows (#2790)
Some checks failed
CI / ubuntu-22-clang (linux/amd64, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/amd64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / quantize (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled
Fixes for Windows:

* MSVC default to utf-8 without BOM.
* Console output code page changed to utf-8.

---------

Co-authored-by: Judd <foldl@boxvest.com>
2025-02-06 15:37:21 +08:00
46d07b9c85 cmake : fix compile assumptions for power9/etc (#2777)
Some checks are pending
CI / ubuntu-22-clang (linux/amd64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/amd64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Waiting to run
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
* Add small comment re: VSX to readme

Co-authored-by: midnight <midnight@example.com>
2025-02-05 14:41:10 +02:00
33ea03f131 authors : update
Some checks are pending
CI / ubuntu-22-clang (linux/amd64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/amd64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Waiting to run
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
2025-02-04 13:03:40 +02:00
dbcc669e1a sync : ggml 2025-02-04 13:03:09 +02:00
16245b35e4 cmake: Add ability to pass in GGML_BUILD_NUMBER (ggml/1096)
This makes git as a dependency optional, and is useful in the case where
ggml is built not from git, but from a tarball, or a distribution source
package.

This conditional also affects GGML_BUILD_COMMIT. Nothing seems to be
using it, though, so there doesn't seem much value factor it out, or
even require it.
2025-02-04 13:03:03 +02:00
898c0cb9d1 readme : add maintenance roadmap
Some checks are pending
CI / ubuntu-22-clang (linux/amd64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/amd64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Waiting to run
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
2025-02-04 10:50:10 +02:00
eb9e5032c4 ci : add stalebot 2025-02-04 09:30:20 +02:00
cadfc50eab node : add max_len params in node addon (#2760)
Some checks failed
CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Waiting to run
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
Bindings Tests (Ruby) / ubuntu-22 (push) Has been cancelled
Examples Tests / addon_node-ubuntu-22 (16.x) (push) Has been cancelled
Examples Tests / addon_node-ubuntu-22 (18.x) (push) Has been cancelled
2025-02-03 22:49:06 +02:00
3f91832352 talk-llama : sync llama.cpp 2025-02-03 22:42:26 +02:00
cff8868b5f coreml : always convert to "neuralnetwork" (#2770) 2025-02-03 22:36:32 +02:00
90e3c5fc40 ci : more git 2025-02-03 22:00:57 +02:00
e0f4cef867 ci : install git 2025-02-03 22:00:57 +02:00
234460987e ci : use ubuntu-22.04 instead of ubuntu-latest 2025-02-03 22:00:57 +02:00
b8ab126343 cmake : sync cmake scripts 2025-02-03 22:00:57 +02:00
edc5d9267c sync : ggml 2025-02-03 22:00:57 +02:00
344b98a44f scripts : fix sync paths 2025-02-03 22:00:57 +02:00
dbeb7916b8 CUDA: fix Volta FlashAttention logic (llama/11615) 2025-02-03 22:00:57 +02:00
fad2806352 HIP: fix flash_attn_stream_k_fixup warning (llama/11604) 2025-02-03 22:00:57 +02:00
9906792ec3 CUDA/HIP: add support for selectable warp size to mmv (llama/11519)
CUDA/HIP: add support for selectable warp size to mmv
2025-02-03 22:00:57 +02:00
c49ee07ff4 HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (llama/11601)
This fixes a bug where RDNA1 gpus other than gfx1010 where not handled correctly
2025-02-03 22:00:57 +02:00
f8a831779e CUDA: use mma PTX instructions for FlashAttention (llama/11583)
* CUDA: use mma PTX instructions for FlashAttention

* __shfl_sync workaround for movmatrix

* add __shfl_sync to HIP

Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-02-03 22:00:57 +02:00
85451e3612 ci: use sccache on windows instead of ccache (llama/11545)
* Use sccache on ci for windows

* Detect sccache in cmake
2025-02-03 22:00:57 +02:00
43c744ce8b HIP: require at least HIP 5.5 2025-02-03 22:00:57 +02:00
fc2e44490d HIP: Prepare reduction operators for wave 64 2025-02-03 22:00:57 +02:00
f41fdad200 CUDA/HIP: add warp_size to cuda_device_info 2025-02-03 22:00:57 +02:00
80fa576254 vulkan: implement initial support for IQ2 and IQ3 quantizations (llama/11360)
* vulkan: initial support for IQ3_S

* vulkan: initial support for IQ3_XXS

* vulkan: initial support for IQ2_XXS

* vulkan: initial support for IQ2_XS

* vulkan: optimize Q3_K by removing branches

* vulkan: implement dequantize variants for coopmat2

* vulkan: initial support for IQ2_S

* vulkan: vertically realign code

* port failing dequant callbacks from mul_mm

* Fix array length mismatches

* vulkan: avoid using workgroup size before it is referenced

* tests: increase timeout for Vulkan llvmpipe backend

---------

Co-authored-by: Jeff Bolz <jbolz@nvidia.com>
2025-02-03 22:00:57 +02:00
75e7d0585e vulkan: Catch pipeline creation failure and print an error message (llama/11436)
* vulkan: Catch pipeline creation failure and print an error message

Also, fix some warnings from my on-demand compile change.

* vulkan: fix pipeline creation logging
2025-02-03 22:00:57 +02:00
682a6f5f87 HIP: Supress transformation warning in softmax.cu
loops with bounds not known at compile time can not be unrolled.
when ncols_template == 0, the bounds of the loop are not constexpr, thus llvm cant unroll the loops here.
2025-02-03 22:00:57 +02:00
115716d109 HIP: Only call rocblas_initialize on rocblas versions with the multiple instantation bug (llama/11080)
This disables the workaround on rocblas fixed versions (>=4.0.0) to eliminate the runtime cost and unnecessary VRAM allocation of loading all tensile objects.
2025-02-03 22:00:57 +02:00
b2cfef655b cmake : don't fail on GGML_CPU=OFF (llama/11457) 2025-02-03 22:00:57 +02:00
22e3df0afa SYCL : SOFTMAX F16 mask support and other fixes (llama/11261)
Implemented ggml_sycl_op_soft_max() F16 src1(mask) support for which a pragma deprecation warning was added during #5021.
To do this, had to decouple it from ggml_sycl_op_flatten which always considered src1 to be of fp32 type(many OP functions are dependent on it).

* SYCL: SOFTMAX F16 mask support and other fixes

* test-backend-ops: Add F16 mask test cases
2025-02-03 22:00:57 +02:00
028511d349 AMD: parse the architecture as supplied by gcnArchName (llama/11244)
The value provided by minor doesn't include stepping for AMD, parse the value returned by gcnArchName instead to retrieve an accurate ID.
2025-02-03 22:00:57 +02:00
70c4038842 metal: Handle null returned from MTLCreateSystemDefaultDevice() (llama/11441)
This fixes segmentation fault error when running tests when no metal
devices are available (for example, when not linked with Core Graphics
framework or otherwise).
2025-02-03 22:00:57 +02:00
8639c003a9 metal : use residency sets (llama/11427)
* metal : use residency sets

ggml-ci

* metal : restore commandBufferWithUnretainedReferences calls [no ci]

* metal : release descriptors

ggml-ci

* metal : check env GGML_METAL_NO_RESIDENCY

ggml-ci

* metal : fix build + clean-up

ggml-ci
2025-02-03 22:00:57 +02:00
d5d831da65 cmake: add ggml find package (llama/11369)
* Add initial ggml cmake package

* Add build numbers to ggml find-package

* Expand variables with GGML_ prefix

* Guard against adding to cache variable twice

* Add git to msys2 workflow

* Handle ggml-cpu-* variants

* Link ggml/ggml-base libraries to their targets

* Replace main-cmake-pkg with simple-cmake-pkg

* Interface features require c_std_90

* Fix typo

* Removed unnecessary bracket from status message

* Update examples/simple-cmake-pkg/README.md

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update examples/simple-cmake-pkg/README.md

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-02-03 22:00:57 +02:00
7230a6e1c8 vulkan: compile shaders on-demand (llama/11406)
Reduce first-run startup time and memory consumption.

Should fix #11339.
2025-02-03 22:00:57 +02:00
a160fa0f3a Hip: disable VMM on hip as it seams that it dosent work in some configurations (llama/11420) 2025-02-03 22:00:57 +02:00
0282ad8fd1 hip : Add hipGraph and VMM support to ROCM (llama/11362)
* Add hipGraph support

* Enable VMM on rocm
2025-02-03 22:00:57 +02:00
9e467815d4 CUDA: fix FP16 cuBLAS GEMM (llama/11396) 2025-02-03 22:00:57 +02:00
727891d9bf rocBLAS: Avoid fp32->fp16->fp32 conversion on cdna (llama/11356) 2025-02-03 22:00:57 +02:00
c262dc80e2 CPU/CUDA: fix (GQA) mul mat back, add CUDA support (llama/11380) 2025-02-03 22:00:57 +02:00
30767b4c4e cmake : avoid -march=native when reproducible build is wanted (llama/11366)
See https://reproducible-builds.org/ for why this is good
and https://reproducible-builds.org/specs/source-date-epoch/
for the definition of this variable.

Without this patch, compiling on different machines produced different binaries, which made verification of results difficult.

Fixes: #11317

This patch was done while working on reproducible builds for openSUSE.
2025-02-03 22:00:57 +02:00
16eeb31933 Vulkan-run-test: fix mmq_wg_denoms (llama/11343)
There should be a copy-and-paste error here.

*mmq_wg_denoms should be used together with *warptile_mmq, instead of
wg_denoms.
2025-02-03 22:00:57 +02:00
ba523d5e22 vulkan: sort shaders for more deterministic binary (llama/11315)
Fixes #11306.
2025-02-03 22:00:57 +02:00
3736706139 vulkan: fix diag_mask_inf (llama/11323)
With robustbufferaccess disabled, this shader was showing OOB stores. There
is a bounds check in the code, but the workgrouop dimensions were reversed vs
CUDA and it was running the wrong number of threads. So fix the workgroup
dimensions and disable robustness for this pipeline.
2025-02-03 22:00:57 +02:00
58640aa456 rpc : better caching of the base buffer pointer (llama/11331)
There is no need to use map, just store the base pointer in the buffer
context.
2025-02-03 22:00:57 +02:00
5183a05e56 metal : fix out-of-bounds write (llama/11314)
ggml-ci
2025-02-03 22:00:57 +02:00
0dcada42d4 vulkan: fix coopmat2 validation failures (llama/11284)
mul mat and flash attention shaders were loading f32 types directly into
A/B matrices, which happens to work but is technically invalid usage.
For FA, we can load it as an Accumulator matrix and convert and this
is not in the inner loop and is cheap enough. For mul mat, it's more
efficient to do this conversion in a separate pass and have the input(s)
be f16.

coopmat2 requires SPIR-V 1.6 (related using to LocalSizeId). LocalSizeId
requires maintenance4 be enabled, and SPIR-V 1.6 requires Vulkan 1.3.
2025-02-03 22:00:57 +02:00
d507b4cebe SYCL: Introducing memory host pool (llama/11251)
* Implement host pool for matrix_info

Creating a new memory pool on the host to store memory location for
matrix_info needed to launch gemm_batch from oneMKL/oneMath.
Removing complex support in gemm_batch since it is not used in llama.cpp

* Remove unnecessary headers and cast

* Reorder member variable to avoid warning on initialization

* Formatting

* Remove unused variable

* Address PR review feedback - remove warning

---------

Signed-off-by: nscipione <nicolo.scipione@codeplay.com>
2025-02-03 22:00:57 +02:00
90171055f3 cmake : add sanitizer flags for llama.cpp (llama/11279)
* cmake : add sanitizer flags for llama.cpp

ggml-ci

* tests : fix compile warnings

ggml-ci

* cmake : move sanitizer flags to llama_add_compile_flags

ggml-ci

* cmake : move llama.cpp compile flags to top level lists

ggml-ci

* cmake : apply only sanitizer flags at top level

ggml-ci

* tests : fix gguf context use in same_tensor_data

* gguf-test: tensor data comparison

* dummy : trigger ggml-ci

* unicode : silence gcc warnings

ggml-ci

* ci : use sanitizer builds only in Debug mode

ggml-ci

* cmake : add status messages [no ci]

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-02-03 22:00:57 +02:00
668306ff2b vulkan: fix coopmat2 flash attention for non-contiguous inputs (llama/11281)
Add code similar to mul_mm_cm2 to force alignment of strides, to avoid
a performance regression.

Add noncontiguous FA tests in test-backend-ops.

Fixes #11268.
2025-02-03 22:00:57 +02:00
fdc21fc87b rpc : early register backend devices (llama/11262)
Early register RPC devices and do not propagate RPC specifics in the
llama model structures.

ref: #10609
2025-02-03 22:00:57 +02:00
7183a1eb72 vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl (llama/11166)
* vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl

Shaders are based on cpy.cu.

* vulkan: support copy from q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl to f32

* ggml: copy q->f32 assumes some contiguity in the destination
2025-02-03 22:00:57 +02:00
09f3c66648 vulkan: optimize coopmat2 q4_k/q5_k dequant functions. (llama/11206)
Do masking on whole dwords, fetch all scales at once.
2025-02-03 22:00:57 +02:00
62e2414620 vulkan: optimize coopmat2 q2_k dequant function (llama/11130) 2025-02-03 22:00:57 +02:00
de49024e49 CUDA: backwards pass for misc. ops, add tests (llama/11257)
* CUDA: backwards pass for misc. ops, add tests

* remove restrict from pointers
2025-02-03 22:00:57 +02:00
db6383094c ggml: aarch64: implement SVE kernels for q4_K_q8_K vector dot (llama/11227)
* Add SVE support for q4_K_q8_K

* Update ggml/src/ggml-cpu/ggml-cpu-quants.c

change to use K_SCALE_SIZE

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-02-03 22:00:57 +02:00
Eve
164f13c6a9 vulkan: scale caching for k quants + misc fixes (llama/11081)
* q6_k scale caching

* 16 bit unpack

* q4_k test (slow)

* revert it

* q3_k

* q2_k

* little stuff

* try precalculating products of a and q2_k scales

* Revert "try precalculating products of a and q2_k scales"

This reverts commit 65110b81f23f66331a50c6e889a7c1ab9470a86b.

* unpack should be u16, add vim swap to gitignore (about time)

* better q4_k scales

* q5_k

* better q6_k with separate paths for all threads and partial threads in use, plus some more optimizations

* q2_k better dequant

* q3_k optimizations

* q3_k use hmask simd from cpu avx version

* make the caches happy

* q3_k separate out calculation

* q2_k separate out

* little stuff

* use calc_superblock everywhere

* q2_k optimize scale calculation

* more barriers
2025-02-03 22:00:57 +02:00
02aa86230a fix: ggml: fix vulkan-shaders-gen build (llama/10448)
* fix: ggml: fix vulkan-shaders-gen build

The vulkan-shaders-gen target was not being built correctly
in case of cross-compilation.
Other outputs need to be built for the cross compile target,
but vulkan-shaders-gen needs to be built for the host.

* refactor: ggml: Improve vulkan-shaders-gen toolchain setup

- Add GGML_SHADERS_GEN_TOOLCHAIN CMake option.
- Auto-detect host toolchain if not set.

* refactor: ggml: Improve vulkan-shaders-gen toolchain setup

Use configure_file to generate host_toolchain.cmake from template

* fix: ggml: Fix compile error

Fix compile error not finding vulkan-shaders-gen

* fix: vulkan-shaders-gen build and path handling

Fix build issues with vulkan-shaders-gen:
- Add target dependency for correct build order
- Use CMAKE_HOST_SYSTEM_NAME for executable suffix
- Fix MSVC output directory in host toolchain
- Normalize path handling for cross-compilation

* fix: improve host compiler detection in vulkan shader build

Improve host compiler detection for vulkan shader generation:
- Add NO_CMAKE_FIND_ROOT_PATH to all compiler searches
- Consolidate compiler detection logic
- Fix Windows-specific MSVC detection
- Ensure correct compiler search in cross-compilation

* refactor: Simplify CMake function for detecting host compiler

Simplified the CMake function to improve the process of detecting the host compiler.

* fix: Remove unnecessary Vulkan library linkage in CMakeLists.txt

Since `vulkan-shader-gen.cpp` only requires the `glslc` executable
and not the Vulkan headers or libraries, CMakeLists.txt needs to
be corrected.
(See: ecc93d0558fc3ecb8a5af69d2ece02fae4710ade)

* refactor: Rename host_toolchain.cmake.in

- Rename host_toolchain.cmake.in to cmake/host-toolchain.cmake.in

* refactor: GGML_VULKAN_SHADERS_GEN_TOOLCHAIN

Rename the macro GGML_SHADERS_GEN_TOOLCHAIN to GGML_VULKAN_SHADERS_GEN_TOOLCHAIN
2025-02-03 22:00:57 +02:00
54a2ee648f RoPE: fix back, CUDA support for back + noncont. (llama/11240)
* RoPE: fix back, CUDA support for back + noncont.

* fix comments reg. non-cont. RoPE support [no-ci]
2025-02-03 22:00:57 +02:00
9700cfb0a3 SYCL: Add gated linear attention kernel (llama/11175)
* SYCL: Add Gated Linear attention kernel

* glahpp: add a space at the end of file

* gla: Put the barrier inside the main logic loop
2025-02-03 22:00:57 +02:00
8e0143e205 ggml : add option to not print stack on abort (ggml/1081)
* Add option to not print stack on abort

Add option/envvar to disable stack printing on abort.
Also link some unittests with Threads to fix link errors on
ubuntu/g++11.

* Update ggml/src/ggml.c

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-02-03 22:00:57 +02:00
f12559d590 ggml-cpu : fix ggml_graph_compute_thread did not terminate on abort. (ggml/1065)
some threads kept looping and failed to terminate properly after an abort during CPU execution.

Co-authored-by: issi <issi@gmail.com>
2025-02-03 22:00:57 +02:00
589b40810a ci : dummy commit to trigger CI
Some checks are pending
CI / ubuntu-latest-clang (linux/amd64, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/amd64, Release) (push) Waiting to run
CI / ubuntu-latest-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-latest-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Waiting to run
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
2025-02-03 16:32:48 +02:00
7ffcd05267 ruby : Make context accept initial parameters, API to retrieve a segment and more (#2749)
* Fix type signature for Whisper.log_set

* Use cache file for model when offline

* Extract ruby_whisper_transcribe() into a file

* Extract Whisper::Error

* Use FileList for ext/*.{c,cpp,h}

* Extract Whisper::Segment

* Extract Whisper::Model

* Extract Whisper::Params

* Extract Whisper::Context

* Extract log_callback function

* Write base code in C rather than C++

* Use chdir instead of Dir.chdir in Rakefile

* Define alloc func for Whisper::Model

* Define Whisper::Params' calback and user data reader

* Add test for Whisper::Params.new with keyword arguments

* Make Whisper::Params.new accept keyword arguments

* Update type signatures

* Update README

* Update CLEAN targets

* Fix document comment for Whisper::Params#new_segment_callback=

* Use macro to define params

* Fix dependency of build task

* Set Whisper.finalize_log_callback visibility to private

* Make Whisper::Context#full and full_parallel return self

* Add test for Whisper::Context#full_get_segment

* Add Whisper::Context#full_get_segment

* Update signatures

* Update README

* Fix signature

* Resplace #initialize with .new in signature file [skip ci]

* Fix potential overflow
2025-01-21 09:39:54 +02:00
7a423f1c00 whisper.objc : fix build and CI
Some checks failed
CI / ubuntu-latest-clang (linux/amd64, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/amd64, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / quantize (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled
2025-01-18 12:06:06 +02:00
99b011a9f5 talk-llama : sync llama.cpp
Some checks failed
CI / ubuntu-latest-clang (linux/amd64, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/amd64, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / quantize (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled
2025-01-14 10:38:01 +02:00
19d95f9f9a sync : ggml 2025-01-14 10:38:01 +02:00
d5ef1737d8 GGUF: C++ refactor, backend support, misc fixes (skip) (llama/11030)
ggml-ci
2025-01-14 10:38:01 +02:00
1deb41f0e7 ggml : add opencl backend (skip) (llama/10693)
---------

Co-authored-by: Skyler Szot <quic_sszot@quicinc.com>
Co-authored-by: Shangqing Gu <quic_shawngu@quicinc.com>
Co-authored-by: Alexander Angus <quic_aangus@quicinc.com>
Co-authored-by: Hongqiang Wang <quic_wangh@quicinc.com>
Co-authored-by: Max Krasnyansky <quic_maxk@quicinc.com>
2025-01-14 10:38:01 +02:00
2425caf4fd cuda : CUDA Graph Compute Function Refactor (precursor for performance improvements) (llama/11042)
* Refactor: Moves cuda graph executable update step to separate function.

* Refactor: Moves cuda graph update check to separate function.

* Refactor: Moves cuda graph maintenance (update or adjusting copy parameters) to separate function for improved readability.

* Fix: Adds missing reference to maintain_cuda_graph() definition.

* Refactor: Improves structure and abstractions by moving CUDA graph evaluation and capture to its own function.

* Refactor: Moves node graph checks and copy ops into individual function for improved readability.

* Refactor: Removes code permanently excluded from compilation to increase readability.

* Style: Adds missing newline

* Style: Consolidates several neighboring '#ifdef USE_CUDA_GRAPH' into a single one

* Refactor: Makes 'cuda_graph_update_required' a local variable

* remove double lines between functions

---------

Co-authored-by: slaren <slarengh@gmail.com>
2025-01-14 10:38:01 +02:00
a4b00bcaaf ggml : do not define GGML_USE_CUDA when building with GGML_BACKEND_DL (llama/11211)
Build fails when using HIP and GGML_BACKEND_DL:
```
/usr/bin/ld: ../ggml/src/libggml.so: undefined reference to `ggml_backend_cuda_reg'
collect2: error: ld returned 1 exit status
```
This patch fixes this.
2025-01-14 10:38:01 +02:00
cdb8aa2f2e Vulkan: Fix float16 use on devices without float16 support + fix subgroup_size_control validation error (llama/11161)
* Vulkan: Remove float16 use in shaders

* Fix validation error about subgroup_size_control extension
2025-01-14 10:38:01 +02:00
06209f6683 llama: add support for QRWKV6 model architecture (llama/11001)
llama: add support for QRWKV6 model architecture (llama/11001)

* WIP: Add support for RWKV6Qwen2

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* RWKV: Some graph simplification

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Add support for RWKV6Qwen2 with cpu and cuda GLA

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Fix some typos

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* code format changes

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Fix wkv test & add gla test

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Fix cuda warning

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Update README.md

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Update ggml/src/ggml-cuda/gla.cu

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Fix fused lerp weights loading with RWKV6

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* better sanity check skipping for QRWKV6 in llama-quant

thanks @compilade

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: compilade <git@compilade.net>

---------

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: compilade <git@compilade.net>
2025-01-14 10:38:01 +02:00
c3235bd81e SYCL: Refactor ggml_sycl_compute_forward (llama/11121)
* SYCL: refactor ggml_sycl_compute_forward

* SYCL: add back GGML_USED(dst) to ggml_sycl_cpy

* SYCL: add function name to noop debug

* SYCL: Some device info print refactoring and add details of XMX availability
2025-01-14 10:38:01 +02:00
262d0abc87 fix: add missing msg in static_assert (llama/11143)
Signed-off-by: hydai <z54981220@gmail.com>
2025-01-14 10:38:01 +02:00
124eec1664 llamafile : ppc64le MMA INT8 implementation (llama/10912)
This change upstreams llamafile's cpu matrix
multiplication kernels for ppc64le using MMA
builtins for quantised int8 datatype.

This change results in 10% - 70% improvement
in total speed(ie all tokens/total time), across
various batch sizes.

The patch is tested with Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf models on a
IBM POWER10 machine.

Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
2025-01-14 10:38:01 +02:00
b08c3a88c8 Disable GL_KHR_cooperative_matrix Vulkan extension if not available. (llama/11117)
* Disable GL_KHR_cooperative_matrix Vulkan extension if not available.

* Perform Vulkan extensions checks in a more sensible order

* Remove unnecessary #ifdef directive
2025-01-14 10:38:01 +02:00
0afce25a69 fix: Vulkan shader gen binary path when Cross-compiling (llama/11096)
* fix: Vulkan shader gen binary path when cross compiling
2025-01-14 10:38:01 +02:00
acdbe58631 GGUF: C++ refactor, backend support, misc fixes (llama/11030)
* GGUF: C++ refactor, backend support, misc fixes

remove ggml_tensor.backend

update CODEOWNERS [no ci]

remove gguf_get_data from API

revise GGUF API data types
2025-01-14 10:38:01 +02:00
09fabffdf5 ggml-backend : only offload from host buffers (fix) (llama/11124) 2025-01-14 10:38:01 +02:00
3988d6396b ggml-backend : only offload from host buffers (llama/11120) 2025-01-14 10:38:01 +02:00
c8c63eeec0 rpc : code cleanup (llama/11107)
Remove duplicated macros, use GGML_LOG_ERROR for errors
2025-01-14 10:38:01 +02:00
abf7f24410 SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6 (llama/11087)
* SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6

* Revert "SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6"

This reverts commit f62dc45f318e48d375e7734b34cbddee81deed52.

* Reland: Use get_multi_ptr instead of deprecated get_pointer in wkv6
2025-01-14 10:38:01 +02:00
341f5c28e6 CUDA: add BF16 support (llama/11093)
* CUDA: add BF16 support
2025-01-14 10:38:01 +02:00
5377099524 Vulkan: Add device-specific blacklist for coopmat for the AMD proprietary driver (llama/11074)
* Vulkan: Add device-specific blacklist for coopmat for the AMD proprietary driver

* Add (TM) to AMD name check
2025-01-14 10:38:01 +02:00
dcbb375779 Support for models with non-512-aligned tensors over RPC. (llama/11047)
* Added init tensor calling code

* Added get_alloc_size forwarding

* Cleaned up and improved type/error handling.

* fix: remove trailing whitespaces.

* Cleanup and use GGML error logging functions.

* Handle potentially dangerous edge cases.

* Apply suggestions from code review

Co-authored-by: Diego Devesa <slarengh@gmail.com>

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-01-14 10:38:01 +02:00
4334c71aed fix: Vulkan shader gen binary path (llama/11037) 2025-01-14 10:38:01 +02:00
e875a82473 ggml : allow loading backend with env variable (ggml/1059)
ref: #1058
2025-01-14 10:38:01 +02:00
507e230f1e scripts : sync opencl, gguf 2025-01-14 09:42:16 +02:00
eb68324c86 whisper : fix gpu device selection (#2728)
Some checks are pending
CI / ubuntu-latest-clang (linux/amd64, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/amd64, Release) (push) Waiting to run
CI / ubuntu-latest-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-latest-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Waiting to run
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
2025-01-13 13:11:37 +02:00
e940fbf283 server : fix build (#2718)
Some checks are pending
CI / ubuntu-latest-clang (linux/amd64, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/amd64, Release) (push) Waiting to run
CI / ubuntu-latest-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-latest-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Waiting to run
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
2025-01-13 08:57:33 +02:00
35d0e02c72 talk-llama : sync llama.cpp (#2709) 2025-01-13 08:55:48 +02:00
45d3faf961 server : generate unique tmp filenames (#2718)
#Summary

This Merge Request adds a mechanism to generate unique filenames for FFmpeg conversions in whisper_server.cpp. Previously, a single fixed filename was used (e.g., whisper-server-tmp.wav), which could result in unexpected file overwrites under certain circumstances. By generating a unique filename per request, any risk of overwriting temporary files is eliminated.

#Background / Motivation
	•	Problem: Relying on a static filename for temporary audio files may lead to overwrites if multiple operations occur simultaneously or if the same file name is reused.
	•	Goal: Dynamically generate unique filenames, ensuring each request or operation uses an isolated temporary file.
2025-01-13 08:55:21 +02:00
2ab2eb5110 whisper : add whisper_full_get_segment_no_speech_prob_from_state (#2716)
Some checks failed
CI / ubuntu-latest-clang (linux/amd64, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/amd64, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / quantize (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled
2025-01-09 16:21:07 +02:00
b82d305282 readme : add docker instructions (#2711)
Some checks failed
CI / ubuntu-latest-gcc-arm-v7 (linux/arm/v7, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/amd64, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/amd64, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / quantize (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled
I found the docker instructions to be useful in the README.md and the differences in docker variants such as ffmpeg and cuda support. However, this section was removed in v1.7.4 and I would vote to bring it back.

This is a pull request to add that section back.
2025-01-07 13:20:51 +02:00
885e31368d docs: Fix main -> whisper-cli in download scripts (#2707)
Some checks are pending
CI / ubuntu-latest-clang (linux/amd64, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/amd64, Release) (push) Waiting to run
CI / ubuntu-latest-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-latest-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Waiting to run
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
2025-01-06 15:17:57 +02:00
8a9ad7844d release : v1.7.4 2025-01-06 15:13:48 +02:00
eb874b3a3c ci : cont
Some checks are pending
CI / ubuntu-latest-clang (linux/amd64, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/amd64, Release) (push) Waiting to run
CI / ubuntu-latest-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-latest-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Waiting to run
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run
2025-01-06 10:46:10 +02:00
eb78e3a3f1 ci : fix ubuntu runner names 2025-01-06 09:29:10 +02:00
ece3ff88f6 cli : fix segfault on missing argument (#2700)
Some checks failed
CI / ubuntu-latest-clang (linux/amd64, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/amd64, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / quantize (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled
2025-01-04 10:47:41 +02:00
9366544991 ci : fix arm builds 2025-01-04 10:45:01 +02:00
95583942ed sync : ggml
ggml-ci
2025-01-04 10:45:01 +02:00
2e93cb6a2f ggml : do not install metal source when embed library (ggml/1054) 2025-01-04 10:45:01 +02:00
de5cd60d1c metal : avoid uint (llama/11019) 2025-01-04 10:45:01 +02:00
3fcba3e58b ggml : fixes for AVXVNNI instruction set with MSVC and Clang (llama/11027)
* Fixes for clang AVX VNNI

* enable AVX VNNI and alder lake build for MSVC

* Apply suggestions from code review

---------

Co-authored-by: slaren <slarengh@gmail.com>
2025-01-04 10:45:01 +02:00
cea5f1c52f vulkan: optimize mul_mat for small values of N (llama/10991)
Make the mul_mat_vec shaders support N>1 (as a spec constant, NUM_COLS) where
the batch_strides are overloaded to hold the row strides. Put the loads from the
B matrix in the innermost loop because it should cache better.

Share some code for reducing the result values to memory in mul_mat_vec_base.
2025-01-04 10:45:01 +02:00
2112462db4 vulkan: im2col and matmul optimizations for stable diffusion (llama/10942)
* tests: Add im2col perf tests

* vulkan: optimize im2col, more elements per thread

* vulkan: increase small tile size for NV_coopmat2

* vulkan: change im2col to 512 elements per workgroup
2025-01-04 10:45:01 +02:00
fc84ecd445 vulkan: Use push constant offset to handle misaligned descriptors (llama/10987) 2025-01-04 10:45:01 +02:00
Eve
8de1e99907 vulkan: multi-row k quants (llama/10846)
* multi row k quant shaders!

* better row selection

* more row choices

* readjust row selection

* rm_kq=2 by default
2025-01-04 10:45:01 +02:00
499af9294a examples, ggml : fix GCC compiler warnings (llama/10983)
Warning types fixed (observed under MSYS2 GCC 14.2.0):
* format '%ld' expects argument of type 'long int', but argument has type 'size_t'
* llama.cpp/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp:81:46: warning: missing initializer for member '_STARTUPINFOA::lpDesktop' [-Wmissing-field-initializers]  (emitted for all struct field except first)
2025-01-04 10:45:01 +02:00
bcf937c216 ggml : more perfo with llamafile tinyblas on x86_64 (llama/10714)
* more perfo with llamafile tinyblas on x86_64.

- add bf16 suport
- change dispache strategie (thanks:
https://github.com/ikawrakow/ik_llama.cpp/pull/71 )
- reduce memory bandwidth

simple tinyblas dispache and more cache freindly

* tinyblas dynamic dispaching

* sgemm: add M blocs.

* - git 2.47 use short id of len 9.
- show-progress is not part of GNU Wget2

* remove not stable test
2025-01-04 10:45:01 +02:00
b8d90953d7 ggml : use wstring for backend search paths (llama/10960)
ggml-ci
2025-01-04 10:45:01 +02:00
60a422147b ggml : fix arm enabled features check (llama/10961) 2025-01-04 10:45:01 +02:00
3387415bad ggml : fix const usage in SSE path (llama/10962) 2025-01-04 10:45:01 +02:00
536ca3ec89 ggml : fix run-time on FreeBSD in get_executable_path() (llama/10948) 2025-01-04 10:45:01 +02:00
a4bb983190 vulkan: build fixes for 32b (llama/10927)
* vulkan: build fixes for 32b

Should fix #10923

* vulkan: initialize some buffer/offset variables
2025-01-04 10:45:01 +02:00
39c205f555 vulkan: optimize coopmat2 dequant functions (llama/10855)
Change the code to do 16b loads when possible and extract the appropriate
component late, so the code is effectively decoding a pair of elements and
then selecting one. This can allow more commoning to happen in the compiler
when neighboring elements are loaded.
2025-01-04 10:45:01 +02:00
6d502f33dc ggml-cpu: replace NEON asm with intrinsics in ggml_gemv_q4_0_4x8_q8_0() (llama/10874)
* ggml-cpu: replace NEON asm with intrinsics in ggml_gemv_q4_0_4x8_q8_0()

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* ggml-cpu: format code

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

---------

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-01-04 10:45:01 +02:00
5ea27d089d SYCL: Migrate away from deprecated ggml_tensor->backend (llama/10840)
* Migrate to tensor->buffer for checking backend buffer type: 1

* SYCL: common.cpp try to migrate away from tensor->backend

* SYCL: fix assertions and add proper comments

* SYCL: remove extra space

* SYCL: Add back static to ggml_backend_buffer_is_sycl_split function

* SYCL: Add pragma directive to suppress warning spam

* SYCL: Integrate debug logs with GGML_LOG and other fixes

* Revert "SYCL: Integrate debug logs with GGML_LOG and other fixes"

This reverts commit 2607b7de0f0d2f4f1f690226f86fa861aa39cb97.
Let's keep the current SYCL specific logging mechanism for now

* SYCL: Use GGML_SYCL_DEBUG after reverting

* SYCL: reg_get_proc_address func, update to the current func signature

* SYCL: Refactor SYCL buffer checks in ggml_sycl_cpy_tensor_2d
2025-01-04 10:45:01 +02:00
1462d92588 ggml : add test for SVE and disable when it fails (llama/10906) 2025-01-04 10:45:01 +02:00
7ba1a41f47 ggml: fix arm build with gcc (llama/10895)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-01-04 10:45:01 +02:00
5ea088636f ggml : fix arm build (llama/10890)
* ggml: GGML_NATIVE uses -mcpu=native on ARM

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* ggml: Show detected features with GGML_NATIVE

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* remove msvc support, add GGML_CPU_ARM_ARCH option

* disable llamafile in android example

* march -> mcpu, skip adding feature macros

ggml-ci

---------

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
Co-authored-by: Adrien Gallouët <angt@huggingface.co>
2025-01-04 10:45:01 +02:00
f32ddb3b1c tts : add OuteTTS support (llama/10784)
* server : add "tokens" output

ggml-ci

* server : output embeddings for all tokens when pooling = none

ggml-ci

* server : be explicit about the pooling type in the tests

ggml-ci

* server : do not normalize embeddings when there is no pooling

ggml-ci

* llama : add OuteTTS support (wip)

* wip

* extract features

* first conv

* group norm

* resnet conv

* resnet

* attn

* pos net

* layer norm

* convnext

* head

* hann window

* fix n_embd + remove llama.cpp hacks

* compute hann window

* fft

* spectrum processing

* clean-up

* tts : receive input text and generate codes

* clip : fix new conv name

* tts : minor fix

* tts : add header + minor fixes

ggml-ci

* tts : add matchematical constant

ggml-ci

* tts : fix sampling + cut initial noise

* tts : fixes

* tts : update default samplers

ggml-ci

* tts : text pre-processing

* tts : outetts-voc -> wavtokenizer-dec

* tts : remove hardcoded constants

ggml-ci

* tts : fix tensor shapes

* llama : refactor wavtokenizer tensors

ggml-ci

* cont

ggml-ci

* cont [no ci]

* llama : update WavTokenizer to non-causal attn

* llama : handle no-vocab detokenization

* tts : add Python example for OuteTTS (wip)

* tts : extend python example to generate spectrogram

ggml-ci

* server : fix rebase artifacts

* tts : enable "return_tokens" in Python example

ggml-ci

* tts : minor fixes

* common : support HF download for vocoder
2025-01-04 10:45:01 +02:00
79b75ece03 tests: add tests for GGUF (llama/10830) 2025-01-04 10:45:01 +02:00
6348d73e55 ggml : improve inputs log sched_print_assignments (ggml/1053)
This commit attempts to improve the log message for the inputs of the
splits in the sched_print_assignments function.

The motivation for this change is that currently even if there are no
inputs a colon is displayed at the end of the line, which can make it a
little confusing when reading the output as it could be interpreted as
the line below are inputs when they are in fact nodes. With this change
the colon will only be printed if there actually are inputs.
2025-01-04 10:45:01 +02:00
fb36a1538a readme : fix real-time audio input example build instructions (#2692)
Some checks failed
CI / ubuntu-latest-clang (linux/amd64, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/amd64, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / quantize (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64,linux/arm64 tag:main]) (push) Has been cancelled
2025-01-02 12:05:38 +02:00
c81b8b910b objc : rename ggml-cpu-aarch64.c to .cpp (#2687) 2025-01-02 12:05:09 +02:00
85b60f31d0 docs : replace Core ML with OpenVINO (#2686) 2025-01-02 12:03:02 +02:00
227b5ffa36 make : fix "main" -> "whisper-cli"
Some checks failed
CI / ubuntu-latest-clang (linux/amd64, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/amd64, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / quantize (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64,linux/arm64 tag:main]) (push) Has been cancelled
2024-12-31 11:46:17 +02:00
36a64a253f ci : re-enable Windows cublas build (#2676)
* Enable Windows cublas build

* Re-add v12 cuda
2024-12-31 11:11:42 +02:00
c84b83c370 ruby : Fix of C++ header guard name, model URI support, type signature and more (#2683)
Some checks failed
CI / ubuntu-latest-gcc (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-latest-clang (linux/amd64, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/amd64, Release) (push) Waiting to run
CI / ubuntu-latest-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-latest-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Waiting to run
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64,linux/arm64 tag:main]) (push) Waiting to run
Bindings Tests (Ruby) / ubuntu-latest (push) Has been cancelled
* Add test to make Whisper::Context.new accept URI string

* Add test to make Whisper::Context.new accept URI

* Make Whisper::Context.new accept URI string and URI

* Update README

Revert "Fix argument of rb_undefine_finalizer"

* Fix typos

* Add type signature file

* Assign literarl to const variable

* Load Whisper::Model::URI from Init_whisper

* Simplify .gitignore

* Don't load whisper.so from whisper/model/uri.rb

* Use each_with_object instead of each

* Add Development section to README

* Rename header guard to conform to C++ naming convention
2024-12-30 14:26:35 +02:00
5136fd92c2 examples : handle "main.exe" deprecation 2024-12-30 13:00:18 +02:00
7d55637f0b cli : add --suppress_nst support (#2664)
Some checks failed
CI / ubuntu-latest-gcc (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-latest-gcc (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/amd64, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/amd64, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / quantize (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64,linux/arm64 tag:main]) (push) Has been cancelled
2024-12-24 09:30:07 +02:00
0994506054 cli : add no_speech_thold (#2663) 2024-12-24 09:29:19 +02:00
53c9a3a984 cmake : remove hardcoded install rpath
Some checks are pending
CI / ubuntu-latest-gcc (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-latest-gcc (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-latest-clang (linux/amd64, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/amd64, Release) (push) Waiting to run
CI / ubuntu-latest-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-latest-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Waiting to run
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64,linux/arm64 tag:main]) (push) Waiting to run
2024-12-23 21:22:10 +02:00
ed09075ca0 server : fix help print
Some checks failed
CI / ubuntu-latest-gcc (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-latest-gcc (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/amd64, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/amd64, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / quantize (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64,linux/arm64 tag:main]) (push) Has been cancelled
2024-12-22 15:32:05 +02:00
f07a81aa9f ruby : bug fix on callbacks and no_speech_prob (#2656)
Some checks failed
CI / ubuntu-latest-gcc (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-latest-clang (linux/amd64, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/amd64, Release) (push) Waiting to run
CI / ubuntu-latest-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-latest-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Waiting to run
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64,linux/arm64 tag:main]) (push) Waiting to run
Bindings Tests (Ruby) / ubuntu-latest (push) Has been cancelled
* Don't generate documentation on test

* Move .startup to TestBase class

* Extract new_segment_callback as a function

* Extract progress_callback as a function

* Extract abort_callback as a function

* Extract register_callbacks as a function

* Call callbacks in Whiser::Context#full and #full_parallel

* Fix README

* Care about the cases content-size is nil and TTY is not available

* Add tests for no_speech_prob

* Add Whisper::Context#full_get_segment_no_speech_prob and Whisper::Segment#no_speech_prob
2024-12-21 21:52:06 +02:00
4183517076 server : add no-speech threshold parameter and functionality (#2654) 2024-12-21 17:00:08 +02:00
f4668169a0 whisper : rename suppress_non_speech_tokens to suppress_nst (#2653)
Some checks are pending
CI / ubuntu-latest-gcc (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-latest-gcc (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-latest-clang (linux/amd64, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/amd64, Release) (push) Waiting to run
CI / ubuntu-latest-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-latest-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Waiting to run
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64,linux/arm64 tag:main]) (push) Waiting to run
2024-12-21 12:54:35 +02:00
944ce49439 server : add option to suppress non-speech tokens (#2649)
* The parameter will suppress non-speech tokens like [LAUGH], [SIGH], etc. from the output when enabled.

* add to whisper_params_parse

* add missing param
2024-12-21 12:05:05 +02:00
2e59dced12 whisper : rename binaries + fix install (#2648)
* whisper : rename binaries + fix install

* cont : try to fix ci

* cont : fix emscripten builds
2024-12-21 09:43:49 +02:00
e4e05981d6 ruby : update gem version to v1.3.1
Some checks failed
CI / ubuntu-latest-gcc (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-latest-gcc (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/amd64, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/amd64, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / quantize (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64,linux/arm64 tag:main]) (push) Has been cancelled
2024-12-20 11:53:27 +02:00
3de9deead5 release : v1.7.3
Some checks failed
CI / ubuntu-latest-gcc (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-latest-gcc (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/amd64, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/amd64, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / quantize (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64,linux/arm64 tag:main]) (push) Has been cancelled
2024-12-18 18:12:40 +02:00
47f989f9b3 ci : msys enable SDL2 build (#2635)
Some checks failed
CI / ubuntu-latest-gcc (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-latest-clang (linux/amd64, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/amd64, Release) (push) Waiting to run
CI / ubuntu-latest-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-latest-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Waiting to run
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64,linux/arm64 tag:main]) (push) Waiting to run
Bindings Tests (Ruby) / ubuntu-latest (push) Has been cancelled
2024-12-18 12:52:41 +02:00
acc4e13dee ruby : sync ggml (#2643) 2024-12-18 12:52:16 +02:00
ba6c2a8fd9 android : try to fix build 2024-12-18 12:52:16 +02:00
6576af00d7 files : remove old sources 2024-12-18 12:52:16 +02:00
8ac5db0169 sync : ggml 2024-12-18 12:52:16 +02:00
61edb117a0 talk-llama : sync llama.cpp 2024-12-18 12:52:16 +02:00
eb97b257eb sync : ggml 2024-12-18 12:52:16 +02:00
479499dc0e ggml : update ggml_backend_cpu_device_supports_op (llama/10867)
* ggml : fix cpy op for IQ-quants to use reference impl

ggml-ci

* ggml : disable tests involving i-matrix quantization

* ggml : update ggml_backend_cpu_device_supports_op

ggml-ci
2024-12-18 12:52:16 +02:00
Eve
d420a759c5 vulkan: bugfixes for small subgroup size systems + llvmpipe test (llama/10809)
* ensure mul mat shaders work on systems with subgroup size less than 32

more fixes

add test

* only s_warptile_mmq needs to be run with 32 threads or more
2024-12-18 12:52:16 +02:00
a1ab9b5e91 rwkv6: add wkv6 support for Vulkan backend (llama/10829)
* rwkv_wkv6 vulkan shader

* RWKV_WKV6 Vulkan op tests passed

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Apply code format changes

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* add [[unroll]] and remove unnecessary conditions

* add uma support

* fix erros in EditorConfig Checker

---------

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: Molly Sophia <mollysophia379@gmail.com>
2024-12-18 12:52:16 +02:00
e22d38e4f2 llama : add Qwen2VL support + multimodal RoPE (llama/10361)
* Barebone Qwen2VL LLM convertor

* Add Qwen2VL cli entrypoint

* [WIP] add qwen2vl arch

* Verify m-rope output

* Add vl-rope/2d-rope support for qwen2vl ViT

* update qwen2vl cli tool

* update 5D tensor op workaround

* [WIP] qwen2vl vision model

* make batch and clip utils compatible with qwen2vl

* [WIP] create inference workflow, gguf convert script but fix

* correcting vision-rope behavior, add the missing last layer back to ViT

* add arg parser to qwen2vl_surgery

* replace variable size array with vector

* cuda-gdb cmake preset

* add fp32 mrope, vision rope kernel

* add fp16 support for qwen2vl and m-rope

* add `GGML_ROPE_TYPE_MROPE`, `GGML_ROPE_TYPE_VISION`

* fix rope op mode switching, out dated func args

* update `llama_hparams`

* update to keep up stream changes

* resolve linter, test errors

* add makefile entry, update speical image padding token

* add mrope unit test, fix few compiler warnings

* rename `mrope` related function, params

* minor updates on debug util, bug fixs

* add `m-rope` testcase to `test-backend-ops`

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* fix traililng whitespce

* store `llama_hparams.rope_sections` with fixed size array

* update position id tensor size check in GGML_OP_ROPE

* minor updates

* update `ggml_backend_*_supports_op` of unsupported backends

* remote old `rope_section` compare operator

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-12-18 12:52:16 +02:00
856fbaa92f Introducing experimental OpenCL backend with support for Qualcomm Adreno GPUs (llama/10693)
* [cl][adreno] Add Adreno GPU support

Add new OpenCL backend to support Adreno GPUs

---------

Co-authored-by: Skyler Szot <quic_sszot@quicinc.com>
Co-authored-by: Shangqing Gu <quic_shawngu@quicinc.com>
Co-authored-by: Alexander Angus <quic_aangus@quicinc.com>
Co-authored-by: Hongqiang Wang <quic_wangh@quicinc.com>
Co-authored-by: Max Krasnyansky <quic_maxk@quicinc.com>

* [cl][ci] Add workflow for CL

* [cl][adreno] Fix memory leak for non SMALL_ALLOC path

* opencl: integrate backend dyn.load interface and fix compiler and format warnings

* opencl: remove small-alloc support and fix build errors for non-opencl platforms

* opencl: fixed merge conflict (MUSA added twice in cmake)

* opencl-ci: use RUNNER_TEMP instead of github.workspace

* opencl: fix embed tool invocation with python3

* opencl: CI workflow fixes

* opencl: Clean up small-alloc in CMake files

* opencl: cleanup ggml-opencl2 header file

* opencl: use ulong for offsets and strides in ADD kernel

* opencl: use cl_ulong for all offsets

* opencl: use cl_ulong for sizes and strides

* opencl: use `GGML_LOG_xxx` instead of `fprintf(stderr, ...)`

* opencl: rename backend `opencl2` -> `opencl`

* opencl: rename kernel files `ggml-opencl2` -> `ggml-opencl`

* opencl: make OpenCL required, remove redundant lib and inc directories

* `ggml-base`, `..` and `.` are added by `ggml_add_backend_library`

* opencl: rename backend - funcs, structs, etc `opencl2` -> `opencl`

* opencl: remove copyright marker since main license already covers

* opencl: replace some more OPENCL2 leftovers

* opencl: remove limits on `tensor_extra`

* opencl: use pools for `tensor_extra`

* opencl: fix compiler warnings with GCC and Clang

Still getting the warning about clCreateCmdQueue being obsolete.
Will fix that separately.

* opencl: fail gracefully if opencl devices are not available

Also for unsupported GPUs.

* opencl: fix MSVC builds (string length error)

* opencl: check for various requirements, allow deprecated API

* opencl: update log message for unsupported GPUs

---------

Co-authored-by: Skyler Szot <quic_sszot@quicinc.com>
Co-authored-by: Shangqing Gu <quic_shawngu@quicinc.com>
Co-authored-by: Alexander Angus <quic_aangus@quicinc.com>
Co-authored-by: Hongqiang Wang <quic_wangh@quicinc.com>
Co-authored-by: Max Krasnyansky <quic_maxk@quicinc.com>
2024-12-18 12:52:16 +02:00
2c05efa4b1 Fix crash caused by ggml_backend_load_all when launching on Android Activity (llama/10812)
* Fix crash caused by ggml_backend_load_all when launching on AndroidActivity.

Details:
Calling ggml_backend_load_all during initialization in the AndroidActivity project leads to a crash with the error:
terminating with uncaught exception of type std::__ndk1::__fs::filesystem::filesystem_error: filesystem error: in directory_iterator::directory_iterator(...): Permission denied [./].
This issue occurs because AndroidActivity restricts file access due to sandboxing.

Reproduction:
In the example folder, the LlamaAndroid project can reproduce the crash by calling ggml_backend_load_all first in Java_android_llama_cpp_LLamaAndroid_backend_1init.

* Update ggml/src/ggml-backend-reg.cpp

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>
2024-12-18 12:52:16 +02:00
Eve
c21fb10b28 vulkan: small mul_mat_vec optimizations (llama/10665)
* double the number of rows per workgroup

* Update ggml-vulkan.cpp

* Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats

* only increase the number of rows for amd and subgroup size 64

* fix missing NUM_ROWS for mul_mat_vec_iq4_nl_f16_f32, untested

* use subgroup min and max to check for gcn (requires https://github.com/ggerganov/llama.cpp/pull/10721)

* manual merge ggml-vulkan.cpp

* set min and max subgroup size in any case

* Also double the number of rows for Intel GPUs
2024-12-18 12:52:16 +02:00
26c9fd0cdc SYCL: Reduce most of the compiler warnings (llama/10748)
* Try to reduce some unused and typecast warnings

* Reduce compiler warnings step 2

* add a newline at the end of the file

* Initialize nreduce as size_t

* [SYCL] Remove pragma directives from mmq.cpp

* SYCL: mmq add condition to prevent blocks_per_tile_x_row variable from becoming 0

* SYCL softmax: Initialize nreduce as size_t

* ggml-sycl.cpp: fix some trailing whitespaces

* SYCL: remove the unused variables instead of commenting it out

* SYCL poo2d kernel: set NAN for invalid pooling op

* SYCL gemm.hpp: remove pragma directives

* SYCL gemm.hpp: use const cast to properly support dnnl::memory

* SYCL: wkv6 remove a comment

* SYCL: clean comments step 2

* SYCL: clean comments and variables step 3

* SYCL: Use GGML_UNUSED for unused variables

* SYCL: remove extra empty lines and a comment

* Remove TODO

* cleanup spaces

* add a stdout for unsupported op

* use sycl printf over fprintf

* remove prints for CI

* SYCL ggml-sycl: pool2D use sycl::nan and remove if-else block

---------

Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
2024-12-18 12:52:16 +02:00
e6eed605cf ggml : Fix compilation issues on ARM platform when building without fp16 (llama/10811) 2024-12-18 12:52:16 +02:00
abe3102cb7 CUDA: faster non-contiguous concat (llama/10760)
* faster uncontiguous concat

* Use a lambda to avoid code duplication

Co-authored-by: Diego Devesa <slarengh@gmail.com>

* Update ggml/src/ggml-cuda/concat.cu

* add constexpr  and static assert

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>
2024-12-18 12:52:16 +02:00
1193e494a9 remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (llama/10797)
other windows build fixes
2024-12-18 12:52:16 +02:00
e5e951672e Vulkan: Use improved q4_k and q5_k dequant code in dequant shaders (llama/10798) 2024-12-18 12:52:16 +02:00
0e24559ad9 Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats (llama/10721)
* Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats

* Fix subgroup size control extension support check

Add accf32 and accf16 checks for coopmats

* Also disable coopmats on amdvlk
2024-12-18 12:52:16 +02:00
527ac800cf ggml: load all backends from a user-provided search path (llama/10699)
* feat: load all backends from a user-provided search path

* fix: Windows search path

* refactor: rename `ggml_backend_load_all_in_search_path` to `ggml_backend_load_all_from_path`

* refactor: rename `search_path` to `dir_path`

* fix: change `NULL` to `nullptr`

Co-authored-by: Diego Devesa <slarengh@gmail.com>

* fix: change `NULL` to `nullptr`

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>
2024-12-18 12:52:16 +02:00
479bd77169 vulkan: request round-to-even for fp16 in im2col/rope_head (llama/10767)
Vulkan doesn't mandate a specific rounding mode, but the shader_float_controls
feature allows rounding mode to be requested if the implementation supports it.
2024-12-18 12:52:16 +02:00
Eve
d8bf63a41b vulkan: dynamic subgroup size for the remaining k quants (llama/10745)
* q5_k

q4_k

q3_k

q2_k

q6_k multi row example

* revert as multi row isnt faster for k quants
2024-12-18 12:52:16 +02:00
b82c8d76dc CUDA: rename macros to avoid conflicts with WinAPI (llama/10736)
* Renames NVIDIA GPU-architecture flags to avoid name clashes with WinAPI. (e.g. CC_PASCAL, GPU architecture or WinAPI pascal compiler flag?)

* Reverts erroneous rename in SYCL-code.

* Renames GGML_CUDA_MIN_CC_DP4A to GGML_CUDA_CC_DP4A.

* Renames the rest of the compute capability macros for consistency.
2024-12-18 12:52:16 +02:00
86346f811e vulkan: disable spirv-opt for coopmat shaders (llama/10763)
There are some bugs in the 1.3.296 SDK, so disable this. It isn't strictly
necessary anyway.

Add missing dependency on vulkan-shaders-gen, so shaders get recompiled when it
changes.

Fix coopmat support reporting when glslc doesn't support NV_coopmat2.
2024-12-18 12:52:16 +02:00
c635f40a34 ggml : remove return from ggml_gallocr_allocate_node (ggml/1048)
This commit removes the return statement from ggml_gallocr_allocate_node
function.

The motivation behind this change is to make the code more readable and
consistent.
2024-12-18 12:52:16 +02:00
e0be0de1ee ggml : add check for grad_accs (ggml/1046)
* ggml : add check for grad_accs

This commit adds a check for grad_accs in ggml_graph_get_grad and
ggml_graph_get_grad_acc functions. This is necessary to avoid segfaults
when grad_accs is not initialized.

The motivation for this change is that I find it nice to be able to
print out a computation graph using ggml_graph_print but this function
segfaults when grad_accs is not initialized:
```console
(gdb) p g1
$2 = (ggml_cgraph *) 0x7ffff66004b0
(gdb) p *g1
$3 = {size = 2048, n_nodes = 1, n_leafs = 2, nodes = 0x7ffff6600500,
grads = 0x0, grad_accs = 0x0, leafs = 0x7ffff6604500,
visited_hash_set = {size = 4099, used = 0x7ffff6610518,
keys = 0x7ffff6608500}, order = GGML_CGRAPH_EVAL_ORDER_LEFT_TO_RIGHT}
(gdb) p ggml_graph_print(g1)
=== GRAPH ===
n_nodes = 1

Program received signal SIGSEGV, Segmentation fault.
0x0000555555579775 in ggml_graph_get_grad
(cgraph=0x7ffff66004b0,node=0x7ffff6600340)
    at /ggml/ggml/src/ggml.c:5990
5990  return igrad != GGML_HASHSET_FULL &&
          ggml_bitset_get(cgraph->visited_hash_set.used, igrad) ?
          cgraph->grads[igrad] : NULL;
```

* squash! ggml : add check for grad_accs

Fix the check in ggml_graph_get_grad. The check was incorrectly using
cgraph->grad_accs instead of cgraph->grads.
2024-12-18 12:52:16 +02:00
60dc6d003f common : remove old types
ggml-ci
2024-12-18 12:52:16 +02:00
eb27e0d834 CUDA: fix shared memory access condition for mmv (llama/10740) 2024-12-18 12:52:16 +02:00
a682fdce0c vulkan: fix compile warnings (llama/10731) 2024-12-18 12:52:16 +02:00
9ffbd3d969 Vulkan: fix NaN in tanh.comp with AMD proprietary driver on Windows (llama/10723)
* Vulkan: fix NaN in tanh.comp

* Faster NaN-free tanh
2024-12-18 12:52:16 +02:00
6585a890b4 vulkan: compile a test shader in cmake to check for coopmat2 support (llama/10713) 2024-12-18 12:52:16 +02:00
d0a050b51f ggml : disable iq4_nl interleave size 8 (llama/10709)
ggml-ci
2024-12-18 12:52:16 +02:00
e990d1b791 ggml : refactor online repacking (llama/10446)
* rename ggml-cpu-aarch64.c to .cpp

* reformat extra cpu backend.

- clean Q4_0_N_M and IQ4_0_N_M
  - remove from "file" tensor type
  - allow only with dynamic repack

- extract cpu extra bufts and convert to C++
  - hbm
  - "aarch64"

- more generic use of extra buffer
  - generalise extra_supports_op
  - new API for "cpu-accel":
     - amx
     - aarch64

* clang-format

* Clean Q4_0_N_M ref

Enable restrict on C++

* add op GGML_OP_MUL_MAT_ID for Q4_0_N_M with runtime repack

* added/corrected control on tensor size for Q4 repacking.

* Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* add debug logs on repacks.

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-12-18 12:52:16 +02:00
4a6d52efe6 Vulkan: VK_KHR_cooperative_matrix support to speed up prompt processing (llama/10597)
* Vulkan: Implement VK_KHR_cooperative_matrix support in the matrix matrix multiplication shader

* Improve performance with better q4_k and q5_k dequant and store unrolling

* Add Vulkan MUL_MAT and MUL_MAT_ID accumulator precision selection

* Rework mulmat shader selection and compilation logic, avoid compiling shaders that won't get used by device

* Vulkan: Implement accumulator switch for specific mul mat mat shaders

* Vulkan: Unroll more loops for more mul mat mat performance

* Vulkan: Add VK_AMD_shader_core_properties2 support to read Compute Unit count for split_k logic

* Disable coopmat support on AMD proprietary driver

* Remove redundant checks

* Add environment variable GGML_VK_DISABLE_COOPMAT to disable VK_KHR_cooperative_matrix support

* Fix rebase typo

* Fix coopmat2 MUL_MAT_ID pipeline selection
2024-12-18 12:52:16 +02:00
8b841d430a metal : Extend how Llama.cpp locates metal resources (llama/10676)
* metal : Extend how Llama.cpp locates metal resources (llama/10675)

  * It searches the resource file in the directory where the current
    binary is located as well.
  * Resolves symbolic links.

Rationale:

When we plug this dependency into a Bazel build and run it in the
context of Bazel (e.g. testing):

  * the execution directory is often very different from where the files
    are located and no direct control over this (Bazel sandboxing),
  * the Bazel sandbox often use symbolic links to make files available.

With this patch, we can have the resource file added to the target,
can build and run tests in the context of Bazel.

* Update ggml/src/ggml-metal/ggml-metal.m

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update ggml/src/ggml-metal/ggml-metal.m

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-12-18 12:52:16 +02:00
b74b68212a vulkan: Add VK_NV_cooperative_matrix2 support for mul_mat and flash attention (llama/10206) 2024-12-18 12:52:16 +02:00
3a27b2b91b ruby : Add no_speech_thold (#2641)
* Remove Whisper::Model.[]

* Fix Whisper::Model::URI#request

* Make Whisper::Context#initialize accept pre-converted model name

* Use downloading pre-converted model feature for testing

* Update README

* Remove unnecessary task

* Move whisper/model.rb -> whisper/model/uri.rb

* Update document comment of Whisper::Context#initialize

* Don't show download progress when not tty

* Pass String to raise

* Use cache model file if download fails

* Add test for auto download

* Specify required Ruby version

* Fix a typo

* Remove unnecessary flags

* Initialize Whisper::Params#diarize explicitely

* Remove redundant code from README for simplicity

* Add Whisper::Params#no_speech_thold attribute

* Add test for Whisper::Params#no_speech_thold
2024-12-18 11:00:50 +02:00
d34445e960 stream : improve consistency in README (#2642) 2024-12-18 08:43:48 +02:00
f897eb7670 whisper : support no_speech_thold (#2625)
Some checks are pending
CI / ubuntu-latest-gcc (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-latest-gcc (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-latest-clang (linux/amd64, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/amd64, Release) (push) Waiting to run
CI / ubuntu-latest-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-latest-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Waiting to run
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64,linux/arm64 tag:main]) (push) Waiting to run
* Implement no_speech_thold

no_speech_thold functionality is on par with OpenAI's whisper

* Addressed review comments
2024-12-17 19:15:47 +02:00
2f2841bfce whisper : add single-timestamp logic (#2629)
* Fix hallucinations during silence

When the predicted tokens end with a single timestamp the the entire 30 segment should be considered as done, to avoid hallucinations for the remaining part of segment.
This behaviour is on par with openai's whisper. Refer to logic related to `single_timestamp_ending` in https://github.com/openai/whisper/blob/main/whisper/transcribe.py

* Accept review comments related to formatting.

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-12-17 19:07:08 +02:00
09a1b61218 readme : fix typo (#2637) 2024-12-17 19:05:35 +02:00
94e7da1ff2 cmake : fix "amd64" processor string (#2638) 2024-12-17 18:34:32 +02:00
c4aed6831e vulkan : fix soft_max.comp division by zero (#2633)
Some checks failed
CI / ubuntu-latest-gcc (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-latest-gcc (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/amd64, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/amd64, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / quantize (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64,linux/arm64 tag:main]) (push) Has been cancelled
This change prevents a division by zero error when p.KY is 0.
2024-12-16 12:34:38 +02:00
199579652e common : add cstdio header 2024-12-16 08:57:04 +02:00
d17e7139d8 stream : update build instructions
Some checks are pending
CI / ubuntu-latest-gcc (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-latest-gcc (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-latest-clang (linux/amd64, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/amd64, Release) (push) Waiting to run
CI / ubuntu-latest-clang (linux/arm64, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/arm64, Release) (push) Waiting to run
CI / ubuntu-latest-clang (linux/ppc64le, Debug) (push) Waiting to run
CI / ubuntu-latest-clang (linux/ppc64le, Release) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run
CI / ubuntu-latest-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run
CI / emscripten (Release) (push) Waiting to run
CI / ios-xcode-build (Release) (push) Waiting to run
CI / android (push) Waiting to run
CI / quantize (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64,linux/arm64 tag:main]) (push) Waiting to run
2024-12-15 21:55:36 +02:00
6a52eaea74 android : fix build and ci (#2624)
Some checks failed
CI / ubuntu-latest-gcc (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-latest-gcc (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/amd64, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/amd64, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / quantize (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64,linux/arm64 tag:main]) (push) Has been cancelled
* Adding missing CMakeLists.txt include for ggm-cpu needed by whisper.android

* attempt to re-enable CI for JNI android

---------

Co-authored-by: Your Name <you@example.com>
2024-12-14 17:25:53 +02:00
6aa1d7b892 models : fix typo in download-ggml-model.sh (#2623)
Some checks failed
CI / ubuntu-latest-gcc (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-latest-gcc (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-latest-gcc (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/amd64, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/amd64, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-latest-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-latest-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / quantize (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64,linux/arm64 tag:main]) (push) Has been cancelled
Introduced in #2589
2024-12-12 18:02:00 +02:00
262e865a70 ruby : Sync whisper.cpp and model download feature (#2617)
* Use C++17

* Add test for Pathname of model

* Make Whisper::Context#initialize accept Pathname

* Add shorthand for pre-converted models

* Update documents

* Add headings to API section in README [skip ci]

* Remove unused function

* Don't care about no longer included file

* Cosmetic fix

* Use conditional get when get model files
2024-12-09 13:17:50 +02:00
ed733e85a1 scripts : update to new build system 2024-12-09 11:30:16 +02:00
5980b1ae77 devops : add cmake 2024-12-08 23:09:26 +02:00
0415a66044 devops : update make commands 2024-12-08 23:07:29 +02:00
7d134e3737 ggml : remove old files (skip) (#0) 2024-12-08 23:04:26 +02:00
9df53b357e ggml : sync remnants (skip) (#0) 2024-12-08 22:48:25 +02:00
b2115b4d9b scripts : remove amx from sync 2024-12-08 22:48:14 +02:00
0164427dd5 ci : disable freeBSD builds [no ci] 2024-12-08 20:14:35 +02:00
627b11c78a readme : update build instructions 2024-12-08 20:14:35 +02:00
472464453d ci : disable CUDA and Android builds 2024-12-08 20:14:35 +02:00
11dddfbc9e ci : disable Obj-C build + fixes 2024-12-08 20:14:35 +02:00
384e214cc7 make : shim cmake 2024-12-08 20:14:35 +02:00
f2c680f893 talk-llama : sync llama.cpp 2024-12-08 20:14:35 +02:00
fbe66da0e5 sync : ggml 2024-12-08 20:14:35 +02:00
a815940e0e ggml : add predefined list of CPU backend variants to build (llama/10626)
* ggml : add predefined list of CPU backend variants to build

* update CPU dockerfiles
2024-12-08 20:14:35 +02:00
904e307bce ggml-cpu : fix HWCAP2_I8MM value (llama/10646) 2024-12-08 20:14:35 +02:00
491ec076b4 vulkan: Implement "fast divide" (mul+shift) for unary ops like copy (llama/10642) 2024-12-08 20:14:35 +02:00
966433fdf2 SYCL : Move to compile time oneMKL interface backend selection for NVIDIA backend (llama/10584)
* [SYCL] Move to Compile Time backend selection on oneMKL Interface for NVIDIA backend

Move to compile time selection to backend to avoid latency at run time.
Add it to all mkl gemm calls and only for NVIDIA backend.

Signed-off-by: nscipione <nicolo.scipione@codeplay.com>

* Formatting

* Address PR comments to increase readibility

---------

Signed-off-by: nscipione <nicolo.scipione@codeplay.com>
2024-12-08 20:14:35 +02:00
6f1ba9d82d Avoid using __fp16 on ARM with old nvcc (llama/10616) 2024-12-08 20:14:35 +02:00
015ecd0001 vulkan: optimize and reenable split_k (llama/10637)
Use vector loads when possible in mul_mat_split_k_reduce. Use split_k
when there aren't enough workgroups to fill the shaders.
2024-12-08 20:14:35 +02:00
PAB
b7c64a4352 ggml: add GGML_SET Metal kernel + i32 CPU kernel (ggml/1037)
* implemented cpu kernel

* add i32 test cases in test-backend-ops

* typedef `ggml_metal_kargs_set`

* implemented `kernel_set`

* memcpy
2024-12-08 20:14:35 +02:00
PAB
7895d39508 ggml : add GGML_PAD_REFLECT_1D operation (ggml/1034)
* ggml_pad_reflect_1d defined in header

* implemented on CPU

* called the forward pass

* impl Metal kernel

* added Metal kernel

* added OP_PAD_REFLECT_1D in test-backend-ops.cpp

* add test-pad-reflect-1d test case

* test case support multiple backend
2024-12-08 20:14:35 +02:00
22616f00f9 files : remove make artifacts 2024-12-08 20:14:35 +02:00
02c6fcbc2c common : fix compile warning
ggml-ci
2024-12-08 20:14:35 +02:00
3daeacad24 ggml : move AMX to the CPU backend (llama/10570)
ggml : automatic selection of best CPU backend (llama/10606)
2024-12-08 20:14:35 +02:00
4d73962da4 metal : small-batch mat-mul kernels (llama/10581)
* metal : small-batch mat-mul kernels

ggml-ci

* metal : add rest of types

ggml-ci

* metal : final adjustments

ggml-ci

* metal : add comments

ggml-ci
2024-12-08 20:14:35 +02:00
068812650e SYCL: Fix and switch to GGML_LOG system instead of fprintf (llama/10579)
* Switched to GGML_LOG

* Fix missing semicolon
2024-12-08 20:14:35 +02:00
4b7e059e15 ggml-cpu: replace AArch64 NEON assembly with intrinsics in ggml_gemv_q4_0_4x4_q8_0() (llama/10567)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2024-12-08 20:14:35 +02:00
Eve
30e35d7271 vulkan: Dynamic subgroup size support for Q6_K mat_vec (llama/10536)
* subgroup 64 version with subgroup add. 15% faster

scalable version

tested for subgroup sizes 16-128

* check for subgroup multiple of 16 and greater than 16

* subgroup sizes are always a power of 2 (https://github.com/KhronosGroup/GLSL/issues/45)

* force 16 sequential threads per block

* make 16 subgroup size a constant
2024-12-08 20:14:35 +02:00
3623bd58f2 ggml : fix I8MM Q4_1 scaling factor conversion (llama/10562)
ggml-ci
2024-12-08 20:14:35 +02:00
cb847c20a7 ggml-cpu: fix typo in gemv/gemm iq4_nl_4_4 (llama/10580) 2024-12-08 20:14:35 +02:00
964b154a2a sycl : offload of get_rows set to 0 (llama/10432) 2024-12-08 20:14:35 +02:00
d7c2a04bce sycl : Reroute permuted mul_mats through oneMKL (llama/10408)
This PR fixes the failing MUL_MAT tests for the sycl backend.
2024-12-08 20:14:35 +02:00
2bb4ca9cba CANN: RoPE operator optimization (llama/10563)
* [cann] RoPE operator optimization

* [CANN]Code Formatting

---------

Co-authored-by: noemotiovon <noemotiovon@gmail.com>
2024-12-08 20:14:35 +02:00
a753a82462 vulkan: get the first command buffer submitted sooner (llama/10499)
This is an incremental improvement over #9118 to get work to the GPU a bit
sooner. The first part is to start with a smaller number of nodes before
the first submit, and ramp it up to the current 100 nodes/submit. The
second part is to reduce the dryrun overhead for all the nodes that just
need to request descriptor space.

With these changes I get around 1-2% speedup on RTX 4070 combined with my
old Haswell-era CPU.
2024-12-08 20:14:35 +02:00
276b08d8f0 ggml : remove redundant copyright notice + update authors 2024-12-08 20:14:35 +02:00
4ca1e72fe0 ggml : fix row condition for i8mm kernels (llama/10561)
ggml-ci
2024-12-08 20:14:35 +02:00
16a66f103f cmake : fix ARM feature detection (llama/10543)
ggml-ci
2024-12-08 20:14:35 +02:00
330273901f ggml-cpu: support IQ4_NL_4_4 by runtime repack (llama/10541)
* ggml-cpu: support IQ4_NL_4_4 by runtime repack

* ggml-cpu: add __ARM_FEATURE_DOTPROD guard
2024-12-08 20:14:35 +02:00
42099a9342 kompute : improve backend to pass test_backend_ops (llama/10542)
* kompute: op_unary: reject unsupported parameters

Signed-off-by: Sergio Lopez <slp@redhat.com>

* kompute: softmax: implement ALiBi support

Signed-off-by: Sergio Lopez <slp@redhat.com>

* kompute: rope: implement neox and phi3 support

Signed-off-by: Sergio Lopez <slp@redhat.com>

* kompute: op_mul_mat_q4_k permutted support

Signed-off-by: Sergio Lopez <slp@redhat.com>

* kompute: op_mul_mat_[q4_0|q4_1|q8_0] permutted support

Signed-off-by: Sergio Lopez <slp@redhat.com>

* kompute: op_mul_mat_f16 permutted support

Signed-off-by: Sergio Lopez <slp@redhat.com>

* kompute: op_mul_mat_q6_k permutted support

Signed-off-by: Sergio Lopez <slp@redhat.com>

---------

Signed-off-by: Sergio Lopez <slp@redhat.com>
2024-12-08 20:14:35 +02:00
90dd5fca9c CANN: Fix SOC_TYPE compile bug (llama/10519)
* CANN: Fix the bug build fail on Ascend310P under two cases:
1) Manual specify SOC_TYPE
2) Under some unusual compile environment

* Update the cann backend News content: Support F16 and F32 data type model for Ascend 310P NPU.

* fix CANN  compile fail bug: the assert in ascend kernel function doesn't supportted on some CANN version
2024-12-08 20:14:35 +02:00
2490f2a7f8 CANN: ROPE operator optimization (llama/10540)
* [cann] ROPE operator optimization

Co-authored-by: noemotiovon <noemotiovon@gmail.com>
2024-12-08 20:14:35 +02:00
230e985633 Add some minimal optimizations for CDNA (llama/10498)
* Add some minimal optimizations for CDNA

* ggml_cuda: set launch bounds also for GCN as it helps there too
2024-12-08 20:14:35 +02:00
ae24083f23 metal : fix group_norm support condition (llama/0) 2024-12-08 20:14:35 +02:00
6463e36369 vulkan: define all quant data structures in types.comp (llama/10440) 2024-12-08 20:14:35 +02:00
b3301f7d82 vulkan: Handle GPUs with less shared memory (llama/10468)
There have been reports of failure to compile on systems with <= 32KB
of shared memory (e.g. #10037). This change makes the large tile size
fall back to a smaller size if necessary, and makes mul_mat_id fall
back to CPU if there's only 16KB of shared memory.
2024-12-08 20:14:35 +02:00
ab5d4d93ec vulkan: further optimize q5_k mul_mat_vec (llama/10479) 2024-12-08 20:14:35 +02:00
2d6e9dd723 vulkan: skip integer div/mod in get_offsets for batch_idx==0 (llama/10506) 2024-12-08 20:14:35 +02:00
2f16e51553 vulkan: optimize Q2_K and Q3_K mul_mat_vec (llama/10459) 2024-12-08 20:14:35 +02:00
0f0994902f mtgpu: Add MUSA_DOCKER_ARCH in Dockerfiles && update cmake and make (llama/10516)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-12-08 20:14:35 +02:00
5e1fcc1780 vulkan: fix group_norm (llama/10496)
Fix bad calculation of the end of the range. Add a backend test that
covers the bad case (taken from stable diffusion).

Fixes https://github.com/leejet/stable-diffusion.cpp/issues/439.
2024-12-08 20:14:35 +02:00
48f421de23 cmake : enable warnings in llama (llama/10474)
* cmake : enable warnings in llama

ggml-ci

* cmake : add llama_get_flags and respect LLAMA_FATAL_WARNINGS

* cmake : get_flags -> ggml_get_flags

* speculative-simple : fix warnings

* cmake : reuse ggml_get_flags

ggml-ci

* speculative-simple : fix compile warning

ggml-ci
2024-12-08 20:14:35 +02:00
e7afb2b991 ggml-cpu: cmake add arm64 cpu feature check for macos (llama/10487)
* ggml-cpu: cmake add arm64 cpu feature check for macos

* use vmmlaq_s32 for compile option i8mm check
2024-12-08 20:14:35 +02:00
9a5ef7b169 CANN: Improve the Inferencing Performance for Ascend NPU Device (llama/10454)
* improve inferencing performance for ascend npu.

Co-authored-by: Frank Mai <thxCode@thxcode0824@gmail.com>

* some modification after review

* some modifications after review

* restore some modifications

* restore some modifications

---------

Co-authored-by: shanshan shen <shanshanshen333@gmail.com>
Co-authored-by: Frank Mai <thxCode@thxcode0824@gmail.com>
2024-12-08 20:14:35 +02:00
453cc0fcf1 CANN: RoPE and CANCAT operator optimization (llama/10488)
Co-authored-by: noemotiovon <noemotiovon@gmail.com>
2024-12-08 20:14:35 +02:00
78dfec6bc5 vulkan: Fix a vulkan-shaders-gen arugment parsing error (llama/10484)
The vulkan-shaders-gen was not parsing the --no-clean argument correctly.
Because the previous code was parsing the arguments which have a value only
and the --no-clean argument does not have a value, it was not being parsed
correctly. This commit can now correctly parse arguments that don't have values.
2024-12-08 20:14:35 +02:00
f6d518fc4c metal : enable mat-vec kernels for bs <= 4 (llama/10491) 2024-12-08 20:14:35 +02:00
ac33379a35 llama : accept a list of devices to use to offload a model (llama/10497)
* llama : accept a list of devices to use to offload a model

* accept `--dev none` to completely disable offloading

* fix dev list with dl backends

* rename env parameter to LLAMA_ARG_DEVICE for consistency
2024-12-08 20:14:35 +02:00
77e3e4a090 ggml : add support for dynamic loading of backends (llama/10469)
* ggml : add support for dynamic loading of backends

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-12-08 20:14:35 +02:00
b840bb09be metal : minor code formatting 2024-12-08 20:14:35 +02:00
8b1c1c30a7 ggml : do not use ARM features not included in the build (llama/10457) 2024-12-08 20:14:35 +02:00
4b81335f75 CANN: Support Ascend310P to accelerate F32 and F16 Model (llama/10216)
* CANN Support Ascend310P to accelerate F32 and F16 Model

* Add compile option soc type macro ASCEND_310P to ggml-cann lib

* Remove unused code

* Remove the ascend soc_type hard code compile option in CMakelist.txt
2024-12-08 20:14:35 +02:00
2a4b5c9d7e cuda : optimize argmax (llama/10441)
* cuda : optimize argmax

* remove unused parameter

ggml-ci

* fixup : use full warps

ggml-ci

* Apply suggestions from code review

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* fix ub

* ggml : check ne00 <= INT32_MAX in argmax and argsort

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2024-12-08 20:14:35 +02:00
04662748aa vulkan: predicate max operation in soft_max shaders/soft_max (llama/10437)
Fixes #10434
2024-12-08 20:14:35 +02:00
a117279e13 vulkan: copy iq4_nl LUT into shared memory (llama/10409) 2024-12-08 20:14:35 +02:00
bbb292ed38 vulkan: further optimize mul_mat_vec using larger loads (llama/10387)
* vulkan: Use pipeline_robustness to disable robustness in mul_mat_vec.

Add some early returns for nonexistent rows in mul_mat_vec shaders. These
can only be hit when dispatching a 2D grid of workgroups. Fix the logic
for the 2D grid of workgroups to round up.

Enable the pipeline robustness extension if it's available, and use it to
disable robustness for these pipelines. The instructions to do the bounds
checking contend for the same ALU resources as the bit twiddling dequant
instructions.

* vulkan: Add GLSL structure aliases for quant types to allow larger loads

In Vulkan it's not possible to cast pointer types, so instead you have to
declare an aliased binding for the memory with a different type. This
commit adds aliases for the quant formats using 16b ints, and in a few
places where the struct size is a multiple of 4 also using 32b ints.
Currently only q4_k's aliases are used, but others will be used in
subsequent commits.

* vulkan: use larger loads in q5_k and q6_k shaders.

Similar to the optimization I did in q4_k recently, this vectorizes some loads
and reduces the number of bit twiddling instructions.

* vulkan: use larger K step per iteration in mul_mat_vec.

Add vec4 dequantization functions, and use them to do K=8 per iteration in
mul_mat_vec. This uses 16b loads for the quant values and 128b loads for B
which helps reduce the load on the memory system.

The K_PER_ITER==2 logic is still there, just for F16/F32, and really only
because they support unaligned sizes.

Tweak the num_iters/unrolling logic to be simpler and catch a couple missed
unrolling opportunities.
2024-12-08 20:14:35 +02:00
95e8901e71 add cmake rvv support (llama/10411) 2024-12-08 20:14:35 +02:00
4af9626702 CUDA: remove unnecessary warp reduce in FA (ggml/1032)
* kqmax_new_j in every thread within warp is same after operate at line 199,this reduce can be omit

* same problem in vec32

---------

Co-authored-by: ZhaoXiaoYu <zhao.xiaoyu@zte.com.cn>
2024-12-08 20:14:35 +02:00
PAB
c52d1035de feat: add GGML_UNARY_OP_ARGMAX Metal kernel (ggml/1019)
* implemented argmax kernel

* tpig -> tgpig

* change to strides

* contiguous assertions

* kernel working and tested

* argmax simd parallel implementation

* added 2 new tests for argmax in test-backend-ops

* cosmit

* added 3 tests cases for perf eval

* add test_argmax in make_test_cases_perf

* Update test-backend-ops.cpp

Co-authored-by: Diego Devesa <slarengh@gmail.com>

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>
2024-12-08 20:14:35 +02:00
PAB
5773a14980 metal : add GGML_OP_CONV_TRANSPOSE_1D kernels (ggml/1026)
* wip

* wip implementation f32

* kernel conv transpose 1d f32 working

* initial commit
2024-12-08 20:14:35 +02:00
6939147c47 Do not include arm_neon.h when compiling CUDA code (ggml/1028) 2024-12-08 20:14:35 +02:00
98f9916c9f ggml-opt: fix data corruption (ggml/1022) 2024-12-08 20:14:35 +02:00
021eef1000 ruby : Add low-level methods to transcribe (#2585)
* Add tests for Whisper::Context#full

* Add Whisper::Context#full

* Add tests for Whisper::Error

* Add document of Whisper::Context#full [skip ci]

* Add additional signature for Whisper::Context#full

* Add description to Whisper::Context#full

* Add test for Whisper::Context#full_parallel

* Add Whisper::Context#full_parallel

* Hide Whisper's instance methods from Ruby code

* Add class to test MemoryView

* Build test class before running test

* Add test for MemoryView

* Make Whisper::Context#full and #full_parallel accept MemoryView

* Use Ruby 3.1 on CI

* Add comment on samples data type

* Update README

* Update README

* Remove unused code
2024-11-28 10:33:07 +02:00
a9d06ce151 models : add q8_0 models to download-ggml-model.sh (#2589) 2024-11-28 10:31:54 +02:00
8c6a9b8bb6 ruby : Follow source tree change (#2580)
* Follow whisper.cpp source tree change

* Update whispercpp.gemspec

* Follow whisper.cpp log level change

* Fix paths in GitHub workflow for Ruby bindings

* Use GitHub workflow setting for dependency definition

* Use ternary operator
2024-11-21 17:04:29 +02:00
37c88027e1 whisper : use backend registry (#0) 2024-11-20 21:00:08 +02:00
9db070a3c5 ggml/sched : do not skip views in pre-assignments 2024-11-20 21:00:08 +02:00
7fd8d9c220 whisper : adapt to new ggml (wip) 2024-11-20 21:00:08 +02:00
06e059b8f8 talk-llama : sync llama.cpp 2024-11-20 21:00:08 +02:00
c9f49d5f9d sync : ggml 2024-11-20 21:00:08 +02:00
f4c1d7df39 ggml : sync resolve (skip) (#0) 2024-11-20 21:00:08 +02:00
339b8e559c Add required ggml-base and backend libs to cmake pkg (llama/10407) 2024-11-20 21:00:08 +02:00
5f6d6919b4 cuda : fix CUDA_FLAGS not being applied (llama/10403) 2024-11-20 21:00:08 +02:00
8ee767732f sycl : Add option to set the SYCL architecture for all targets (llama/10266)
* Add option to set the SYCL architecture for all targets
* Convert GGML_SYCL_HIP_TARGET to the more generic GGML_SYCL_ARCH option
* Document that setting GGML_SYCL_ARCH can improve the performance
2024-11-20 21:00:08 +02:00
45f1f9144f vulkan: Optimize soft_max (llama/10301)
* vulkan: Optimize soft_max

Large soft_max could already saturate memory, but small/medium sizes were
pretty slow. The bulk of the gains for them comes from using a smaller
workgroup size, and making the workgroup size match the subgroup size also
makes the barriers much cheaper.

Cache some values in locals to avoid refetching/recomputing. And stamp
out a few "template instantiations" so smaller cases will fully unroll.

Add a missing early return for OOB rows. This happens when there are more
than 512 rows and the dispatch is 512 x H.

* vulkan: Further soft_max optimizations

Restore the workgroup size of 512 case, use it for >1024.

Use unrollable loops for more iteration counts.
2024-11-20 21:00:08 +02:00
53589c8f12 sycl: Revert MUL_MAT_OP support changes (llama/10385) 2024-11-20 21:00:08 +02:00
7ac2f17fac cuda : only use native when supported by cmake (llama/10389) 2024-11-20 21:00:08 +02:00
48862c7b27 vulkan: remove use of null initializer (llama/10372)
Seems like this isn't working for vulkan-over-metal when the array is sized
by a spec constant. Maybe a spirv-cross limitation?
2024-11-20 21:00:08 +02:00
44f7d9f4e3 metal : fox offset integer overflows in im2col (ggml/1015)
-- While running StableDiffusion.cpp locally with Metal some offsets overflow and results in incorrect calculations
2024-11-20 21:00:08 +02:00
fd12302587 Vulkan: Fix device info output format specifiers (llama/10366)
* Vulkan: Fix device info output format specifiers

* Vulkan: Use zu printf specifier for size_t instead of ld
2024-11-20 21:00:08 +02:00
PAB
f80bef4630 metal : add GGML_UNARY_OP_ELU kernel (ggml/1018) 2024-11-20 21:00:08 +02:00
161b443514 CUDA: fix MMV kernel being used for FP16 src1 (llama/10357) 2024-11-20 21:00:08 +02:00
ef7fbe1c66 CMake: fix typo in comment [no ci] (llama/10360) 2024-11-20 21:00:08 +02:00
0879d3599e llama : only use default buffer types for the KV cache (llama/10358) 2024-11-20 21:00:08 +02:00
2a444dc5bd metal : refactor kernel args into structs (llama/10238)
* metal : add kernel arg structs (wip)

* metal : fattn args

ggml-ci

* metal : cont + avoid potential int overflow [no ci]

* metal : mul mat struct (wip)

* cont : mul mat vec

* cont : pass by reference

* cont : args is first argument

* cont : use char ptr

* cont : shmem style

* cont : thread counters style

* cont : mul mm id

ggml-ci

* cont : int safety + register optimizations

ggml-ci

* metal : GGML_OP_CONCAT

ggml-ci

* metal : GGML_OP_ADD, GGML_OP_SUB, GGML_OP_MUL, GGML_OP_DIV

* metal : GGML_OP_REPEAT

* metal : GGML_OP_CPY

* metal : GGML_OP_RMS_NORM

* metal : GGML_OP_NORM

* metal : add TODOs for rest of ops

* ggml : add ggml-metal-impl.h

ggml-ci
2024-11-20 21:00:08 +02:00
45cf1634dc ggml : fix undefined reference to 'getcpu' (llama/10354)
https://github.com/ggerganov/llama.cpp/issues/10352
2024-11-20 21:00:08 +02:00
dcb2922d1d CUDA: remove DMMV, consolidate F16 mult mat vec (llama/10318) 2024-11-20 21:00:08 +02:00
3c5c751174 CMake: default to -arch=native for CUDA build (llama/10320) 2024-11-20 21:00:08 +02:00
24ad19d0e9 ggml : fix possible buffer use after free in sched reserve (llama/9930) 2024-11-20 21:00:08 +02:00
bd574b05af ggml : inttypes.h -> cinttypes (llama/0)
ggml-ci
2024-11-20 21:00:08 +02:00
7e0eafcb1e ggml : adapt AMX to tensor->grad removal (llama/0)
ggml-ci
2024-11-20 21:00:08 +02:00
75670ae673 ggml : fix compile warnings (llama/0)
ggml-ci
2024-11-20 21:00:08 +02:00
d4fcdf602b llamafile : fix include path (llama/0)
ggml-ci
2024-11-20 21:00:08 +02:00
1bebb1a116 vulkan: Optimize some mat-vec mul quant shaders (llama/10296)
Compute two result elements per workgroup (for Q{4,5}_{0,1}). This reuses
the B loads across the rows and also reuses some addressing calculations.
This required manually partially unrolling the loop, since the compiler
is less willing to unroll outer loops.

Add bounds-checking on the last iteration of the loop. I think this was at
least partly broken before.

Optimize the Q4_K shader to vectorize most loads and reduce the number of
bit twiddling instructions.
2024-11-20 21:00:08 +02:00
ee437cde59 ggml : optimize Q4_0 into Q4_0_X_Y repack (llama/10324) 2024-11-20 21:00:08 +02:00
c1506d38cf Make updates to fix issues with clang-cl builds while using AVX512 flags (llama/10314) 2024-11-20 21:00:08 +02:00
c9541741e6 ggml: new optimization interface (ggml/988)
* ggml: new optimization interface

remove test2.c, test3.c

store adamw params in tensor

move grads from tensor to graph

* avoid segfault upon API misuse

* add ggml-opt.h to public headers

* remove dependence of ggml-opt.cpp on ggml-cpu.h
2024-11-20 21:00:08 +02:00
6a55015dc4 ggml : remove duplicated sources from the last sync (ggml/1017)
* ggml : remove duplicated sources from the last sync

ggml-ci

* cont : remove FindSIMD.cmake [no ci]
2024-11-20 21:00:08 +02:00
7e86030d4d ggml : fix some build issues 2024-11-20 21:00:08 +02:00
401fbea326 sync : leftovers (ggml/0)
ggml-ci
2024-11-20 21:00:08 +02:00
44d1cbdfe9 cmake : restore CMakeLists.txt (llama/10256)
ggml-ci
2024-11-20 21:00:08 +02:00
Eve
3216efef2e AVX BF16 and single scale quant optimizations (llama/10212)
* use 128 bit loads (i've tried 256->128 to death and its slower)

* double accumulator

* avx bf16 vec dot

* +3% q4_0 inference

* +7% tg +5% pp compared to master

* slower f16c version, kep for reference

* 256b version, also slow. i tried :)

* revert f16

* faster with madd

* split to functions

* Q8_0 and IQ4_NL, 5-7% faster

* fix potential overflow (performance reduced)

* 16 bit add for q4_0 only

* merge
2024-11-20 21:00:08 +02:00
2c0484ebf7 sycl: Use syclcompat::dp4a (llama/10267)
* sycl: Use syclcompat::dp4a

* Using the syclcompat version allow the compiler to optimize the
  operation with native function

* Update news section

* Update CI Windows oneAPI version to 2025.0

* Reword doc

* Call syclcompat::dp4a inside dpct::dp4a

This reverts commit 90cb61d692d61360b46954a1c7f780bd2e569b73.
2024-11-20 21:00:08 +02:00
3298916e5e backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (llama/9921)
* backend-cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>
2024-11-20 21:00:08 +02:00
746bf2596f ggml : build backends as libraries (llama/10256)
* ggml : build backends as libraries

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>
2024-11-20 21:00:08 +02:00
5f7e094ccb scripts : update sync 2024-11-20 21:00:08 +02:00
6266a9f9e5 release : v1.7.2 2024-11-19 18:54:22 +02:00
d24f981fb2 sycl: fix example build (#2570) 2024-11-18 14:57:23 +02:00
01d3bd7d5c ci : use local ggml in Android build (#2567) 2024-11-16 20:45:41 +02:00
bb12cd9b77 ggml : tmp workaround for whisper.cpp (skip) (#2565) 2024-11-16 20:21:24 +02:00
f02b40bcb4 update : readme 2024-11-15 16:00:10 +02:00
83ac2842bd scripts : fix sync path 2024-11-15 15:24:09 +02:00
c4e95fb74d whisper.swiftui : switch Mac dest to Mac (Designed for iPad) (#2562) 2024-11-15 15:21:53 +02:00
e23721f3fb cmake : fix ppc64 check (#0) 2024-11-15 15:21:04 +02:00
c0a9f8ef85 whisper : include ggml-cpu.h (#0) 2024-11-15 15:21:04 +02:00
6477b84eb6 build : fixes 2024-11-15 15:21:04 +02:00
24d706774d talk-llama : sync llama.cpp 2024-11-15 15:21:04 +02:00
5089ab2d6a whisper : fix build (#0) 2024-11-15 15:21:04 +02:00
bdbb906817 sync : ggml 2024-11-15 15:21:04 +02:00
fa2ebd336e sycl : Fixes to broken builds and test-backend-ops (llama/10257)
* Fixes broken build for the SYCL CUDA backend caused by non-explicit gemm call in outprod (merged in with RWKV6 in
Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration #10133)

* Marks permuted MUL_MAT as unsupported to be able to run test-backend-ops

* Fixes asserts in norm to fix debug builds.
2024-11-15 15:21:04 +02:00
21b01a21b6 vulkan: Optimize contiguous copies (llama/10254)
* tests: Fix memory bandwidth calculation for perf tests

Add a flops calculation for flash attention.

Add one GGML_OP_CPY perf test.

* vulkan: Optimize contiguous copies

Add a variant of the copy shader for when the tensors are contiguous. Avoid
the complex addressing calculations, and do four elements per invocation
to hide some other overhead.

Apply similar changes to the scale shader, since scale is always contiguous.

Add a "progress bar" for shader compiles.
2024-11-15 15:21:04 +02:00
b54ce5edc5 vulkan: Throttle the number of shader compiles during the build step. (llama/10222)
Fixes #9582

Spawning too many concurrent copies of glslc leads to "Failed to create pipes"
errors on Linux. This change applies the same throttling we use for
multithreaded pipeline creation.
2024-11-15 15:21:04 +02:00
26a31b78e9 metal : more precise Q*K in FA vec kernel (llama/10247) 2024-11-15 15:21:04 +02:00
14d13c5f9f vulkan: Fix newly added tests for permuted mul_mat and 1D im2col (llama/10226) 2024-11-15 15:21:04 +02:00
5e110c2eb5 metal : reorder write loop in mul mat kernel + style (llama/10231)
* metal : reorder write loop

* metal : int -> short, style

ggml-ci
2024-11-15 15:21:04 +02:00
4a9926d521 metal : fix build and some more comments (llama/10229) 2024-11-15 15:21:04 +02:00
ae3c5642d0 metal : fix F32 accumulation in FA vec kernel (llama/10232) 2024-11-15 15:21:04 +02:00
e287a3b627 metal : hide debug messages from normal log 2024-11-15 15:21:04 +02:00
SXX
b890243690 ggml: fix zero division in ‘dne’ calculation in CUDA COUNT_EQUAL operator when ‘ne’ is small (#10213) 2024-11-15 15:21:04 +02:00
b7b38f7d68 ggml : optimize llamafile cpu matrix multiplication for ppc64le (llama/10156)
This change upstreams llamafile's cpu matrix
multiplication kernels for ppc64le using MMA
builtins for FP32 datatype.

This change results in a consistent 90%
improvement in input processing time, and 20%
to 80% improvement in output processing time,
across various batch sizes.

The patch is tested with Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf models on a
IBM POWER10 machine.

Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
2024-11-15 15:21:04 +02:00
9f67aab211 metal : opt-in compile flag for BF16 (llama/10218)
* metal : opt-in compile flag for BF16

ggml-ci

* ci : use BF16

ggml-ci

* swift : switch back to v12

* metal : has_float -> use_float

ggml-ci

* metal : fix BF16 check in MSL

ggml-ci
2024-11-15 15:21:04 +02:00
8f0f785d88 metal : improve clarity (minor) (llama/10171) 2024-11-15 15:21:04 +02:00
d0b8335789 metal : optimize FA kernels (llama/10171)
* ggml : add ggml_flash_attn_ext_get_prec

* metal : use F16 precision in FA kernels

ggml-ci

* metal : minor clean-up

* metal : compile-guard bf16 FA kernels

ggml-ci

* build : remove obsolete compile flag [no ci]

* metal : prevent int overflows [no ci]

* cuda : disable BF16 FA

ggml-ci

* metal : fix BF16 requirement for FA kernels

ggml-ci

* make : clean-up [no ci]
2024-11-15 15:21:04 +02:00
1550be79f1 ggml : add ggml-cpu.h to the public headers (llama/10204) 2024-11-15 15:21:04 +02:00
807f848c2f fix q4_0_8_8 format for corrupted tokens issue (llama/10198)
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-62-167.us-west-2.compute.internal>
2024-11-15 15:21:04 +02:00
42398f13b0 Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration (llama/10133)
* rwkv6: rename to wkv6

* rwkv6: support avx2 avx512 armv8 armv9

* rwkv6: update cuda file name

* rwkv6: rename params

* wkv on sycl

* sycl: add some ops

* sycl: Enhance OP support judgment

* wkv6: drop armv9 and tranfer to GGML style

ggml-ci

* sync : ggml

* update the function to use appropriate types

* fix define error

* Update ggml/src/ggml-cpu.c

* add appropriate asserts

* move element-wise functions outside

* put the declaration outside the loop

* rewrite to be more inline with the common pattern for distributing threads

* use recommended way GGML_TENSOR_LOCALS

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
Co-authored-by: Plamen Minev <pacominev@gmail.com>
Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com>
Co-authored-by: Meng, Hengyu <airdldl@163.com>
2024-11-15 15:21:04 +02:00
31c3482a4e metal : add BF16 support (llama/8439)
* ggml : add initial BF16 support

ggml-ci

* metal : add mul_mat_id BF16 support

ggml-ci

* metal : check for bfloat support on the Metal device

ggml-ci

* metal : better var names [no ci]

* metal : do not build bfloat kernels when not supported

ggml-ci

* metal : try to fix BF16 support check

ggml-ci

* metal : this should correctly check bfloat support
2024-11-15 15:21:04 +02:00
50257af686 metal : fix from ptr buffer name (llama/10189) 2024-11-15 15:21:04 +02:00
d111a0987e ggml : adjust is_first_call init value (llama/10193)
ggml-ci
2024-11-15 15:21:04 +02:00
915bcd2c63 metal : add quantized FA support (llama/10149)
* metal : add quantized FA (vec) support

ggml-ci

* metal : add quantized FA (non-vec) support

* metal : fix support check

ggml-ci

* metal : clean-up

* metal : clean-up (cont)

* metal : fix shared memory calc + reduce smem + comments

* metal : float-correctness

* metal : minor [no ci]
2024-11-15 15:21:04 +02:00
f69c8b6f1b ggml : fix arch check in bf16_to_fp32 (llama/10164) 2024-11-15 15:21:04 +02:00
Eve
8c9044bef0 Q6_K AVX improvements (llama/10118)
* q6_k instruction reordering attempt

* better subtract method

* should be theoretically faster

small improvement with shuffle lut, likely because all loads are already done at that stage

* optimize bit fiddling

* handle -32 offset separately. bsums exists for a reason!

* use shift

* Update ggml-quants.c

* have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86
2024-11-15 15:21:04 +02:00
5f8e928194 ggml : fix gelu tables initialization (llama/10172) 2024-11-15 15:21:04 +02:00
25da30bd60 ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (llama/10167) 2024-11-15 15:21:04 +02:00
542734100e fix build break on arm64 linux (llama/10166)
This fixes the build break from the recent changes
to move the CPU backend to separate files
https://github.com/ggerganov/llama.cpp/pull/10144
2024-11-15 15:21:04 +02:00
b06b4c0c08 cuda : clear error after changing peer access (llama/10153) 2024-11-15 15:21:04 +02:00
939d36fb4c metal : simplify f16 and f32 dequant kernels (llama/0) 2024-11-15 15:21:04 +02:00
1471e41180 metal : move dequantize templates to beginning of MSL source (llama/0) 2024-11-15 15:21:04 +02:00
35949192e9 CANN: adjust backend registry refactor. (llama/10158)
remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR.
2024-11-15 15:21:04 +02:00
9c817edb48 ggml : move CPU backend to a separate file (llama/10144) 2024-11-15 15:21:04 +02:00
24a0feb5d9 metal : minor fixup in FA kernel (llama/10143)
* metal : minor fixup in FA kernel

ggml-ci

* metal : use the unrolled loop variable

* metal : remove unused var
2024-11-15 15:21:04 +02:00
2ab8cce7e3 llama : add simple-chat example (llama/10124)
* llama : add simple-chat example

---------

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
2024-11-15 15:21:04 +02:00
b40c255e98 llama : use smart pointers for ggml resources (llama/10117) 2024-11-15 15:21:04 +02:00
ec3e16445e vulkan : improve ggml_vk_create_buffer error handling (llama/9898) 2024-11-15 15:21:04 +02:00
0665168ef3 ggml : remove ggml_scratch (llama/10121)
ggml-ci
2024-11-15 15:21:04 +02:00
5f6b992eea build: fix build error in Windows env with OneAPI setup (llama/10107) 2024-11-15 15:21:04 +02:00
3e231ab9cc llama : fix buffer checks for mamba and rwk (llama/10111)
* llama : fix buffer checks for mamba and rwk

* llama : fix missing worst case flag during reserve

* cuda : fix supports_op for norm

* disable sched SET_CAUSE
2024-11-15 15:21:04 +02:00
371bfaca8c ggml : check tensor name lengths in gguf files (llama/10100) 2024-11-15 15:21:04 +02:00
91e30a3a23 kompute: add mul_mat_q4_k shader (llama/10097)
This is a more or less direct translation from the Metal implementation
to GLSL.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2024-11-15 15:21:04 +02:00
1e122d66f9 kompute: add backend registry / device interfaces (llama/10045)
Get in line with the other backends by supporting the newer
backend/device registry interfaces.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2024-11-15 15:21:04 +02:00
63a4e09a0f ggml : fix memory leaks when loading invalid gguf files (llama/10094)
* ggml : fix gguf string leak when reading kv pairs fails

* ggml : avoid crashing with GGML_ABORT when the KV has an invalid type

* ggml : avoid crashing on failed memory allocations when loading a gguf file
2024-11-15 15:21:04 +02:00
75dd198870 ggml : add Q4_0_8_8 RISC-V GEMV and GEMM kernels (llama/10029)
* ggml : RISC-V vector gemv for q4_0_8x8

* ggml : Added WIP rvv q4_0_8x8 gemm

* ggml : Added initial implementation of rvv gemm

* ggml : optimize gemm to avoid register spillover

* ggml : Fix GCC rvv load alignment issue

* ggml : Format gemm rvv code

* ggml : Fix a typo in RVV q4_0_8_8 GEMM
2024-11-15 15:21:04 +02:00
1d48457aa6 llama : refactor model loader with backend registry (llama/10026) 2024-11-15 15:21:04 +02:00
307712a903 ggml: Add POOL2D OP for GPU acceleration to the Vulkan backend in the MobileVLM model. (llama/9763)
* ggml: Add POOL2D OP for GPU ACC to the Vulkan.

- The MobileVLM model now supports inference acceleration through GPU by utilizing the Vulkan backend.
- A GGML_OP_POOL_2D shader has been added. (Pooling)
- The encoding performance of the CLIP model improved from 2.8s on the CPU to 0.7s on the GPU.

Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>

* [fix] Correct the incorrect order of the parameters.

fix casting to int.

Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>

---------

Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>
2024-11-15 15:21:04 +02:00
fbc9a05ddf musa: workaround for Guilty Lockup in cleaning src0 (llama/10042)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-11-15 15:21:04 +02:00
28496ac55e cmake : make it possible linking ggml as external lib (ggml/1003) 2024-11-15 15:21:04 +02:00
b1c06c09b0 metal : fix minor string leaks (ggml/1004) 2024-11-15 15:21:04 +02:00
498ac0dc27 scripts : update sync 2024-11-15 15:21:04 +02:00
03af461de8 ci : fix building workflow for linux/arm64 container (#2555) 2024-11-15 11:07:17 +02:00
f19463ece2 ruby : extend API (#2551)
* Handle objs in Ruby code

* Add task to make Makefile

* Share commont constance in test suites

* Add model-related APIs

* Add Whisper::Model class

* Add tests for Whisper::Model

* Add missing LDFLAG -lstdc++

* Add tests for Whisper.log_set

* Add Whisper.set_log

* Define log level

* Add document on logging

* Add license section to README

* Add document on Whisper::Model

* Fix examples in README

* Add test for Model with GC

* Make dependency on Makefile more accurate

* Fix bug about Whisper::Model and GC
2024-11-13 21:52:56 +02:00
5f8a086e22 whisper.swiftui : add model download list & bench methods (#2546)
* swift : fix resources & exclude build

* whisper : impl whisper_timings struct & api

* whisper.swiftui : model list & bench methods

* whisper : return ptr for whisper_get_timings

* revert unnecessary change

* whisper : avoid designated initializer

* whisper.swiftui: code style changes

* whisper.swiftui : get device name / os from UIDevice

* whisper.swiftui : fix UIDevice usage

* whisper.swiftui : add memcpy and ggml_mul_mat (commented)
2024-11-13 21:51:34 +02:00
a28d82e373 ruby : fix the instructions (#2548)
#prompt doesn't exist but #initial_prompt does
2024-11-13 21:47:42 +02:00
5ccca19f0c ggml : vulkan logs (#2547) 2024-11-13 21:47:15 +02:00
300c07b94d examples : fix ffmpeg v5 build (#2543)
remove call to 'av_register_all()' which does not exist in ffmpeg v5
anymore.
2024-11-13 21:41:52 +02:00
31aea563a8 whisper : fix extra memory usage (#2534)
* passing samples_padded by ref to the threads.

* passing samples_padded by ref to the threads.

---------

Co-authored-by: Vinith Misra <physicsdemon@gmail.com>
2024-11-06 23:02:11 +02:00
0377596b77 whisper : backend registry init before model load 2024-11-01 10:19:05 +02:00
c65d0fd3c8 talk-llama : sync llama.cpp 2024-11-01 10:19:05 +02:00
d9efb664ac sync : ggml 2024-11-01 10:19:05 +02:00
b5b4b0f5de ggml : add AMX backend (llama/8998) 2024-11-01 10:19:05 +02:00
ab36d02560 metal : support permuted matrix multiplicaions (llama/10033)
* metal : support permuted matrix multiplicaions

ggml-ci

* cont : use nb01 directly for row steps

ggml-ci

* cont : add comments [no ci]

* metal : minor refactor

* metal : minor
2024-11-01 10:19:05 +02:00
6e67749c00 CUDA: fix insufficient buffer clearing for MMQ (llama/10032) 2024-11-01 10:19:05 +02:00
ab0385f43b CUDA: fix MMQ for non-contiguous src0, add tests (llama/10021)
* CUDA: fix MMQ for non-contiguous src0, add tests

* revise test code
2024-11-01 10:19:05 +02:00
10eb603a3c increase cuda_cpy block size (ggml/996)
Co-authored-by: bssrdf <bssrdf@gmail.com>
2024-11-01 10:19:05 +02:00
a3231b2f2e metal : add POOL2D and fix IM2COL (llama/9943)
* add pool_2d

Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>

* fix im2col and add unittest for N>=1024

Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>

* add tests for N % 1024 != 0

Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>

* remove trailing whitespaces

Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>

* apply suggestions

Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>

* apply more optimization

- original IM2COL kernel + _ext with MIN()

Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>

* apply review: change kernel name of pool_2d

Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>

* apply review

Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>

* fix more formatting and enhance readability

Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>

---------

Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>
2024-11-01 10:19:05 +02:00
13db492f83 Adapt to dynamically loadable backends mechanism (llama/9970)
* [CANN] Adapt to dynamically loadable backends mechanism

* Fix the Bug: inference running result is garbled in debug running model for LM models who's type is Q4_0 class

* Handle the review comments of this pull request
2024-11-01 10:19:05 +02:00
741c138aa1 ggml : add asserts for type conversion in fattn kernels (llama/9971)
ggml-ci
2024-11-01 10:19:05 +02:00
25f9fee6fb rpc : pack only RPC structs (llama/9959) 2024-11-01 10:19:05 +02:00
7c1570bee6 fix mul_mat_vec_q and *_vec_q error (llama/9939)
Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
2024-11-01 10:19:05 +02:00
4078e4c388 rpc : backend refactoring (llama/9912)
* rpc : refactor backend

Use structs for RPC request/response messages

* rpc : refactor server
2024-11-01 10:19:05 +02:00
a4a22daa8f Add SYCL Backend registry, device and Event Interfaces (llama/9705)
* implemented missing SYCL event APIs

* sycl : Added device and backend reg interfaces

* Restructured ggml-sycl.cpp
2024-11-01 10:19:05 +02:00
e1936eb2a5 add amx kernel for gemm (llama/8998)
add intel amx isa detection

add vnni kernel for gemv cases

add vnni and amx kernel support for block_q8_0

code cleanup

fix packing B issue

enable openmp

fine tune amx kernel

switch to aten parallel pattern

add error message for nested parallelism

code cleanup

add f16 support in ggml-amx

add amx kernels for QK_K quant formats: Q4_K, Q5_K, Q6_K and IQ4_XS

update CMakeList

update README

fix some compilation warning

fix compiler warning when amx is not enabled

minor change

ggml-ci

move ggml_amx_init from ggml.c to ggml-amx/mmq.cpp

ggml-ci

update CMakeLists with -mamx-tile, -mamx-int8 and -mamx-bf16

ggml-ci

add amx as an ggml-backend

update header file, the old path for immintrin.h has changed to ggml-cpu-impl.h

minor change

update CMakeLists.txt

minor change

apply weight prepacking in set_tensor method in ggml-backend

fix compile error

ggml-ci

minor change

ggml-ci

update CMakeLists.txt

ggml-ci

add march dependency

minor change

ggml-ci

change ggml_backend_buffer_is_host to return false for amx backend

ggml-ci

fix supports_op

use device reg for AMX backend

ggml-ci

minor change

ggml-ci

minor change

fix rebase

set .buffer_from_host_ptr to be false for AMX backend
2024-11-01 10:19:05 +02:00
28b044dad9 vulkan : add backend registry / device interfaces (llama/9721)
* vulkan : add backend registry / device interfaces

* llama : print devices used on model load
2024-11-01 10:19:05 +02:00
b8f11a0a17 fix: allocating CPU buffer with size 0 (llama/9917) 2024-11-01 10:19:05 +02:00
ff5a838099 fix: use vm_allocate to allocate CPU backend buffer on macOS (llama/9875)
* fix: use `vm_allocate` to allocate CPU backend buffer on macOS

* fix: switch to `posix_memalign` to keep existing `free()` usages work

* feat: move `GGML_ALIGNED_MALLOC` to `ggml-backend-impl.h`, add support for `vm_allocate` on macOS

* style: formatting

* fix: move const outside of `#ifndef`

* style: formatting

* fix: unused var

* fix: transform `GGML_ALIGNED_MALLOC` and `GGML_ALIGNED_FREE` into functions and add them to `ggml-impl.h`

* fix: unused var

* fix: page align to `GGUF_DEFAULT_ALIGNMENT`

* fix: page align to `TENSOR_ALIGNMENT`

* fix: convert `TENSOR_ALIGNMENT` to a macro

* fix: increase page size to `32` on iOS

* fix: iOS page size

* fix: `hbw_posix_memalign` alignment
2024-11-01 10:19:05 +02:00
84713613be CUDA: fix 1D im2col, add tests (ggml/993) 2024-11-01 10:19:05 +02:00
ded89c9d08 Fix cann compilation error (llama/9891)
Fix cann compilation error after merging llama.cpp supports dynamically loadable backends.
2024-11-01 10:19:05 +02:00
042e95d92f Vectorize load instructions in dmmv f16 CUDA kernel (llama/9816)
* Vectorize load instructions in dmmv f16 CUDA kernel

Replaces scalar with vector load instructions, which substantially
improves performance on NVIDIA HBM GPUs, e.g. gives a 1.27X overall
speedup for Meta-Llama-3-8B-Instruct-F16 BS1 inference evaluation on
H100 SXM 80GB HBM3. On GDDR GPUs, there is a slight (1.01X) speedup.

* addressed comment

* Update ggml/src/ggml-cuda/dmmv.cu

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2024-11-01 10:19:05 +02:00
81110c0174 ggml : move more prints to the ggml log system (llama/9839)
* ggml : move more prints to the ggml log system

* show BLAS OpenMP warnings in all builds using debug print
2024-11-01 10:19:05 +02:00
c313723860 rpc : add backend registry / device interfaces (llama/9812)
* rpc : add backend registry / device interfaces

* llama : add llama_supports_rpc API

* ggml_backend_rpc_start_rpc_server -> ggml_backend_rpc_start_server
2024-11-01 10:19:05 +02:00
e69b2371e2 musa: add docker image support (llama/9685)
* mtgpu: add docker image support

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* mtgpu: enable docker workflow

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-11-01 10:19:05 +02:00
1531259b2c ggml : fix BLAS with unsupported types (llama/9775)
* ggml : do not use BLAS with types without to_float

* ggml : return pointer from ggml_internal_get_type_traits to avoid unnecessary copies

* ggml : rename ggml_internal_get_type_traits -> ggml_get_type_traits

it's not really internal if everybody uses it
2024-11-01 10:19:05 +02:00
44bc2767fd ggml : add backend registry / device interfaces to BLAS backend (llama/9752)
* ggml : add backend registry / device interfaces to BLAS backend

* fix mmap usage when using host buffers
2024-11-01 10:19:05 +02:00
bd7ace7adc Update building for Android (llama/9672)
* docs : clarify building Android on Termux

* docs : update building Android on Termux

* docs : add cross-compiling for Android

* cmake : link dl explicitly for Android
2024-11-01 10:19:05 +02:00
315364d7de ggml : add metal backend registry / device (llama/9713)
* ggml : add metal backend registry / device

ggml-ci

* metal : fix names [no ci]

* metal : global registry and device instances

ggml-ci

* cont : alternative initialization of global objects

ggml-ci

* llama : adapt to backend changes

ggml-ci

* fixes

* metal : fix indent

* metal : fix build when MTLGPUFamilyApple3 is not available

ggml-ci

* fix merge

* metal : avoid unnecessary singleton accesses

ggml-ci

* metal : minor fix [no ci]

* metal : g_state -> g_ggml_ctx_dev_main [no ci]

* metal : avoid reference of device context in the backend context

ggml-ci

* metal : minor [no ci]

* metal : fix maxTransferRate check

* metal : remove transfer rate stuff

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-11-01 10:19:05 +02:00
80753d4da8 metal : single allocation of encode_async block (llama/9747)
* Single allocation of encode_async block with non-ARC capture in ggml-metal.m

* Moving Block_release to the deallocation code

* Release encode block when re-setting encoding buffer count if needed

* Update ggml/src/ggml-metal.m

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-11-01 10:19:05 +02:00
8f9bdca4c4 ggml-alloc : remove buffer_id from leaf_alloc (ggml/987)
This commit removes the buffer_id field from the leaf_alloc struct.

The motivation for is that this field is only written to and never
read/used as far as I can tell. Each tensor_alloc has a buffer_id field
and this is what caused me to look into this more closely, to
understand what the buffer_id in leaf_alloc was used for.
2024-11-01 10:19:05 +02:00
4e10afb5a9 scripts : sync amx 2024-10-31 22:13:24 +02:00
aa037a60f3 ggml : alloc ggml_contexts on the heap (#2525)
* whisper : reduce ggml_context usage

* ggml : allocate contexts on the heap (v2)

* ggml : aligned malloc -> malloc
2024-10-31 22:00:09 +02:00
19dca2bb14 ci : fix openblas build (#2511)
* ci : fix openblas build

* cont : would this work?

* ci : I'm sorry, windows

* cont : disabled wrong build

* ci : fix openblas build with pkgconfiglite (#2517)

- choco install pkgconfiglite (vcpkg-pkgconf doesn't contain pkg-config executable?)
- vcpkg install openblas (otherwise it is not detected now)

---------

Co-authored-by: Tamotsu Takahashi <ttakah+github@gmail.com>
2024-10-30 12:58:26 +02:00
55e422109b scripts : add turbo-q8_0 to the benchmark 2024-10-29 19:37:24 +02:00
3f020fac9d whisper : minor compile warning 2024-10-29 19:30:26 +02:00
1626b73b03 whisper : move new-segment callback after DTW step (#2515) 2024-10-29 08:47:21 +02:00
850f7b19d3 ruby : fix installation test (#2519) 2024-10-29 08:45:37 +02:00
d4bc413505 ruby : add more APIs (#2518)
* Add test for built package existence

* Add more tests for Whisper::Params

* Add more Whisper::Params attributes

* Add tests for callbacks

* Add progress and abort callback features

* [skip ci] Add prompt usage in README

* Change prompt text in example
2024-10-28 19:23:23 +02:00
fc49ee4479 ruby : support new-segment callback (#2506)
* Add Params#new_segment_callback= method

* Add tests for Params#new_segment_callback=

* Group tests for #transcribe

* Don't use static for thread-safety

* Set new_segment_callback only when necessary

* Remove redundant check

* [skip ci] Add Ruby version README

* Revert "Group tests for #transcribe"

This reverts commit 71b65b00cc.

* Revert "Add tests for Params#new_segment_callback="

This reverts commit 81e6df3bab.

* Add test for Context#full_n_segments

* Add Context#full_n_segments

* Add tests for lang API

* Add lang API

* Add tests for Context#full_lang_id API

* Add Context#full_lang_id

* Add abnormal test cases for lang

* Raise appropriate errors from lang APIs

* Add tests for Context#full_get_segment_t{0,1} API

* Add Context#full_get_segment_t{0,1}

* Add tests for Context#full_get_segment_speaker_turn_next API

* Add Context#full_get_segment_speaker_turn_next

* Add tests for Context#full_get_segment_text

* Add Context#full_get_setgment_text

* Add tests for Params#new_segment_callback=

* Run new segment callback

* Split tests to multiple files

* Use container struct for new segment callback

* Add tests for Params#new_segment_callback_user_data=

* Add Whisper::Params#new_user_callback_user_data=

* Add GC-related test for new segment callback

* Protect new segment callback related structs from GC

* Add meaningful test for build

* Rename: new_segment_callback_user_data -> new_segment_callback_container

* Add tests for Whisper::Segment

* Add Whisper::Segment and Whisper::Context#each_segment

* Extract c_ruby_whisper_callback_container_allocate()

* Add test for Whisper::Params#on_new_segment

* Add Whisper::Params#on_new_egment

* Assign symbol IDs to variables

* Make extsources.yaml simpler

* Update README

* Add document comments

* Add test for calling Whisper::Params#on_new_segment multiple times

* Add file dependencies to GitHub actions config and .gitignore

* Add more files to ext/.gitignore
2024-10-28 15:43:27 +02:00
c0ea41f6b2 ruby : add Metal support (#2516) 2024-10-28 13:08:09 +02:00
0fbaac9c89 whisper : fix index overflow in token-level timestamp logic (#2505) 2024-10-23 15:14:03 +03:00
a5abfe6a90 readme : update links and make commands (#2489)
* Update links to headers in README.md

* Add link to Vulkan section in README.md

* Add "-j" for parallelism for "make" in README.md

* Update README.md
2024-10-17 13:25:18 +03:00
d3f7137cc9 ruby : fix bindings (#2484)
* Improve Rakefile

* Remove intermediate files

* Remove unnecessary manipulations from extconf.rb

* Add README and LINCENSE to source files

* Manage ext source files using YAML file

* Use extsources.yaml to include files into gem package file

* Add git-managed source files to build dependency

* Add test task

* Download model for test if not exists

* Add test for build

* Ignore gem package directory

* Enable GitHub action for Ruby binding

* Fix model name

* Build lib file for test

* Use extension for each platform

* Use extension for each platform on testing

* Move built lib file rather than copy

* Add intermediate files to clean targets
2024-10-16 18:44:04 +03:00
f7c99e49b3 readme : add Vulkan notice (#2488)
* Add Vulkan notice in README.md

* Fix formatting for Vulkan section in README.md

* Fix formatting in README.md
2024-10-16 18:43:26 +03:00
1d5752fa42 make : fix GGML_VULKAN=1 build (#2485) 2024-10-16 18:42:47 +03:00
b6049060dd whisper : add dtw preset for large-v3-turbo (#2481) 2024-10-15 21:00:21 +03:00
06a1da9daf convert : handle max_target_positions (#2477)
as needed eg for
https://huggingface.co/primeline/whisper-large-v3-turbo-german/blob/main/config.json
2024-10-14 10:46:33 +03:00
746d173592 readme : update the Quick Start section (#2475)
navigating into the directory
2024-10-14 10:44:57 +03:00
fdbfb460ed whisper : add OpenVINO init with state (#2464)
* Fixed OpenVino init on state

* Removed an empty line

* Fixed typo

* Replaced tabs with spaces

---------

Co-authored-by: Sandro Hanea <sandrohanea@users.noreply.github.com>
2024-10-08 20:08:00 +03:00
ebca09a3d1 release : v1.7.1 2024-10-07 13:06:48 +03:00
9f346d0084 vulkan : retry allocation with fallback flags (#2451)
Co-authored-by: Samuel Morris <samuel.morris@artlist.io>
2024-10-06 10:34:20 +03:00
6a94163b91 release : v1.7.0 2024-10-05 16:43:26 +03:00
8a35b58c4f scripts : bench v3-turbo 2024-10-05 16:22:53 +03:00
1789abca84 whisper : remove mel leftover constants (396089f) 2024-10-05 16:13:03 +03:00
847f94fdeb whisper : zero-out the KV cache upon clear (#2445) 2024-10-05 15:23:51 +03:00
6e40108a59 objc : fix build 2024-10-05 15:23:51 +03:00
1ba185f4af metal : zero-init buffer contexts (#0) 2024-10-05 15:23:51 +03:00
396089f3cf whisper : revert mel-related changes (#0)
too much extra logic and complexity for small benefit
2024-10-05 15:23:51 +03:00
941912467d whisper : adapt to latest ggml (skip) (#0) 2024-10-05 15:23:51 +03:00
0b1b094a67 ggml : fix typo in example usage ggml_gallocr_new (ggml/984) 2024-10-05 15:23:51 +03:00
40e52a76b9 ggml : fixes after sync (ggml/983)
ggml : remove test-backend-buffer

ggml : fix CUDA build warnings
2024-10-05 15:23:51 +03:00
cf977670e6 ggml-backend : add device and backend reg interfaces (llama/9707)
Also:

- metal : fix compute pass descriptor autorelease crash
- ggml-backend : add device description to CPU backend
- ggml: unify backend logging mechanism
2024-10-05 15:23:51 +03:00
df2c364de7 Fixed dequant precision issues in Q4_1 and Q5_1 (llama/9711) 2024-10-05 15:23:51 +03:00
1acfadb721 ggml-backend : add device and backend reg interfaces (llama/9707)
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2024-10-05 15:23:51 +03:00
ea642144d2 Initial cmake support of SYCL for AMD GPUs (llama/9658)
sycl: initial cmake support of SYCL for AMD GPUs
2024-10-05 15:23:51 +03:00
282a8654c4 vulkan : do not use tensor->extra (llama/9407)
* vulkan : do not use tensor->extra

This patch allows using the Vulkan backend with the RPC backend as
tensor->extra is no longer used.

Ref: #8536

* Adapt GGML_VULKAN_CHECK_RESULTS to extra removal (llama/2)

---------

Co-authored-by: 0cc4m <picard12@live.de>
2024-10-05 15:23:51 +03:00
936cf3beb7 ggml/ex: calculate accuracy in graph, adapt MNIST (ggml/980) 2024-10-05 15:23:51 +03:00
bc92c2f8f0 ggml: refactor cross entropy loss CPU impl. (ggml/976) 2024-10-05 15:23:51 +03:00
f7d55e0614 scripts : sync ggml-backend.cpp 2024-10-05 15:23:51 +03:00
f62a546e03 whisper : fix excessive memory usage (#2443)
* whisper : fix KV cache allocation

* whisper : reduce memory overhead from unused input tensors
2024-10-05 12:36:40 +03:00
2944cb72d9 examples : update dr_wav.h to newer version (#2449) 2024-10-04 11:04:51 +03:00
ccc2547210 talk-llama : sync llama.cpp 2024-10-03 12:22:17 +03:00
162a455402 metal : reduce command encoding overhead (llama/9698) 2024-10-03 12:22:17 +03:00
ff2cb0811f sync : ggml 2024-10-03 12:22:17 +03:00
5e9d6baa48 test: fix OPT_STEP_ADAMW for test-backend-ops (ggml/974) 2024-10-03 12:22:17 +03:00
845f8d663e vulkan : mul_mat: fix UB with small warps (ggml/952)
When the device's warp size is less than 16,
it is possible for loadstride_a (mul_mm.comp:114)
and loadstride_b (mul_mm.comp:115) to be set to 0.
Because they are calculated as: the workgroup size,
multiplied by LOAD_VEC_* (which can be 1) and divided by 16.
And the workgroup size is set to be the same as the
warp/subgroup size.

The loadstride_* variables are used as increments in the
loops that populate the buffers used for the multiplication.

When they are 0 they cause an infinite loop.
But infinite loops without side-effects are UB and the
values of loadstride_* are known at compile time.
So, the compiler quietly optimizes all the loops away.
As a consequence, the buffers are not populated and
the multiplication result is just a matrix with all elements
set to 0.

We prevent the UB by making sure that the workgroup size
will never be less than 16, even if our device has a
smaller warp size (e.g. 8).

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
2024-10-03 12:22:17 +03:00
31fdf05fda ggml : fix ggml_cast (ggml/973) 2024-10-03 12:22:17 +03:00
0ac6666cd2 ggml: fix gradient allocation logic (ggml/966)
* ggml: fix gradient allocation logic

* gradient allocation in ggml_build_backward_expand

* fixup

* fix test-backend-ops grad

* suggestions by slaren

* fix test1.c

* fix legacy opt API

* fix test-grad0

* remove keep arg
2024-10-03 12:22:17 +03:00
6c91da80b8 ggml : define missing HWCAP flags (llama/9684)
ggml-ci

Co-authored-by: Willy Tarreau <w@1wt.eu>
2024-10-03 12:22:17 +03:00
c245168ba3 ggml : add run-time detection of neon, i8mm and sve (llama/9331)
* ggml: Added run-time detection of neon, i8mm and sve

Adds run-time detection of the Arm instructions set features
neon, i8mm and sve for Linux and Apple build targets.

* ggml: Extend feature detection to include non aarch64 Arm arch

* ggml: Move definition of ggml_arm_arch_features to the global data section
2024-10-03 12:22:17 +03:00
280fee8fa0 Enable use to the rebar feature to upload buffers to the device. (llama/9251) 2024-10-03 12:22:17 +03:00
78b4c1c25f mtgpu: enable VMM (llama/9597)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-10-03 12:22:17 +03:00
1edea2eb4b ggml : remove assert for AArch64 GEMV and GEMM Q4 kernels (llama/9217)
* ggml : remove assert for AArch64 GEMV and GEMM Q4 kernels

* added fallback mechanism when the offline re-quantized model is not
optimized for the underlying target.

* fix for build errors

* remove prints from the low-level code

* Rebase to the latest upstream
2024-10-03 12:22:17 +03:00
96808786b7 cann: fix crash when llama-bench is running on multiple cann devices (llama/9627) 2024-10-03 12:22:17 +03:00
bb57ecb85e CUDA: remove bad assert (ggml/972) 2024-10-03 12:22:17 +03:00
abdb73c7cc vulkan : multithread pipeline creation (ggml/963) 2024-10-03 12:22:17 +03:00
391e548a43 vulkan : fix build for GGML_VULKAN_RUN_TESTS, add TFLOPS to log (ggml/961) 2024-10-03 12:22:17 +03:00
2a29afd4c6 vulkan : argsort barriers must be under uniform control flow (ggml/951)
a return before a barrier (that happens only in some threads in
a workgroup) leads to UB.
While the old code actually works on some devices,
it fails on some others (i.e. "smaller" GPUs).

BTW, I think it would be better to set specialization constants
when the graph is built, in that way the local workgroup
could be sized appropriately.
But it would take a lot of work.

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
2024-10-03 12:22:17 +03:00
5963004ff9 ggml : fix GGML_MAX_N_THREADS + improve formatting (ggml/969) 2024-10-03 12:22:17 +03:00
ede1718f6d server : ffmpeg overwrite leftover temp file (#2431)
* Remove possible leftover ffmpeg temp file from a previous failed conversion

* Revert "Remove possible leftover ffmpeg temp file from a previous failed conversion"

This reverts commit 00797403bd.

* Flag to force ffmpeg to overwrite output file if it exists
2024-10-02 15:06:40 +03:00
2ef717b293 whisper : add large-v3-turbo (#2440) 2024-10-01 15:57:06 +03:00
8feb375fbd tests : remove test-backend-ops (#2434) 2024-09-27 11:49:01 +03:00
69339af2d1 ci : disable failing CUDA and Java builds 2024-09-25 10:05:04 +03:00
0d2e2aed80 readme : fix references to download-ggml-model.sh (#2427)
The script itself has a hashbang indicating that it is a shell script,
but the README indicates that it must be executed with `bash`.

I checked the script itself, and it seems to be valid POSIX shell. I can
confirm that it works with busybox sh.

Clarify the reference on the README, so it is clear that bash is not
actually a dependency for this script.
2024-09-24 21:07:51 +03:00
451e9ee92c make : remove "talk" target until updated 2024-09-24 19:45:08 +03:00
1133ac98a8 ggml : add ggml-cpu-impl.h (skip) (#0) 2024-09-24 19:45:08 +03:00
76d27eec9a sync : ggml 2024-09-24 19:45:08 +03:00
fe18c29ab8 talk-llama : sync llama.cpp 2024-09-24 19:45:08 +03:00
234f9bd320 ggml : add AVX512DQ requirement for AVX512 builds (llama/9622) 2024-09-24 19:45:08 +03:00
3b183cfae7 log : add CONT level for continuing previous log entry (llama/9610) 2024-09-24 19:45:08 +03:00
02285dff81 threads: fix msvc build without openmp (llama/9615)
We're missing atomic_thread_fence() in MSVC builds when openmp is disabled.
2024-09-24 19:45:08 +03:00
2fc1d20f9e cuda: add q8_0->f32 cpy operation (llama/9571)
llama: enable K-shift for quantized KV cache
It will fail on unsupported backends or quant types.
2024-09-24 19:45:08 +03:00
08e8414f27 threads: improve ggml_barrier scaling with large number of threads (llama/9598)
Make sure n_barrier and n_barrier_passed do not share the cache line to avoid cache line bouncing.
This optimization shows performance improvements even for n_threads <= 8 cases.

Resurect TSAN (Thread Sanitizer) check so that we can avoid doing expensive read-modify-write
in the normal case and just use thread-fence as originally intended.
2024-09-24 19:45:08 +03:00
05c6139625 ggml : AVX512 gemm for Q4_0_8_8 (llama/9532)
* AVX512 version of ggml_gemm_q4_0_8x8_q8_0

* Remove zero vector parameter passing

* Rename functions and rearrange order of macros

* Edit commments

* style : minor adjustments

* Update x to start from 0

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-09-24 19:45:08 +03:00
896c41ef30 metal : use F32 prec for K*Q in vec FA (llama/9595)
ggml-ci
2024-09-24 19:45:08 +03:00
c36ddc43c6 Revert "[SYCL] fallback mmvq (ggml/9088)" (llama/9579)
This reverts commit 50addec9a532a6518146ab837a85504850627316.
2024-09-24 19:45:08 +03:00
13f41af43e musa: enable building fat binaries, enable unified memory, and disable Flash Attention on QY1 (MTT S80) (llama/9526)
* mtgpu: add mp_21 support

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* mtgpu: disable flash attention on qy1 (MTT S80); disable q3_k and mul_mat_batched_cublas

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* mtgpu: enable unified memory

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* mtgpu: map cublasOperation_t to mublasOperation_t (sync code to latest)

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-09-24 19:45:08 +03:00
3fc5306b82 Fix merge error in #9454 (llama/9589)
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-09-24 19:45:08 +03:00
adf2474b10 CUDA: enable Gemma FA for HIP/Pascal (llama/9581) 2024-09-24 19:45:08 +03:00
008816a257 RWKV v6: RWKV_WKV op CUDA implementation (llama/9454)
* ggml: CUDA unary op EXP

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* ggml: rwkv_wkv op CUDA impl

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

---------

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-09-24 19:45:08 +03:00
33e5a6612e ggml-alloc : fix list of allocated tensors with GGML_ALLOCATOR_DEBUG (llama/9573) 2024-09-24 19:45:08 +03:00
f0a7d65b3d Update CUDA graph on scale change plus clear nodes/params (llama/9550)
* Avoid using saved CUDA graph if scale changes and reset nodes/params on update

Fixes https://github.com/ggerganov/llama.cpp/issues/9451

* clear before resize
2024-09-24 19:45:08 +03:00
54e5095765 examples : adapt to ggml.h changes (ggml/0)
ggml-ci
2024-09-24 19:45:08 +03:00
34291099fb ggml : refactoring (llama/#0)
- d6a04f87
- 23e0d70b
2024-09-24 19:45:08 +03:00
d245d7aec7 ggml : fix builds (llama/0)
ggml-ci
2024-09-24 19:45:08 +03:00
d661283e68 ggml : fix trailing whitespace (llama/0)
ggml-ci
2024-09-24 19:45:08 +03:00
c0761c95f5 CUDA: fix sum.cu compilation for CUDA < 11.7 (llama/9562) 2024-09-24 19:45:08 +03:00
138e20b697 ggml : fix n_threads_cur initialization with one thread (llama/9538)
* ggml : fix n_threads_cur initialization with one thread

* Update ggml/src/ggml.c

---------

Co-authored-by: Max Krasnyansky <quic_maxk@quicinc.com>
2024-09-24 19:45:08 +03:00
a8d9abfa22 threadpool : skip polling for unused threads (llama/9461)
* threadpool: skip polling for unused threads

Currently all threads do N polling rounds even if only 1 thread is active (n_threads_cur == 1).
This commit adds a check to skip the polling for unused threads (ith >= n_threads_cur).

n_threads_cur is now an atomic_int to explicitly tell thread sanitizer that it is written
from one thread and read from other threads (not a race conditions).

* threadpool: further simplify and improve ggml_barrier

Avoid using strict memory order while polling, yet make sure that all threads go through
full memory barrier (memory fence) on ggml_barrier entrace and exit.

* threads: add simple barrier test

This test does lots of small, parallel matmul ops where the barriers in between dominate the overhead.

* threadpool: improve thread sync for new-graphs

Using the same tricks as ggml_barrier. All the polling is done with relaxed memory order
to keep it efficient, once the new graph is detected we do full fence using read-modify-write
with strict memory order.

* threadpool: improve abort handling

Do not use threadpool->ec (exit code) to decide whether to exit the compute loop.
threadpool->ec is not atomic which makes thread-sanitizer rightfully unhappy about it.

Instead introduce atomic threadpool->abort flag used for this. This is consistent with
how we handle threadpool->stop or pause.

While at it add an explicit atomic_load for n_threads_cur for consistency.

* test-barrier: release threadpool before releasing the context

fixes use-after-free detected by gcc thread-sanitizer on x86-64
for some reason llvm sanitizer is not detecting this issue.
2024-09-24 19:45:08 +03:00
195afd6dc1 ggml : link MATH_LIBRARY not by its full path (llama/9339) 2024-09-24 19:45:08 +03:00
1fd78999e8 cmake : do not hide GGML options + rename option (llama/9465)
* cmake : do not hide GGML options

ggml-ci

* build : rename flag GGML_CUDA_USE_GRAPHS -> GGML_CUDA_GRAPHS

for consistency

ggml-ci
2024-09-24 19:45:08 +03:00
Eve
374e9e0c5e ggml : IQ4_NL sgemm + Q4_0 AVX optimization (llama/9422)
* squashed

readd my iq4_nl sgemm PR https://github.com/ggerganov/llama.cpp/pull/8049

have ggml_vec_dot_q4_0 do two blocks per loop for avx

try out f16c ggml_vec_dot_iq4_nl, but it's not really faster. as per https://github.com/ggerganov/llama.cpp/pull/8549 we can calculate several blocks at a time with no issue

* shuffle

* remove f16c iq4_nl as i cant make it faster than before
2024-09-24 19:45:08 +03:00
a2cb5b4183 metal : handle zero-sized allocs (llama/9466) 2024-09-24 19:45:08 +03:00
288ae5176e common : reimplement logging (llama/9418)
https://github.com/ggerganov/llama.cpp/pull/9418
2024-09-24 19:45:08 +03:00
d868122a5a cmake : correct order of sycl flags (llama/9497) 2024-09-24 19:45:08 +03:00
2ba25fb122 cmake : try to fix sycl+intel build (llama/9487) 2024-09-24 19:45:08 +03:00
4f4687cb74 ggml : ggml_type_name return "NONE" for invalid values (llama/9458)
When running on Windows, the quantization utility attempts to print the types that are not set which leads to a crash.
2024-09-24 19:45:08 +03:00
66b00fad0d cmake : use list(APPEND ...) instead of set() + dedup linker (llama/9463)
* cmake : use list(APPEND ...) instead of set() + dedup linker

ggml-ci

* cmake : try fix sycl

* cmake : try to fix sycl 2

* cmake : fix sycl build (llama/9469)

* try fix sycl build

* use CMAKE_CXX_FLAGS as a string variable

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* one more CMAKE_CXX_FLAGS fix (llama/9471)

---------

Co-authored-by: Michael Podvitskiy <podvitskiymichael@gmail.com>
2024-09-24 19:45:08 +03:00
c6cc8d16c3 cann: Add host buffer type for Ascend NPU (llama/9406)
* feat: Add host buffer type for Ascend NPU(CANN backend)

* fix some checking errors

* Add a few comments
2024-09-24 19:45:08 +03:00
3f8f8a78a2 riscv : modify Makefile and add a RISCV_VECT to print log info (llama/9442)
- Added ggml_cpu_has_riscv_v() in GGML to print system info in log
- Modified Makefile to only use flag when cross compiling for RISC-V
2024-09-24 19:45:08 +03:00
3e47686919 cann: Fix error when running a non-exist op (llama/9424) 2024-09-24 19:45:08 +03:00
a53b69a003 CUDA: fix --split-mode row race condition (llama/9413) 2024-09-24 19:45:08 +03:00
d1c9b47360 musa: remove Clang builtins mapping (llama/9421)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-09-24 19:45:08 +03:00
32f659861a sycl : update support conditions (llama/9394)
* sycl : update support condition to im2col

Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>

* Added TODO to remind supporting FP32 im2col

---------

Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>
2024-09-24 19:45:08 +03:00
a785232bf9 metal : fix compile warning with GGML_METAL_NDEBUG (llama/0) 2024-09-24 19:45:08 +03:00
0677293503 rpc : fix segfault with nkvo (llama/9389)
* rpc : fix nkvo

* rpc : buf_size must not be static

ref: #9337

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-09-24 19:45:08 +03:00
1fbdb813c0 ggml : vector length agnostic SVE support (llama/9290)
* Implemented vector length agnostic SVE using switch case for 512-bit, 256-bit, 128-bit vector lengths

* Implemented vector length agnostic SVE using switch case for 512-bit, 256-bit, 128-bit vector lengths

* Removed WhiteSpaces

* ggml : style changes + fix 512-bit nb loop check

- fix local scope in switch cases
- consistent predicate names
- empty lines when necessary
- opening braces, spaces
- const-correctness
- add asserts

* Update ggml/src/ggml-quants.c

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-09-24 19:45:08 +03:00
67725ac8f3 CUDA: fix variable name conflict for Windows build (llama/9382) 2024-09-24 19:45:08 +03:00
dac89af357 Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend by submitting smaller cmdbuffers early. (llama/9118)
* Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend by submitting smaller cmdbuffers early.

* fix compile issues

* Fix issues where the last submit wasn't executed or handled properly.

* remove trailing whitespace

* Repair GGML_VULKAN_CHECK_RESULTS

* Increase submit counter only if actual work has been submitted and increase submit count to 100.

* Fix some nodes are not checked with GGML_VULKAN_CHECK_RESULTS enabled.
2024-09-24 19:45:08 +03:00
26225f1fb0 cuda : fix FA Q src index (1 -> 0) (llama/9374) 2024-09-24 19:45:08 +03:00
3468983315 add check malloc result on device (llama/9346)
* add check malloc result on device

* update for review comments, check all malloc_device() result

---------

Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
2024-09-24 19:45:08 +03:00
c7515b0995 ggml/examples: add backend support for numerical optimization (ggml/949)
* CUDA eval works

* stochastic gradient descent op

* Adam except decay

* CUDA CROSS_ENTROPY_LOSS_BACK

* CUDA mnist-fc training works

* backend CLI arg

* refactor gguf load

* remove sched from opt_step_adam

* implement l1 regularization (weight decay)

* extra call to add optimizer

* initialize gradients with ggml_graph_reset

* gradient accumulation

* increment iter per eval instead of epoch

* adjust backend interfaces

* fix ggml_graph_reset without backend

* fix ggml graph export/import

* fixup

* rename

* revert ggml_opt changes

* more general CUDA repeat_back

* update documentation, fix CNN

* validation split

* add clarifying comment

* optimize PyTorch training

* adjust buffer size, thread count

* fix 0.0f validation split

* Update examples/mnist/mnist-common.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* fix gradient accumulation

* tensor flag for accumulators -> tensor hash set

* Update include/ggml.h

Co-authored-by: slaren <slarengh@gmail.com>

* Update tests/test-backend-ops.cpp

Co-authored-by: slaren <slarengh@gmail.com>

* Update tests/test-backend-ops.cpp

Co-authored-by: slaren <slarengh@gmail.com>

* fix test prints

* Update src/ggml-backend.c

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* better CUDA support for noncontiguous out_prod

* add comment

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>
2024-09-24 19:45:08 +03:00
253ce30004 examples : add null threadpool args where needed (ggml/0)
ggml-ci
2024-09-24 19:45:08 +03:00
03a6fae484 metal : update support condition for im2col + fix warning (llama/0) 2024-09-24 19:45:08 +03:00
d37fd275fd ggml : always check bounds on get_rows operations (llama/9354) 2024-09-24 19:45:08 +03:00
195877fd72 ggml : fix missing cpu_set_t on emscripten (llama/9336)
* ggml : fix missing cpu_set_t on emscripten

* better version

* bring back android part
2024-09-24 19:45:08 +03:00
9e715e1b96 Improve Vulkan shader build system (llama/9239)
* Improve Vulkan shader builds system

- Add dependency to vulkan-shaders-gen to rebuild shaders when changing the shader compilation utility.
- Add option to generate debug info for Vulkan shaders to provide shader source to Vulkan shader profiling tools

* remove not required self dependency
2024-09-24 19:45:08 +03:00
6f5514b6e2 ggml-quants : ternary packing for TriLMs and BitNet b1.58 (llama/8151)
* ggml-quants : 1.625 bpw ternary packing for BitNet 1.58b

* ggml-quants : faster 1.625 bpw AVX2 vec_dot

Not using a lookup table anymore makes it match q4_0 speed.

* gguf-py : fix formatting

* llama : remove spaces on empty line

* ggml-quants : subtract 1 when back in epi8

This makes the 1.625 bpw type go faster than q4_0. Still not the fastest.

* ggml-quants : Q2_2 now faster than Q4_K on with AVX2

* ggml-quants : cleanup Q1_3 code formatting

* ggml-quants : ARM NEON vec_dot for q2_2 and q1_3

* ggml-quants : use ceiling division when quantizing q1_3

* convert-hf : simplify BitNet pre-quantization

This still results in the exact same tensor weights and scales,
but it reveals some weirdness in the current algorithm.

* convert-hf : allow converting the weird BitNet 1.3B

Its FFN size is 5460 which is not convenient.
The offending tensors are kept in F16,
which makes the final model 5.01 bpw.

* bitnet : replace 1.58b with b1.58, as in the paper

* ggml-quants : fix build failure on Windows

* ggml-quants : attempt to fix Arm 32-bit support

* ggml : add some informative comments in q1_3 vec_dot

* ggml : add TQ1_0 and TQ2_0 ternary quantization types

* ggml : even faster TQ2_0

* ggml : also faster TQ1_0

Same optimization as for TQ2_0 by offsetting the sum instead of the weights.
This makes TQ1_0 almost as fast as Q8_0 on AVX2.

* ggml : fix build issues in certain environments

* ggml : add NEON vec_dot implementation for TQ1_0 and TQ2_0

* ggml : avoid directly using vmlal_high_s8, for 32-bit ARM compat

The compiler seems smart enough to use the same instruction
even when using vget_high_s8 instead.

* ggml : remove q1_3 and q2_2

No more 1.625 bpw and 2.000 bpw,
now instead using 1.6875 bpw and 2.0625 bpw
with TQ1_0 and TQ2_0, respectively.

* llama : remove the separate scale tensors of BitNet b1.58

They won't be needed, since the remaining ternary quant types have
built-in scales.

* ggml-quants : rename fields of TQ1_0 and TQ2_0 structs for consistency

* ggml-quants : allow using vdotq_s32 in TQ2_0 vec_dot

Not yet tested on hardware which supports it,
might not work or might not even compile. But also it might.
It should make the performance better on recent ARM CPUs.

* ggml-quants : remove comment about possible format change of TQ2_0

Making it slightly more convenient for AVX512
but less convenient for everything else is not worth the trouble.

* gguf-py : Numpy (de)quantization for TQ1_0 and TQ2_0

* ggml-quants : use roundf instead of nearest_int for TQ1_0 and TQ2_0

This does not change anything for ternary models,
since their values should never end up being in halfway cases anyway.

* convert : allow direct conversion to TQ1_0 and TQ2_0

The token embeddings and output tensors are kept in F16
to allow quantizing them to Q4_K and Q6_K with llama-quantize.

* llama : handle fallback for TQ1_0 and TQ2_0 with Q4_0

Q4_0 is not completely symmetric (so not lossless for ternary models),
but it should be good enough.

* ggml-quants : allow using ARM dot product instructions for TQ1_0

* ggml-quants : deduplicate TQ1_0 and TQ2_0 __ARM_FEATURE_DOTPROD support

* ggml : remove unused ggml_mul special case

It would otherwise conflict with the more general
optimization coming with Mamba-2.

* ggml : handle TQ1_0 and TQ2_0 in dequantization-based operators

* test-backend-ops : add TQ1_0 and TQ2_0 comments for later

Not yet adding uncommented, because some backends like SYCL and Metal
do not properly handle unknown types in supports_op for GGML_OP_MUL_MAT.
(and Metal also doesn't handle it with GGML_OP_GET_ROWS)
Support for TQ1_0 and TQ2_0 for other backends than CPU
will be added in follow-up pull requests.
2024-09-24 19:45:08 +03:00
709a22b92d cuda : fix defrag with quantized KV (llama/9319) 2024-09-24 19:45:08 +03:00
01e214a1d7 ggml : AVX2 support for Q4_0_8_8 (llama/8713)
* Add AVX2 based implementations for quantize_q8_0_4x8, ggml_gemv_q4_0_8x8_q8_0 and ggml_gemm_q4_0_8x8_q8_0 functions

* Update code to fix issues occuring due to non alignment of elements to be processed as multiple of 16 in MSVC

* Update comments and indentation

* Make updates to reduce number of load instructions
2024-09-24 19:45:08 +03:00
1cecfe6a02 Fix DMMV dequantization (llama/9279)
Fixed dmmv dequant for ncols== GGML_SYCL_DMMV_X
2024-09-24 19:45:08 +03:00
3764bc974c ggml : add pthread includes on FreeBSD (llama/9258) 2024-09-24 19:45:08 +03:00
fcffc912a9 llama : support RWKV v6 models (llama/8980)
* convert_hf_to_gguf: Add support for RWKV v6

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Add RWKV tokenization

* Fix build

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Do not use special tokens when matching in RWKV tokenizer

* Fix model loading

* Add (broken) placeholder graph builder for RWKV

* Add workaround for kv cache

* Add logits conversion to rwkv5

* Add rwkv5 layer norms

* Add time mix KVRG & correct merge mistake

* Add remaining time mix parameters

* Add time mix output loading

* Add placeholder llm_build_time_mix

* Fix build

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Load more tensors for rwkv v6

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Fix rwkv tokenizer

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* ggml: Add unary operator Exp

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* RWKV v6 graph building

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Add ``rescale_every_n_layers`` parameter

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Add ``wkv.head_size`` key for RWKV

so it doesn't reuse Mamba ssm parameters

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Fix offloading layers to CUDA

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Fix parallel inferencing for RWKV

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Remove trailing whitespaces

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* build_rwkv: Avoid using inplace operations

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* convert_hf_to_gguf: rwkv: Avoid using ``eval``

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* convert_hf_to_gguf: rwkv tokenizer: Don't escape sequences manually

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Update convert_hf_to_gguf.py

Co-authored-by: compilade <git@compilade.net>

* ggml: Add backward computation for unary op ``exp``

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Update convert_hf_to_gguf.py

Co-authored-by: compilade <git@compilade.net>

* Update convert_hf_to_gguf.py

Co-authored-by: compilade <git@compilade.net>

* Use MODEL_ARCH.RWKV6 instead of MODEL_ARCH.RWKV

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* build_rwkv6: Simplify graph

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* llama: rwkv6: Detect model.type

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* llama: rwkv6: Fix tensor loading for 7B/14B models

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* llama: rwkv6: Fix group_norm assertion failure with Metal

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* llama: rwkv6: Clean up

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* llama: rwkv6: Add quantization tensor exclusion

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* llama: rwkv6: Use the new advanced batch splits

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Update src/llama.cpp

Co-authored-by: compilade <git@compilade.net>

* llama: rwkv6: Use ``ggml_norm`` instead of ``ggml_group_norm``

Co-authored-by: compilade <git@compilade.net>

* llama: rwkv6: Apply code style and misc changes

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* converter: Use class name ``Rwkv6Model``

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* llama: rwkv6: Make use of key ``feed_forward_length``

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* llama: rwkv6: Add kv ``time_mix_extra_dim`` and ``time_decay_extra_dim``

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* converter: Match ``new_name`` instead of ``name`` for float32 explicit tensors

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* llama: rwkv6: Keep ``time_mix_w1/w2`` as F32

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* llama: rwkv6: Remove unused nodes

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* llama: rwkv6: Apply code format changes

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* llama: rwkv6: Add lora for some supported tensors

Currently att.key/receptance/value/gate/output, ffn.receptance/key/value, as well as head.weight

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* rwkv : speed-up tokenization using trie

* minor : style + indentation

* llama: rwkv6: Avoid division by zero

Co-authored-by: compilade <git@compilade.net>

* ggml: rwkv_wkv: Avoid copying the state

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

---------

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: Layl Bongers <3094382+LaylBongers@users.noreply.github.com>
Co-authored-by: compilade <git@compilade.net>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-09-24 19:45:08 +03:00
38d40b9972 Threadpool: take 2 (llama/8672)
* Introduce ggml_compute_threadpool

- OpenMP functional: check
- Vanilla ggml functional: Check
- ggml w/threadpool functional: Check
- OpenMP no regression: No glaring problems
- Vanilla ggml no regression: No glaring problems
- ggml w/threadpool no regression: No glaring problems

* Minor fixes

* fixed use after release bug

* fixed a harmless race condition

* Fix Android bulid issue

* fix more race conditions

* fix deadlock for cases where cgraph.n_nodes == 1

and fix --poll case

* threadpool: use cpu_get_num_math to set the default number of threadpool threads

This way we avoid using E-Cores and Hyperthreaded siblings.

* bench: create fresh threadpool for each test

For benchmarking it's better to start a fresh pool for each test with the exact number of threads
needed for that test. Having larger pools is suboptimal (causes more load, etc).

* atomics: always use stdatomics with clang and use relaxed memory order when polling in ggml_barrier

This also removes sched_yield() calls from ggml_barrier() to match OpenMP behavior.

* threadpool: make polling the default to match openmp behavior

All command line args now allow for setting poll to 0 (false).

* threadpool: do not wakeup threads in already paused threadpool

* fix potential race condition in check_for_work

* threadpool: do not create two threadpools if their params are identical

* threadpool: reduce pause/resume/wakeup overhead in common cases

We now start threadpool in paused state only if we have two.
The resume is now implicit (ie new work) which allows for reduced locking and context-switch overhead.

* threadpool: add support for hybrid polling

poll params (--poll, ...) now specify "polling level", i.e. how aggresively we poll before waiting on cond.var.
poll=0 means no polling, 1 means poll for 128K rounds then wait, 2 for 256K rounds, ...

The default value of 50 (ie 50x128K rounds) seems like a decent default across modern platforms.
We can tune this further as things evolve.

* threadpool: reduce the number of barrier required

New work is now indicated with an atomic counter that is incremented for
each new graph that needs to be computed.
This removes the need for extra barrier for clearing the "new_work" and
removes the special case for trivial graphs.

* threadpool: remove special-casing for disposable threadpools

With the efficient hybrid polling there is no need to make disposable pools any different.
This simplifies the overall logic and reduces branching.

Include n_threads in debug print for disposable threadpool.

Declare pause and stop flags as atomic_bool
This doesn't actually generate any memory barriers and simply informs
the thread sanitizer that these flags can be written & read by different
threads without locking.

* threadpool: do not clear barrier counters between graphs computes (fixes race with small graphs)

This fixes the race condition with very small graphs where the main thread happens to
start a new graph while the workers are just about to exit from barriers.

* threadpool: use relaxed order for chunk sync

Full memory barrier is an overkill for this since each thread works on different chunk

* threadpool: remove abort_callback from threadpool state

* threadpool: better naming for thread/cpumask releated functions

* threadpool: consistent use of int type for n_threads params

* threadpool: add support for ggml_threadpool_params_default/init

Also removes the need for explicit mask_specified param.
all-zero cpumask means use default (usually inherited) cpu affinity mask.

* threadpool: move typedef into ggml.h

* threadpool: fix apply_priority() function name

* threadpool: fix swift wrapper errors due to n_threads int type cleanup

* threadpool: enable --cpu-mask and other threadpool related options only if threadpool is enabled

* threadpool: replace checks for compute_thread ret code with proper status check

* threadpool: simplify threadpool init logic and fix main thread affinity application

Most of the init code is now exactly the same between threadpool and openmp.

* threadpool: update threadpool resume/pause function names

* threadpool: enable openmp by default for now

* threadpool: don't forget to free workers state when omp is enabled

* threadpool: avoid updating process priority on the platforms that do not require it

On Windows we need to change overall process priority class in order to set thread priorities,
but on Linux, Mac, etc we do not need to touch the overall process settings.

* threadpool: update calling thread prio and affinity only at start/resume

This avoids extra syscalls for each graph_compute()

* llama-bench: turn threadpool params into vectors, add output headers, etc

* llama-bench: add support for cool off between tests --delay

This helps for long running tests on platforms that are thermally limited (phones, laptops, etc).
--delay (disabled by default) introduces the sleep for N seconds before starting each test.

* threadpool: move process priority setting into the apps (bench and cli)

This avoids changing the overall process priority on Windows for the apps
that use ggml/llama.cpp directy.

* threadpool: move all pause/resume logic into ggml

* threadpool: futher api cleanup and prep for future refactoring

All threadpool related functions and structs use ggml_threadpool prefix.

* threadpool: minor indent fixes

* threadpool: improve setprioty error message

* Update examples/llama-bench/llama-bench.cpp

Co-authored-by: slaren <slarengh@gmail.com>

* threadpool: fix indent in set_threadpool call

* use int32_t for n_thread type in public llama.cpp API

* threadpool: use _new and _free instead of _create and _release

* fix two more public APIs to use int32_t for n_threads

* build: set _GNU_SOURCE for Adroid

---------

Co-authored-by: Max Krasnyansky <quic_maxk@quicinc.com>
Co-authored-by: fmz <quic_fzaghlou@quic.com>
Co-authored-by: Max Krasnyansky <max.krasnyansky@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>
2024-09-24 19:45:08 +03:00
09149ee0ae vulkan: fix compilation with GGML_VULKAN_DEBUG=ON (ggml/948)
the old code was trying to print a non-existent field (size)
and the struct as a whole (which doesn't have a operator<<
override defined).
Probably a typo happened during refactoring.

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
2024-09-24 19:45:08 +03:00
6b7f37dd5c vulkan: add dryrun support to sin and cos ops (ggml/947)
sin and cos failed test-backend-ops because they
tried to dereference a context pointer that is null
on dry runs.

This commit prevents that segfault.

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
2024-09-24 19:45:08 +03:00
791812fb54 vulkan: correctly report support for OP_CONT (ggml/946)
test-backend-ops fails because ggml_cont aborts
when invoked passing an unsupported type.

This commit makes ggml_cont tests pass

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
2024-09-24 19:45:08 +03:00
5d6dc19f04 tests: add gradient tests for all backends (ggml/932)
* tests: add gradient checking to test-backend-ops

* remove old comment

* reorder includes

* adjust SIN/COS parameters

* add documentation, use supports_op if possible
2024-09-24 19:45:08 +03:00
34972dbe22 go : add temperature options (#2417)
* Fixed go cuda bindings building

* Added note to go bindings Readme to build using cuda support

* Added temperature bindings for Go

---------

Co-authored-by: Binozo <entwickler@binozoworks.de>
2024-09-20 15:45:36 +03:00
bea43e0c64 docker : add libsdl2-dev for container builds (#2424)
* Added libsdl2-dev for SDL samples

Building talk-llama seems to fail here as there is no sdl.h.

* Adding libsdl2-dev for sdl.h

* Adding libsdl2-dev for sdl.h
2024-09-20 15:36:43 +03:00
3853d83d73 go : add tests and update bindings (#2425)
Update Go version to ^1.23, the actions/setup-go
to v5, actions/checkout to v4 and
github.com/stretchr/testify to v1.9.0.

Add test cases for the following model
struct methods:
 - New
 - Close
 - NewContext
 - IsMultilingual
 - Languages

Add test cases for the following context
struct methods:
 - SetLanguage
 - IsMultilingual
 - Language
 - Process
2024-09-20 15:36:12 +03:00
5b1ce40fa8 server : use OS-generated temp file name for converted files (#2419) 2024-09-17 15:56:32 +03:00
049b3a0e53 go : fix CUDA build (#2416)
* Fixed go cuda bindings building

* Added note to go bindings Readme to build using cuda support

---------

Co-authored-by: Binozo <entwickler@binozoworks.de>
2024-09-15 12:23:56 +03:00
a551933542 cann : add Ascend NPU instructions (#2410) 2024-09-11 15:59:24 +03:00
5caa19240d cmake: Fix libdir value in pkgconfig file (#2407)
Depending on the OS the lib dir can vary, on Fedora for instance it is
"${prefix}/lib64". Instead of hard-coding the directory name, let CMake fill
this variable for us.
2024-09-07 11:18:17 +03:00
5236f02784 revert : cmake : set MSVC to use UTF-8 on source files (#2346)
This reverts commit c96906d84d.
2024-09-02 15:24:50 +03:00
2abaf19e0d sync : ggml 2024-09-02 15:24:50 +03:00
6eb7a0ffbd ggml: fix ggml_graph_cpy undefined behavior (ggml/943) 2024-09-02 15:24:50 +03:00
e8f0f9b5f0 cann : fix doxy (ggml/0) 2024-09-02 15:24:50 +03:00
d8e24b877d vulkan : fix build (llama/0)
ggml-ci
2024-09-02 15:24:50 +03:00
cc68f31577 cuda : mark BF16 CONT as unsupported 2024-09-02 15:24:50 +03:00
4a4a52bf98 ggml : fix cont with transposed tensors when one dimension is 1 (ggml/934)
* ggml_cont: fix issue with transposed tensors when one dimension is 1

when using multiple threads, it is not enough
to check for the tensors to be contiguous for
ggml_compute_forward_dup_same_cont to work correctly.
The tensors strides also need to match.

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>

* Add ggml_cont tests

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>

* Remove dead code

it isn't possible to reach this code because
all these functions are invoked by ggml_compute_forward_dup
if and only if src0->type != dst->type

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>

* Make ggml_compute_forward_dup_same_cont work with contiguous tensors

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>

---------

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-09-02 15:24:50 +03:00
c96906d84d cmake : set MSVC to use UTF-8 on source files (#2346) 2024-08-30 14:04:04 +03:00
9600fc3eb1 readme : remove invalid flag from Python example (#2396)
* Update README.md

Fix broken C-style API link

* Update whisper_processor.py

Update examples/python/whisper_processor.py to remove nonexistent flag "-np" from subprocess.Popen call.

* Add pywhispercpp to the Pybind11 Python wrapper list

abdeladim-s/pywhispercpp wasn't added to the list / was removed at some point (?)

It was referenced in issue #9, so I feel like it's worthy of being added as it's the first if not one of the first Python wrappers for whisper.cpp
2024-08-30 14:00:38 +03:00
e2e55a6fed readme : fix link (#2394) 2024-08-30 13:58:22 +03:00
c4e1861d2c go : add beamsize/entropythold/maxcontext to context interface (#2350)
* feat(go binding): add beamsize/entropythold/maxcontext to context interface

fixes: #2349

* fix go building build

* fix dynamic link .so and header.h

* remove LD_LIBRARY_PATH

* remove ggml obj from whisper dynamic lib

* drop LIB_GGML
2024-08-28 17:09:01 +03:00
da9809f243 talk-llama : sync llama.cpp 2024-08-28 13:22:20 +03:00
9d754a56cf whisper : update FA call 2024-08-28 13:22:20 +03:00
8cc90a0e80 sync : ggml 2024-08-28 13:22:20 +03:00
82b5c56f63 sync : vulkan (skip) (llama/0) 2024-08-28 13:22:20 +03:00
b2ad484c89 ggml : do not crash when quantizing q4_x_x with an imatrix (llama/9192) 2024-08-28 13:22:20 +03:00
d96a17848f metal : separate scale and mask from QKT in FA kernel (llama/9189)
* metal : separate scale and mask from QKT in FA kernel

* metal : ne01 check no longer necessary

* metal : keep data in local memory
2024-08-28 13:22:20 +03:00
0e7798677a ggml : add SSM Metal kernels (llama/8546)
* ggml : add ggml_ssm_conv metal impl

* ggml : add ssm_scan metal impl

ggml-ci
2024-08-28 13:22:20 +03:00
58a36d2e3b metal : gemma2 flash attention support (llama/9159) 2024-08-28 13:22:20 +03:00
24d8534bd8 CPU/CUDA: Gemma 2 FlashAttention support (llama/8542)
* CPU/CUDA: Gemma 2 FlashAttention support

* apply logit_softcap to scale in kernel

* disable logit softcapping tests on Metal

* remove metal check
2024-08-28 13:22:20 +03:00
9b16ddd3a5 Add a space to supress a cmake warning (llama/9133) 2024-08-28 13:22:20 +03:00
32f88af17b Add oneDNN primitive support (llama/9091)
* add onednn

* add sycl_f16

* add dnnl stream

* add engine map

* use dnnl for intel only

* use fp16fp16fp16

* update doc
2024-08-28 13:22:20 +03:00
9bf7250bf9 llama : simplify Mamba with advanced batch splits (llama/8526)
* llama : advanced batch splits

This includes equal-sequence-length batch splits which are useful
to simplify recurrent model operators.

* llama : always make recurrent state slots contiguous

* ggml : simplify mamba operators

* llama : fix integer signedness mixing

* llama : logits_all has priority over batch->logits

Otherwise, the server embeddings tests failed.
This was likely an existing problem but was only detected here
because of an additional assertion.

* llama : apply suggestions

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* llama : fix t5 segfault

* llama : fix Mamba session save and restore

* llama : minor cosmetic changes

* llama : rename llama_reorder_outputs to llama_output_reorder

Also move it closer to llama_output_reserve.

* llama : fix pooled embeddings when using batches with equal_seqs

* minor : add struct members for clarity

ggml-ci

* llama : fix T5 segfault again

* llama : fix Mamba pooled embeddings with multiple sequences

Until the pooled embeddings are refactored to allow splitting
across ubatches for causal embeddings,
recurrent models can only process a single sequence per ubatch
when calculating pooled embeddings.

* llama : add llama_model_is_recurrent to simplify figuring that out

This will make it easier to more cleanly support RWKV-v6 and Mamba-2.

* llama : fix simple splits when the batch contains embeddings

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-08-28 13:22:20 +03:00
17e49d3ab2 fallback mmvq (llama/9088)
* fallback mmvq to mul_mat

* mmvq in cuda path

* Update ggml/src/ggml-sycl.cpp

Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@codeplay.com>

---------

Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@codeplay.com>
2024-08-28 13:22:20 +03:00
58b725282a Fix SYCL im2col and convert Overflow with Large Dims (llama/9052)
* sycl: fix im2col overflow and sync with cuda

Signed-off-by: zhentaoyu <zhentao.yu@intel.com>

* sycl: fix convert overflow

Signed-off-by: zhentaoyu <zhentao.yu@intel.com>

* sycl: fix convert and dequantize

Signed-off-by: zhentaoyu <zhentao.yu@intel.com>

* sycl: fix ib in dmmv

Signed-off-by: zhentaoyu <zhentao.yu@intel.com>

* sycl:refine convert

Signed-off-by: zhentaoyu <zhentao.yu@intel.com>

* sycl: move downsample global_range into common

Signed-off-by: zhentaoyu <zhentao.yu@intel.com>

* test: add im2col and convert test cases

Signed-off-by: zhentaoyu <zhentao.yu@intel.com>

* test: make new cases only in sycl

Signed-off-by: zhentaoyu <zhentao.yu@intel.com>

* test: comment new test_cases for only local testing

Signed-off-by: zhentaoyu <zhentao.yu@intel.com>

---------

Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
2024-08-28 13:22:20 +03:00
7e59afa1e0 rpc : print error message when failed to connect endpoint (llama/9042) 2024-08-28 13:22:20 +03:00
5ac022140e rpc : prevent crashes on invalid input (llama/9040)
Add more checks which prevent RPC server from crashing if invalid input
is received from client
2024-08-28 13:22:20 +03:00
0eaa67280c ggml : dynamic ggml_sched_max_splits based on graph_size (llama/9047)
* ggml : Dynamic ggml_sched_max_splits based on graph_size

* Fixed and readded debug code for causes
2024-08-28 13:22:20 +03:00
5a62fdb735 cmake : remove unused option GGML_CURL (llama/9011) 2024-08-28 13:22:20 +03:00
60098d6204 ggml : move rope type enum to ggml.h (llama/8949)
* ggml : move rope type enum to ggml.h

This commit moves the `llama_rope_type` enum from `llama.h` to
`ggml.h` and changes its name to `ggml_rope_type`.

The motivation for this change is to address the TODO in `llama.h` and
use the enum in ggml.

Note: This commit does not change the `mode` parameter to be of type
`enum ggml_rope_type`. The name `mode` and its usage suggest that it
might be more generic and possibly used as a bit field for multiple
flags. Further investigation/discussion may be needed to determine
if `mode` should be restricted to RoPE types.

* squash! ggml : move rope type enum to ggml.h

This commit removes GGML_ROPE_TYPE_NONE and GGML_ROPE_TYPE_GLM from
ggml.h, and back the llama_rope_type enum.

I've kept the assert for GGML_ROPE_TYPE_GLM as I'm not sure if it is
safe to remove it yet.

* squash! ggml : move rope type enum to ggml.h

This commit removes the enum ggml_rope_type from ggml.h and replaces it
with a define (GGML_ROPE_TYPE_NEOX). This define is used in the code to
check if the mode is set to GPT-NeoX. Also the enum llama_rope_type has
been updated to reflect this change.

* squash! ggml : move rope type enum to ggml.h

This commit contains a suggestion enable the GGML_ROPE_TYPE_NEOX
macro/define to be passed to the shader compiler.

* squash! ggml : move rope type enum to ggml.h

This commit fixes the editorconfig-checker warnings.

* squash! ggml : move rope type enum to ggml.h

Update comment for ggml_rope function.

* Revert "squash! ggml : move rope type enum to ggml.h"

This reverts commit 6261222bd0dc0efd51f0fb0435ad3f16a5b52fd6.

* squash! ggml : move rope type enum to ggml.h

Add GGML_ROPE_TYPE_NEOX to rope_common.comp.

* remove extra line

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-08-28 13:22:20 +03:00
317293e6a7 ggml: fix div-by-zero (llama/9003)
Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=70724

In order to access the above bug you need to login using one of the
emails in
https://github.com/google/oss-fuzz/blob/master/projects/llamacpp/project.yaml#L3-L5

Signed-off-by: David Korczynski <david@adalogics.com>
2024-08-28 13:22:20 +03:00
488a966c07 Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead. (llama/8943)
* Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead.

- Allocation overhead for the temporary std::vectors was easily detectable with a sampling profiler and simple to remove.
- ggml_vk_sync_buffer introduce a full pipeline sync which has a significant cost on the GPU side, sometimes larger than the actual kernel execution. Adding only barriers for shader read/writes and transfers seems to be sufficient looking at the code which either launches compute kernels or copies tensors.

* Fix small typo

---------

Co-authored-by: 0cc4m <picard12@live.de>
2024-08-28 13:22:20 +03:00
8954769aa2 feat: ref. cross entropy, add CUDA, fix grad test (ggml/929) 2024-08-28 13:22:20 +03:00
df06468d9e ggml: remove bad assert (ggml/928) 2024-08-28 13:22:20 +03:00
1fbd828a5d examples: add MNIST training + missing ops 2024-08-28 13:22:20 +03:00
d2986f8b07 models : add support for wget2 for fedora (#2387) 2024-08-28 11:46:01 +03:00
8bfa8574e2 readme : update the path to bench.py (#2386) 2024-08-28 11:45:05 +03:00
376567bf4f readme : fix typo (#2383) 2024-08-28 11:42:18 +03:00
c0fd64a9c0 readme : fix broken links in implementation details section (#2382) 2024-08-28 11:41:51 +03:00
6e9596f6de whisper : fix compile warning for unused params 2024-08-28 11:40:11 +03:00
9e3c5345cd sync : ggml vulkan (ggml/0)
ggml-ci
2024-08-21 11:07:13 +03:00
b6c05ce82f yolo : add backend support (ggml/924)
* yolo : add backend support

* metal : add sub and sqrt kernels

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-08-21 11:07:13 +03:00
52c80cac00 ggml : fix typo in ggml-quants.c comment (ggml/922) 2024-08-21 11:07:13 +03:00
3643120690 feat: add new sin and cos operators (ggml/919)
* ggml : add sin/cos operators

* ggml-cuda : add sin/cos operators

* ggml : add corresponding tests for sin/cos

* ggml : add backward computation for sin/cos operators

* ggml-vulkan : add sin/cos operators

* ggml-vulkan : add sin/cos shader source

* metal : add sin, cos

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-08-21 11:07:13 +03:00
d65786ea54 readme : fix broken links (#2358)
For whisper.cpp and whisper.h files
2024-08-20 10:57:45 +03:00
7f78675008 examples : use colorblind friendly TTY color scheme (#2360)
This change updates the -pc flag, so that a new xterm256 color scheme is
used. This color scheme is believed to be better for three reasons:

1. It should be friendlier to the colorblind. The scheme was designed by
   Paul Tol (see: https://personal.sron.nl/~pault/). TensorBoard uses it
   since 2017, so it's already popular in the machine learning community

2. It should appear to be the same colors as before to people who aren't
   i.e. it's still a red-green spectrum like before but lightly modified

3. It is readable in both white and black background terminals. The neon
   colors before were probably a bit too intense for white backgrounds.
2024-08-20 10:49:10 +03:00
22fcd5fd11 sync : ggml 2024-08-12 11:59:15 +03:00
993f0df419 ggml : support forward pass broadcasting in ggml_sub (ggml/914)
* ggml: support forward pass broadcasting in ggml_sub

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>

* Use assert instead of GGML_ASSERT in ggml_compute_forward_sub_f32

The check is already performed in ggml_sub_impl

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>

---------

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
2024-08-12 11:58:49 +03:00
9b1788483c metal : fix uninitialized abort_callback (llama/8968) 2024-08-12 11:58:49 +03:00
ad37d26983 rpc : sanitize tensor data + warnings (llama/0)
Co-authored-by: slaren <slarengh@gmail.com>
2024-08-12 11:58:46 +03:00
81c999fe0a cann : add Ascend NPU support (#2336)
* enable Ascend NPU in src/whisper.cpp
  * sync test-backend-ops with llama.cpp
2024-08-09 15:21:56 +03:00
4b7de08bfd whisper : fix compile warning (#0) 2024-08-09 09:58:16 +03:00
4b9c4de1ad sync : ggml 2024-08-09 09:58:16 +03:00
be88ee1d75 ggml : add CANN backend (llama/0)
ggml-ci
2024-08-09 09:58:16 +03:00
3ab19c744e scripts : sync cann 2024-08-09 09:58:16 +03:00
6eac06759b ci : disable ruby workflow (#0) 2024-08-08 22:48:46 +03:00
2e9a5bd2c4 ci : try to fix FreeBSD (#0) 2024-08-08 22:48:46 +03:00
58323bf8ed build : fix aarch64 (#0) 2024-08-08 22:48:46 +03:00
22058f2dbc talk-llama : sync llama.cpp 2024-08-08 22:48:46 +03:00
5b7979a1e6 sync : ggml 2024-08-08 22:48:46 +03:00
ee14c02365 ggml-backend : fix async copy from CPU (llama/8897)
* ggml-backend : fix async copy from CPU

* cuda : more reliable async copy, fix stream used when the devices are the same
2024-08-08 22:48:46 +03:00
ab39dd34e1 Updated SYCL device filtering (llama/8901)
* Updated device filter to depend on default_selector (fixes non-intel device issues)
* Small related update to example/sycl Readme
2024-08-08 22:48:46 +03:00
b1348d3530 CUDA/HIP: fix tests/test-backend-ops (llama/8896) 2024-08-08 22:48:46 +03:00
90641b5cf4 CUDA: fix padding logic for FP16/FP32 (llama/8884) 2024-08-08 22:48:46 +03:00
4160b930f1 ggml : add epsilon as a parameter for group_norm (llama/8818)
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-08-08 22:48:46 +03:00
7a96e661e4 ggml : fix overflows in elu function (llama/8866)
It's helpful to use expm1f(x), because expf(x)-1 will result in overflow
for 25% of single-precision floating point numbers.
2024-08-08 22:48:46 +03:00
a902fb4ab2 ggml : reading the runtime sve config of the cpu (llama/8709)
* ggml : reading the runtime sve config of the cpu

* change to one time init to prevent performance drop

* prefix variable to avoid possible conflicts

* revert xxhash fix and add brackets

---------

Co-authored-by: domke <673751-domke@users.noreply.gitlab.com>
2024-08-08 22:48:46 +03:00
6cb38c3673 Fix conversion of unnormalized BF16->BF16 weights (llama/7843)
* add truncate_bf16

* truncate intermediate fp32 if converting bf16 to bf16

* fix masking in __compute_fp32_to_bf16

* np.int16 no longer used

* missing cast and additional numpy 2.x fix

* ggml-impl : do not flush bf16 subnormals to zero

* ggml : add reference fp32 to bf16 conversion

The fast version is no longer equivalent for all platforms
because of the handling of subnormal values.

* gguf-py : remove flush to zero for bf16 subnormals

* gguf-py : remove float32 truncation to bf16

Rounding achieves the same thing in the cases where this was used.

* missed prototype update in merge

* merge cleanup

---------

Co-authored-by: Francis Couture-Harpin <git@compilade.net>
2024-08-08 22:48:46 +03:00
9cf14ebcbc Fixing wrong VDR iq4nl value (llama/8812) 2024-08-08 22:48:46 +03:00
8e39ee171f ggml-cuda: Adding support for unified memory (llama/8035)
* Adding support for unified memory

* adding again the documentation about unified memory

* refactoring: Moved the unified memory code in the correct location.

* Fixed compilation error when using hipblas

* cleaning up the documentation

* Updating the documentation

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* adding one more case where the PR should not be enabled

---------

Co-authored-by: matteo serva <matteo.serva@gmail.com>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2024-08-08 22:48:46 +03:00
d26250f78c Build: Only include execinfo.h on linux systems that support it (llama/8783)
* Only enable backtrace on GLIBC linux systems

* fix missing file from copy

* use glibc macro instead of defining a custom one
2024-08-08 22:48:46 +03:00
5218ea21b8 cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X (llama/8800)
* cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X

* update asserts

* only use dmmv for supported types

* add test
2024-08-08 22:48:46 +03:00
e60be821ce added android implementation of ggml_print_backtrace_symbols (llama/8751)
* added android implementation of ggml_print_backtrace_symbols

* Update ggml/src/ggml.c

Co-authored-by: slaren <slarengh@gmail.com>

* Update ggml/src/ggml.c

Co-authored-by: slaren <slarengh@gmail.com>

* Update ggml/src/ggml.c

Co-authored-by: slaren <slarengh@gmail.com>

* Update ggml/src/ggml.c

Co-authored-by: slaren <slarengh@gmail.com>

* Update ggml/src/ggml.c

Co-authored-by: slaren <slarengh@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-08-08 22:48:46 +03:00
19708df884 cann: update cmake (llama/8765) 2024-08-08 22:48:46 +03:00
3f190addda Add TIMESTEP_EMBEDDING OP (llama/8707)
Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
2024-08-08 22:48:46 +03:00
b355ee7cfa ggml: bugfix: fix the inactive elements is agnostic for risc-v vector (llama/8748)
In these codes, we want to retain the value that they previously held
when mask[i] is false. So we should use undisturbed. With the default
agnostic policy of rvv intrinsic, these values can be held or be
written with 1s.

Co-authored-by: carter.li <carter.li@starfivetech.com>
2024-08-08 22:48:46 +03:00
49ac8872b4 cuda : organize vendor-specific headers into vendors directory (llama/8746)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-08-08 22:48:46 +03:00
8ef98ae7e3 add conv support (llama/8688) 2024-08-08 22:48:46 +03:00
e471adcfa5 feat: Support Moore Threads GPU (llama/8383)
* Update doc for MUSA

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* Add GGML_MUSA in Makefile

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* Add GGML_MUSA in CMake

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* CUDA => MUSA

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* MUSA adds support for __vsubss4

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* Fix CI build failure

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-08-08 22:48:46 +03:00
aa816c922c ggml : ignore more msvc warnings (ggml/906) 2024-08-08 22:48:46 +03:00
b3264eb266 metal : fix struct name (ggml/912)
ggml-ci
2024-08-08 22:48:46 +03:00
eb2eb87a58 metal : add abort callback (ggml/905) 2024-08-08 22:48:46 +03:00
83fcb0e486 vulkan : implement Stable Diffusion operators (ggml/904)
* Fix Vulkan repeat op

* Implement Vulkan concat op

* Delete old Vulkan shader generator

* Implement Vulkan im2col op

* Implement Vulkan unary gelu_quick op

* Implement Vulkan group_norm op

* Implement Vulkan timestep_embedding op

* Implement Vulkan upscale op

* Fix Vulkan vk_context tensor extra index issue

* Fix Vulkan matmul shader parameter bug

* Properly fix Vulkan matmul shader parameter bug

* Add Vulkan ADD f16 + f32 -> f16 operator support

* Implement Vulkan tanh op

* Fix Vulkan group count too large Validation error on non-Nvidia GPUs

* Throw error when too much memory is requested

* Fix another Vulkan group count too large Validation error on non-Nvidia GPUs

* Fix matmul MMQ condition

* Implement Vulkan pad op

* Fix Vulkan crash when tensor is used multiple times in a compute graph

* Add Vulkan CONCAT f16 + f16 -> f16 op

* Add Vulkan LEAKY_RELU op
2024-08-08 22:48:46 +03:00
f7bb412878 ggml : move c parameter comment to ggml_rope_ext (ggml/901)
This commit moves the comment for the c parameter from ggml_rope to
ggml_rope_ext. The comment is currently incorrect as ggml_rope does not
have a c parameter (freq_factors tensor).

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-08-08 22:48:46 +03:00
ef6dcf0d0c ggml : resolve sync conflicst (ggml/0)
ggml-ci
2024-08-08 22:48:46 +03:00
c7ea4fd235 common : handle new quant types (ggml/0) 2024-08-08 22:48:46 +03:00
525f190917 ggml : add ggml-aarch64 (ggml/0) 2024-08-08 22:48:46 +03:00
dd916a2852 ggml : reduce hash table reset cost (llama/8698)
* ggml : reduce hash table reset cost

* fix unreachable code warnings after GGML_ASSERT(false)

* GGML_ASSERT(false) -> GGML_ABORT("fatal error")

* GGML_ABORT use format string
2024-08-08 22:48:46 +03:00
0620fe00ec ggml: handle ggml_init failure to fix NULL pointer deref (llama/8692)
`ggml_init` can fail if no unused context is found. In that case, a NULL-pointer deref will happen later in the code during a call to `ggml_set_on_alloc`.

This fixes it by bailing out if no context is found.
2024-08-08 22:48:46 +03:00
31d0a9a14f fix multi-gpu issue on sycl (llama/8554)
---------

Signed-off-by: Chen Xi <xi2chen@intel.com>
Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com>
2024-08-08 22:48:46 +03:00
c06970dd72 ggml : add and use ggml_cpu_has_llamafile() (llama/8664) 2024-08-08 22:48:46 +03:00
7598acf525 Re-add erroneously removed -fsycl from GGML_EXTRA_LIBS (llama/8667) 2024-08-08 22:48:46 +03:00
43ddfce969 sycl : Add support for non-release DPC++ & oneMKL (llama/8644)
* Update cmake to support nvidia hardware & open-source compiler
---------
Signed-off-by: Joe Todd <joe.todd@codeplay.com>
2024-08-08 22:48:46 +03:00
a7e6d2cd9c Vulkan IQ4_NL Support (llama/8613)
* Fix Vulkan matmul tests compile errors

* Add Vulkan IQ4_NL support

* Fix Vulkan DeepSeek-Coder-V2-Lite MoE support
2024-08-08 22:48:46 +03:00
86506b0c5c Allow all RDNA2 archs to use sdot4 intrinsic (llama/8629)
The check gating the use of `__builtin_amdgc_sdot4` specifically checks for gfx1030. This causes a severe perf regression for anything gfx103? that's not gfx1030 and not using `HSA_OVERRIDE_GFX_VERSION` (if you've built ROCm to support it). We already have a generic RDNA2 define, let's use it.
2024-08-08 22:48:46 +03:00
11182fae34 fix scratch size of softmax (llama/8642) 2024-08-08 22:48:46 +03:00
0bc8bffe1d ggml: fix compile error for RISC-V (llama/8623) 2024-08-08 22:48:46 +03:00
8c4f30497a CUDA: MMQ code deduplication + iquant support (llama/8495)
* CUDA: MMQ code deduplication + iquant support

* 1 less parallel job for CI build
2024-08-08 22:48:46 +03:00
b1ee3a8444 gguf : handle null name during init (llama/8587) 2024-08-08 22:48:46 +03:00
be9a16fd3f ggml : fix quant dot product with odd number of blocks (llama/8549)
* ggml : fix iq4_nl dot product with odd number of blocks

* ggml : fix odd blocks for ARM_NEON (llama/8556)

* ggml : fix iq4_nl dot product with odd number of blocks

* ggml : fix q4_1

* ggml : fix q5_0

* ggml : fix q5_1

* ggml : fix iq4_nl metal

ggml-ci

* ggml : fix q4_0

* ggml : fix q8_0

ggml-ci

* ggml : remove special Q4_0 code for first 2 blocks

* ggml : fix sumf redefinition

---------

Co-authored-by: slaren <slarengh@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-08-08 22:48:46 +03:00
f4d9a95b0f ggml : add friendlier error message to fopen errors (llama/8575)
* Add additional error information when model files fail to load.

* Adding additional error information to most instances of fopen.
2024-08-08 22:48:46 +03:00
a8ab3abe09 CUDA: fix partial offloading for ne0 % 256 != 0 (llama/8572) 2024-08-08 22:48:46 +03:00
65a
fb6a835938 cmake : install all ggml public headers (llama/8480)
Co-authored-by: 65a <65a@65a.invalid>
2024-08-08 22:48:46 +03:00
8923bb4292 Add Ascend NPU backend (llama/6035)
* [CANN] Add Ascend NPU backend

Ascend is a full-stack AI computing infrastructure for industry
applications and services based on Huawei Ascend processors and
software.

CANN (Compute Architecture of Neural Networks), developped by
Huawei, is a heterogeneous computing architecture for AI.

Co-authored-by: wangshuai09 <391746016@qq.com>

* delete trailing whitespaces

* Modify the code based on review comment

* Rename LLAMA_CANN to GGML_CANN

* Make ggml-common.h private

* add ggml_cann prefix for acl funcs

* Add logging for CANN backend

* Delete Trailing whitespace

---------

Co-authored-by: wangshuai09 <391746016@qq.com>
2024-08-08 22:48:46 +03:00
fcba6aa352 make/cmake: add missing force MMQ/cuBLAS for HIP (llama/8515) 2024-08-08 22:48:46 +03:00
8807fe608b Refactor lora adapter support (llama/8332)
* lora: load to devide buft

* add patch tensor function

* correct tensor patch

* llama_lora_adapter_apply

* correct ggml_backend_tensor_copy

* add llm_build_mm

* fix auto merge

* update based on review comments

* add convert script

* no more transpose A

* add f16 convert

* add metadata check

* add sanity check

* fix ftype

* add requirements

* fix requirements

* fix outfile

* conversion: only allow selected models

* fix types

* cuda : do not use dmmv if the tensor does not have enough cols

* llama : lora fixes

* do not disable mmap with lora

Co-authored-by: slaren <slarengh@gmail.com>

* llm_build_lora_mm_id

* convert_lora : MoE LoRA conversion support

* convert_lora : prefer safetensors, similarly to convert_hf

* convert_hf : simplify modify_tensors for InternLM2

* convert_lora : lazy conversion

* llama : load and use alpha from LoRA adapters

* llama : use llm_build_lora_mm in most model graphs

* auto scale

* Revert "auto scale"

This reverts commit 42415a4874e0f963e4aca6796ea5dfb97cd17464.

* remove redundant params

* Apply suggestions from code review

Co-authored-by: slaren <slarengh@gmail.com>

* change kv metadata

* move add_type to __init__

* convert_hf : move add_type to main()

* convert_lora : use the GGUFWriter from Model instead of overwriting it

---------

Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: Francis Couture-Harpin <git@compilade.net>
2024-08-08 22:48:46 +03:00
3e94c7a81d add concat through dim 1/2 (llama/8483)
* add concat through dim 1/2
2024-08-08 22:48:46 +03:00
77af3254e1 Vulkan MMQ Fix (llama/8479)
* Fix incoherence by adding missing LOAD_VEC_A parameter

* Fix Vulkan op result checker build error
2024-08-08 22:48:46 +03:00
d4b3cffec4 vulkan : cmake integration (llama/8119)
* Add Vulkan to CMake pkg

* Add Sycl to CMake pkg

* Add OpenMP to CMake pkg

* Split generated shader file into separate translation unit

* Add CMake target for Vulkan shaders

* Update README.md

* Add make target for Vulkan shaders

* Use pkg-config to locate vulkan library

* Add vulkan SDK dep to ubuntu-22-cmake-vulkan workflow

* Clean up tabs

* Move sudo to apt-key invocation

* Forward GGML_EXTRA_LIBS to CMake config pkg

* Update vulkan obj file paths

* Add shaderc to nix pkg

* Add python3 to Vulkan nix build

* Link against ggml in cmake pkg

* Remove Python dependency from Vulkan build

* code review changes

* Remove trailing newline

* Add cflags from pkg-config to fix w64devkit build

* Update README.md

* Remove trailing whitespace

* Update README.md

* Remove trailing whitespace

* Fix doc heading

* Make glslc required Vulkan component

* remove clblast from nix pkg
2024-08-08 22:48:46 +03:00
b852a4c5ca metal : template-ify some of the kernels (llama/8447)
ggml-ci
2024-08-08 22:48:46 +03:00
2157abaab4 ggml : minor naming changes (llama/8433)
* ggml : minor naming changes

ggml-ci

* ggml : use PRId64 [no ci]

* ggml : revert FA K/Q names
2024-08-08 22:48:46 +03:00
68d609a12c fix the mul_mat_id ut issues (llama/8427)
* fix part of mul_mat_id

* skip the bfloat 16 sycl ut

Signed-off-by: Chen Xi <xi2chen@intel.com>

---------

Signed-off-by: Chen Xi <xi2chen@intel.com>
Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com>
Co-authored-by: Chen Xi <xi2chen@intel.com>
2024-08-08 22:48:46 +03:00
5a8ae474f0 ggml : add NVPL BLAS support (ggml/8329) (llama/8425)
* ggml : add NVPL BLAS support

* ggml : replace `<BLASLIB>_ENABLE_CBLAS` with `GGML_BLAS_USE_<BLASLIB>`

---------

Co-authored-by: ntukanov <ntukanov@nvidia.com>
2024-08-08 22:48:46 +03:00
84493d7f3e cuda : suppress 'noreturn' warn in no_device_code (llama/8414)
* cuda : suppress 'noreturn' warn in no_device_code

This commit adds a while(true) loop to the no_device_code function in
common.cuh. This is done to suppress the warning:

```console
/src/ggml-cuda/template-instances/../common.cuh:346:1: warning:
function declared 'noreturn' should not return [-Winvalid-noreturn]
  346 | }
      | ^
```

The motivation for this is to reduce the number of warnings when
compilng with GGML_HIPBLAS=ON.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

* squash! cuda : suppress 'noreturn' warn in no_device_code

Update __trap macro instead of using a while loop to suppress the
warning.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

---------

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-08-08 22:48:46 +03:00
15d71189e9 CUDA: optimize and refactor MMQ (llama/8416)
* CUDA: optimize and refactor MMQ

* explicit q8_1 memory layouts, add documentation
2024-08-08 22:48:46 +03:00
37e962580f Use multi_ptr to clean up deprecated warnings (llama/8256) 2024-08-08 22:48:46 +03:00
db0ea7a2f2 ggml : move sgemm sources to llamafile subfolder (llama/8394)
ggml-ci
2024-08-08 22:48:46 +03:00
5498b0e6c0 ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (llama/5780)
* Arm AArch64: optimized GEMV and GEMM kernels for q4_0_q8_0, and q8_0_q8_0 quantization

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add copyright claim only to ggml-aarch64.cpp and ggml-aarch64.h files

* Arm AArch64: minor code refactoring for rebase

* Arm AArch64: minor code refactoring for resolving a build issue with cmake

* Arm AArch64: minor code refactoring to split the Q4_0_AARC64 type into three separate types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8

* Arm AArch64: minor code change for resolving a build issue with server-windows

* retrigger checks

* Arm AArch64: minor code changes for rebase

* Arm AArch64: minor changes to skip the pr#7433 vec_dot code for arm cpus with SVE VL not equal to 256 bits

* Arm AArch64: remove stale LLAMA_QKK_64 from CMakeLists.txt and delete build.zig

* Arm AArch64: add reference scalar gemm and gemv, and avoid dynamic memory allocations during quantization for Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8

* Arm AArch64: add multithreaded quantization support for the new types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8

* Arm AArch64: minor code refactoring

* Arm AArch64: simplify logic for calling gemm and gemv functions in ggml_compute_forward_mul_mat

* Arm AArch64: minimize changes in ggml_compute_forward_mul_mat

* Arm AArch64: minor code refactoring, and add reference scalar code to quantize routines for new quant types

* Arm AArch64: minor code refactoring

* Arm AArch64: minor code refactoring

* Arm AArch64: minor code refactoring

* rebase on the latest master commit 3fd62a6 and adapt to the new directory structure

* Arm AArch64: remove a redundant comment

* Arm AArch64: add pragma in ggml-aarch64.c to turn -Woverlength-strings warning off

* Arm AArch64: use __aarch64__ check to guard 64-bit neon kernels

* Arm AArch64: update docs/build.md README to include compile time flags for buiilding the Q4_0_4_4 quant type
2024-08-08 22:48:46 +03:00
2af4a52c39 sycl : Reenabled mmvq path for the SYCL Nvidia Backend (llama/8372)
* SYCL : Reenabled mmvq path for the SYCL Nvidia Backend

* Reduced verbosity of comment
2024-08-08 22:48:46 +03:00
eee2fe882e sycl : fix powf call in device code (llama/8368) 2024-08-08 22:48:46 +03:00
0d1a11e5e2 ggml : loop tiling optimizations for scalar path (ggml/898)
Apply a loop tiling technique to the generic path, which provides
performance upside for ISAs with enough registers to take advantage
of it. Also helps the compiler optimize this path.
2024-08-08 22:48:46 +03:00
b2ead7d6f4 ggml: add support for float16 input tensors in pooling operations (ggml/895)
* Add support for float16 tensors in 1d pooling operations

* Add support for float16 input tensors in 2d pooling operations

* code cleanup

remove unnecessary casting during srow ptr initialization

---------

Co-authored-by: vanaka11 <vanaka1189@gmail.com>
2024-08-08 22:48:46 +03:00
8da6fd4dff vulkan : initialize vk_buffer_struct members to VK_NULL_HANDLE (ggml/893)
This prevents invalid frees when destroying a partially initialized
vk_buffer_struct. For example, this could happen in ggml_vk_create_buffer
when running out of device memory.

Co-authored-by: Tony Wasserka <neobrain@users.noreply.github.com>
2024-08-08 22:48:46 +03:00
ab8ec9e940 cmake : only enable GGML_NATIVE and x86 flags if not crosscompiling (ggml/885) 2024-08-08 22:48:46 +03:00
701265bf38 scripts : sync new files (#0) 2024-08-08 22:48:46 +03:00
fe36c90971 cmake : fix compile in xcode (#2311) 2024-08-05 09:48:26 +03:00
6739eb83c3 whisper : handle empty mel (#2324) 2024-07-27 20:35:04 +03:00
f68298ce06 whisper : use vulkan as gpu backend when available (#2302)
* ggml: use vulkan as gpu backend when available

Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com>

* whisper: enable using vk as default buffer type

Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com>

---------

Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com>
2024-07-16 10:21:09 +03:00
7ae885c1ef whisper : fix DTW assert (#2299) 2024-07-15 15:50:36 +03:00
d207c68822 cmake : use WHISPER_EXTRA_FLAGS (#2294) 2024-07-09 18:54:18 +03:00
16d72504fe cmake : allow external ggml 2024-07-09 11:38:15 +03:00
1c31f9d4a8 cmake : try to fix openvino build (#2281) 2024-07-08 15:36:51 +03:00
8ecb2f1f68 cmake : remove install of llama convert script [no ci] (#2266) 2024-07-08 14:53:55 +03:00
5226c3d45c make : remove llama prints [no ci] (#2265) 2024-07-08 14:53:55 +03:00
dbf9c15e30 talk-llama : sync llama.cpp 2024-07-08 14:53:55 +03:00
d3f6c34976 examples : fix compile warnings [no ci] (#0) 2024-07-08 14:53:55 +03:00
425e2910a3 sync : ggml 2024-07-08 14:53:55 +03:00
49868aa851 ggml : sync sycl (skip) (#0) 2024-07-08 14:53:55 +03:00
ff08e30ab5 scripts : fix sync scripts 2024-07-08 14:53:55 +03:00
95f2a191c0 ggml : remove unnecessary UNUSED macro call (ggml/880)
This commit removes an UNUSED macro call that is not needed as the
variable n0 is used in the code and will not produce a warning.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-07-08 14:53:55 +03:00
00422ec3cf cmake : add GGML_BUILD and GGML_SHARED macro definitions (llama/8281) 2024-07-08 14:53:55 +03:00
c5b05321e9 Enabled more data types for oneMKL gemm_batch (llama/8236) 2024-07-08 14:53:55 +03:00
5dc636a65a CUDA: MMQ support for iq4_nl, iq4_xs (llama/8278) 2024-07-08 14:53:55 +03:00
73703a144f CUDA: revert part of the RDNA1 optimizations (llama/8309)
The change on the launch_bounds was causing a small performance drop in perplexity of 25 t/s
2024-07-08 14:53:55 +03:00
e89fdceec2 CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 (llama/8311) 2024-07-08 14:53:55 +03:00
29a2739d27 Fix WARP_SIZE=16 bug of Intel GPU (llama/8266)
* fix group_norm ut

* split softmax

* fix softmax

* add concat support condition

* revert debug code

* move QK_WARP_SIZE to presets.hpp
2024-07-08 14:53:55 +03:00
ee6d17f6b4 rm get_work_group_size() by local cache for performance (llama/8286)
Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
2024-07-08 14:53:55 +03:00
95e90823d9 Define and optimize RDNA1 (llama/8085) 2024-07-08 14:53:55 +03:00
005cc45df3 fix typo (llama/8267)
Co-authored-by: Judd <foldl@boxvest.com>
2024-07-08 14:53:55 +03:00
c2c60dc9ba Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (llama/8258) 2024-07-08 14:53:55 +03:00
4af3194b7c cuda : update supports_op for matrix multiplication (llama/8245) 2024-07-08 14:53:55 +03:00
4a2ba1a065 Fix win build conflict of math library (llama/8230)
* fix win build conflict of math library

* fix the condition: !(win32 & SYCL)

* revert warp_size=16
2024-07-08 14:53:55 +03:00
f096cc6807 Fix the sub group size of Intel (llama/8106)
* use warp_size macro for all sycl kernels

* fix mask of permute_sub_group_by_xor

* fix rms_norm with correct warp number

* fix rms_norm_f32/group_norm_f32

* move norm to norm.cpp file

* fix quantize bug

* fix mmvq's batch size
2024-07-08 14:53:55 +03:00
e4bc83ab47 CUDA: refactor and optimize IQ MMVQ (llama/8215)
* CUDA: refactor and optimize IQ MMVQ

* uint -> uint32_t

* __dp4a -> ggml_cuda_dp4a

* remove MIN_CC_DP4A checks

* change default

* try CI fix
2024-07-08 14:53:55 +03:00
db7e0dbe6e Update SYCL-Rope op and Refactor (llama/8157)
* align with rope.cu and move sycl-op to a single file
2024-07-08 14:53:55 +03:00
bf88c94da9 CUDA: fix MMQ stream-k for --split-mode row (llama/8167) 2024-07-08 14:53:55 +03:00
3eea171cab feat: cuda implementation for ggml_conv_transpose_1d (ggml/854)
* conv transpose 1d passing test for 1d input and kernel

* working for different input and output channel counts, added test for variable stride

* initial draft appears to work with stride other than 1

* working with all old and new conv1d  tests

* added a test for large tensors

* removed use cuda hardcoding

* restored test-conv-transpose.c

* removed unused arugments, and fixed bug where test failure would cause subsequent tests to fail

* fixed accumulator bug

* added test to test-backend-ops

* fixed mistake

* addressed review

* fixed includes

* removed blank lines

* style and warning fixes

* return failure when test fails

* fix supports_op

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-07-08 14:53:55 +03:00
64a56ebf13 ci : disable java build 2024-07-08 14:26:59 +03:00
bec9836849 server : add inference path to make OAI API compatible (#2270) 2024-07-08 14:24:58 +03:00
c118733a29 sync : ggml + fix sync script 2024-06-26 23:20:19 +03:00
bb3dd45524 make : disable CUDA graphs 2024-06-26 23:20:13 +03:00
04e7fa6f4f ggml : add GGML_CUDA_USE_GRAPHS option, restore GGML_CUDA_FORCE_CUBLAS (cmake) (llama/8140) 2024-06-26 23:18:11 +03:00
9f7f36d4c9 make : disable CUDA mel build 2024-06-26 22:25:25 +03:00
4a62efbb95 cmake : minor fixes 2024-06-26 21:42:39 +03:00
0a55a70b9b make : fix missing -O3
same as https://github.com/ggerganov/llama.cpp/pull/8143
2024-06-26 21:21:12 +03:00
dc8cc2dd6f whisper : disable CUDA mel + fix FFMPEG 2024-06-26 20:11:38 +03:00
3efedb9511 sync : ggml 2024-06-26 19:40:23 +03:00
e30c679928 whisper : reorganize source code + improve CMake (#2256)
* scripts : update sync [no ci]

* files : reorganize [no ci]

* sync : llama.cpp

* cmake : link math library

* cmake : build normal ggml library

* files : move headers to include

* objc : fix path to ggml-metal.h

* ci : fix WHISPER_CUDA -> GGML_CUDA

* scripts : sync LICENSE [no ci]
2024-06-26 19:34:09 +03:00
bf4cb4abad whisper : optimize fft() function (#2242)
Co-authored-by: Mike Fan <60965742+mike-fzy@users.noreply.github.com>
2024-06-18 18:10:33 +03:00
e293f17d34 talk-llama : sync llama.cpp 2024-06-18 09:45:37 +03:00
5d950c4b8d whisper : use ggml_backend_sched (#2239)
* whisper : use ggml_backend_sched (wip)

* use sched in whisper_allocr

* whisper : single backend in whisper_context

* whisper : remove whisper_state->backends_used

* whisper : remove whisper_context->backend

* whisper : reset scheduler after init

* whisper : fix external encoder (e.g. CoreML)

* whisper : cleanup

* whisper : handle null GPU buffer types + fix sycl

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-06-18 09:39:40 +03:00
820446e230 fix : remove extra files 2024-06-18 09:39:40 +03:00
54d5823ebe scripts : sync ggml-blas 2024-06-18 09:39:40 +03:00
5181494e9f build : update make / cmake 2024-06-18 09:39:40 +03:00
4a6e6e8b30 sync : ggml 2024-06-18 09:39:40 +03:00
de29b193f6 move BLAS to a separate backend (cont) (llama/6210)
ggml-ci
2024-06-18 09:39:40 +03:00
922971041b Vulkan Shader Refactor, Memory Debugging Option (llama/7947)
* Refactor shaders, extract GLSL code from ggml_vk_generate_shaders.py into vulkan-shaders directory

* Improve debug log code

* Add memory debug output option

* Fix flake8

* Fix unnecessary high llama-3 VRAM use
2024-06-18 09:39:40 +03:00
63a767a134 scripts : stop sync whisper example from ggml 2024-06-18 09:39:40 +03:00
30841fa786 cmake : fix sycl build (#0) 2024-06-16 18:19:48 +03:00
3b1ac03828 ggml : remove OpenCL (#0) 2024-06-16 18:19:48 +03:00
990de617b5 sycl : sync (#0) 2024-06-16 18:19:48 +03:00
6975600b4b cuda : enable CUDA graphs (#0) 2024-06-16 18:19:48 +03:00
061eeb9f61 talk-llama : sync llama.cpp 2024-06-16 18:19:48 +03:00
4942b1b428 cmake : fix CUDA build (#0) 2024-06-16 18:19:48 +03:00
3c7cc5c437 sync : ggml
ggml-ci
2024-06-16 18:19:48 +03:00
5cd42ee2cc ggml : fix and optimize ppc64le (ggml/849)
* fix compile issues introduced by loongarch_asx

* restore quant changes to merge

* fix compile issues introduced by loongarch_asx

* further optimize by using vec_msum & vec_sum4s on ppc64le
2024-06-16 18:19:48 +03:00
ee718f3da6 ggml : remove duplicate include of ggml-common.h (ggml/853)
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-06-16 18:19:48 +03:00
63eac1f608 remove global variables (llama/7710)
* separate DPCT helpers outside

* replace global variables with context

* remove useless extra

* update mul_mat condition

* remove duplicate buft initialization

* remove duplicate extra and global work group size

* remove useless backend check

* remove duplicated extras

* use macro for group_size and remove cuda-related
2024-06-16 18:19:48 +03:00
b17ba2815b CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (llama/7921)
* CUDA: faster q2_K, q3_K MMQ + int8 tensor cores

* try CI fix

* try CI fix

* try CI fix

* fix data race

* rever q2_K precision related changes
2024-06-16 18:19:48 +03:00
7a489af2f3 metal : utilize max shared memory for mul_mat_id (llama/7935) 2024-06-16 18:19:48 +03:00
4a4ea13d6d rpc : fix ggml_backend_rpc_supports_buft() (llama/7918) 2024-06-16 18:19:48 +03:00
174a461fc6 move BLAS to a separate backend (llama/6210)
* move BLAS to a separate backend

* rename GGML_USE_OPENBLAS to GGML_USE_BLAS

* alloc : reuse same buffer when the same buffer type if used multiple times

* set number of threads automatically for openblas and blis

* sched : print assignments when GGML_SCHED_DEBUG env variable is set

* sched : allow ops with weights on an incompatible buffer type

This will cause the weight to be copied to a backend that supports the
op, which is very costly. The weight should have been stored in a buffer
of a backend that can run the op, but llama.cpp cannot do this
automatically at the moment.

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-06-16 18:19:48 +03:00
d8b7a24bc9 CUDA: fix broken oob check for FA vec f32 kernel (llama/7904) 2024-06-16 18:19:48 +03:00
acf3832c9c tests : add non-cont unary tests (llama/7857)
* tests : add non-cont unary tests

* ggml : update unary asserts and "supports_op"

ggml-ci
2024-06-16 18:19:48 +03:00
d29ac44303 ggml : improve ggml_is_contiguous logic (llama/7856)
* ggml : improve ggml_is_contiguous logic

ggml-ci

* ggml : support more contiguous cases

ggml-ci
2024-06-16 18:19:48 +03:00
12638dfef0 vulkan: select only one device for single gpu with multiple drivers (llama/7582) 2024-06-16 18:19:48 +03:00
f100b3b523 Update Vulkan RoPE implementation (llama/7818)
* Update Vulkan RoPE implementation

* Return nullptr on alloc_buffer when allocation fails, instead of throwing an exception

Minor fixes

* Fix segfault when running out of VRAM

Co-authored-by: slaren <slarengh@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-06-16 18:19:48 +03:00
a99e213a82 CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (llama/7860) 2024-06-16 18:19:48 +03:00
7483d2b61c CUDA: use tensor cores for MMQ (llama/7676)
* CUDA: int8 tensor cores for MMQ (legacy quants)

* fix out-of-bounds writes

* __builtin_assume -> GGML_CUDA_ASSUME

* fix writeback returning too early
2024-06-16 18:19:48 +03:00
1fe5948227 use the correct SYCL context for host USM allocations (llama/7777)
Signed-off-by: Ben Ashbaugh <ben.ashbaugh@intel.com>
2024-06-16 18:19:48 +03:00
760497e1ab CUDA: revise q8_1 data layout for mul_mat_q (llama/7824) 2024-06-16 18:19:48 +03:00
b172e7714c vulkan : reuse parent extra for views (llama/7806)
* vulkan : reuse parent extra for views

* Fix validation error when multiple compute contexts are used in a graph

---------

Co-authored-by: 0cc4m <picard12@live.de>
2024-06-16 18:19:48 +03:00
dc01aadb18 fix softmax r2r result wrong issue (llama/7811) 2024-06-16 18:19:48 +03:00
e08c62149b CUDA: refactor mmq, dmmv, mmvq (llama/7716)
* CUDA: refactor mmq, dmmv, mmvq

* fix out-of-bounds write

* struct for qk, qr, qi

* fix cmake build

* mmq_type_traits
2024-06-16 18:19:48 +03:00
abab4500fa ggml : refactor rope norm/neox (llama/7634)
* ggml : unify rope norm/neox (CPU)

* ggml : fix compile warning

* ggml : remove GLM rope mode

ggml-ci

* metal : better rope implementation

ggml-ci

* cuda : better rope implementation

ggml-ci

* naming : n_orig_ctx -> n_ctx_orig

ggml-ci

* dev : add reminders to update backends

ggml-ci

* vulkan : fix ggml_rope_ext() usage

* cuda : fix array size + indents

ggml-ci
2024-06-16 18:19:48 +03:00
e666315fa8 Allow number of nodes in CUDA graph to change (llama/7738)
Previously the code would have failed to cope in the case that the
number of nodes changes in an existing CUDA graph. This fixes the
issue by removing an unnecessary conditional.
2024-06-16 18:19:48 +03:00
3f869af14c ggml : remove OpenCL (llama/7735)
ggml-ci
2024-06-16 18:19:48 +03:00
cbacb7634c ggml : prevent builds with -ffinite-math-only (llama/7726)
This enforces a check that -fno-finite-math-only was set and that the operating
compiling mode is not in finite maths mode. This is because during rewriting of
silu and softmax for cpu #7154 there emerged an issue where the result that was
observed when >1 slot was nondeterministic as found by @JohannesGaessler.

@LostRuins narrowed the problem down to -ffinite-math-only which was theorised
to be due to SiLU, instead of flushing small values to 0, returns NaN or some
other garbage. @jart proposed a fix that @ggerganov then implemented in this fix

ref https://github.com/ggerganov/llama.cpp/pull/7154#issuecomment-2145661825
2024-06-16 18:19:48 +03:00
6cc3b022ee llama : offload to RPC in addition to other backends (llama/7640)
* llama : offload to RPC in addition to other backends

* - fix copy_tensor being called on the src buffer instead of the dst buffer

- always initialize views in the view_src buffer

- add RPC backend to Makefile build

- add endpoint to all RPC object names

* add rpc-server to Makefile

* Update llama.cpp

Co-authored-by: slaren <slarengh@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-06-16 18:19:48 +03:00
e5e38d4920 ggml : use OpenMP as a thread pool (llama/7606)
* ggml: Added OpenMP for multi-threads processing

* ggml : Limit the number of threads used to avoid deadlock

* update shared state n_threads in parallel region

* clear numa affinity for main thread even with openmp

* enable openmp by default

* fix msvc build

* disable openmp on macos

* ci : disable openmp with thread sanitizer

* Update ggml.c

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-06-16 18:19:48 +03:00
2a6bab5655 Vulkan Mixture of Experts (MoE) support (llama/7628)
* Finish Vulkan mul_mat_id implementation

* Add Vulkan sum_rows and div ops

* Fix MUL_MAT_ID matrix matrix shader

* Fix MUL_MAT_ID matrix vector shader dispatch size

* Fix MUL_MAT_ID matrix vector shader and dispatch code

* Update Vulkan CPU offload for MUL_MAT_ID

* Fix crash when using split mode none and setting a main GPU
2024-06-16 18:19:48 +03:00
8c01c9b85c kompute : implement op_getrows_f32 (llama/6403)
op_getrows_f32 is required since https://github.com/ggerganov/llama.cpp/pull/6122
for the Vulkan w/ Kompute backend to be functional.

As such, implement this op to make this backend functional again.
2024-06-16 18:19:48 +03:00
d1123d795e fix bug introduced in using calloc (llama/7701)
compilade pointed this out on the previous MR
2024-06-16 18:19:48 +03:00
9b3d784020 Fix FlashAttention debug test, FP32 assert (llama/7684) 2024-06-16 18:19:48 +03:00
a16137d13d CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 (llama/7681) 2024-06-16 18:19:48 +03:00
5582039d0a CUDA: quantized KV support for FA vec (llama/7527)
* CUDA: quantized KV support for FA vec

* try CI fix

* fix commented-out kernel variants

* add q8_0 q4_0 tests

* fix nwarps > batch size

* split fattn compile via extern templates

* fix flake8

* fix metal tests

* fix cmake

* make generate_cu_files.py executable

* add autogenerated .cu files

* fix AMD

* error if type_v != FP16 and not flash_attn

* remove obsolete code
2024-06-16 18:19:48 +03:00
9a16c643e2 ggml : fix loongson compile warnings (llama/7537)
* ggml : fix loongson compile warnings

ggml-ci

* Fix loongarch quantize test fail.

Fix unexpected error introduced during rebase code.

* tests : disable json test due to lack of python on the CI node

ggml-ci

---------

Co-authored-by: junchao-loongson <zhaojunchao@loongson.cn>
2024-06-16 18:19:48 +03:00
10a8a23100 faster avx512 exp implementation (llama/7551)
* faster avx512 exp implementation

* x->r

* improve accuracy, handle special cases

* remove `e`
2024-06-16 18:19:48 +03:00
29cfeef77f ggml : fix loongarch build (O2 issue) (llama/7636) 2024-06-16 18:19:48 +03:00
e66e9ea25b metal : remove invalid asserts (llama/7617) 2024-06-16 18:19:48 +03:00
276779a849 metal : add missing asserts (llama/7617) 2024-06-16 18:19:48 +03:00
1f35ce61c1 ggml : fix YARN + add tests + add asserts (llama/7617)
* tests : add rope tests

ggml-ci

* ggml : fixes (hopefully)

ggml-ci

* tests : add non-cont tests

ggml-ci

* cuda : add asserts for rope/norm + fix DS2

ggml-ci

* ggml : assert contiguousness

* tests : reduce RoPE tests

ggml-ci
2024-06-16 18:19:48 +03:00
4b19cc3ed4 cuda : non-cont concat support (llama/7610)
* tests : add non-cont concat tests

* cuda : non-cont concat support

ggml-ci
2024-06-16 18:19:48 +03:00
a535d348dd llama-bench : add support for the RPC backend (llama/7435) 2024-06-16 18:19:48 +03:00
8f5dc729d9 ggml : use atomic_flag for critical section (llama/7598)
* ggml : use atomic_flag for critical section

* add windows shims
2024-06-16 18:19:48 +03:00
02fc147a0b examples : adapt to new ggml_concat (ggml/0) 2024-06-16 18:19:48 +03:00
109148ac84 ggml : fix typo in ggml.c (llama/7603) 2024-06-16 18:19:48 +03:00
3563473d2c Align GEMM dispatch (llama/7566)
* align GEMM dispatch
2024-06-16 18:19:48 +03:00
046834198d sycl : fix assert (llama/7563) 2024-06-16 18:19:48 +03:00
0a2ad9de06 vulkan: properly initialize vulkan devices for LLAMA_SPLIT_MODE_NONE (llama/7552) 2024-06-16 18:19:48 +03:00
39b0640b09 rpc : resource management rework (llama/7562)
* rpc : resource management rework

* address review comments
2024-06-16 18:19:48 +03:00
8dca71de64 fix ggml_sycl_mul_mat_id() to match the change of api (llama/7436)
* fix mul_mat_id to match the change of api

* rm comment

* rm unused or duplicated code, rename as review comment
2024-06-16 18:19:48 +03:00
812787cbc5 ggml : generalize GGML_OP_CONCAT (llama/7563)
* ggml : generalize GGML_OP_CONCAT (WIP)

ggml-ci

* tests : add dim != 2 tests

* metal : generalize concat kernel

* tests : naming

* cuda : generalize concat kernel

ggml-ci

* sycl : add warning and assert

* ggml : fix op params handling

* metal : bugfix kernel

ggml-ci

* ggml : reimplement CPU and Metal

* cuda : add asserts

ggml-ci

* ggml : fix ptrs

ggml-ci
2024-06-16 18:19:48 +03:00
68ef10805e update HIP_UMA #7399 (llama/7414)
* update HIP_UMA #7399

add use of hipMemAdviseSetCoarseGrain when LLAMA_HIP_UMA is enable.
- get x2 on prompte eval and x1.5 on token gen with rocm6.0 on ryzen 7940HX iGPU (780M/gfx1103)

* simplify code, more consistent style

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-06-16 18:19:48 +03:00
96fdb90f5f Allow multiple copy function pointers for CUDA graph kernel param updates (llama/7565)
CUDA graphs require parameter updates to kernels associated with
GGML_OP_CPY nodes. Previously the implementation only checked for a
single CUDA kernel in such nodes, but this caused a bug in cases where
2 such kernels exist. This fixes the issue by using a vector to allow
multiple function pointers to be stored and checked against.

Fixes #7942
2024-06-16 18:19:48 +03:00
e98f9ac554 Fix q_xxs using mul_mat_q (llama/7459) 2024-06-16 18:19:48 +03:00
02d481595b Add freq factors (llama/7495) 2024-06-16 18:19:48 +03:00
7091c7ab5a metal : add GGML_OP_REPEAT kernels (llama/7557)
ggml-ci
2024-06-16 18:19:48 +03:00
d70ccb75f5 metal : disable FA kernel for HS=256 (llama/7556)
ggml-ci
2024-06-16 18:19:48 +03:00
5ee048eb67 ggml : restore ggml_rope_xpos_inplace (ggml/0)
ggml-ci
2024-06-16 18:19:48 +03:00
37ed71c964 ggml: aarch64: SVE kernels for q8_0_q8_0, q4_0_q8_0 vector dot (llama/7433)
* Add SVE support for q4_0_q8_0 q8_0_q8_0

* remove ifdef
2024-06-16 18:19:48 +03:00
8cd7a3df37 ggml : silence UB sanitizer error during iq2_xxs quantization (llama/0) 2024-06-16 18:19:48 +03:00
04a3279320 ggml : remove ggml_flash_attn and ggml_flash_ff (llama/7463)
ggml-ci
2024-06-16 18:19:48 +03:00
45ddda8e0c ggml : drop support for QK_K=64 (llama/7473)
* ggml : drop support for QK_K=64

ggml-ci

* opencl : restore QK_K=256 define
2024-06-16 18:19:48 +03:00
c41317fd66 Update vulkan rope implementation to support frequency factors (llama/7475) 2024-06-16 18:19:48 +03:00
96b8419b27 CUDA: fix FA out-of-bounds reads (llama/7479) 2024-06-16 18:19:48 +03:00
3c63f4cf35 CUDA: fix FA out-of-bounds writes (llama/7465) 2024-06-16 18:19:48 +03:00
5848dfd9c8 cuda : fix compile warning (llama/7454) 2024-06-16 18:19:48 +03:00
29ab5d0326 CUDA: remove incorrect precision check (llama/7454) 2024-06-16 18:19:48 +03:00
c4d6958b3e cuda : fix rope + add tests (llama/7452)
* cuda : fix rope pos data

ggml-ci

* ggml : drop mode & 1 == 1 support for ggml_rope

ggml-ci

* ggml : support freq_factors for f16 rope (CPU)

ggml-ci

* tests : add rope tests using frequency factors

ggml-ci
2024-06-16 18:19:48 +03:00
c9dcb75118 llama : add phi3 128K model support (llama/7225)
* add phi3 128k support in convert-hf-to-gguf

* add phi3 128k support in cuda

* address build warnings on llama.cpp

* adjust index value in cuda long rope freq factors

* add long rope support in ggml cpu backend

* make freq factors only depend on ctx size

* remove unused rope scaling type 'su' frin gguf converter

* fix flint warnings on convert-hf-to-gguf.py

* set to the short freq factor when context size is small than trained context size

* add one line of comments

* metal : support rope freq_factors

* ggml : update ggml_rope_ext API to support freq. factors

* backends : add dev messages to support rope freq. factors

* minor : style

* tests : update to use new rope API

* backends : fix pragma semicolons

* minor : cleanup

* llama : move rope factors from KV header to tensors

* llama : remove tmp assert

* cuda : fix compile warning

* convert : read/write n_head_kv

* llama : fix uninitialized tensors

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-06-16 18:19:48 +03:00
bbdbc3fc62 metal : handle F16 inf values, fix FA partial offload (llama/7434)
ggml-ci
2024-06-16 18:19:48 +03:00
28c207a541 CUDA: fix unused warning in mmq.cu (llama/7442) 2024-06-16 18:19:48 +03:00
c23f830983 CUDA: deduplicate mmq code (llama/7397) 2024-06-16 18:19:48 +03:00
caeeb32b41 rpc : track allocated buffers (llama/7411)
* rpc : track allocated buffers

ref: #7407

* rpc : pack rpc_tensor tightly
2024-06-16 18:19:48 +03:00
584cc1177a Update SYCL upscale operation (llama/7321)
* Update SYCL upscale operation

* Formatting

* Remove messages
2024-06-16 18:19:48 +03:00
cc1ae10989 ggml-opencl, llama: using reserve() if count already known (llama/7272) 2024-06-16 18:19:48 +03:00
eb26f55b40 ggml : add loongarch lsx and lasx support (llama/6454)
* add loongarch lsx and lasx optimize code

* Add loongarch compilation support to makefile

* revert stb_image.h

* opt bytes_from_nibbles_32 and sum_i16_pairs_float

* fix undeclared

* format code

* update

* update 2

---------

Co-authored-by: Jinyang He <hejinyang@loongson.cn>
2024-06-16 18:19:48 +03:00
eb2b086584 Add provisions for windows support for BF16 code including CMake provision for enabling AVX512_BF16 (llama/7258) 2024-06-16 18:19:48 +03:00
67919cfe11 Vulkan Embedding Fix (llama/7360)
* Fix empty Vulkan host buffers

Add fp32 fp16 matmul shader

Fix matmul shader alignment

* Remove deprecated tensor->backend uses

* Fix Vulkan validation errors on embedding models with no offloaded layers

* Fix Vulkan llava segfault when not offloading layers
2024-06-16 18:19:48 +03:00
bf5fc81a8a ggml : fix another case of quants nans (llama/7387) 2024-06-16 18:19:48 +03:00
2b07dc3186 ggml: implement quantized KV cache for FA (llama/7372) 2024-06-16 18:19:48 +03:00
951c463d39 cuda : clear error after buffer allocation failure (llama/7376) 2024-06-16 18:19:48 +03:00
7f257b210f Capture CUDA logging output (llama/7298)
* logging: output capture in cuda module

* fix compile error

* fix: vsnprintf terminates with 0, string use not correct

* post review

* Update llama.cpp

Co-authored-by: slaren <slarengh@gmail.com>

* Update llama.cpp

Co-authored-by: slaren <slarengh@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-06-16 18:19:48 +03:00
705fe30a02 android : use "ci-android" branch for CI (llama/7341)
* android : use "ci-android" branch for CI

* ggml : disable SIMD exp and silu for 32-bit ARM

ggml-ci

* android : do not fetch, use add_subdirectory instead

* cmake : provide binary dir
2024-06-16 18:19:48 +03:00
45b5b95e29 CUDA: deduplicate FlashAttention code (llama/7352) 2024-06-16 18:19:48 +03:00
f2c47d1e6a cuda : add half2 __shfl_xor() for ROCm 5.5 (llama/7263) 2024-06-16 18:19:48 +03:00
b4bb9b9036 Update and fix Vulkan soft_max and argsort implementations (llama/7237)
* Update and fix Vulkan softmax implementation

* Update and fix Vulkan argsort implementation
2024-06-16 18:19:48 +03:00
2bc6483299 ggml : fix quants nans when all the group weights are very close to zero (llama/7313) 2024-06-16 18:19:48 +03:00
ec52f900e4 CUDA: faster large batch FA without tensor cores (llama/7314) 2024-06-16 18:19:48 +03:00
77d708fabb rpc : set SO_REUSEADDR for the server socket (llama/7320)
ref: #7293
2024-06-16 18:19:48 +03:00
c00149c861 ggml-quants, llama : removed excess checks (llama/7274) 2024-06-16 18:19:48 +03:00
574661f2e6 ggml : rewrite silu and softmax for cpu (llama/7154)
This change upstreams llamafile's vectorized expf() functions. This lets
us compute softmax and silu more accurately than the short[65536] lookup
table that GGML previously used to make this operation go faster. We can
support aarch64 and sse2+ with the worst case rounding error of 2ulp. It
makes make -j8 tests && ./tests/test-backend-ops -o SOFT_MAX -b CPU perf
go 1.5x faster for SSE2+FMA, 1.9x faster for AVX2+FMA and 2.1x on AVX512
2024-06-16 18:19:48 +03:00
7bd69349bf rpc : add command line arg for specifying backend memory
ref: #7293
2024-06-16 18:19:48 +03:00
488ad99c13 Add support for properly optimized Windows ARM64 builds with LLVM and MSVC (llama/7191)
* logging: add proper checks for clang to avoid errors and warnings with VA_ARGS

* build: add CMake Presets and toolchian files for Windows ARM64

* matmul-int8: enable matmul-int8 with MSVC and fix Clang warnings

* ci: add support for optimized Windows ARM64 builds with MSVC and LLVM

* matmul-int8: fixed typos in q8_0_q8_0 matmuls

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* matmul-int8: remove unnecessary casts in q8_0_q8_0

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-06-16 18:19:48 +03:00
7178cceeaa ggml : use dynamic thread scheduling for matrix multiplication (llama/6915)
* Just reordering some structs.

* Adding in the calls to mm_pause

* Passing around the state

* Renaming and moving a bunch of variables around.

* Extracting the logic to it's own function.

* Moving some variable definitions into the chunk function.

* Moving some variables around

* moving src1_cont inside

* Moving row_size

* adding the current_chunk

* Reorg the code.

* Formatting to match the orig patch

* starting to setup the chunking variables

* Starting the buildup of the loop

* The yield shouldn't be necessary.

* adding the looping structure based on the chunk configuration.

* Add in the re-chunking code.

* Making it much more likely to rechunk.

* disable resizing if numa is enabled.

* Updating comments with what we've learned.

* Fix formatting

* Couple more formatting fixes.

* More style fixes.

* Fix Warnings

* Going with unused because there's conditional logic that needs it.

* Update ggml.c

* Update ggml.c

---------
2024-06-16 18:19:48 +03:00
8d55ccdb8c Avoid unnecessarily disabling CUDA graphs (llama/7302)
As discussed in PR #6766, CUDA graphs were being disabled in the presence of long prompts.
This fixes the issue by avoiding the consective update counter from incrementing unnecessarily
for tokens in which cuda graphs are disabled due to batch size > 1.
2024-06-16 18:19:48 +03:00
37a72cb170 ggml : tag ggml_tensor::backend as deprecated (llama/7290) 2024-06-16 18:19:48 +03:00
bf9b69284f Add missing " (llama/7303) 2024-06-16 18:19:48 +03:00
c4de1e19df ggml : add ggml_upscale_ext (ggml/814)
* initial commit with CPU implementation of upscale to shape and test, cuda implementation next

* experimental commit to see if dst shape is correct

* test version

* test

* removed unnecessary params

* refactor

* fixed tests

* ggml : metal impl + cleanup + sycl dev warnings

* patched ggml_upscale cuda op to handle non-contiguous tensors, added test for non-contiguous behavior

* metal : fix upsacle op to support nb00 + style

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-06-16 18:19:48 +03:00
5b7073cae1 scripts : update sync 2024-06-16 12:41:42 +03:00
b29b3b2924 whisper : use ggml-cuda in mel calc, set appropriate device (#2236)
* whisper : use ggml-cuda in mel calc, set appropriate device

* whisper : forbid cuda mel calc on devices with compute < 600, workaround for #2230
2024-06-13 13:16:07 +03:00
420b6abc54 cuda : fix HIPBLAS build (#2234) 2024-06-11 19:14:38 +03:00
99804b0f3e cuda : fix bounds check for src0 rows in MMVQ kernel (#2231)
* cuda : fix bounds check for src0 rows in MMVQ kernel

* Update ggml-cuda/mmvq.cu

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2024-06-11 17:39:01 +03:00
c55964c956 ci : fix CUDA builds (#2232) 2024-06-11 17:21:30 +03:00
20c542c713 whisper : auto-grow working areas for mel_calc_cuda (#2227)
* whisper : auto-grow working areas for mel_calc_cuda, fixes #2226

* whisper : only calculate mel spectrogram on GPU if audio is <= 5 min
2024-06-10 21:51:32 +03:00
c2bdb960cd whisper : free whisper_mel instances (#2220) 2024-06-10 11:00:15 +03:00
87acd6d629 whisper : whisper_state/backend fixes (#2217)
* whisper : fixes

* ci : WHISPER_CUBLAS -> WHISPER_CUDA
2024-06-06 18:51:36 +03:00
f842d31171 whisper : calculate mel spectrogram directly into a ggml_tensor (#2208)
* whisper : calculate mel spectrogram directly into a ggml_tensor

* whisper : remove unused temp buffer from state

* whisper : fix not initializing wstate.embd_enc
2024-06-06 16:20:46 +03:00
ffef323c4c whisper : add CUDA-specific computation mel spectrograms (#2206)
* whisper : use polymorphic class to calculate mel spectrogram

* whisper : add cuda-specific mel spectrogram calculation

* whisper : conditionally compile cufftGetErrorString to avoid warnings

* build : add new files to makefile

* ruby : add new files to conf script

* build : fix typo in makefile

* whisper : suppress cub warning for deprecated C++ std in whisper-mel-cuda
2024-06-04 09:32:23 +03:00
af5833e298 whisper : remove speed_up and phase_vocoder* functions (#2198)
* whisper : fix cast warning

* whisper : remove phase_vocoder functions, ref #2195

* whisper : remove speed_up from whisper_full_params, closes #2195
2024-05-31 11:37:29 +03:00
b87494bb8f readme : add conan badge (#2196)
* Add conan badge

* Fix markdown formating
2024-05-30 15:43:28 +03:00
ad130431aa readme : add install instructions for Conan (#2189) 2024-05-30 15:06:15 +03:00
e130b66642 whisper: use global cache for sin/cos vals and Hann window (#2194)
- also rename Hanning to Hann as it's named after Julius von Hann
 as per Wikipedia
2024-05-29 19:09:21 +03:00
838 changed files with 309613 additions and 136861 deletions

View File

@ -12,7 +12,7 @@ FROM ${BASE_CUDA_DEV_CONTAINER} as build
ARG CUDA_DOCKER_ARCH=all
RUN apt-get update && \
apt-get install -y build-essential git cmake
apt-get install -y build-essential git cmake libsdl2-dev wget git
WORKDIR /app
@ -21,8 +21,8 @@ COPY . .
# Set nvcc architecture
ENV CUDA_DOCKER_ARCH=${CUDA_DOCKER_ARCH}
# Enable cuBLAS
ENV WHISPER_CUBLAS=1
ENV GGML_CUDA=1
RUN make
RUN make base.en
ENTRYPOINT ["/app/main"]

View File

@ -14,10 +14,10 @@ ARG CUDA_DOCKER_ARCH=all
# Set nvcc architecture
ENV CUDA_DOCKER_ARCH=${CUDA_DOCKER_ARCH}
# Enable cuBLAS
ENV WHISPER_CUBLAS=1
ENV GGML_CUDA=1
RUN apt-get update && \
apt-get install -y build-essential \
apt-get install -y build-essential libsdl2-dev wget cmake git \
&& rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*
# Ref: https://stackoverflow.com/a/53464012
@ -25,7 +25,7 @@ ENV CUDA_MAIN_VERSION=12.3
ENV LD_LIBRARY_PATH /usr/local/cuda-${CUDA_MAIN_VERSION}/compat:$LD_LIBRARY_PATH
COPY .. .
RUN make
RUN make base.en
FROM ${BASE_CUDA_RUN_CONTAINER} AS runtime
ENV CUDA_MAIN_VERSION=12.3
@ -33,7 +33,7 @@ ENV LD_LIBRARY_PATH /usr/local/cuda-${CUDA_MAIN_VERSION}/compat:$LD_LIBRARY_PATH
WORKDIR /app
RUN apt-get update && \
apt-get install -y curl ffmpeg \
apt-get install -y curl ffmpeg wget cmake git \
&& rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*
COPY --from=build /app /app

View File

@ -2,17 +2,17 @@ FROM ubuntu:22.04 AS build
WORKDIR /app
RUN apt-get update && \
apt-get install -y build-essential \
apt-get install -y build-essential wget cmake git \
&& rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*
COPY .. .
RUN make
RUN make base.en
FROM ubuntu:22.04 AS runtime
WORKDIR /app
RUN apt-get update && \
apt-get install -y curl ffmpeg \
apt-get install -y curl ffmpeg libsdl2-dev wget cmake git \
&& rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*
COPY --from=build /app /app

View File

@ -10,13 +10,13 @@ on:
- whisper.h
jobs:
ubuntu-latest:
runs-on: ubuntu-latest
ubuntu-22:
runs-on: ubuntu-22.04
steps:
- uses: actions/setup-go@v3
- uses: actions/setup-go@v5
with:
go-version: '^1.19'
- uses: actions/checkout@v1
go-version: '^1.23'
- uses: actions/checkout@v4
- run: |
cd bindings/go
make test

View File

@ -3,20 +3,63 @@ on:
push:
paths:
- bindings/ruby/**
- whisper.h
- src/**/*.c
- src/**/*.cpp
- src/**/*.h
- src/**/*.m
- src/**/*.metal
- include/**/*.c
- include/**/*.cpp
- include/**/*.h
- include/**/*.m
- include/**/*.metal
- ggml/**/*.c
- ggml/**/*.cpp
- ggml/**/*.h
- ggml/**/*.m
- ggml/**/*.metal
- scripts/get-flags.mk
- examples/common.h
- examples/common.cpp
- examples/common-whisper.h
- examples/common-whisper.cpp
- examples/stb_vorbis.c
- examples/miniaudio.h
pull_request:
paths:
- bindings/ruby/**
- whisper.h
- src/**/*.c
- src/**/*.cpp
- src/**/*.h
- src/**/*.m
- src/**/*.metal
- include/**/*.c
- include/**/*.cpp
- include/**/*.h
- include/**/*.m
- include/**/*.metal
- ggml/**/*.c
- ggml/**/*.cpp
- ggml/**/*.h
- ggml/**/*.m
- ggml/**/*.metal
- scripts/get-flags.mk
- examples/common.h
- examples/common.cpp
- examples/common-whisper.h
- examples/common-whisper.cpp
- examples/stb_vorbis.c
- examples/miniaudio.h
jobs:
ubuntu-latest:
runs-on: ubuntu-latest
ubuntu-22:
runs-on: ubuntu-22.04
defaults:
run:
working-directory: bindings/ruby
steps:
- uses: ruby/setup-ruby@v1
with:
ruby-version: '3.0'
- uses: actions/checkout@v1
- run: |
cd bindings/ruby/ext
ruby extconf.rb && make
ruby-version: '3.1'
- uses: actions/checkout@v4
- run: rake test

File diff suppressed because it is too large Load Diff

View File

@ -11,14 +11,16 @@ jobs:
name: Push Docker image to Docker Hub
if: github.event.pull_request.draft == false
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
env:
COMMIT_SHA: ${{ github.sha }}
strategy:
matrix:
config:
- { tag: "main", dockerfile: ".devops/main.Dockerfile", platform: "linux/amd64,linux/arm64" }
- { tag: "main-cuda", dockerfile: ".devops/main-cuda.Dockerfile", platform: "linux/amd64" }
- { tag: "main", dockerfile: ".devops/main.Dockerfile", platform: "linux/amd64" }
#TODO: the cuda image keeps failing - disable for now
# https://github.com/ggerganov/whisper.cpp/actions/runs/11019444428/job/30602020339
#- { tag: "main-cuda", dockerfile: ".devops/main-cuda.Dockerfile", platform: "linux/amd64" }
steps:
- name: Check out the repo
@ -26,6 +28,8 @@ jobs:
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
with:
image: tonistiigi/binfmt:qemu-v7.0.0-28
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
@ -43,7 +47,7 @@ jobs:
with:
context: .
push: true
platforms: ${{ matrix.config.platforms }}
platforms: ${{ matrix.config.platform }}
tags: "ghcr.io/${{ github.repository }}:${{ matrix.config.tag }}-${{ env.COMMIT_SHA }}"
file: ${{ matrix.config.dockerfile }}
@ -52,6 +56,6 @@ jobs:
with:
context: .
push: ${{ github.event_name == 'push' }}
platforms: ${{ matrix.config.platforms }}
platforms: ${{ matrix.config.platform }}
tags: "ghcr.io/${{ github.repository }}:${{ matrix.config.tag }}"
file: ${{ matrix.config.dockerfile }}

91
.github/workflows/examples-wasm.yml vendored Normal file
View File

@ -0,0 +1,91 @@
name: Examples WASM
on:
push:
branches: ["master"]
workflow_dispatch:
permissions:
contents: read
pages: write
id-token: write
concurrency:
group: "pages"
cancel-in-progress: false
jobs:
deploy-wasm-github-pages:
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Pages
uses: actions/configure-pages@v4
- name: Setup emsdk
uses: mymindstorm/setup-emsdk@v14
- name: Build WASM Examples
# Enable for real build later in whisper.cpp
run: |
mkdir -p build-em && cd build-em
emcmake cmake .. -DCMAKE_BUILD_TYPE=Release
make -j
- name: Create staging directory
run: mkdir -p staging
- name: Create .nojekyll file in staging directory
run: touch staging/.nojekyll
- name: Copy application files
run: |
build_dir=build-em/bin
ls ${build_dir}
# command.wasm
target_dir=staging/command.wasm
mkdir -p ${target_dir}
cp ${build_dir}/command.wasm/{index.html,command.js,helpers.js} ${target_dir}
cp ${build_dir}/libcommand.js ${target_dir}
# bench.wasm
target_dir=staging/bench.wasm
mkdir -p ${target_dir}
cp ${build_dir}/bench.wasm/{index.html,bench.js,helpers.js} ${target_dir}
cp ${build_dir}/libbench.js ${target_dir}
# stream.wasm
target_dir=staging/stream.wasm
mkdir -p ${target_dir}
cp ${build_dir}/stream.wasm/{index.html,stream.js,helpers.js} ${target_dir}
cp ${build_dir}/libstream.js ${target_dir}
# whisper.wasm (this will be the main example page)
target_dir=staging
mkdir -p ${target_dir}
cp ${build_dir}/whisper.wasm/{index.html,main.js,helpers.js} ${target_dir}
cp ${build_dir}/libmain.js ${target_dir}
# Copy Cross-Origin Isolation service worker
cp -v examples/coi-serviceworker.js staging/
- name: List files in staging directory (for debugging)
run: |
echo "Files in staging directory:"
find staging -type f | sort
- name: Upload artifact
uses: actions/upload-pages-artifact@v3
with:
path: ./staging
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v4

View File

@ -10,8 +10,8 @@ on:
- whisper.h
jobs:
addon_node-ubuntu-latest:
runs-on: ubuntu-latest
addon_node-ubuntu-22:
runs-on: ubuntu-22.04
strategy:
matrix:
node-version: [ 16.x, 18.x ]
@ -22,7 +22,7 @@ jobs:
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install build-essential
sudo apt-get install build-essential git
sudo apt-get install cmake
sudo apt-get install libsdl2-dev

19
.gitignore vendored
View File

@ -1,31 +1,28 @@
*.o
*.a
*.d
.cache/
.coreml/
.test/
.venv/
.vs/
.vscode/
.DS_Store
.vimspector.json
/CMakeSettings.json
/talk-llama.dSYM/
build/
build-coreml/
build-em/
build-debug/
build-release/
build-rwdi/
build-static/
build-cublas/
build-no-accel/
build-sanitize-addr/
build-sanitize-thread/
build-*/
# SPM
.build/
.swiftpm
*.metallib
ggml-metal-embed.metal
ggml-metal-embed.metal.tmp
/main
/stream
/command
@ -61,3 +58,5 @@ cmake-build-debug/
.cxx/
.gradle/
local.properties
.log
.exe

3
.gitmodules vendored
View File

@ -1,3 +0,0 @@
[submodule "bindings/ios"]
path = bindings/ios
url = https://github.com/ggerganov/whisper.spm

211
AUTHORS
View File

@ -1,34 +1,51 @@
# date: Tue Apr 9 20:27:03 EEST 2024
# date: Tue Feb 4 13:03:35 EET 2025
# this file is auto-generated by scripts/gen-authors.sh
0/0 <zero@imaskeleton.me>
0cc4m <picard12@live.de>
0xsourcecode <134374803+0xsourcecode@users.noreply.github.com>
65a <10104049+65a@users.noreply.github.com>
AIWintermuteAI <32562299+AIWintermuteAI@users.noreply.github.com>
AT <manyoso@users.noreply.github.com>
Aarni Koskela <akx@iki.fi>
Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Aaron Taylor <aaron@exphat.com>
Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
Abitofevrything <54505189+abitofevrything@users.noreply.github.com>
Adam Jones <domdomegg+git@gmail.com>
Adrien Gallouët <adrien@gallouet.fr>
Adrien Gallouët <angt@huggingface.co>
AfryMask <AfryMask@163.com>
Ahmad Bilal <ahmad.bilal@empglabs.com>
Ahmad Tameem <113388789+Tameem-10xE@users.noreply.github.com>
AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com>
AidanBeltonS <aidan.belton@codeplay.com>
Akarshan Biswas <akarshan.biswas@gmail.com>
Akarshan Biswas <akarshanbiswas@fedoraproject.org>
Akash Mahajan <akash7190@gmail.com>
Akash Mahajan <akashmjn@stanford.edu>
Al Hoang <3811822-hoanga@users.noreply.gitlab.com>
Alan <unknown>
Albert Jin <albert.jin@gmail.com>
Alberto Cabrera Pérez <alberto.cabrera@codeplay.com>
Alberto Cabrera Pérez <alberto.cabrera@intel.com>
Aleksander Andrzejewski <18704749+aleksanderandrzejewski@users.noreply.github.com>
Alex Azarov <alex@azarov.by>
Alex Bacart <13940752+alex-bacart@users.noreply.github.com>
Alex Evgrashin <aevgrashin@yandex.ru>
Alex O'Connell <35843486+acon96@users.noreply.github.com>
Alexandr Graschenkov <alexandr.graschenkov91@gmail.com>
Alexandru Mariuti <alex@mariuti.com>
Alexey Kharlamov <alexey@kharlamov.biz>
Alfredo Montesinos <alfredo.montesinos@g.austincc.edu>
Ali Alameh <ali.alameh@isae.edu.lb>
Alter <0x7c48@gmail.com>
Ananta Bastola <anantarajbastola@gmail.com>
Andreas Kieslinger <47689530+aendk@users.noreply.github.com>
Andreas Lubbe <git@lubbe.org>
Andreu Huguet <andreuhuguet@gmail.com>
Andrew Huynh <a5thuynh@gmail.com>
Andrew Minh Nguyen <40281306+amqdn@users.noreply.github.com>
Andrew S <andrews54757@gmail.com>
Andy Maloney <asmaloney@gmail.com>
Anton Kostin <masguit42@users.noreply.github.com>
@ -40,8 +57,11 @@ AustinMroz <austinmroz@utexas.edu>
Avik Sengupta <avik@sengupta.net>
Bader-eddine Ouaich <49657842+baderouaich@users.noreply.github.com>
Baffin Lee <baffinlee@gmail.com>
Ben Ashbaugh <ben.ashbaugh@intel.com>
Ben Nortier <bjnortier@gmail.com>
Benjamin Heiniger <benjamin.heiniger@bluewin.ch>
Bernhard M. Wiedemann <githubbmwprimary@lsmod.de>
Binozo <70137898+Binozo@users.noreply.github.com>
Bo-Yi Wu <appleboy.tw@gmail.com>
Boris Bliznioukov <blib@mail.com>
Borislav Stanimirov <b.stanimirov@abv.bg>
@ -49,47 +69,86 @@ Brad Murray <59848399+bradmurray-dt@users.noreply.github.com>
Brian Murray <brian@bmurray.ca>
CRD716 <crd716@gmail.com>
Canis Lupus <Canis-UK@users.noreply.github.com>
Carlos Zoido <mrgalleta@gmail.com>
Carolinabanana <140120812+Carolinabanana@users.noreply.github.com>
CarterLi999 <664681047@qq.com>
ChangSeok Oh <shivamidow@users.noreply.github.com>
Changyeon Kim <cyzero.kim@samsung.com>
Chaoqun <27287694+OpenWaygate@users.noreply.github.com>
Charles Xu <63788048+chaxu01@users.noreply.github.com>
Charles Xu <charles.xu@arm.com>
Chen Xi <xi2.chen@intel.com>
Chen Xi <xixichen08@foxmail.com>
Chenguang Li <87689256+noemotiovon@users.noreply.github.com>
Chia-Hsiang Cheng <88014292+garychia@users.noreply.github.com>
Chidi Williams <williamschidi1@gmail.com>
Chris Elrod <elrodc@gmail.com>
Christian <12550267+iceychris@users.noreply.github.com>
Christian Kastner <ckk@kvr.at>
Clifford Heath <clifford.heath@gmail.com>
Clint Herron <hanclinto@gmail.com>
Colin <github@whoisc.cc>
Conrad Kramer <conrad@conradkramer.com>
Corey Earwood <iamcgn+github@gmail.com>
CrispStrobe <154636388+CrispStrobe@users.noreply.github.com>
DAN™ <dranger003@gmail.com>
DGdev91 <DGdev91@users.noreply.github.com>
Damian Czaja <trojan295@protonmail.com>
Dan Johansson <164997844+eddnjjn@users.noreply.github.com>
Dan Johansson <dan.johansson@arm.com>
Daniel Bevenius <daniel.bevenius@gmail.com>
Daniel Valdivia <18384552+dvaldivia@users.noreply.github.com>
Daniel Ziegenberg <daniel@ziegenberg.at>
Daniele <57776841+daniandtheweb@users.noreply.github.com>
Dave <dave-fl@users.noreply.github.com>
Dave Airlie <airlied@gmail.com>
Dave Airlie <airlied@redhat.com>
Daven Sanassy <daven@vochlea.co.uk>
David <dnhkng@gmail.com>
David Thorpe <djt@mutablelogic.com>
DavidKorczynski <david@adalogics.com>
Davidson Francis <davidsondfgl@gmail.com>
Dener Stassun <denerstassun@gmail.com>
Dibakar Gope <dibakar.gope@arm.com>
Didzis Gosko <didzis@users.noreply.github.com>
Diego Devesa <slarengh@gmail.com>
Digipom <admin@digipom.com>
Dimo <dimo@ieee.org>
Djip007 <3705339+Djip007@users.noreply.github.com>
Djip007 <djip.perois@free.fr>
Dody Suria Wijaya <dodysw@gmail.com>
Dou Xinpeng <15529241576@163.com>
Dou Xinpeng <81913537+Dou-Git@users.noreply.github.com>
Dr. Tom Murphy VII Ph.D <499244+tom7@users.noreply.github.com>
Duncan McConnell <ddmcconnell4@gmail.com>
Egor Egorov <me@egorfine.com>
Elkana Bardugo <ttv200@gmail.com>
Emmanuel Schmidbauer <eschmidbauer@gmail.com>
Engininja2 <139037756+Engininja2@users.noreply.github.com>
Eric Curtin <ericcurtin17@gmail.com>
Eric Swanson <eswanson@alloscomp.com>
Eric Tendian <erictendian@gmail.com>
Eric Zhang <34133756+EZForever@users.noreply.github.com>
Erik Scholz <Green-Sky@users.noreply.github.com>
Evan Jones <evan.q.jones@gmail.com>
Evan Martin <evan.martin@gmail.com>
Eve <139727413+netrunnereve@users.noreply.github.com>
Evgeny Kuznetsov <evgeny@kuznetsov.md>
F1L1P <78918286+F1L1Pv2@users.noreply.github.com>
Faisal Zaghloul <quic_fzaghlou@quicinc.com>
Fangjun Kuang <csukuangfj@gmail.com>
Felix <stenbackfelix@gmail.com>
Finn Voorhees <finnvoorhees@gmail.com>
FirstTimeEZ <179362031+FirstTimeEZ@users.noreply.github.com>
FlippFuzz <41221030+FlippFuzz@users.noreply.github.com>
Frankie Robertson <frankier@users.noreply.github.com>
Gang Chen <goncha@gmail.com>
Gavin Cai <gavin1818@hotmail.com>
George Hindle <george@georgehindle.com>
Georgi Gerganov <ggerganov@gmail.com>
Gilad S <7817232+giladgd@users.noreply.github.com>
Gilad S <giladgd@users.noreply.github.com>
Gilad S. <7817232+giladgd@users.noreply.github.com>
GitAritron <103900385+GitAritron@users.noreply.github.com>
GiviMAD <GiviMAD@users.noreply.github.com>
Gleicon Moraes <gleicon@gmail.com>
@ -98,41 +157,66 @@ Guillaume Wenzek <gwenzek@users.noreply.github.com>
HY. Kelvin Lee <34256578+hykelvinlee42@users.noreply.github.com>
Halalaluyafail3 <55773281+Halalaluyafail3@users.noreply.github.com>
Hang <bebound@gmail.com>
Haus1 <haus.xda@gmail.com>
Herman Semenov <GermanAizek@yandex.ru>
HimariO <dsfhe49854@gmail.com>
Hong Bo PENG <penghb@cn.ibm.com>
Hrishikesh Barman <geekodour@users.noreply.github.com>
Hugo <hugo@whynothugo.nl>
Ian Bicking <ian@ianbicking.org>
Ian Bull <irbull@eclipsesource.com>
Ihar Hrachyshka <ihrachys@redhat.com>
Ikko Ashimine <eltociear@gmail.com>
Ikko Eltociear Ashimine <eltociear@gmail.com>
InconsolableCellist <23345188+InconsolableCellist@users.noreply.github.com>
Ismatulla Mansurov <47342870+sapoepsilon@users.noreply.github.com>
Ivan <nekotekina@gmail.com>
Ivan Filipov <159561759+vanaka11@users.noreply.github.com>
Ivan Gorin <ivangorin21@gmail.com>
Ivo von Putzer Reibegg <ivo.putzer@gmail.com>
JJ <103335846+computerscienceiscool@users.noreply.github.com>
Jack Mousseau <jmousseau@users.noreply.github.com>
JacobLinCool <jacoblincool@gmail.com>
Jakub Ráček <blizzcz@gmail.com>
Jared Van Bortel <jared@nomic.ai>
Jay Binks <jaybinks@gmail.com>
Jayant <jayantyadav202@gmail.com>
Jeff Bolz <jbolz@nvidia.com>
Jeroen Mostert <jeroen.mostert@cm.com>
Jhen-Jie Hong <developer@jhen.me>
Jhen-Jie Hong <iainst0409@gmail.com>
JidongZhang-THU <1119708529@qq.com>
Jo Liss <joliss42@gmail.com>
Joe Todd <joe.todd@codeplay.com>
Johan <jr.raffin@gmail.com>
Johannes Gäßler <johannesg@5d6.de>
John Balis <phobossystems@gmail.com>
JohnnyB <jboero@users.noreply.github.com>
Jonathan Soo <jcsoo@agora.com>
Jonno <1160532+razodactyl@users.noreply.github.com>
Joonas Pihlajamaa <joonas.pihlajamaa@iki.fi>
Jose <34888496+Jerry-Master@users.noreply.github.com>
Josh Bleecher Snyder <josharian@gmail.com>
Josscii <jossciiweiyi@gmail.com>
Judd <foldl@users.noreply.github.com>
Jumper775 <78500318+jumpers775@users.noreply.github.com>
Jun Hee Yoo <contact.jhyoo@gmail.com>
Junil Kim <logyourself@gmail.com>
Justina Cho <justcho5@gmail.com>
Justine Tunney <jtunney@gmail.com>
Justine Tunney <jtunney@mozilla.com>
KITAITI Makoto <KitaitiMakoto@gmail.com>
KP Kaiser <kirk@zothcorp.com>
Kamilake <exjang0@gmail.com>
Karol Kontny <82021046+kkontny@users.noreply.github.com>
Karthick <j.karthic2004@gmail.com>
Kartik Saranathan <278928+Kartiku@users.noreply.github.com>
Kasumi <90275229+kasumi-1@users.noreply.github.com>
Kawrakow <48489457+ikawrakow@users.noreply.github.com>
Kendrick Taylor <kendrick@circuitsix.com>
Kevin Brothaler <admin@digipom.com>
Kevin Gibbons <bakkot@gmail.com>
Konosuke Sakai <konosuke@konosuke.work>
Konstantin Zhuravlyov <konstantin.zhuravlyov@amd.com>
Kreijstal <rainb@tfwno.gf>
Kylin <56434533+KyL0N@users.noreply.github.com>
@ -147,56 +231,110 @@ Luis Herrera <herrera-luis@users.noreply.github.com>
Lukas Rist <glaslos@gmail.com>
M. A. Ali <73258591+MightyStud@users.noreply.github.com>
M. Eren Akbiyik <erenakbiyik@gmail.com>
Ma Mingfei <mingfei.ma@intel.com>
Maciek <maciek.mab122@gmail.com>
Mahesh Madhav <67384846+heshpdx@users.noreply.github.com>
Marcin Mielniczuk <marmistrz.dev@zoho.eu>
Mark Karpelès <MagicalTux@users.noreply.github.com>
Mark Zhuang <zhuangqiubin@gmail.com>
Markus Tavenrath <mtavenrath@users.noreply.github.com>
Martin Delille <martin@delille.org>
Martin Warnaar <martinwarnaar@gmail.com>
Masaya, Kato <62578291+msy-kato@users.noreply.github.com>
Matheus de Sousa <23645013+keyehzy@users.noreply.github.com>
Mathieu Baudier <mbaudier@argeo.org>
Mathijs de Bruin <mathijs@mathijsfietst.nl>
Matija Pevec <mightymatth@users.noreply.github.com>
Matt Stephenson <mstephenson6@users.noreply.github.com>
Max Krasnyansky <max.krasnyansky@gmail.com>
Max Krasnyansky <quic_maxk@quicinc.com>
Maximiliano Levi <8160966+maxilevi@users.noreply.github.com>
Meng, Hengyu <hengyu.meng@intel.com>
Mengqing Cao <cmq0113@163.com>
Michael Podvitskiy <podvitskiymichael@gmail.com>
Michael Rienstra <mrienstra@gmail.com>
Mikhail Grigorev <sleuthhound@gmail.com>
Mohammadreza Hendiani <hendiani.mohammadreza@gmail.com>
Mohit Agarwal <mohit@sdf.org>
Molly Sophia <mollysophia379@gmail.com>
Murilo Santana <mvrilo@gmail.com>
NETZkultur GmbH <mulholland@netzkultur.de>
Natsu <chino@hotococoa.moe>
Neil Chudleigh <nchudleigh@users.noreply.github.com>
Neo Zhang <14088817+arthw@users.noreply.github.com>
Neo Zhang Jianyu <jianyu.zhang@intel.com>
Neuman Vong <neuman.vong@gmail.com>
Nicholai Tukanov <nicholaitukanov@gmail.com>
Nicholas Albion <nalbion@yahoo.com>
Nico Bosshard <nico@bosshome.ch>
Nicolò Scipione <nicolo.scipione@codeplay.com>
Niels Mayer <Niels.Mayer@gmail.com>
Nikita Sarychev <42014488+sARY77@users.noreply.github.com>
Nikolaj Olsson <nikse.dk@gmail.com>
Okabintaro <103938900+Okabintaro@users.noreply.github.com>
Oleg Sidorov <me@whitebox.io>
Oleg Sidorov <oleg@sidorov.nl>
Olivier Chafik <ochafik@users.noreply.github.com>
Ondrej Kokes <ondrej.kokes@gmail.com>
Ouadie EL FAROUKI <ouadie.elfarouki@codeplay.com>
PAB <pierreantoine.bannier@gmail.com>
Paul Tsochantaris <ptsochantaris@icloud.com>
Pedro Probst <pprobst@insiberia.net>
Peng <hzp1024@qq.com>
Peter <peter277@users.noreply.github.com>
Philipp Zabel <philipp.zabel@gmail.com>
Philippe Normand <phil@base-art.net>
Philippe Normand <philn@igalia.com>
Plamen Minev <pacominev@gmail.com>
Prashant Vithule <119530321+Vithulep@users.noreply.github.com>
Przemysław Pawełczyk <przemoc@gmail.com>
Qianhe Chen <54462604+chenqianhe@users.noreply.github.com>
R0CKSTAR <xiaodong.ye@mthreads.com>
R0CKSTAR <yeahdongcn@gmail.com>
Radoslav Gerganov <rgerganov@gmail.com>
Radosław Gryta <radek.gryta@gmail.com>
Rahul Vadhyar <107788610+RahulVadhyar@users.noreply.github.com>
Raiya Araki <83504221+rai62@users.noreply.github.com>
Reinforce-II <fate@eastal.com>
Reinis Muiznieks <muiznieks.reinis@gmail.com>
RelatedTitle <r3latedtitle@gmail.com>
Rémy Oudompheng <oudomphe@phare.normalesup.org>
RhinoDevel <RhinoDevel@users.noreply.github.com>
Rich Jones <miserlou@gmail.com>
Robert Ormandi <52251610+ormandi@users.noreply.github.com>
Robin <robin.xw@hotmail.com>
Roddur Dasgupta <roddurd@gmail.com>
Roland Rabien <figbug@gmail.com>
Romain Biessy <romain.biessy@codeplay.com>
Ronsor <ronsor@ronsor.pw>
Rotem Dan <rotemdan@gmail.com>
Ryan Hitchman <hitchmanr@gmail.com>
Ryan Metcalfe <107415876+RyanMetcalfeInt8@users.noreply.github.com>
RyanChang <ftes90015@gmail.com>
SRHMorris <69468379+SRHMorris@users.noreply.github.com>
SXX <sxx1136965276@gmail.com>
Sacha Arbonel <sacha.arbonel@hotmail.fr>
Salman Faroz <stsfaroz@gmail.com>
Salvatore Mesoraca <s.mesoraca16@gmail.com>
Sam <49637763+Onlyartist9@users.noreply.github.com>
Sam Pullara <spullara@gmail.com>
Samuel Durante <44513615+samueldurantes@users.noreply.github.com>
Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Sandro Hanea <40202887+sandrohanea@users.noreply.github.com>
Sergio López <slp@redhat.com>
Sergio López <slp@sinrega.org>
Shanshan Shen <467638484@qq.com>
Shijie <821898965@qq.com>
Shupei Fan <dymarkfan@outlook.com>
Siddharth Ramakrishnan <srr2141@columbia.edu>
Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Simon Moisselin <simon.moisstoll@gmail.com>
Sindre Sorhus <sindresorhus@gmail.com>
Slava Primenko <primenko.s@gmail.com>
Srihari-mcw <96763064+Srihari-mcw@users.noreply.github.com>
Stavros Panakakis <53979866+Stavrospanakakis@users.noreply.github.com>
Stefan Sydow <s.sydow@heinlein-video.de>
Stefan Sydow <stefan@sydow.email>
Syahmi Azhar <prsyahmi@gmail.com>
Syed Jafri <syedjafri97@gmail.com>
Sơn Phan Trung <phantrungson17@gmail.com>
@ -205,37 +343,63 @@ Takeshi Inoue <inoue.takeshi@gmail.com>
Tamotsu Takahashi <ttakah+github@gmail.com>
Taras Glek <taras@thegp.com>
Tauseef Mohiuddin <35351464+tauseefmohammed2@users.noreply.github.com>
Thamster <Thamster@users.noreply.github.com>
Thijs Raymakers <thijs@raymakers.nl>
Thomas Fitzsimmons <fitzsim@fitzsim.org>
Tiago Fassoni <tiagofassoni@users.noreply.github.com>
Tienshiao Ma <tienshiao@tienshiao.org>
Tim Miller <drasticactions@users.noreply.github.com>
Timothy Cronin <40186632+4imothy@users.noreply.github.com>
Tobrun <tobrun.van.nuland@gmail.com>
Todd <taf2@users.noreply.github.com>
Toliver <teejae@gmail.com>
Tong Li <31761981+litongjava@users.noreply.github.com>
Tony Wasserka <4840017+neobrain@users.noreply.github.com>
Topping1 <78745143+Topping1@users.noreply.github.com>
Travis Cline <travis.cline@gmail.com>
UEXTM.com <84163508+uextm@users.noreply.github.com>
UsernamesLame <156965854+UsernamesLame@users.noreply.github.com>
Vadim Peretokin <vperetokin@hey.com>
Valentin Gosu <1454649+valenting@users.noreply.github.com>
Vin Misra <vinith@alum.mit.edu>
Vulcan <93451215+trholding@users.noreply.github.com>
WhiteOlivierus <36532695+WhiteOlivierus@users.noreply.github.com>
William Tambellini <william.tambellini@gmail.com>
William Tambellini <wtambellini@sdl.com>
Wilson Silva <wilson.dsigns@gmail.com>
Xiang (Kevin) Li <kevinli020508@gmail.com>
Xiao-Yong Jin <jinxiaoyong@gmail.com>
XiaotaoChen <chenxiaotao1234@gmail.com>
Xingchen Song(宋星辰) <xingchensong1996@163.com>
Xinpeng Dou <81913537+Dou-Git@users.noreply.github.com>
Xuan Son Nguyen <thichthat@gmail.com>
Yajing Tang <phillis@google.com>
Yang Shen <aplshenyang@gmail.com>
Yunès <jean.baptiste.yunes@free.fr>
Yuri Khrustalev <ykhrustalev@users.noreply.github.com>
Yusuf Redžić <48274562+redzic@users.noreply.github.com>
ZaBlazzingZephyrus <119159668+blazingzephyr@users.noreply.github.com>
Zhenwei Jin <109658203+kylo5aby@users.noreply.github.com>
Zhiyuan Li <lizhiyuan@uniartisan.com>
Zhiyuan Li <uniartisan2017@gmail.com>
Zigfrid Zvezdin <ziggerZZ@gmail.com>
Zollner <24618122+Zolliner@users.noreply.github.com>
a3sh <38979186+A3shTnT@users.noreply.github.com>
ag2s20150909 <19373730+ag2s20150909@users.noreply.github.com>
agray3 <agray3@users.noreply.github.com>
ai-at-home <149282006+ai-at-home@users.noreply.github.com>
aldorof <aldorof@users.noreply.github.com>
alonfaraj <alonfaraj@gmail.com>
amd-dwang <dong.wang@amd.com>
amritahs-ibm <amritahs@linux.vnet.ibm.com>
andypayne <apayne@gmail.com>
ardfork <134447697+ardfork@users.noreply.github.com>
arizhih <40765267+arizhih@users.noreply.github.com>
automaticcat <daogiatuank54@gmail.com>
bandoti <141645996+bandoti@users.noreply.github.com>
be-next <jerome.ramette@gmail.com>
bert hubert <bert@hubertnet.nl>
billyct <billy_allen@126.com>
bmwl <brian.marshall@tolko.com>
bobqianic <129547291+bobqianic@users.noreply.github.com>
bocytko <bocytko+github@gmail.com>
@ -248,7 +412,9 @@ byte-6174 <88070277+byte-6174@users.noreply.github.com>
cdosoftei <ciprian.dosoftei@gmail.com>
clach04 <Chris.Clark@actian.com>
compilade <113953597+compilade@users.noreply.github.com>
compilade <git@compilade.net>
conradg <conradjgodfrey@gmail.com>
crummyh <elijah@crums.us>
ddpasa <112642920+ddpasa@users.noreply.github.com>
denersc <denerstassun@gmail.com>
dscripka <dscripka@users.noreply.github.com>
@ -256,28 +422,55 @@ duthils <duthils@duthils.net>
ecneladis <ecneladis@users.noreply.github.com>
faker <nspyia2002@gmail.com>
fitzsim <fitzsim@fitzsim.org>
fj-y-saito <85871716+fj-y-saito@users.noreply.github.com>
fraxy-v <65565042+fraxy-v@users.noreply.github.com>
genevera (she/her) <genevera@users.noreply.github.com>
geniusnut <geniusnut@gmail.com>
gilbertgong <gilbert.gong@gmail.com>
gn64 <yukikaze.jp@gmail.com>
goldwaving <77494627+goldwaving@users.noreply.github.com>
greeshmay <greeshmay@gmail.com>
haopeng <657407891@qq.com>
hipudding <huafengchun@gmail.com>
hsinhoyeh <yhh92u@gmail.com>
hydai <z54981220@gmail.com>
iamthad <thadeus.j.fleming@gmail.com>
issixx <46835150+issixx@users.noreply.github.com>
james wolf <contractorwolf@hotmail.com>
jdomke <28772296+jdomke@users.noreply.github.com>
jettoblack <jettoblack@gmail.com>
jiez <373447296@qq.com>
joecryptotoo <80373433+joecryptotoo@users.noreply.github.com>
jorismertz <35079666+jorismertz@users.noreply.github.com>
junchao-loongson <68935141+junchao-loongson@users.noreply.github.com>
junkfood <69683722+JunkFood02@users.noreply.github.com>
jwijffels <jwijffels@bnosac.be>
k.h.lai <adrian.k.h.lai@outlook.com>
kamranjon <kamranjon@gmail.com>
katsu560 <katsu560oo-@docomo.ne.jp>
kennethge <57784063+kenneth-ge@users.noreply.github.com>
keyehzy <msamuel@aluno.puc-rio.br>
kunnis <kunnis@users.noreply.github.com>
l3utterfly <gc.pthzfoldr@gmail.com>
leejet <leejet714@gmail.com>
leo-pony <nengjunma@outlook.com>
lhez <quic_lih@quicinc.com>
litong <31761981+litongjava@users.noreply.github.com>
liuwei-git <14815172+liuwei-git@users.noreply.github.com>
lnyan <lkwq007@gmail.com>
luoyu-intel <yu.luo@intel.com>
m.bell <m.bell@techsmith.com>
mahorozte <41834471+mahorozte@users.noreply.github.com>
mashizora <30516315+mashizora@users.noreply.github.com>
matt23654 <matthew.webber@protonmail.com>
matteo <matteogeniaccio@yahoo.it>
mgrachten <maarten@grachten.eu>
mkiol <mkiol@users.noreply.github.com>
mky_coder <47767389+mkycoder@users.noreply.github.com>
novag <7754358+novag@users.noreply.github.com>
pajowu <pajowu@pajowu.de>
pengxin99 <pengxin.yuan@intel.com>
petterreinholdtsen <pere-github@hungry.com>
polarmoon <90010972+polarmoon@users.noreply.github.com>
rlapray <lapray.romain@gmail.com>
sandrohanea <40202887+sandrohanea@users.noreply.github.com>
@ -287,15 +480,31 @@ shikokuchuo <53399081+shikokuchuo@users.noreply.github.com>
slaren <slarengh@gmail.com>
slashlib <slashlib@users.noreply.github.com>
snadampal <87143774+snadampal@users.noreply.github.com>
someone13574 <81528246+someone13574@users.noreply.github.com>
st-gr <38470677+st-gr@users.noreply.github.com>
stduhpf <stephduh@live.fr>
stormofice <58337328+stormofice@users.noreply.github.com>
texmex76 <40733439+texmex76@users.noreply.github.com>
thefinaldegree <thefinaldegree@gmail.com>
thewh1teagle <61390950+thewh1teagle@users.noreply.github.com>
toboil-features <160222185+toboil-features@users.noreply.github.com>
trixirt <trix@redhat.com>
ulatekh <ulatekh@yahoo.com>
undef <undefdev@gmail.com>
uvos <devnull@uvos.xyz>
uvos <philipp@uvos.xyz>
valVk <valVk@users.noreply.github.com>
venkr <venkateshrameshkumar+1@gmail.com>
vicalloy <zbirder@gmail.com>
wangshuai09 <391746016@qq.com>
woachk <24752637+woachk@users.noreply.github.com>
xctan <axunlei@gmail.com>
xdrudis <xavierdrudis@yahoo.es>
yuri@FreeBSD <yuri@FreeBSD>
zhangjixiong <code.zjx@gmail.com>
zhentaoyu <zhentao.yu@intel.com>
zhouwg <6889919+zhouwg@users.noreply.github.com>
zhouwg <zhouwg2000@gmail.com>
谢乃闻 <sienaiwun@users.noreply.github.com>
布客飞龙 <562826179@qq.com>
Артём Земляк <azemlyak@smart-consulting.ru>

View File

@ -1,25 +1,31 @@
cmake_minimum_required (VERSION 3.5)
cmake_minimum_required(VERSION 3.5) # for add_link_options and implicit target directories.
project("whisper.cpp" C CXX)
project("whisper.cpp" VERSION 1.7.4)
include(CheckIncludeFileCXX)
# Allow for the creation of solution folders.
set_property(GLOBAL PROPERTY USE_FOLDERS ON)
project(whisper.cpp VERSION 1.6.2)
set(SOVERSION 1)
#set(CMAKE_WARN_DEPRECATED YES)
set(CMAKE_WARN_UNUSED_CLI YES)
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
if (NOT XCODE AND NOT MSVC AND NOT CMAKE_BUILD_TYPE)
set(CMAKE_BUILD_TYPE Release CACHE STRING "Build type" FORCE)
set_property(CACHE CMAKE_BUILD_TYPE PROPERTY STRINGS "Debug" "Release" "MinSizeRel" "RelWithDebInfo")
endif()
# Add path to modules
list(APPEND CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/cmake/")
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/bin)
if(CMAKE_SOURCE_DIR STREQUAL CMAKE_CURRENT_SOURCE_DIR)
if (CMAKE_SOURCE_DIR STREQUAL CMAKE_CURRENT_SOURCE_DIR)
set(WHISPER_STANDALONE ON)
include(GitVars)
include(BuildTypes)
include(git-vars)
# configure project version
if (EXISTS "${CMAKE_SOURCE_DIR}/bindings/ios/Makefile-tmpl")
configure_file(${CMAKE_SOURCE_DIR}/bindings/ios/Makefile-tmpl ${CMAKE_SOURCE_DIR}/bindings/ios/Makefile @ONLY)
endif()
configure_file(${CMAKE_SOURCE_DIR}/bindings/javascript/package-tmpl.json ${CMAKE_SOURCE_DIR}/bindings/javascript/package.json @ONLY)
else()
set(WHISPER_STANDALONE OFF)
@ -29,6 +35,16 @@ if (EMSCRIPTEN)
set(BUILD_SHARED_LIBS_DEFAULT OFF)
option(WHISPER_WASM_SINGLE_FILE "whisper: embed WASM inside the generated whisper.js" ON)
# TODO: without these, we get the following error:
# wasm-ld: error: --shared-memory is disallowed by whisper.cpp.o because it was not compiled with 'atomics' or 'bulk-memory' features.
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -pthread")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -pthread")
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -s TOTAL_STACK=5242880")
set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -s TOTAL_STACK=5242880")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-deprecated")
else()
if (MINGW)
set(BUILD_SHARED_LIBS_DEFAULT OFF)
@ -37,751 +53,144 @@ else()
endif()
endif()
# options
option(BUILD_SHARED_LIBS "build shared libraries" ${BUILD_SHARED_LIBS_DEFAULT})
if (APPLE)
set(WHISPER_METAL_DEFAULT ON)
else()
set(WHISPER_METAL_DEFAULT OFF)
endif()
#
# option list
#
option(BUILD_SHARED_LIBS "whisper: build shared libs" ${BUILD_SHARED_LIBS_DEFAULT})
# general
option(WHISPER_CCACHE "whisper: use ccache if available" ON)
# debug
option(WHISPER_ALL_WARNINGS "whisper: enable all compiler warnings" ON)
option(WHISPER_ALL_WARNINGS_3RD_PARTY "whisper: enable all compiler warnings in 3rd party libs" OFF)
option(WHISPER_SANITIZE_THREAD "whisper: enable thread sanitizer" OFF)
option(WHISPER_SANITIZE_ADDRESS "whisper: enable address sanitizer" OFF)
option(WHISPER_SANITIZE_UNDEFINED "whisper: enable undefined sanitizer" OFF)
option(WHISPER_BUILD_TESTS "whisper: build tests" ${WHISPER_STANDALONE})
option(WHISPER_BUILD_EXAMPLES "whisper: build examples" ${WHISPER_STANDALONE})
option(WHISPER_SDL2 "whisper: support for libSDL2" OFF)
if (CMAKE_SYSTEM_NAME MATCHES "Linux")
option(WHISPER_FFMPEG "whisper: support building and linking with ffmpeg libs (avcodec, swresample, ...)" OFF)
endif()
option(WHISPER_NO_AVX "whisper: disable AVX" OFF)
option(WHISPER_NO_AVX2 "whisper: disable AVX2" OFF)
option(WHISPER_NO_AVX512 "whisper: disable AVX512" ON)
option(WHISPER_NO_AVX512_VBMI "whisper: disable AVX512-VBMI" ON)
option(WHISPER_NO_AVX512_VNNI "whisper: disable AVX512-VNNI" ON)
option(WHISPER_NO_FMA "whisper: disable FMA" OFF)
option(WHISPER_NO_F16C "whisper: disable F16c" OFF)
option(WHISPER_OPENVINO "whisper: support for OpenVINO" OFF)
if (APPLE)
option(WHISPER_NO_ACCELERATE "whisper: disable Accelerate framework" OFF)
option(WHISPER_METAL "whisper: use Metal" ${WHISPER_METAL_DEFAULT})
option(WHISPER_METAL_NDEBUG "whisper: disable Metal debugging" OFF)
option(WHISPER_COREML "whisper: enable Core ML framework" OFF)
option(WHISPER_COREML_ALLOW_FALLBACK "whisper: allow non-CoreML fallback" OFF)
option(WHISPER_METAL_EMBED_LIBRARY "whisper: embed Metal library" OFF)
else()
option(WHISPER_BLAS "whisper: use BLAS libraries" OFF)
option(WHISPER_BLAS_VENDOR "whisper: BLAS library vendor" Generic)
option(WHISPER_OPENBLAS "whisper: prefer OpenBLAS" OFF)
option(WHISPER_OPENBLAS_INTERFACE64 "whisper: use OpenBLAS w/ 64-bit interface" OFF)
option(WHISPER_CUDA "whisper: support for CUDA" OFF)
option(WHISPER_CUBLAS "whisper: support for CUDA (deprecated)" OFF)
option(WHISPER_HIPBLAS "whisper: support for hipBLAS" OFF)
option(WHISPER_CLBLAST "whisper: use CLBlast" OFF)
option(WHISPER_MKL "whisper: use Intel Math Kernel Library (MKL)" OFF)
option(WHISPER_SYCL "whisper: use SYCL" OFF)
option(WHISPER_SYCL_F16 "whisper: use 16 bit floats for sycl calculations" OFF)
endif()
option(WHISPER_PERF "whisper: enable perf timings" OFF)
# build
option(WHISPER_FATAL_WARNINGS "whisper: enable -Werror flag" OFF)
option(WHISPER_USE_SYSTEM_GGML "whisper: use system-installed GGML library" OFF)
# sanitizers
option(WHISPER_SANITIZE_THREAD "whisper: enable thread sanitizer" OFF)
option(WHISPER_SANITIZE_ADDRESS "whisper: enable address sanitizer" OFF)
option(WHISPER_SANITIZE_UNDEFINED "whisper: enable undefined sanitizer" OFF)
if (NOT MSVC)
if (WHISPER_SANITIZE_THREAD)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fsanitize=thread")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fsanitize=thread")
endif()
# extra artifacts
option(WHISPER_BUILD_TESTS "whisper: build tests" ${WHISPER_STANDALONE})
option(WHISPER_BUILD_EXAMPLES "whisper: build examples" ${WHISPER_STANDALONE})
option(WHISPER_BUILD_SERVER "whisper: build server example" ${WHISPER_STANDALONE})
if (WHISPER_SANITIZE_ADDRESS)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fsanitize=address -fno-omit-frame-pointer")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fsanitize=address -fno-omit-frame-pointer")
endif()
# 3rd party libs
option(WHISPER_CURL "whisper: use libcurl to download model from an URL" OFF)
option(WHISPER_SDL2 "whisper: support for libSDL2" OFF)
if (WHISPER_SANITIZE_UNDEFINED)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fsanitize=undefined")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fsanitize=undefined")
endif()
endif()
#set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -ffast-math")
#set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -march=native")
# dependencies
find_package(Threads REQUIRED)
#compile flag sycl
if (WHISPER_SYCL)
set(CMAKE_CXX_STANDARD 17)
else()
set(CMAKE_CXX_STANDARD 11)
endif()
if (WHISPER_FFMPEG)
# As of cmake 3.27, there is no official cmake support for FindFFmpeg.
# Consequnelty we added a FindFFmpeg.cmake script the cmake subfolder:
# whisper.cpp does not need the full ffmpeg libs, just AVFORMAT AVCODEC AVUTIL SWRESAMPLE
# libswresample performs highly optimized audio resampling, rematrixing and sample format conversion operations
# libavcodec provides a generic encoding/decoding framework and contains multiple decoders and encoders for audio, video and subtitle streams, and several bitstream filters.
# libavformat provides a generic framework for multiplexing and demultiplexing (muxing and demuxing) audio, video and subtitle streams.
find_package(FFmpeg REQUIRED)
if (NOT ${FFMPEG_FOUND})
message(FATAL_ERROR "Cannot find ffmpeg libs/headers")
endif()
message(STATUS "Found ffmpeg libs: ${FFMPEG_LIBRARIES}")
message(STATUS "Found ffmpeg headers in: ${FFMPEG_INCLUDE_DIRS}")
message(STATUS "ffmpeg definitions: ${FFMPEG_DEFINITIONS}")
message(STATUS "Found avformat ${AVFORMAT_VERSION}")
include_directories(${FFMPEG_INCLUDE_DIRS})
add_compile_definitions(WHISPER_FFMPEG)
set(WHISPER_EXTRA_LIBS ${WHISPER_EXTRA_LIBS} ${FFMPEG_LIBRARIES})
endif()
# on APPLE
if (APPLE)
# include Accelerate framework
if (NOT WHISPER_NO_ACCELERATE)
find_library(ACCELERATE_FRAMEWORK Accelerate)
if (ACCELERATE_FRAMEWORK)
message(STATUS "Accelerate framework found")
set(WHISPER_EXTRA_LIBS ${WHISPER_EXTRA_LIBS} ${ACCELERATE_FRAMEWORK})
set(WHISPER_EXTRA_FLAGS ${WHISPER_EXTRA_FLAGS} -DGGML_USE_ACCELERATE -DACCELERATE_NEW_LAPACK -DACCELERATE_LAPACK_ILP64)
else()
message(FATAL_ERROR "Accelerate framework not found")
endif()
endif()
if (WHISPER_METAL)
find_library(FOUNDATION_LIBRARY Foundation REQUIRED)
find_library(METAL_FRAMEWORK Metal REQUIRED)
find_library(METALKIT_FRAMEWORK MetalKit REQUIRED)
if (METAL_FRAMEWORK)
message(STATUS "Metal framework found")
set(WHISPER_EXTRA_LIBS ${WHISPER_EXTRA_LIBS}
${FOUNDATION_LIBRARY}
${METAL_FRAMEWORK}
${METALKIT_FRAMEWORK}
)
set(WHISPER_EXTRA_FLAGS ${WHISPER_EXTRA_FLAGS} -DGGML_USE_METAL)
if (WHISPER_METAL_NDEBUG)
set(WHISPER_EXTRA_FLAGS ${WHISPER_EXTRA_FLAGS} -DGGML_METAL_NDEBUG)
endif()
else()
message(FATAL_ERROR "Metal framework not found")
endif()
set(GGML_SOURCES_METAL ggml-metal.m ggml-metal.h)
# copy ggml-common.h and ggml-metal.metal to bin directory
configure_file(ggml-common.h bin/ggml-common.h COPYONLY)
configure_file(ggml-metal.metal bin/ggml-metal.metal COPYONLY)
if (WHISPER_METAL_EMBED_LIBRARY)
enable_language(ASM)
set(WHISPER_EXTRA_FLAGS ${WHISPER_EXTRA_FLAGS} -DGGML_METAL_EMBED_LIBRARY)
set(METALLIB_SOURCE "${CMAKE_CURRENT_SOURCE_DIR}/ggml-metal.metal")
set(COMMON_HEADER "${CMAKE_CURRENT_SOURCE_DIR}/ggml-common.h")
file(MAKE_DIRECTORY "${CMAKE_BINARY_DIR}/autogenerated")
set(EMBED_METALLIB_ASSEMBLY "${CMAKE_BINARY_DIR}/autogenerated/ggml-embed-metallib.s")
set(EMBED_METALLIB_SOURCE "${CMAKE_BINARY_DIR}/autogenerated/ggml-metal-combined.metal")
add_custom_command(
OUTPUT ${EMBED_METALLIB_SOURCE}
COMMAND sed -e "/^#include \\\"ggml-common.h\\\"/r ${COMMON_HEADER}" -e "/^#include \\\"ggml-common.h\\\"/d" ${METALLIB_SOURCE} > ${EMBED_METALLIB_SOURCE}
DEPENDS ${METALLIB_SOURCE} ${COMMON_HEADER}
COMMENT "Generating combined Metal library for embedding"
)
add_custom_command(
OUTPUT ${EMBED_METALLIB_ASSEMBLY}
COMMAND echo ".section __DATA,__ggml_metallib" > ${EMBED_METALLIB_ASSEMBLY}
COMMAND echo ".globl _ggml_metallib_start" >> ${EMBED_METALLIB_ASSEMBLY}
COMMAND echo "_ggml_metallib_start:" >> ${EMBED_METALLIB_ASSEMBLY}
COMMAND echo ".incbin \\\"${EMBED_METALLIB_SOURCE}\\\"" >> ${EMBED_METALLIB_ASSEMBLY}
COMMAND echo ".globl _ggml_metallib_end" >> ${EMBED_METALLIB_ASSEMBLY}
COMMAND echo "_ggml_metallib_end:" >> ${EMBED_METALLIB_ASSEMBLY}
DEPENDS ${EMBED_METALLIB_SOURCE}
COMMENT "Generate assembly for embedded Metal library"
)
set(GGML_SOURCES_METAL ${GGML_SOURCES_METAL} ${EMBED_METALLIB_ASSEMBLY})
endif()
endif()
if (WHISPER_COREML)
find_library(FOUNDATION_FRAMEWORK Foundation)
find_library(COREML_FRAMEWORK CoreML)
if (COREML_FRAMEWORK)
message(STATUS "CoreML framework found")
set(WHISPER_EXTRA_FLAGS ${WHISPER_EXTRA_FLAGS} -DWHISPER_USE_COREML)
else()
message(FATAL_ERROR "CoreML framework not found")
endif()
if (WHISPER_COREML_ALLOW_FALLBACK)
set(WHISPER_EXTRA_FLAGS ${WHISPER_EXTRA_FLAGS} -DWHISPER_COREML_ALLOW_FALLBACK)
endif()
endif()
endif()
if (WHISPER_OPENBLAS)
set(WHISPER_BLAS_VENDOR "OpenBLAS")
set(WHISPER_BLAS ON)
# BLA_PKGCONFIG_BLAS is supported since CMake 3.25.
# FindBLAS.cmake pkg-config logic seems incomplete, because when
# BLA_SIZEOF_INTEGER is 8, then it should search for blas64 instead of blas.
# blas.pc/blas64.pc are not always provided, so let's be more specific
# and go with openblas.pc/openblas64.pc if WHISPER_OPENBLAS is on.
if (WHISPER_OPENBLAS_INTERFACE64)
set(WHISPER_BLAS_LIB "openblas64")
else ()
set(WHISPER_BLAS_LIB "openblas")
endif ()
set(BLA_PKGCONFIG_BLAS ${WHISPER_BLAS_LIB})
# OpenBLAS prebuilt libraries for Windows do not have "64" suffix in filename.
# (But .pc file has "64" suffix in filename for USE_64BITINT=1 Windows build.)
if (MSVC)
set(WHISPER_BLAS_LIB "openblas")
endif ()
endif()
if (WHISPER_BLAS)
if (NOT "$ENV{OPENBLAS_PATH}" STREQUAL "")
if (WHISPER_STATIC)
set(WHISPER_BLAS_LIB_PREFIX ${CMAKE_STATIC_LIBRARY_PREFIX})
set(WHISPER_BLAS_LIB_SUFFIX ${CMAKE_STATIC_LIBRARY_SUFFIX})
else ()
if (CMAKE_IMPORT_LIBRARY_SUFFIX)
set(WHISPER_BLAS_LIB_PREFIX ${CMAKE_IMPORT_LIBRARY_PREFIX})
set(WHISPER_BLAS_LIB_SUFFIX ${CMAKE_IMPORT_LIBRARY_SUFFIX})
else ()
set(WHISPER_BLAS_LIB_PREFIX ${CMAKE_SHARED_LIBRARY_PREFIX})
set(WHISPER_BLAS_LIB_SUFFIX ${CMAKE_SHARED_LIBRARY_SUFFIX})
endif ()
endif ()
# OpenBLAS prebuilt libraries hardcode "lib" prefix in filename even on Windows
if (WHISPER_OPENBLAS)
set(WHISPER_BLAS_LIB_PREFIX "lib")
endif ()
message(STATUS "BLAS compatible library path provided")
set(BLAS_LIBRARIES "$ENV{OPENBLAS_PATH}/lib/${WHISPER_BLAS_LIB_PREFIX}${WHISPER_BLAS_LIB}${WHISPER_BLAS_LIB_SUFFIX}")
message(STATUS "Libraries ${BLAS_LIBRARIES}")
set(BLAS_INCLUDE_DIRS "$ENV{OPENBLAS_PATH}/include")
message(STATUS "Include dirs ${BLAS_INCLUDE_DIRS}")
if (NOT EXISTS "${BLAS_LIBRARIES}")
message(FATAL_ERROR "BLAS library was not found. Environment variable OPENBLAS_PATH misdefined.")
endif ()
set(WHISPER_EXTRA_FLAGS ${WHISPER_EXTRA_FLAGS} -DGGML_USE_OPENBLAS)
include_directories(${BLAS_INCLUDE_DIRS})
set(WHISPER_EXTRA_LIBS ${WHISPER_EXTRA_LIBS} ${BLAS_LIBRARIES})
else ()
if (WHISPER_STATIC)
# FindBLAS.cmake pkg-config logic seems incomplete, because when
# BLA_STATIC is on, then it should use pkg_check_modules_static
# instead of pkg_check_modules.
# Some manual variable overriding may be necessary if you don't
# achieve desired results.
set(BLA_STATIC 1)
endif ()
set(BLA_VENDOR ${WHISPER_BLAS_VENDOR})
if (WHISPER_OPENBLAS_INTERFACE64)
set(BLA_SIZEOF_INTEGER 8)
else ()
set(BLA_SIZEOF_INTEGER 4)
endif()
set(BLA_PREFER_PKGCONFIG 1)
find_package(BLAS)
if(BLAS_FOUND)
message(STATUS "BLAS compatible library found")
message(STATUS "Libraries ${BLAS_LIBRARIES}")
if (NOT DEFINED BLAS_INCLUDE_DIRS)
if (PKGC_BLAS_FOUND)
set(BLAS_INCLUDE_DIRS "${PKGC_BLAS_INCLUDE_DIRS}")
else ()
find_path(BLAS_INCLUDE_DIRS cblas.h /usr/include/openblas)
endif()
endif()
message(STATUS "Include dirs ${BLAS_INCLUDE_DIRS}")
set(WHISPER_EXTRA_FLAGS ${WHISPER_EXTRA_FLAGS} -DGGML_USE_OPENBLAS)
include_directories(${BLAS_INCLUDE_DIRS})
set(WHISPER_EXTRA_LIBS ${WHISPER_EXTRA_LIBS} ${BLAS_LIBRARIES})
else()
message(FATAL_ERROR "BLAS library was not found")
endif()
endif ()
endif ()
if (WHISPER_MKL)
find_package(MKL CONFIG REQUIRED PATHS $ENV{MKLROOT})
message(STATUS "Imported oneMKL targets: ${MKL_IMPORTED_TARGETS}")
set(WHISPER_EXTRA_FLAGS ${WHISPER_EXTRA_FLAGS} -DGGML_USE_OPENBLAS)
set(WHISPER_EXTRA_FLAGS ${WHISPER_EXTRA_FLAGS} -DGGML_BLAS_USE_MKL)
endif()
if (WHISPER_CUBLAS)
message(WARNING "WHISPER_CUBLAS is deprecated and will be removed in the future.\nUse WHISPER_CUDA instead")
set(WHISPER_CUDA ON)
endif()
if (WHISPER_CUDA)
cmake_minimum_required(VERSION 3.17)
find_package(CUDAToolkit)
if (CUDAToolkit_FOUND)
message(STATUS "cuBLAS found")
enable_language(CUDA)
file(GLOB GGML_SOURCES_CUDA "ggml-cuda/*.cu")
list(APPEND GGML_SOURCES_CUDA ggml-cuda.h)
list(APPEND GGML_SOURCES_CUDA ggml-cuda.cu)
add_compile_definitions(GGML_USE_CUDA)
if (WHISPER_STATIC)
if (WIN32)
# As of 12.3.1 CUDA Tookit for Windows does not offer a static cublas library
set(WHISPER_EXTRA_LIBS ${WHISPER_EXTRA_LIBS} CUDA::cudart_static CUDA::cublas CUDA::cublasLt)
else ()
set(WHISPER_EXTRA_LIBS ${WHISPER_EXTRA_LIBS} CUDA::cudart_static CUDA::cublas_static CUDA::cublasLt_static)
endif()
else()
set(WHISPER_EXTRA_LIBS ${WHISPER_EXTRA_LIBS} CUDA::cudart CUDA::cublas CUDA::cublasLt)
endif()
set(WHISPER_EXTRA_LIBS ${WHISPER_EXTRA_LIBS} CUDA::cuda_driver)
else()
message(FATAL_ERROR "cuBLAS not found")
endif()
endif()
if (WHISPER_HIPBLAS)
list(APPEND CMAKE_PREFIX_PATH /opt/rocm)
if (NOT ${CMAKE_C_COMPILER_ID} MATCHES "Clang")
message(WARNING "Only LLVM is supported for HIP, hint: CC=/opt/rocm/llvm/bin/clang")
endif()
if (NOT ${CMAKE_CXX_COMPILER_ID} MATCHES "Clang")
message(WARNING "Only LLVM is supported for HIP, hint: CXX=/opt/rocm/llvm/bin/clang++")
endif()
find_package(hip)
find_package(hipblas)
find_package(rocblas)
if (${hipblas_FOUND} AND ${hip_FOUND})
message(STATUS "HIP and hipBLAS found")
set(GGML_HEADERS_ROCM "ggml-cuda.h")
file(GLOB GGML_SOURCES_ROCM "ggml-cuda/*.cu")
list(APPEND GGML_SOURCES_ROCM "ggml-cuda.cu")
add_compile_definitions(GGML_USE_HIPBLAS GGML_USE_CUDA)
set_source_files_properties(${GGML_SOURCES_ROCM} PROPERTIES LANGUAGE CXX)
if (WHISPER_STATIC)
message(FATAL_ERROR "Static linking not supported for HIP/ROCm")
endif()
set(WHISPER_EXTRA_LIBS ${WHISPER_EXTRA_LIBS} hip::device PUBLIC hip::host roc::rocblas roc::hipblas)
else()
message(FATAL_ERROR "hipBLAS or HIP not found. Try setting CMAKE_PREFIX_PATH=/opt/rocm")
endif()
endif()
if (WHISPER_CLBLAST)
find_package(CLBlast)
if (CLBlast_FOUND)
message(STATUS "CLBlast found")
set(GGML_SOURCES_OPENCL ggml-opencl.cpp ggml-opencl.h)
add_compile_definitions(GGML_USE_CLBLAST)
set(WHISPER_EXTRA_LIBS ${WHISPER_EXTRA_LIBS} clblast)
else()
message(FATAL_ERROR "CLBlast not found")
endif()
endif()
if( WHISPER_OPENVINO )
find_package(OpenVINO REQUIRED COMPONENTS Runtime)
endif()
if (WHISPER_SYCL)
if ( NOT DEFINED ENV{ONEAPI_ROOT})
message(FATAL_ERROR "Not detect ENV {ONEAPI_ROOT}, please install oneAPI & source it, like: source /opt/intel/oneapi/setvars.sh")
endif()
#todo: AOT
find_package(IntelSYCL REQUIRED)
if (WHISPER_SYCL_F16)
add_compile_definitions(GGML_SYCL_F16)
endif()
add_compile_definitions(GGML_USE_SYCL)
add_compile_options(-I./) #include DPCT
add_compile_options(-I/${SYCL_INCLUDE_DIR})
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-narrowing")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O3")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fsycl -L${MKLROOT}/lib")
set(GGML_HEADERS_SYCL ggml-sycl.h)
set(GGML_SOURCES_SYCL ggml-sycl.cpp)
set(WHISPER_EXTRA_LIBS ${WHISPER_EXTRA_LIBS} sycl OpenCL mkl_core pthread m dl mkl_sycl_blas mkl_intel_ilp64 mkl_tbb_thread)
endif()
# compiler flags
if (NOT CMAKE_BUILD_TYPE AND NOT CMAKE_CONFIGURATION_TYPES)
set(CMAKE_BUILD_TYPE Release CACHE STRING "Build type" FORCE)
set_property(CACHE CMAKE_BUILD_TYPE PROPERTY STRINGS "Debug" "Release" "RelWithDebInfo")
endif ()
if (WHISPER_ALL_WARNINGS)
if (NOT MSVC)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} \
-Wall \
-Wextra \
-Wpedantic \
-Wshadow \
-Wcast-qual \
-Wstrict-prototypes \
-Wpointer-arith \
-Wno-unused-function \
")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} \
-Wall \
-Wextra \
-Wpedantic \
-Wcast-qual \
")
else()
# todo : msvc
endif()
endif()
if (NOT MSVC)
# TODO: temporary disabled until we figure out ggml-metal.m
#set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Werror=vla")
#set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fno-math-errno -ffinite-math-only -funsafe-math-optimizations")
endif()
message(STATUS "CMAKE_SYSTEM_PROCESSOR: ${CMAKE_SYSTEM_PROCESSOR}")
if (${CMAKE_SYSTEM_PROCESSOR} MATCHES "arm" OR ${CMAKE_SYSTEM_PROCESSOR} MATCHES "aarch64")
message(STATUS "ARM detected")
elseif(${CMAKE_SYSTEM_PROCESSOR} MATCHES "ppc64le")
message(STATUS "PowerPC detected")
else()
message(STATUS "x86 detected")
if (MSVC)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /utf-8")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /utf-8")
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} /utf-8")
if(NOT WHISPER_NO_AVX512)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /arch:AVX512")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /arch:AVX512")
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} /arch:AVX512")
# MSVC has no compile-time flags enabling specific
# AVX512 extensions, neither it defines the
# macros corresponding to the extensions.
# Do it manually.
if (NOT WHISPER_NO_AVX512_VBMI)
add_compile_definitions($<$<COMPILE_LANGUAGE:C>:__AVX512VBMI__>)
add_compile_definitions($<$<COMPILE_LANGUAGE:CXX>:__AVX512VBMI__>)
endif()
if (NOT WHISPER_NO_AVX512_VNNI)
add_compile_definitions($<$<COMPILE_LANGUAGE:C>:__AVX512VNNI__>)
add_compile_definitions($<$<COMPILE_LANGUAGE:CXX>:__AVX512VNNI__>)
endif()
elseif(NOT WHISPER_NO_AVX2)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /arch:AVX2")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /arch:AVX2")
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} /arch:AVX2")
elseif(NOT WHISPER_NO_AVX)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /arch:AVX")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /arch:AVX")
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} /arch:AVX")
endif()
else()
if (EMSCRIPTEN)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -pthread -s TOTAL_STACK=5242880")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -pthread -s TOTAL_STACK=5242880")
else()
if(NOT WHISPER_NO_AVX)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mavx")
endif()
if(NOT WHISPER_NO_AVX2)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mavx2")
endif()
if(NOT WHISPER_NO_AVX512)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mavx512f -mavx512cd -mavx512vl -mavx512dq -mavx512bw")
if(NOT WHISPER_NO_AVX512_VBMI)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mavx512vbmi")
endif()
if(NOT WHISPER_NO_AVX512_VNNI)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mavx512vnni")
endif()
endif()
if(NOT WHISPER_NO_FMA)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mfma")
endif()
if(NOT WHISPER_NO_F16C)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mf16c")
endif()
endif()
endif()
endif()
#
# POSIX conformance
#
# clock_gettime came in POSIX.1b (1993)
# CLOCK_MONOTONIC came in POSIX.1-2001 / SUSv3 as optional
# posix_memalign came in POSIX.1-2001 / SUSv3
# M_PI is an XSI extension since POSIX.1-2001 / SUSv3, came in XPG1 (1985)
add_compile_definitions(_XOPEN_SOURCE=600)
# Somehow in OpenBSD whenever POSIX conformance is specified
# some string functions rely on locale_t availability,
# which was introduced in POSIX.1-2008, forcing us to go higher
if (CMAKE_SYSTEM_NAME MATCHES "OpenBSD")
remove_definitions(-D_XOPEN_SOURCE=600)
add_compile_definitions(_XOPEN_SOURCE=700)
endif()
# Data types, macros and functions related to controlling CPU affinity
# are available on Linux through GNU extensions in libc
if (CMAKE_SYSTEM_NAME MATCHES "Linux")
add_compile_definitions(_GNU_SOURCE)
option(WHISPER_FFMPEG "whisper: support building and linking with ffmpeg libs (avcodec, swresample, ...)" OFF)
endif()
# RLIMIT_MEMLOCK came in BSD, is not specified in POSIX.1,
# and on macOS its availability depends on enabling Darwin extensions
# similarly on DragonFly, enabling BSD extensions is necessary
if (CMAKE_SYSTEM_NAME MATCHES "Darwin")
add_compile_definitions(_DARWIN_C_SOURCE)
endif()
if (CMAKE_SYSTEM_NAME MATCHES "DragonFly")
add_compile_definitions(_DARWIN_C_SOURCE)
endif()
option(WHISPER_COREML "whisper: enable Core ML framework" OFF)
option(WHISPER_COREML_ALLOW_FALLBACK "whisper: allow non-CoreML fallback" OFF)
option(WHISPER_OPENVINO "whisper: support for OpenVINO" OFF)
# alloca is a non-standard interface that is not visible on BSDs when
# POSIX conformance is specified, but not all of them provide a clean way
# to enable it in such cases
if (CMAKE_SYSTEM_NAME MATCHES "FreeBSD")
add_compile_definitions(__BSD_VISIBLE)
endif()
if (CMAKE_SYSTEM_NAME MATCHES "NetBSD")
add_compile_definitions(_NETBSD_SOURCE)
endif()
if (CMAKE_SYSTEM_NAME MATCHES "OpenBSD")
add_compile_definitions(_BSD_SOURCE)
endif()
# Required for relocatable CMake package
include(${CMAKE_CURRENT_SOURCE_DIR}/cmake/build-info.cmake)
if (WHISPER_PERF)
set(WHISPER_EXTRA_FLAGS ${WHISPER_EXTRA_FLAGS} -DGGML_PERF)
endif()
# override ggml options
set(GGML_CCACHE ${WHISPER_CCACHE})
set(GGML_SANITIZE_THREAD ${WHISPER_SANITIZE_THREAD})
set(GGML_SANITIZE_ADDRESS ${WHISPER_SANITIZE_ADDRESS})
set(GGML_SANITIZE_UNDEFINED ${WHISPER_SANITIZE_UNDEFINED})
set(GGML_ALL_WARNINGS ${WHISPER_ALL_WARNINGS})
set(GGML_FATAL_WARNINGS ${WHISPER_FATAL_WARNINGS})
#
# whisper.coreml - Core ML support
#
if (WHISPER_COREML)
set(TARGET whisper.coreml)
add_library(${TARGET}
coreml/whisper-encoder.h
coreml/whisper-encoder.mm
coreml/whisper-encoder-impl.h
coreml/whisper-encoder-impl.m
)
include(DefaultTargetOptions)
target_include_directories(${TARGET} PUBLIC
.
)
target_link_libraries(${TARGET} PRIVATE ${FOUNDATION_FRAMEWORK} ${COREML_FRAMEWORK})
set_target_properties(${TARGET} PROPERTIES
COMPILE_FLAGS "-fobjc-arc"
)
set_target_properties(${TARGET} PROPERTIES FOLDER "libs")
endif()
if (WHISPER_OPENVINO)
set(TARGET whisper.openvino)
add_library(${TARGET} OBJECT
openvino/whisper-openvino-encoder.h
openvino/whisper-openvino-encoder.cpp
)
target_include_directories(${TARGET} PUBLIC
.
)
set_property(TARGET ${TARGET} PROPERTY POSITION_INDEPENDENT_CODE ON)
set(WHISPER_EXTRA_FLAGS ${WHISPER_EXTRA_FLAGS} -DWHISPER_USE_OPENVINO)
target_link_libraries(${TARGET} PRIVATE openvino::runtime)
set_target_properties(${TARGET} PROPERTIES FOLDER "libs")
endif()
#
# whisper - this is the main library of the project
#
set(TARGET whisper)
add_library(${TARGET}
ggml.h
ggml.c
ggml-alloc.h
ggml-alloc.c
ggml-backend.h
ggml-backend.c
ggml-quants.h
ggml-quants.c
${GGML_SOURCES_METAL}
${GGML_SOURCES_CUDA}
${GGML_SOURCES_OPENCL}
${GGML_SOURCES_SYCL} ${GGML_HEADERS_SYCL}
${GGML_SOURCES_ROCM} ${GGML_HEADERS_ROCM}
whisper.h
whisper.cpp
)
include_directories (
.
)
# Set the version numbers
set_target_properties(whisper PROPERTIES
VERSION ${PROJECT_VERSION}
SOVERSION ${SOVERSION}
)
include(DefaultTargetOptions)
target_include_directories(${TARGET} PUBLIC
.
)
if (WHISPER_COREML)
target_link_libraries(${TARGET} PRIVATE whisper.coreml)
endif()
if (WHISPER_OPENVINO)
target_link_libraries(${TARGET} PRIVATE whisper.openvino)
endif()
if (WHISPER_MKL)
target_link_libraries(${TARGET} PUBLIC MKL::MKL)
endif()
if (MSVC)
target_link_libraries(${TARGET} PRIVATE ${WHISPER_EXTRA_LIBS} ${CMAKE_THREAD_LIBS_INIT})
set(WHISPER_EXTRA_FLAGS ${WHISPER_EXTRA_FLAGS} -D_CRT_SECURE_NO_WARNINGS)
else()
target_link_libraries(${TARGET} PRIVATE m ${WHISPER_EXTRA_LIBS} ${CMAKE_THREAD_LIBS_INIT})
endif()
if (BUILD_SHARED_LIBS)
set_target_properties(${TARGET} PROPERTIES POSITION_INDEPENDENT_CODE ON)
target_link_libraries(${TARGET} PUBLIC
${CMAKE_DL_LIBS}
)
target_compile_definitions(${TARGET} PUBLIC
WHISPER_SHARED
GGML_SHARED
)
target_compile_definitions(${TARGET} PRIVATE
WHISPER_BUILD
GGML_BUILD
)
if (WHISPER_METAL)
# TODO: I think this should make ggml-metal.m "see" the ggml-metal.metal file from the "bin" directory
# but for some reason it does not work here like it does in llama.cpp
set_target_properties(${TARGET} PROPERTIES RESOURCE "${CMAKE_CURRENT_SOURCE_DIR}/ggml-metal.metal")
# transition helpers
function (whisper_option_depr TYPE OLD NEW)
if (${OLD})
message(${TYPE} "${OLD} is deprecated and will be removed in the future.\nUse ${NEW} instead\n")
set(${NEW} ON)
endif()
endif()
endfunction()
if (GGML_SOURCES_CUDA)
message(STATUS "GGML CUDA sources found, configuring CUDA architecture")
# Only configure gmml CUDA architectures is not globally set
if (NOT DEFINED GGML_CUDA_ARCHITECTURES)
# Not overriden by user, so set defaults
set(GGML_CUDA_ARCHITECTURES 52 61 70)
whisper_option_depr(FATAL_ERROR WHISPER_CUBLAS GGML_CUDA)
whisper_option_depr(WARNING WHISPER_CUDA GGML_CUDA)
whisper_option_depr(WARNING WHISPER_KOMPUTE GGML_KOMPUTE)
whisper_option_depr(WARNING WHISPER_METAL GGML_METAL)
whisper_option_depr(WARNING WHISPER_METAL_EMBED_LIBRARY GGML_METAL_EMBED_LIBRARY)
whisper_option_depr(WARNING WHISPER_NATIVE GGML_NATIVE)
whisper_option_depr(WARNING WHISPER_OPENMP GGML_OPENMP)
whisper_option_depr(WARNING WHISPER_RPC GGML_RPC)
whisper_option_depr(WARNING WHISPER_SYCL GGML_SYCL)
whisper_option_depr(WARNING WHISPER_SYCL_F16 GGML_SYCL_F16)
#
# build the library
#
if (NOT TARGET ggml)
if (WHISPER_USE_SYSTEM_GGML)
find_package(ggml REQUIRED)
if (NOT ggml_FOUND)
message(FATAL_ERROR "System-installed GGML library not found.")
endif()
add_library(ggml ALIAS ggml::ggml)
else()
add_subdirectory(ggml)
endif()
message(STATUS "GGML Configuring CUDA architectures ${GGML_CUDA_ARCHITECTURES}")
set_property(TARGET whisper PROPERTY CUDA_ARCHITECTURES ${GGML_CUDA_ARCHITECTURES})
set_property(TARGET whisper PROPERTY CUDA_SELECT_NVCC_ARCH_FLAGS "Auto")
# ... otherwise assume ggml is added by a parent CMakeLists.txt
endif()
add_subdirectory(src)
if (EMSCRIPTEN)
set_target_properties(${TARGET} PROPERTIES COMPILE_FLAGS "-msimd128")
endif()
target_compile_definitions(${TARGET} PUBLIC
${WHISPER_EXTRA_FLAGS}
)
set_target_properties(${TARGET} PROPERTIES PUBLIC_HEADER "ggml.h;whisper.h")
set_target_properties(${TARGET} PROPERTIES FOLDER "libs")
#
# install
#
include(GNUInstallDirs)
include(CMakePackageConfigHelpers)
install(TARGETS ${TARGET}
LIBRARY DESTINATION lib
ARCHIVE DESTINATION lib/static
RUNTIME DESTINATION bin
RESOURCE DESTINATION bin
PUBLIC_HEADER DESTINATION include
)
set(WHISPER_BUILD_NUMBER ${BUILD_NUMBER})
set(WHISPER_BUILD_COMMIT ${BUILD_COMMIT})
set(WHISPER_INSTALL_VERSION ${CMAKE_PROJECT_VERSION})
#
# bindings
#
set(WHISPER_INCLUDE_INSTALL_DIR ${CMAKE_INSTALL_INCLUDEDIR} CACHE PATH "Location of header files")
set(WHISPER_LIB_INSTALL_DIR ${CMAKE_INSTALL_LIBDIR} CACHE PATH "Location of library files")
set(WHISPER_BIN_INSTALL_DIR ${CMAKE_INSTALL_BINDIR} CACHE PATH "Location of binary files")
add_subdirectory(bindings)
get_directory_property(WHISPER_TRANSIENT_DEFINES COMPILE_DEFINITIONS)
set_target_properties(whisper PROPERTIES PUBLIC_HEADER ${CMAKE_CURRENT_SOURCE_DIR}/include/whisper.h)
install(TARGETS whisper LIBRARY PUBLIC_HEADER)
configure_package_config_file(
${CMAKE_CURRENT_SOURCE_DIR}/cmake/whisper-config.cmake.in
${CMAKE_CURRENT_BINARY_DIR}/whisper-config.cmake
INSTALL_DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/whisper
PATH_VARS
WHISPER_INCLUDE_INSTALL_DIR
WHISPER_LIB_INSTALL_DIR
WHISPER_BIN_INSTALL_DIR )
write_basic_package_version_file(
${CMAKE_CURRENT_BINARY_DIR}/whisper-version.cmake
VERSION ${WHISPER_INSTALL_VERSION}
COMPATIBILITY SameMajorVersion)
install(FILES ${CMAKE_CURRENT_BINARY_DIR}/whisper-config.cmake
${CMAKE_CURRENT_BINARY_DIR}/whisper-version.cmake
DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/whisper)
configure_file(cmake/whisper.pc.in
"${CMAKE_CURRENT_BINARY_DIR}/whisper.pc"
@ONLY)
install(FILES "${CMAKE_CURRENT_BINARY_DIR}/whisper.pc"
DESTINATION lib/pkgconfig)
#
# programs, examples and tests
#
if (WHISPER_BUILD_TESTS AND NOT CMAKE_JS_VERSION)
enable_testing()
include(CTest)
add_subdirectory(tests)
endif ()

527
Makefile
View File

@ -1,501 +1,12 @@
default: main bench quantize server
ifndef UNAME_S
UNAME_S := $(shell uname -s)
endif
ifndef UNAME_P
UNAME_P := $(shell uname -p)
endif
ifndef UNAME_M
UNAME_M := $(shell uname -m)
endif
ifndef NVCC_VERSION
ifeq ($(call,$(shell which nvcc))$(.SHELLSTATUS),0)
NVCC_VERSION := $(shell nvcc --version | egrep -o "V[0-9]+.[0-9]+.[0-9]+" | cut -c2-)
endif
endif
# In GNU make default CXX is g++ instead of c++. Let's fix that so that users
# of non-gcc compilers don't have to provide g++ alias or wrapper.
DEFCC := cc
DEFCXX := c++
ifeq ($(origin CC),default)
CC := $(DEFCC)
endif
ifeq ($(origin CXX),default)
CXX := $(DEFCXX)
endif
CCV := $(shell $(CC) --version | head -n 1)
CXXV := $(shell $(CXX) --version | head -n 1)
# Mac OS + Arm can report x86_64
# ref: https://github.com/ggerganov/whisper.cpp/issues/66#issuecomment-1282546789
ifeq ($(UNAME_S),Darwin)
ifneq ($(UNAME_P),arm)
SYSCTL_M := $(shell sysctl -n hw.optional.arm64)
ifeq ($(SYSCTL_M),1)
# UNAME_P := arm
# UNAME_M := arm64
warn := $(warning Your arch is announced as x86_64, but it seems to actually be ARM64. Not fixing that can lead to bad performance. For more info see: https://github.com/ggerganov/whisper.cpp/issues/66\#issuecomment-1282546789)
endif
endif
endif
#
# Compile flags
#
CFLAGS = -I. -O3 -DNDEBUG -std=c11 -fPIC
CXXFLAGS = -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC
LDFLAGS =
ifdef MACOSX_DEPLOYMENT_TARGET
CFLAGS += -mmacosx-version-min=$(MACOSX_DEPLOYMENT_TARGET)
CXXFLAGS += -mmacosx-version-min=$(MACOSX_DEPLOYMENT_TARGET)
LDFLAGS += -mmacosx-version-min=$(MACOSX_DEPLOYMENT_TARGET)
endif
# clock_gettime came in POSIX.1b (1993)
# CLOCK_MONOTONIC came in POSIX.1-2001 / SUSv3 as optional
# posix_memalign came in POSIX.1-2001 / SUSv3
# M_PI is an XSI extension since POSIX.1-2001 / SUSv3, came in XPG1 (1985)
CFLAGS += -D_XOPEN_SOURCE=600
CXXFLAGS += -D_XOPEN_SOURCE=600
# Somehow in OpenBSD whenever POSIX conformance is specified
# some string functions rely on locale_t availability,
# which was introduced in POSIX.1-2008, forcing us to go higher
ifeq ($(UNAME_S),OpenBSD)
CFLAGS += -U_XOPEN_SOURCE -D_XOPEN_SOURCE=700
CXXFLAGS += -U_XOPEN_SOURCE -D_XOPEN_SOURCE=700
endif
# Data types, macros and functions related to controlling CPU affinity
# are available on Linux through GNU extensions in libc
ifeq ($(UNAME_S),Linux)
CFLAGS += -D_GNU_SOURCE
CXXFLAGS += -D_GNU_SOURCE
endif
# RLIMIT_MEMLOCK came in BSD, is not specified in POSIX.1,
# and on macOS its availability depends on enabling Darwin extensions
# similarly on DragonFly, enabling BSD extensions is necessary
ifeq ($(UNAME_S),Darwin)
CFLAGS += -D_DARWIN_C_SOURCE
CXXFLAGS += -D_DARWIN_C_SOURCE
endif
ifeq ($(UNAME_S),DragonFly)
CFLAGS += -D__BSD_VISIBLE
CXXFLAGS += -D__BSD_VISIBLE
endif
# alloca is a non-standard interface that is not visible on BSDs when
# POSIX conformance is specified, but not all of them provide a clean way
# to enable it in such cases
ifeq ($(UNAME_S),FreeBSD)
CFLAGS += -D__BSD_VISIBLE
CXXFLAGS += -D__BSD_VISIBLE
endif
ifeq ($(UNAME_S),NetBSD)
CFLAGS += -D_NETBSD_SOURCE
CXXFLAGS += -D_NETBSD_SOURCE
endif
ifeq ($(UNAME_S),OpenBSD)
CFLAGS += -D_BSD_SOURCE
CXXFLAGS += -D_BSD_SOURCE
endif
# OS specific
# TODO: support Windows
ifeq ($(filter $(UNAME_S),Linux Darwin DragonFly FreeBSD NetBSD OpenBSD Haiku),$(UNAME_S))
CFLAGS += -pthread
CXXFLAGS += -pthread
endif
# detect Windows
ifneq ($(findstring _NT,$(UNAME_S)),)
_WIN32 := 1
endif
# Windows Sockets 2 (Winsock) for network-capable apps
ifeq ($(_WIN32),1)
LWINSOCK2 := -lws2_32
endif
# Architecture specific
# TODO: probably these flags need to be tweaked on some architectures
# feel free to update the Makefile for your architecture and send a pull request or issue
ifeq ($(UNAME_M),$(filter $(UNAME_M),x86_64 i686 amd64))
ifeq ($(UNAME_S),Darwin)
CPUINFO_CMD := sysctl machdep.cpu.features machdep.cpu.leaf7_features
else ifeq ($(UNAME_S),Linux)
CPUINFO_CMD := cat /proc/cpuinfo
else ifneq (,$(filter MINGW32_NT% MINGW64_NT% MSYS_NT%,$(UNAME_S)))
CPUINFO_CMD := cat /proc/cpuinfo
else ifneq (,$(filter DragonFly FreeBSD,$(UNAME_S)))
CPUINFO_CMD := grep Features /var/run/dmesg.boot
else ifeq ($(UNAME_S),Haiku)
CPUINFO_CMD := sysinfo -cpu
endif
# x86 ISA extensions (chronological order)
ifdef CPUINFO_CMD
SSE3_M := $(shell $(CPUINFO_CMD) | grep -iwE 'PNI|SSE3')
SSSE3_M := $(shell $(CPUINFO_CMD) | grep -iw 'SSSE3')
AVX_M := $(shell $(CPUINFO_CMD) | grep -iwE 'AVX|AVX1.0')
F16C_M := $(shell $(CPUINFO_CMD) | grep -iw 'F16C')
FMA_M := $(shell $(CPUINFO_CMD) | grep -iw 'FMA')
AVX2_M := $(shell $(CPUINFO_CMD) | grep -iw 'AVX2')
AVX512F_M := $(shell $(CPUINFO_CMD) | grep -iw 'AVX512F')
AVX512VBMI_M := $(shell $(CPUINFO_CMD) | grep -iw 'AVX512VBMI')
AVX512VNNI_M := $(shell $(CPUINFO_CMD) | grep -iwE 'AVX512_VNNI|AVX512VNNI')
# AVX-512 has many subsets, so let's make it easy to disable them all
ifneq ($(filter-out 0,$(WHISPER_NO_AVX512)),)
AVX512F_M :=
AVX512VBMI_M :=
AVX512VNNI_M :=
endif
ifneq (,$(SSE3_M))
CFLAGS += -msse3
CXXFLAGS += -msse3
endif
ifneq (,$(SSSE3_M))
CFLAGS += -mssse3
CXXFLAGS += -mssse3
endif
ifneq (,$(AVX_M))
CFLAGS += -mavx
CXXFLAGS += -mavx
endif
ifneq (,$(F16C_M))
CFLAGS += -mf16c
CXXFLAGS += -mf16c
endif
ifneq (,$(FMA_M))
CFLAGS += -mfma
CXXFLAGS += -mfma
endif
ifneq (,$(AVX2_M))
CFLAGS += -mavx2
CXXFLAGS += -mavx2
endif
ifneq (,$(AVX512F_M))
CFLAGS += -mavx512f -mavx512cd -mavx512vl -mavx512dq -mavx512bw
CXXFLAGS += -mavx512f -mavx512cd -mavx512vl -mavx512dq -mavx512bw
endif
ifneq (,$(AVX512VBMI_M))
CFLAGS += -mavx512vbmi
CXXFLAGS += -mavx512vbmi
endif
ifneq (,$(AVX512VNNI_M))
CFLAGS += -mavx512vnni
CXXFLAGS += -mavx512vnni
endif
endif
endif
ifneq ($(filter ppc64%,$(UNAME_M)),)
POWER9_M := $(shell grep "POWER9" /proc/cpuinfo)
ifneq (,$(findstring POWER9,$(POWER9_M)))
CFLAGS += -mpower9-vector
endif
# Require c++23's std::byteswap for big-endian support.
ifeq ($(UNAME_M),ppc64)
CXXFLAGS += -std=c++23 -DGGML_BIG_ENDIAN
endif
endif
ifndef WHISPER_NO_ACCELERATE
# Mac M1 - include Accelerate framework
ifeq ($(UNAME_S),Darwin)
CFLAGS += -DGGML_USE_ACCELERATE
CFLAGS += -DACCELERATE_NEW_LAPACK
CFLAGS += -DACCELERATE_LAPACK_ILP64
LDFLAGS += -framework Accelerate
endif
endif
ifdef WHISPER_COREML
CXXFLAGS += -DWHISPER_USE_COREML
LDFLAGS += -framework Foundation -framework CoreML
ifdef WHISPER_COREML_ALLOW_FALLBACK
CXXFLAGS += -DWHISPER_COREML_ALLOW_FALLBACK
endif
endif
ifndef WHISPER_NO_METAL
ifeq ($(UNAME_S),Darwin)
WHISPER_METAL := 1
CFLAGS += -DGGML_USE_METAL
CXXFLAGS += -DGGML_USE_METAL
LDFLAGS += -framework Foundation -framework Metal -framework MetalKit
endif
endif
ifneq ($(filter-out 0,$(WHISPER_OPENBLAS)),) # OpenBLAS
WHISPER_OPENBLAS_INTERFACE64 ?= 0 # use 32-bit interface by default
ifneq ($(filter-out 0,$(WHISPER_OPENBLAS_INTERFACE64)),)
WHISPER_BLAS_LIB := openblas64
else
WHISPER_BLAS_LIB := openblas
endif
ifneq ($(OPENBLAS_PATH),)
WHISPER_BLAS_CFLAGS := -I$(OPENBLAS_PATH)/include
WHISPER_BLAS_LDFLAGS := -L$(OPENBLAS_PATH)/lib -l$(WHISPER_BLAS_LIB)
else
WHISPER_BLAS_LIB_PC_EXISTS := $(shell pkg-config --exists $(WHISPER_BLAS_LIB) && echo 1)
ifneq ($(filter-out 0,$(WHISPER_BLAS_LIB_PC_EXISTS)),)
WHISPER_BLAS_CFLAGS := $(shell pkg-config --cflags $(WHISPER_BLAS_LIB))
WHISPER_BLAS_LDFLAGS := $(shell pkg-config --libs $(WHISPER_BLAS_LIB))
else
WHISPER_BLAS_CFLAGS := -I/usr/include/openblas
WHISPER_BLAS_LDFLAGS := -l$(WHISPER_BLAS_LIB)
endif
endif
CFLAGS += $(WHISPER_BLAS_CFLAGS) -DGGML_USE_OPENBLAS
LDFLAGS += $(WHISPER_BLAS_LDFLAGS)
endif
ifdef WHISPER_CUBLAS
# WHISPER_CUBLAS is deprecated and will be removed in the future
WHISPER_CUDA := 1
endif
ifdef WHISPER_CUDA
ifeq ($(shell expr $(NVCC_VERSION) \>= 11.6), 1)
CUDA_ARCH_FLAG ?= native
else
CUDA_ARCH_FLAG ?= all
endif
CFLAGS += -DGGML_USE_CUDA -I/usr/local/cuda/include -I/opt/cuda/include -I$(CUDA_PATH)/targets/$(UNAME_M)-linux/include
CXXFLAGS += -DGGML_USE_CUDA -I/usr/local/cuda/include -I/opt/cuda/include -I$(CUDA_PATH)/targets/$(UNAME_M)-linux/include
LDFLAGS += -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L$(CUDA_PATH)/targets/$(UNAME_M)-linux/lib -L/usr/lib/wsl/lib
WHISPER_OBJ += ggml-cuda.o
WHISPER_OBJ += $(patsubst %.cu,%.o,$(wildcard ggml-cuda/*.cu))
NVCC = nvcc
NVCCFLAGS = --forward-unknown-to-host-compiler -arch=$(CUDA_ARCH_FLAG)
ggml-cuda/%.o: ggml-cuda/%.cu ggml-cuda/%.cuh ggml.h ggml-common.h ggml-cuda/common.cuh
$(NVCC) $(NVCCFLAGS) $(CXXFLAGS) -c $< -o $@
ggml-cuda.o: ggml-cuda.cu ggml-cuda.h ggml.h ggml-backend.h ggml-backend-impl.h ggml-common.h $(wildcard ggml-cuda/*.cuh)
$(NVCC) $(NVCCFLAGS) $(CXXFLAGS) -Wno-pedantic -c $< -o $@
endif
ifdef WHISPER_HIPBLAS
ROCM_PATH ?= /opt/rocm
HIPCC ?= $(ROCM_PATH)/bin/hipcc
GPU_TARGETS ?= $(shell $(ROCM_PATH)/llvm/bin/amdgpu-arch)
CFLAGS += -DGGML_USE_HIPBLAS -DGGML_USE_CUDA
CXXFLAGS += -DGGML_USE_HIPBLAS -DGGML_USE_CUDA
LDFLAGS += -L$(ROCM_PATH)/lib -Wl,-rpath=$(ROCM_PATH)/lib
LDFLAGS += -lhipblas -lamdhip64 -lrocblas
HIPFLAGS += $(addprefix --offload-arch=,$(GPU_TARGETS))
WHISPER_OBJ += ggml-cuda.o
WHISPER_OBJ += $(patsubst %.cu,%.o,$(wildcard ggml-cuda/*.cu))
ggml-cuda/%.o: ggml-cuda/%.cu ggml-cuda/%.cuh ggml.h ggml-common.h ggml-cuda/common.cuh
$(HIPCC) $(CXXFLAGS) $(HIPFLAGS) -x hip -c -o $@ $<
ggml-cuda.o: ggml-cuda.cu ggml-cuda.h ggml.h ggml-backend.h ggml-backend-impl.h ggml-common.h $(wildcard ggml-cuda/*.cuh)
$(HIPCC) $(CXXFLAGS) $(HIPFLAGS) -x hip -c -o $@ $<
endif
ifdef WHISPER_CLBLAST
CFLAGS += -DGGML_USE_CLBLAST
CXXFLAGS += -DGGML_USE_CLBLAST
LDFLAGS += -lclblast
ifeq ($(UNAME_S),Darwin)
LDFLAGS += -framework OpenCL
else
LDFLAGS += -lOpenCL
endif
WHISPER_OBJ += ggml-opencl.o
ggml-opencl.o: ggml-opencl.cpp ggml-opencl.h
$(CXX) $(CXXFLAGS) -c $< -o $@
endif
ifdef WHISPER_GPROF
CFLAGS += -pg
CXXFLAGS += -pg
endif
ifneq ($(filter aarch64%,$(UNAME_M)),)
CFLAGS += -mcpu=native
CXXFLAGS += -mcpu=native
endif
ifneq ($(filter armv6%,$(UNAME_M)),)
# 32-bit Raspberry Pi 1, 2, 3
CFLAGS += -mfpu=neon -mfp16-format=ieee -mno-unaligned-access
endif
ifneq ($(filter armv7%,$(UNAME_M)),)
# 32-bit ARM, for example on Armbian or possibly raspbian
#CFLAGS += -mfpu=neon -mfp16-format=ieee -funsafe-math-optimizations -mno-unaligned-access
#CXXFLAGS += -mfpu=neon -mfp16-format=ieee -funsafe-math-optimizations -mno-unaligned-access
# 64-bit ARM on 32-bit OS, use these (TODO: auto-detect 64-bit)
CFLAGS += -mfpu=neon-fp-armv8 -mfp16-format=ieee -funsafe-math-optimizations -mno-unaligned-access
CXXFLAGS += -mfpu=neon-fp-armv8 -mfp16-format=ieee -funsafe-math-optimizations -mno-unaligned-access
endif
ifneq ($(filter armv8%,$(UNAME_M)),)
# Raspberry Pi 4
CFLAGS += -mfpu=neon-fp-armv8 -mfp16-format=ieee -funsafe-math-optimizations -mno-unaligned-access
CXXFLAGS += -mfpu=neon-fp-armv8 -mfp16-format=ieee -funsafe-math-optimizations -mno-unaligned-access
endif
#
# Print build information
#
$(info I whisper.cpp build info: )
$(info I UNAME_S: $(UNAME_S))
$(info I UNAME_P: $(UNAME_P))
$(info I UNAME_M: $(UNAME_M))
$(info I CFLAGS: $(CFLAGS))
$(info I CXXFLAGS: $(CXXFLAGS))
$(info I LDFLAGS: $(LDFLAGS))
$(info I CC: $(CCV))
$(info I CXX: $(CXXV))
$(info )
ifdef WHISPER_CUBLAS
$(info !!!!)
$(info WHISPER_CUBLAS is deprecated and will be removed in the future. Use WHISPER_CUDA instead.)
$(info !!!!)
$(info )
endif
#
# Build library
#
ggml.o: ggml.c ggml.h ggml-cuda.h
$(CC) $(CFLAGS) -c $< -o $@
ggml-alloc.o: ggml-alloc.c ggml.h ggml-alloc.h
$(CC) $(CFLAGS) -c $< -o $@
ggml-backend.o: ggml-backend.c ggml.h ggml-backend.h
$(CC) $(CFLAGS) -c $< -o $@
ggml-quants.o: ggml-quants.c ggml.h ggml-quants.h
$(CC) $(CFLAGS) -c $< -o $@
WHISPER_OBJ += ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o
whisper.o: whisper.cpp whisper.h ggml.h ggml-cuda.h
$(CXX) $(CXXFLAGS) -c $< -o $@
ifndef WHISPER_COREML
WHISPER_OBJ += whisper.o
else
whisper-encoder.o: coreml/whisper-encoder.mm coreml/whisper-encoder.h
$(CXX) -O3 -I . -fobjc-arc -c coreml/whisper-encoder.mm -o whisper-encoder.o
whisper-encoder-impl.o: coreml/whisper-encoder-impl.m coreml/whisper-encoder-impl.h
$(CXX) -O3 -I . -fobjc-arc -c coreml/whisper-encoder-impl.m -o whisper-encoder-impl.o
WHISPER_OBJ += whisper.o whisper-encoder.o whisper-encoder-impl.o
endif
ifdef WHISPER_METAL
ggml-metal.o: ggml-metal.m ggml-metal.h
$(CC) $(CFLAGS) -c $< -o $@
WHISPER_OBJ += ggml-metal.o
ifdef WHISPER_METAL_EMBED_LIBRARY
CFLAGS += -DGGML_METAL_EMBED_LIBRARY
ggml-metal-embed.o: ggml-metal.metal ggml-common.h
@echo "Embedding Metal library"
$(eval TEMP_ASSEMBLY=$(shell mktemp))
$(eval TEMP_METALLIB=$(shell mktemp))
@sed "/^#include \"ggml-common.h\"/{r ggml-common.h"$$'\n'"d;}" ggml-metal.metal > $(TEMP_METALLIB)
@echo ".section __DATA, __ggml_metallib" > $(TEMP_ASSEMBLY)
@echo ".globl _ggml_metallib_start" >> $(TEMP_ASSEMBLY)
@echo "_ggml_metallib_start:" >> $(TEMP_ASSEMBLY)
@echo ".incbin \"$(TEMP_METALLIB)\"" >> $(TEMP_ASSEMBLY)
@echo ".globl _ggml_metallib_end" >> $(TEMP_ASSEMBLY)
@echo "_ggml_metallib_end:" >> $(TEMP_ASSEMBLY)
@$(AS) $(TEMP_ASSEMBLY) -o $@
@rm -f $(TEMP_ASSEMBLY) $(TEMP_METALLIB)
WHISPER_OBJ += ggml-metal-embed.o
endif
endif
libwhisper.a: $(WHISPER_OBJ)
$(AR) rcs libwhisper.a $(WHISPER_OBJ)
libwhisper.so: $(WHISPER_OBJ)
$(CXX) $(CXXFLAGS) -shared -o libwhisper.so $(WHISPER_OBJ) $(LDFLAGS)
clean:
rm -f *.o main stream command talk talk-llama bench quantize server lsp libwhisper.a libwhisper.so
#
# Examples
#
CC_SDL=`sdl2-config --cflags --libs`
SRC_COMMON = examples/common.cpp examples/common-ggml.cpp examples/grammar-parser.cpp
SRC_COMMON_SDL = examples/common-sdl.cpp
main: examples/main/main.cpp $(SRC_COMMON) $(WHISPER_OBJ)
$(CXX) $(CXXFLAGS) examples/main/main.cpp $(SRC_COMMON) $(WHISPER_OBJ) -o main $(LDFLAGS)
./main -h
bench: examples/bench/bench.cpp $(WHISPER_OBJ)
$(CXX) $(CXXFLAGS) examples/bench/bench.cpp $(WHISPER_OBJ) -o bench $(LDFLAGS)
quantize: examples/quantize/quantize.cpp $(WHISPER_OBJ) $(SRC_COMMON)
$(CXX) $(CXXFLAGS) examples/quantize/quantize.cpp $(SRC_COMMON) $(WHISPER_OBJ) -o quantize $(LDFLAGS)
server: examples/server/server.cpp $(SRC_COMMON) $(WHISPER_OBJ)
$(CXX) $(CXXFLAGS) examples/server/server.cpp $(SRC_COMMON) $(WHISPER_OBJ) -o server $(LDFLAGS) $(LWINSOCK2)
stream: examples/stream/stream.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) $(WHISPER_OBJ)
$(CXX) $(CXXFLAGS) examples/stream/stream.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) $(WHISPER_OBJ) -o stream $(CC_SDL) $(LDFLAGS)
command: examples/command/command.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) $(WHISPER_OBJ)
$(CXX) $(CXXFLAGS) examples/command/command.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) $(WHISPER_OBJ) -o command $(CC_SDL) $(LDFLAGS)
lsp: examples/lsp/lsp.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) $(WHISPER_OBJ)
$(CXX) $(CXXFLAGS) examples/lsp/lsp.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) $(WHISPER_OBJ) -o lsp $(CC_SDL) $(LDFLAGS)
talk: examples/talk/talk.cpp examples/talk/gpt-2.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) $(WHISPER_OBJ)
$(CXX) $(CXXFLAGS) examples/talk/talk.cpp examples/talk/gpt-2.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) $(WHISPER_OBJ) -o talk $(CC_SDL) $(LDFLAGS)
talk-llama: examples/talk-llama/talk-llama.cpp examples/talk-llama/llama.cpp examples/talk-llama/unicode.cpp examples/talk-llama/unicode-data.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) $(WHISPER_OBJ)
$(CXX) $(CXXFLAGS) examples/talk-llama/talk-llama.cpp examples/talk-llama/llama.cpp examples/talk-llama/unicode.cpp examples/talk-llama/unicode-data.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) $(WHISPER_OBJ) -o talk-llama $(CC_SDL) $(LDFLAGS)
#
# Audio samples
#
.PHONY: build
build:
cmake -B build
cmake --build build --config Release
# download a few audio samples into folder "./samples":
.PHONY: samples
samples:
@ -507,17 +18,6 @@ samples:
@wget --quiet --show-progress -O samples/mm1.wav https://cdn.openai.com/whisper/draft-20220913a/micro-machines.wav
@wget --quiet --show-progress -O samples/a13.mp3 https://upload.wikimedia.org/wikipedia/commons/transcoded/6/6f/Apollo13-wehaveaproblem.ogg/Apollo13-wehaveaproblem.ogg.mp3
@wget --quiet --show-progress -O samples/diffusion2023-07-03.flac https://archive.org/download/diffusion2023-07-03/diffusion2023-07-03.flac
@echo "Converting to 16-bit WAV ..."
@ffmpeg -loglevel -0 -y -i samples/gb0.ogg -ar 16000 -ac 1 -c:a pcm_s16le samples/gb0.wav
@ffmpeg -loglevel -0 -y -i samples/gb1.ogg -ar 16000 -ac 1 -c:a pcm_s16le samples/gb1.wav
@ffmpeg -loglevel -0 -y -i samples/hp0.ogg -ar 16000 -ac 1 -c:a pcm_s16le samples/hp0.wav
@rm samples/*.ogg
@ffmpeg -loglevel -0 -y -i samples/mm1.wav -ar 16000 -ac 1 -c:a pcm_s16le samples/mm0.wav
@rm samples/mm1.wav
@ffmpeg -loglevel -0 -y -i samples/a13.mp3 -ar 16000 -ac 1 -c:a pcm_s16le -ss 00:00:00 -to 00:00:30 samples/a13.wav
@rm samples/a13.mp3
@ffmpeg -loglevel -0 -y -i samples/diffusion2023-07-03.flac -ar 16000 -ac 1 -c:a pcm_s16le samples/diffusion2023-07-03.wav
@rm samples/diffusion2023-07-03.flac
#
# Models
@ -537,27 +37,22 @@ samples:
.PHONY: large-v1
.PHONY: large-v2
.PHONY: large-v3
.PHONY: large-v3-turbo
tiny.en tiny base.en base small.en small medium.en medium large-v1 large-v2 large-v3: main
tiny.en tiny base.en base small.en small medium.en medium large-v1 large-v2 large-v3 large-v3-turbo:
bash ./models/download-ggml-model.sh $@
cmake -B build
cmake --build build --config Release
@echo ""
@echo "==============================================="
@echo "Running $@ on all samples in ./samples ..."
@echo "==============================================="
@echo ""
@for f in samples/*.wav; do \
@for f in samples/*$(.flac .mp3 .ogg .wav); do \
echo "----------------------------------------------" ; \
echo "[+] Running $@ on $$f ... (run 'ffplay $$f' to listen)" ; \
echo "----------------------------------------------" ; \
echo "" ; \
./main -m models/ggml-$@.bin -f $$f ; \
./build/bin/whisper-cli -m models/ggml-$@.bin -f $$f ; \
echo "" ; \
done
#
# Tests
#
.PHONY: tests
tests:
bash ./tests/run-tests.sh $(word 2, $(MAKECMDGOALS))

View File

@ -1,61 +0,0 @@
// swift-tools-version:5.5
import PackageDescription
let package = Package(
name: "whisper",
platforms: [
.macOS(.v12),
.iOS(.v14),
.watchOS(.v4),
.tvOS(.v14)
],
products: [
.library(name: "whisper", targets: ["whisper"]),
],
targets: [
.target(
name: "whisper",
path: ".",
exclude: [
"bindings",
"cmake",
"coreml",
"examples",
"extra",
"models",
"samples",
"tests",
"CMakeLists.txt",
"ggml-cuda.cu",
"ggml-cuda.h",
"Makefile"
],
sources: [
"ggml.c",
"whisper.cpp",
"ggml-alloc.c",
"ggml-backend.c",
"ggml-quants.c",
"ggml-metal.m"
],
resources: [.process("ggml-metal.metal")],
publicHeadersPath: "spm-headers",
cSettings: [
.unsafeFlags(["-Wno-shorten-64-to-32", "-O3", "-DNDEBUG"]),
.define("GGML_USE_ACCELERATE"),
.unsafeFlags(["-fno-objc-arc"]),
.define("GGML_USE_METAL")
// NOTE: NEW_LAPACK will required iOS version 16.4+
// We should consider add this in the future when we drop support for iOS 14
// (ref: ref: https://developer.apple.com/documentation/accelerate/1513264-cblas_sgemm?language=objc)
// .define("ACCELERATE_NEW_LAPACK"),
// .define("ACCELERATE_LAPACK_ILP64")
],
linkerSettings: [
.linkedFramework("Accelerate")
]
)
],
cxxLanguageStandard: .cxx11
)

440
README.md
View File

@ -4,24 +4,29 @@
[![Actions Status](https://github.com/ggerganov/whisper.cpp/workflows/CI/badge.svg)](https://github.com/ggerganov/whisper.cpp/actions)
[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Conan Center](https://shields.io/conan/v/whisper-cpp)](https://conan.io/center/whisper-cpp)
[![npm](https://img.shields.io/npm/v/whisper.cpp.svg)](https://www.npmjs.com/package/whisper.cpp/)
Stable: [v1.6.2](https://github.com/ggerganov/whisper.cpp/releases/tag/v1.6.0) / [Roadmap | F.A.Q.](https://github.com/ggerganov/whisper.cpp/discussions/126)
> [!NOTE]
> New maintenance roadmap: https://github.com/ggerganov/whisper.cpp/discussions/2788
Stable: [v1.7.4](https://github.com/ggerganov/whisper.cpp/releases/tag/v1.7.4) / [Roadmap | F.A.Q.](https://github.com/ggerganov/whisper.cpp/discussions/126)
High-performance inference of [OpenAI's Whisper](https://github.com/openai/whisper) automatic speech recognition (ASR) model:
- Plain C/C++ implementation without dependencies
- Apple Silicon first-class citizen - optimized via ARM NEON, Accelerate framework, Metal and [Core ML](https://github.com/ggerganov/whisper.cpp#core-ml-support)
- Apple Silicon first-class citizen - optimized via ARM NEON, Accelerate framework, Metal and [Core ML](#core-ml-support)
- AVX intrinsics support for x86 architectures
- VSX intrinsics support for POWER architectures
- [VSX intrinsics support for POWER architectures](#power-vsx-intrinsics)
- Mixed F16 / F32 precision
- [4-bit and 5-bit integer quantization support](https://github.com/ggerganov/whisper.cpp#quantization)
- [Integer quantization support](#quantization)
- Zero memory allocations at runtime
- [Vulkan support](#vulkan-gpu-support)
- Support for CPU-only inference
- [Efficient GPU support for NVIDIA](https://github.com/ggerganov/whisper.cpp#nvidia-gpu-support-via-cublas)
- [Partial OpenCL GPU support via CLBlast](https://github.com/ggerganov/whisper.cpp#opencl-gpu-support-via-clblast)
- [OpenVINO Support](https://github.com/ggerganov/whisper.cpp#openvino-support)
- [C-style API](https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h)
- [Efficient GPU support for NVIDIA](#nvidia-gpu-support)
- [OpenVINO Support](#openvino-support)
- [Ascend NPU Support](#ascend-npu-support)
- [C-style API](https://github.com/ggerganov/whisper.cpp/blob/master/include/whisper.h)
Supported platforms:
@ -33,9 +38,9 @@ Supported platforms:
- [x] [WebAssembly](examples/whisper.wasm)
- [x] Windows ([MSVC](https://github.com/ggerganov/whisper.cpp/blob/master/.github/workflows/build.yml#L117-L144) and [MinGW](https://github.com/ggerganov/whisper.cpp/issues/168)]
- [x] [Raspberry Pi](https://github.com/ggerganov/whisper.cpp/discussions/166)
- [x] [docker](https://github.com/ggerganov/whisper.cpp/pkgs/container/whisper.cpp)
- [x] [Docker](https://github.com/ggerganov/whisper.cpp/pkgs/container/whisper.cpp)
The entire high-level implementation of the model is contained in [whisper.h](whisper.h) and [whisper.cpp](whisper.cpp).
The entire high-level implementation of the model is contained in [whisper.h](include/whisper.h) and [whisper.cpp](src/whisper.cpp).
The rest of the code is part of the [`ggml`](https://github.com/ggerganov/ggml) machine learning library.
Having such a lightweight implementation of the model allows to easily integrate it in different platforms and applications.
@ -51,18 +56,6 @@ On Apple Silicon, the inference runs fully on the GPU via Metal:
https://github.com/ggerganov/whisper.cpp/assets/1991296/c82e8f86-60dc-49f2-b048-d2fdbd6b5225
Or you can even run it straight in the browser: [talk.wasm](examples/talk.wasm)
## Implementation details
- The core tensor operations are implemented in C ([ggml.h](ggml.h) / [ggml.c](ggml.c))
- The transformer model and the high-level C-style API are implemented in C++ ([whisper.h](whisper.h) / [whisper.cpp](whisper.cpp))
- Sample usage is demonstrated in [main.cpp](examples/main)
- Sample real-time audio transcription from the microphone is demonstrated in [stream.cpp](examples/stream)
- Various other examples are available in the [examples](examples) folder
The tensor operators are optimized heavily for Apple silicon CPUs. Depending on the computation size, Arm Neon SIMD intrinsics or CBLAS Accelerate framework routines are used. The latter are especially effective for bigger sizes since the Accelerate framework utilizes the special-purpose AMX coprocessor available in modern Apple products.
## Quick start
First clone the repository:
@ -71,140 +64,38 @@ First clone the repository:
git clone https://github.com/ggerganov/whisper.cpp.git
```
Navigate into the directory:
```
cd whisper.cpp
```
Then, download one of the Whisper [models](models/README.md) converted in [`ggml` format](#ggml-format). For example:
```bash
bash ./models/download-ggml-model.sh base.en
sh ./models/download-ggml-model.sh base.en
```
Now build the [main](examples/main) example and transcribe an audio file like this:
Now build the [whisper-cli](examples/cli) example and transcribe an audio file like this:
```bash
# build the main example
make
# build the project
cmake -B build
cmake --build build --config Release
# transcribe an audio file
./main -f samples/jfk.wav
./build/bin/whisper-cli -f samples/jfk.wav
```
---
For a quick demo, simply run `make base.en`:
```text
$ make base.en
cc -I. -O3 -std=c11 -pthread -DGGML_USE_ACCELERATE -c ggml.c -o ggml.o
c++ -I. -I./examples -O3 -std=c++11 -pthread -c whisper.cpp -o whisper.o
c++ -I. -I./examples -O3 -std=c++11 -pthread examples/main/main.cpp whisper.o ggml.o -o main -framework Accelerate
./main -h
usage: ./main [options] file0.wav file1.wav ...
options:
-h, --help [default] show this help message and exit
-t N, --threads N [4 ] number of threads to use during computation
-p N, --processors N [1 ] number of processors to use during computation
-ot N, --offset-t N [0 ] time offset in milliseconds
-on N, --offset-n N [0 ] segment index offset
-d N, --duration N [0 ] duration of audio to process in milliseconds
-mc N, --max-context N [-1 ] maximum number of text context tokens to store
-ml N, --max-len N [0 ] maximum segment length in characters
-sow, --split-on-word [false ] split on word rather than on token
-bo N, --best-of N [5 ] number of best candidates to keep
-bs N, --beam-size N [5 ] beam size for beam search
-wt N, --word-thold N [0.01 ] word timestamp probability threshold
-et N, --entropy-thold N [2.40 ] entropy threshold for decoder fail
-lpt N, --logprob-thold N [-1.00 ] log probability threshold for decoder fail
-debug, --debug-mode [false ] enable debug mode (eg. dump log_mel)
-tr, --translate [false ] translate from source language to english
-di, --diarize [false ] stereo audio diarization
-tdrz, --tinydiarize [false ] enable tinydiarize (requires a tdrz model)
-nf, --no-fallback [false ] do not use temperature fallback while decoding
-otxt, --output-txt [false ] output result in a text file
-ovtt, --output-vtt [false ] output result in a vtt file
-osrt, --output-srt [false ] output result in a srt file
-olrc, --output-lrc [false ] output result in a lrc file
-owts, --output-words [false ] output script for generating karaoke video
-fp, --font-path [/System/Library/Fonts/Supplemental/Courier New Bold.ttf] path to a monospace font for karaoke video
-ocsv, --output-csv [false ] output result in a CSV file
-oj, --output-json [false ] output result in a JSON file
-ojf, --output-json-full [false ] include more information in the JSON file
-of FNAME, --output-file FNAME [ ] output file path (without file extension)
-ps, --print-special [false ] print special tokens
-pc, --print-colors [false ] print colors
-pp, --print-progress [false ] print progress
-nt, --no-timestamps [false ] do not print timestamps
-l LANG, --language LANG [en ] spoken language ('auto' for auto-detect)
-dl, --detect-language [false ] exit after automatically detecting language
--prompt PROMPT [ ] initial prompt
-m FNAME, --model FNAME [models/ggml-base.en.bin] model path
-f FNAME, --file FNAME [ ] input WAV file path
-oved D, --ov-e-device DNAME [CPU ] the OpenVINO device used for encode inference
-ls, --log-score [false ] log best decoder scores of tokens
-ng, --no-gpu [false ] disable GPU
bash ./models/download-ggml-model.sh base.en
Downloading ggml model base.en ...
ggml-base.en.bin 100%[========================>] 141.11M 6.34MB/s in 24s
Done! Model 'base.en' saved in 'models/ggml-base.en.bin'
You can now use it like this:
$ ./main -m models/ggml-base.en.bin -f samples/jfk.wav
===============================================
Running base.en on all samples in ./samples ...
===============================================
----------------------------------------------
[+] Running base.en on samples/jfk.wav ... (run 'ffplay samples/jfk.wav' to listen)
----------------------------------------------
whisper_init_from_file: loading model from 'models/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: f16 = 1
whisper_model_load: type = 2
whisper_model_load: mem required = 215.00 MB (+ 6.00 MB per decoder)
whisper_model_load: kv self size = 5.25 MB
whisper_model_load: kv cross size = 17.58 MB
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx = 140.60 MB
whisper_model_load: model size = 140.54 MB
system_info: n_threads = 4 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...
[00:00:00.000 --> 00:00:11.000] And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: load time = 113.81 ms
whisper_print_timings: mel time = 15.40 ms
whisper_print_timings: sample time = 11.58 ms / 27 runs ( 0.43 ms per run)
whisper_print_timings: encode time = 266.60 ms / 1 runs ( 266.60 ms per run)
whisper_print_timings: decode time = 66.11 ms / 27 runs ( 2.45 ms per run)
whisper_print_timings: total time = 476.31 ms
```
For a quick demo, simply run `make base.en`.
The command downloads the `base.en` model converted to custom `ggml` format and runs the inference on all `.wav` samples in the folder `samples`.
For detailed usage instructions, run: `./main -h`
For detailed usage instructions, run: `./build/bin/whisper-cli -h`
Note that the [main](examples/main) example currently runs only with 16-bit WAV files, so make sure to convert your input before running the tool.
Note that the [whisper-cli](examples/cli) example currently runs only with 16-bit WAV files, so make sure to convert your input before running the tool.
For example, you can use `ffmpeg` like this:
```bash
@ -216,7 +107,7 @@ ffmpeg -i input.mp3 -ar 16000 -ac 1 -c:a pcm_s16le output.wav
If you want some extra audio samples to play with, simply run:
```
make samples
make -j samples
```
This will download a few more audio files from Wikipedia and convert them to 16-bit WAV format via `ffmpeg`.
@ -224,17 +115,18 @@ This will download a few more audio files from Wikipedia and convert them to 16-
You can download and run the other models as follows:
```
make tiny.en
make tiny
make base.en
make base
make small.en
make small
make medium.en
make medium
make large-v1
make large-v2
make large-v3
make -j tiny.en
make -j tiny
make -j base.en
make -j base
make -j small.en
make -j small
make -j medium.en
make -j medium
make -j large-v1
make -j large-v2
make -j large-v3
make -j large-v3-turbo
```
## Memory usage
@ -247,6 +139,20 @@ make large-v3
| medium | 1.5 GiB | ~2.1 GB |
| large | 2.9 GiB | ~3.9 GB |
## POWER VSX Intrinsics
`whisper.cpp` supports POWER architectures and includes code which
significantly speeds operation on Linux running on POWER9/10, making it
capable of faster-than-realtime transcription on underclocked Raptor
Talos II. Ensure you have a BLAS package installed, and replace the
standard cmake setup with:
```bash
# build with GGML_BLAS defined
cmake -B build -DGGML_BLAS=1
cmake --build build --config Release
./build/bin/whisper-cli [ .. etc .. ]
## Quantization
`whisper.cpp` supports integer quantization of the Whisper `ggml` models.
@ -256,11 +162,12 @@ Here are the steps for creating and using a quantized model:
```bash
# quantize a model with Q5_0 method
make quantize
./quantize models/ggml-base.en.bin models/ggml-base.en-q5_0.bin q5_0
cmake -B build
cmake --build build --config Release
./build/bin/quantize models/ggml-base.en.bin models/ggml-base.en-q5_0.bin q5_0
# run the examples as usual, specifying the quantized model file
./main -m models/ggml-base.en-q5_0.bin ./samples/gb0.wav
./build/bin/whisper-cli -m models/ggml-base.en-q5_0.bin ./samples/gb0.wav
```
## Core ML support
@ -277,11 +184,11 @@ speed-up - more than x3 faster compared with CPU-only execution. Here are the in
```
- To ensure `coremltools` operates correctly, please confirm that [Xcode](https://developer.apple.com/xcode/) is installed and execute `xcode-select --install` to install the command-line tools.
- Python 3.10 is recommended.
- Python 3.11 is recommended.
- MacOS Sonoma (version 14) or newer is recommended, as older versions of MacOS might experience issues with transcription hallucination.
- [OPTIONAL] It is recommended to utilize a Python version management system, such as [Miniconda](https://docs.conda.io/en/latest/miniconda.html) for this step:
- To create an environment, use: `conda create -n py310-whisper python=3.10 -y`
- To activate the environment, use: `conda activate py310-whisper`
- To create an environment, use: `conda create -n py311-whisper python=3.11 -y`
- To activate the environment, use: `conda activate py311-whisper`
- Generate a Core ML model. For example, to generate a `base.en` model, use:
@ -294,10 +201,6 @@ speed-up - more than x3 faster compared with CPU-only execution. Here are the in
- Build `whisper.cpp` with Core ML support:
```bash
# using Makefile
make clean
WHISPER_COREML=1 make -j
# using CMake
cmake -B build -DWHISPER_COREML=1
cmake --build build -j --config Release
@ -306,7 +209,7 @@ speed-up - more than x3 faster compared with CPU-only execution. Here are the in
- Run the examples as usual. For example:
```text
$ ./main -m models/ggml-base.en.bin -f samples/jfk.wav
$ ./build/bin/whisper-cli -m models/ggml-base.en.bin -f samples/jfk.wav
...
@ -390,7 +293,7 @@ This can result in significant speedup in encoder performance. Here are the inst
- Run the examples as usual. For example:
```text
$ ./main -m models/ggml-base.en.bin -f samples/jfk.wav
$ ./build/bin/whisper-cli -m models/ggml-base.en.bin -f samples/jfk.wav
...
@ -407,7 +310,7 @@ This can result in significant speedup in encoder performance. Here are the inst
The first time run on an OpenVINO device is slow, since the OpenVINO framework will compile the IR (Intermediate Representation) model to a device-specific 'blob'. This device-specific blob will get
cached for the next run.
For more information about the Core ML implementation please refer to PR [#1037](https://github.com/ggerganov/whisper.cpp/pull/1037).
For more information about the OpenVINO implementation please refer to PR [#1037](https://github.com/ggerganov/whisper.cpp/pull/1037).
## NVIDIA GPU support
@ -417,31 +320,19 @@ First, make sure you have installed `cuda`: https://developer.nvidia.com/cuda-do
Now build `whisper.cpp` with CUDA support:
```
make clean
WHISPER_CUDA=1 make -j
```
## OpenCL GPU support via CLBlast
For cards and integrated GPUs that support OpenCL, the Encoder processing can be largely offloaded to the GPU through CLBlast. This is especially useful for users with AMD APUs or low end devices for up to ~2x speedup.
First, make sure you have installed `CLBlast` for your OS or Distribution: https://github.com/CNugteren/CLBlast
Now build `whisper.cpp` with CLBlast support:
```
Makefile:
cd whisper.cpp
make clean
WHISPER_CLBLAST=1 make -j
CMake:
cd whisper.cpp
cmake -B build -DWHISPER_CLBLAST=ON
cmake -B build -DGGML_CUDA=1
cmake --build build -j --config Release
```
Run all the examples as usual.
## Vulkan GPU support
Cross-vendor solution which allows you to accelerate workload on your GPU.
First, make sure your graphics card driver provides support for Vulkan API.
Now build `whisper.cpp` with Vulkan support:
```
cmake -B build -DGGML_VULKAN=1
cmake --build build -j --config Release
```
## BLAS CPU support via OpenBLAS
@ -451,25 +342,41 @@ First, make sure you have installed `openblas`: https://www.openblas.net/
Now build `whisper.cpp` with OpenBLAS support:
```
make clean
WHISPER_OPENBLAS=1 make -j
cmake -B build -DGGML_BLAS=1
cmake --build build -j --config Release
```
## BLAS CPU support via Intel MKL
## Ascend NPU support
Encoder processing can be accelerated on the CPU via the BLAS compatible interface of Intel's Math Kernel Library.
First, make sure you have installed Intel's MKL runtime and development packages: https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-download.html
Ascend NPU provides inference acceleration via [`CANN`](https://www.hiascend.com/en/software/cann) and AI cores.
Now build `whisper.cpp` with Intel MKL BLAS support:
First, check if your Ascend NPU device is supported:
**Verified devices**
| Ascend NPU | Status |
|:-----------------------------:|:-------:|
| Atlas 300T A2 | Support |
Then, make sure you have installed [`CANN toolkit`](https://www.hiascend.com/en/software/cann/community) . The lasted version of CANN is recommanded.
Now build `whisper.cpp` with CANN support:
```
source /opt/intel/oneapi/setvars.sh
mkdir build
cd build
cmake -DWHISPER_MKL=ON ..
WHISPER_MKL=1 make -j
cmake -B build -DGGML_CANN=1
cmake --build build -j --config Release
```
Run the inference examples as usual, for example:
```
./build/bin/whisper-cli -f samples/jfk.wav -m models/ggml-base.en.bin -t 8
```
*Notes:*
- If you have trouble with Ascend NPU device, please create a issue with **[CANN]** prefix/tag.
- If you run successfully with your Ascend NPU device, please help update the table `Verified devices`.
## Docker
### Prerequisites
@ -502,102 +409,31 @@ docker run -it --rm \
whisper.cpp:main "./main -m /models/ggml-base.bin -f ./samples/jfk.wav"
```
## Installing with Conan
You can install pre-built binaries for whisper.cpp or build it from source using [Conan](https://conan.io/). Use the following command:
```
conan install --requires="whisper-cpp/[*]" --build=missing
```
For detailed instructions on how to use Conan, please refer to the [Conan documentation](https://docs.conan.io/2/).
## Limitations
- Inference only
## Another example
Here is another example of transcribing a [3:24 min speech](https://upload.wikimedia.org/wikipedia/commons/1/1f/George_W_Bush_Columbia_FINAL.ogg)
in about half a minute on a MacBook M1 Pro, using `medium.en` model:
<details>
<summary>Expand to see the result</summary>
```text
$ ./main -m models/ggml-medium.en.bin -f samples/gb1.wav -t 8
whisper_init_from_file: loading model from 'models/ggml-medium.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1024
whisper_model_load: n_text_head = 16
whisper_model_load: n_text_layer = 24
whisper_model_load: n_mels = 80
whisper_model_load: f16 = 1
whisper_model_load: type = 4
whisper_model_load: mem required = 1720.00 MB (+ 43.00 MB per decoder)
whisper_model_load: kv self size = 42.00 MB
whisper_model_load: kv cross size = 140.62 MB
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx = 1462.35 MB
whisper_model_load: model size = 1462.12 MB
system_info: n_threads = 8 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
main: processing 'samples/gb1.wav' (3179750 samples, 198.7 sec), 8 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...
[00:00:00.000 --> 00:00:08.000] My fellow Americans, this day has brought terrible news and great sadness to our country.
[00:00:08.000 --> 00:00:17.000] At nine o'clock this morning, Mission Control in Houston lost contact with our Space Shuttle Columbia.
[00:00:17.000 --> 00:00:23.000] A short time later, debris was seen falling from the skies above Texas.
[00:00:23.000 --> 00:00:29.000] The Columbia's lost. There are no survivors.
[00:00:29.000 --> 00:00:32.000] On board was a crew of seven.
[00:00:32.000 --> 00:00:39.000] Colonel Rick Husband, Lieutenant Colonel Michael Anderson, Commander Laurel Clark,
[00:00:39.000 --> 00:00:48.000] Captain David Brown, Commander William McCool, Dr. Kultna Shavla, and Ilan Ramon,
[00:00:48.000 --> 00:00:52.000] a colonel in the Israeli Air Force.
[00:00:52.000 --> 00:00:58.000] These men and women assumed great risk in the service to all humanity.
[00:00:58.000 --> 00:01:03.000] In an age when space flight has come to seem almost routine,
[00:01:03.000 --> 00:01:07.000] it is easy to overlook the dangers of travel by rocket
[00:01:07.000 --> 00:01:12.000] and the difficulties of navigating the fierce outer atmosphere of the Earth.
[00:01:12.000 --> 00:01:18.000] These astronauts knew the dangers, and they faced them willingly,
[00:01:18.000 --> 00:01:23.000] knowing they had a high and noble purpose in life.
[00:01:23.000 --> 00:01:31.000] Because of their courage and daring and idealism, we will miss them all the more.
[00:01:31.000 --> 00:01:36.000] All Americans today are thinking as well of the families of these men and women
[00:01:36.000 --> 00:01:40.000] who have been given this sudden shock and grief.
[00:01:40.000 --> 00:01:45.000] You're not alone. Our entire nation grieves with you,
[00:01:45.000 --> 00:01:52.000] and those you love will always have the respect and gratitude of this country.
[00:01:52.000 --> 00:01:56.000] The cause in which they died will continue.
[00:01:56.000 --> 00:02:04.000] Mankind is led into the darkness beyond our world by the inspiration of discovery
[00:02:04.000 --> 00:02:11.000] and the longing to understand. Our journey into space will go on.
[00:02:11.000 --> 00:02:16.000] In the skies today, we saw destruction and tragedy.
[00:02:16.000 --> 00:02:22.000] Yet farther than we can see, there is comfort and hope.
[00:02:22.000 --> 00:02:29.000] In the words of the prophet Isaiah, "Lift your eyes and look to the heavens
[00:02:29.000 --> 00:02:35.000] who created all these. He who brings out the starry hosts one by one
[00:02:35.000 --> 00:02:39.000] and calls them each by name."
[00:02:39.000 --> 00:02:46.000] Because of His great power and mighty strength, not one of them is missing.
[00:02:46.000 --> 00:02:55.000] The same Creator who names the stars also knows the names of the seven souls we mourn today.
[00:02:55.000 --> 00:03:01.000] The crew of the shuttle Columbia did not return safely to earth,
[00:03:01.000 --> 00:03:05.000] yet we can pray that all are safely home.
[00:03:05.000 --> 00:03:13.000] May God bless the grieving families, and may God continue to bless America.
[00:03:13.000 --> 00:03:19.000] [Silence]
whisper_print_timings: fallbacks = 1 p / 0 h
whisper_print_timings: load time = 569.03 ms
whisper_print_timings: mel time = 146.85 ms
whisper_print_timings: sample time = 238.66 ms / 553 runs ( 0.43 ms per run)
whisper_print_timings: encode time = 18665.10 ms / 9 runs ( 2073.90 ms per run)
whisper_print_timings: decode time = 13090.93 ms / 549 runs ( 23.85 ms per run)
whisper_print_timings: total time = 32733.52 ms
```
</details>
## Real-time audio input example
This is a naive example of performing real-time inference on audio from your microphone.
The [stream](examples/stream) tool samples the audio every half a second and runs the transcription continuously.
More info is available in [issue #10](https://github.com/ggerganov/whisper.cpp/issues/10).
More info is available in [issue #10](https://github.com/ggerganov/whisper.cpp/issues/10).
You will need to have [sdl2](https://wiki.libsdl.org/SDL2/Installation) installed for it to work properly.
```bash
make stream
./stream -m ./models/ggml-base.en.bin -t 8 --step 500 --length 5000
cmake -B build -DWHISPER_SDL2=ON
cmake --build build --config Release
./build/bin/whisper-stream -m ./models/ggml-base.en.bin -t 8 --step 500 --length 5000
```
https://user-images.githubusercontent.com/1991296/194935793-76afede7-cfa8-48d8-a80f-28ba83be7d09.mp4
@ -608,7 +444,7 @@ Adding the `--print-colors` argument will print the transcribed text using an ex
to highlight words with high or low confidence:
```bash
./main -m models/ggml-base.en.bin -f samples/gb0.wav --print-colors
./build/bin/whisper-cli -m models/ggml-base.en.bin -f samples/gb0.wav --print-colors
```
<img width="965" alt="image" src="https://user-images.githubusercontent.com/1991296/197356445-311c8643-9397-4e5e-b46e-0b4b4daa2530.png">
@ -618,7 +454,7 @@ to highlight words with high or low confidence:
For example, to limit the line length to a maximum of 16 characters, simply add `-ml 16`:
```text
$ ./main -m ./models/ggml-base.en.bin -f ./samples/jfk.wav -ml 16
$ ./build/bin/whisper-cli -m ./models/ggml-base.en.bin -f ./samples/jfk.wav -ml 16
whisper_model_load: loading model from './models/ggml-base.en.bin'
...
@ -642,7 +478,7 @@ main: processing './samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 pr
The `--max-len` argument can be used to obtain word-level timestamps. Simply use `-ml 1`:
```text
$ ./main -m ./models/ggml-base.en.bin -f ./samples/jfk.wav -ml 1
$ ./build/bin/whisper-cli -m ./models/ggml-base.en.bin -f ./samples/jfk.wav -ml 1
whisper_model_load: loading model from './models/ggml-base.en.bin'
...
@ -689,7 +525,7 @@ Sample usage:
./models/download-ggml-model.sh small.en-tdrz
# run as usual, adding the "-tdrz" command-line argument
./main -f ./samples/a13.wav -m ./models/ggml-small.en-tdrz.bin -tdrz
./build/bin/whisper-cli -f ./samples/a13.wav -m ./models/ggml-small.en-tdrz.bin -tdrz
...
main: processing './samples/a13.wav' (480000 samples, 30.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, tdrz = 1, timestamps = 1 ...
...
@ -706,14 +542,14 @@ main: processing './samples/a13.wav' (480000 samples, 30.0 sec), 4 threads, 1 pr
## Karaoke-style movie generation (experimental)
The [main](examples/main) example provides support for output of karaoke-style movies, where the
The [whisper-cli](examples/cli) example provides support for output of karaoke-style movies, where the
currently pronounced word is highlighted. Use the `-wts` argument and run the generated bash script.
This requires to have `ffmpeg` installed.
Here are a few *"typical"* examples:
Here are a few _"typical"_ examples:
```bash
./main -m ./models/ggml-base.en.bin -f ./samples/jfk.wav -owts
./build/bin/whisper-cli -m ./models/ggml-base.en.bin -f ./samples/jfk.wav -owts
source ./samples/jfk.wav.wts
ffplay ./samples/jfk.wav.mp4
```
@ -723,7 +559,7 @@ https://user-images.githubusercontent.com/1991296/199337465-dbee4b5e-9aeb-48a3-b
---
```bash
./main -m ./models/ggml-base.en.bin -f ./samples/mm0.wav -owts
./build/bin/whisper-cli -m ./models/ggml-base.en.bin -f ./samples/mm0.wav -owts
source ./samples/mm0.wav.wts
ffplay ./samples/mm0.wav.mp4
```
@ -733,7 +569,7 @@ https://user-images.githubusercontent.com/1991296/199337504-cc8fd233-0cb7-4920-9
---
```bash
./main -m ./models/ggml-base.en.bin -f ./samples/gb0.wav -owts
./build/bin/whisper-cli -m ./models/ggml-base.en.bin -f ./samples/gb0.wav -owts
source ./samples/gb0.wav.wts
ffplay ./samples/gb0.wav.mp4
```
@ -758,12 +594,12 @@ https://user-images.githubusercontent.com/1991296/223206245-2d36d903-cf8e-4f09-8
## Benchmarks
In order to have an objective comparison of the performance of the inference across different system configurations,
use the [bench](examples/bench) tool. The tool simply runs the Encoder part of the model and prints how much time it
use the [whisper-bench](examples/bench) tool. The tool simply runs the Encoder part of the model and prints how much time it
took to execute it. The results are summarized in the following Github issue:
[Benchmark results](https://github.com/ggerganov/whisper.cpp/issues/89)
Additionally a script to run whisper.cpp with different models and audio files is provided [bench.py](bench.py).
Additionally a script to run whisper.cpp with different models and audio files is provided [bench.py](scripts/bench.py).
You can run it with the following command, by default it will run against any standard model in the models folder.
@ -810,6 +646,7 @@ For more details, see the conversion script [models/convert-pt-to-ggml.py](model
- [stlukey/whispercpp.py](https://github.com/stlukey/whispercpp.py) (Cython)
- [AIWintermuteAI/whispercpp](https://github.com/AIWintermuteAI/whispercpp) (Updated fork of aarnphm/whispercpp)
- [aarnphm/whispercpp](https://github.com/aarnphm/whispercpp) (Pybind11)
- [abdeladim-s/pywhispercpp](https://github.com/abdeladim-s/pywhispercpp) (Pybind11)
- [x] R: [bnosac/audio.whisper](https://github.com/bnosac/audio.whisper)
- [x] Unity: [macoron/whisper.unity](https://github.com/Macoron/whisper.unity)
@ -820,13 +657,12 @@ Some of the examples are even ported to run in the browser using WebAssembly. Ch
| Example | Web | Description |
| --------------------------------------------------- | ------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------- |
| [main](examples/main) | [whisper.wasm](examples/whisper.wasm) | Tool for translating and transcribing audio using Whisper |
| [bench](examples/bench) | [bench.wasm](examples/bench.wasm) | Benchmark the performance of Whisper on your machine |
| [stream](examples/stream) | [stream.wasm](examples/stream.wasm) | Real-time transcription of raw microphone capture |
| [command](examples/command) | [command.wasm](examples/command.wasm) | Basic voice assistant example for receiving voice commands from the mic |
| [wchess](examples/wchess) | [wchess.wasm](examples/wchess) | Voice-controlled chess |
| [talk](examples/talk) | [talk.wasm](examples/talk.wasm) | Talk with a GPT-2 bot |
| [talk-llama](examples/talk-llama) | | Talk with a LLaMA bot |
| [whisper-cli](examples/cli) | [whisper.wasm](examples/whisper.wasm) | Tool for translating and transcribing audio using Whisper |
| [whisper-bench](examples/bench) | [bench.wasm](examples/bench.wasm) | Benchmark the performance of Whisper on your machine |
| [whisper-stream](examples/stream) | [stream.wasm](examples/stream.wasm) | Real-time transcription of raw microphone capture |
| [whisper-command](examples/command) | [command.wasm](examples/command.wasm) | Basic voice assistant example for receiving voice commands from the mic |
| [whisper-server](examples/server) | | HTTP transcription server with OAI-like API |
| [whisper-talk-llama](examples/talk-llama) | | Talk with a LLaMA bot |
| [whisper.objc](examples/whisper.objc) | | iOS mobile application using whisper.cpp |
| [whisper.swiftui](examples/whisper.swiftui) | | SwiftUI iOS / macOS application using whisper.cpp |
| [whisper.android](examples/whisper.android) | | Android mobile application using whisper.cpp |
@ -834,7 +670,7 @@ Some of the examples are even ported to run in the browser using WebAssembly. Ch
| [generate-karaoke.sh](examples/generate-karaoke.sh) | | Helper script to easily [generate a karaoke video](https://youtu.be/uj7hVta4blM) of raw audio capture |
| [livestream.sh](examples/livestream.sh) | | [Livestream audio transcription](https://github.com/ggerganov/whisper.cpp/issues/185) |
| [yt-wsp.sh](examples/yt-wsp.sh) | | Download + transcribe and/or translate any VOD [(original)](https://gist.github.com/DaniruKun/96f763ec1a037cc92fe1a059b643b818) |
| [server](examples/server) | | HTTP transcription server with OAI-like API |
| [wchess](examples/wchess) | [wchess.wasm](examples/wchess) | Voice-controlled chess |
## [Discussions](https://github.com/ggerganov/whisper.cpp/discussions)

View File

@ -11,11 +11,16 @@ UNAME_M := $(shell uname -m)
endif
GGML_METAL_PATH_RESOURCES := $(abspath ../..)
BUILD_DIR := build
BUILD_DIR := build_go
MODELS_DIR := models
EXAMPLES_DIR := $(wildcard examples/*)
INCLUDE_PATH := $(abspath ../..)
LIBRARY_PATH := $(abspath ../..)
INCLUDE_PATH := $(abspath ../../include):$(abspath ../../ggml/include)
LIBRARY_PATH := $(abspath ../../${BUILD_DIR}/src:$(abspath ../../${BUILD_DIR}/ggml/src))
ifeq ($(GGML_CUDA),1)
LIBRARY_PATH := $(LIBRARY_PATH):$(CUDA_PATH)/targets/$(UNAME_M)-linux/lib/
BUILD_FLAGS := -ldflags "-extldflags '-lcudart -lcuda -lcublas'"
endif
ifeq ($(UNAME_S),Darwin)
EXT_LDFLAGS := -framework Foundation -framework Metal -framework MetalKit
@ -24,8 +29,10 @@ endif
all: clean whisper examples
whisper: mkdir
@echo Build whisper
@${MAKE} -C ../.. libwhisper.a
cmake -S ../.. -B ../../${BUILD_DIR} \
-DCMAKE_BUILD_TYPE=Release \
-DBUILD_SHARED_LIBS=OFF
cmake --build ../../${BUILD_DIR} --target whisper
test: model-small whisper modtidy
ifeq ($(UNAME_S),Darwin)

View File

@ -31,7 +31,7 @@ func main() {
if err != nil {
panic(err)
}
if err := context.Process(samples, nil, nil); err != nil {
if err := context.Process(samples, nil, nil, nil); err != nil {
return err
}
@ -62,6 +62,12 @@ This will compile a static `libwhisper.a` in a `build` folder, download a model
make examples
```
To build using cuda support add `GGML_CUDA=1`:
```bash
GGML_CUDA=1 make examples
```
The examples are placed in the `build` directory. Once built, you can download all the models with the following command:
```bash

View File

@ -9,22 +9,23 @@ import (
// ContextForSignal returns a context object which is cancelled when a signal
// is received. It returns nil if no signal parameter is provided
func ContextForSignal(signals ...os.Signal) context.Context {
if len(signals) == 0 {
return nil
}
if len(signals) == 0 {
return nil
}
ch := make(chan os.Signal)
ctx, cancel := context.WithCancel(context.Background())
ch := make(chan os.Signal, 1) // Buffered channel with space for 1 signal
ctx, cancel := context.WithCancel(context.Background())
// Send message on channel when signal received
signal.Notify(ch, signals...)
// Send message on channel when signal received
signal.Notify(ch, signals...)
// When any signal received, call cancel
go func() {
<-ch
cancel()
}()
// When any signal is received, call cancel
go func() {
<-ch
cancel()
}()
// Return success
return ctx
// Return success
return ctx
}

View File

@ -9,6 +9,7 @@ import (
"net/url"
"os"
"path/filepath"
"strings"
"syscall"
"time"
)
@ -17,14 +18,27 @@ import (
// CONSTANTS
const (
srcUrl = "https://huggingface.co/ggerganov/whisper.cpp/resolve/main" // The location of the models
srcExt = ".bin" // Filename extension
bufSize = 1024 * 64 // Size of the buffer used for downloading the model
srcUrl = "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/" // The location of the models
srcExt = ".bin" // Filename extension
bufSize = 1024 * 64 // Size of the buffer used for downloading the model
)
var (
// The models which will be downloaded, if no model is specified as an argument
modelNames = []string{"ggml-tiny.en", "ggml-tiny", "ggml-base.en", "ggml-base", "ggml-small.en", "ggml-small", "ggml-medium.en", "ggml-medium", "ggml-large-v1", "ggml-large-v2", "ggml-large-v3"}
modelNames = []string{
"tiny", "tiny-q5_1", "tiny-q8_0",
"tiny.en", "tiny.en-q5_1", "tiny.en-q8_0",
"base", "base-q5_1", "base-q8_0",
"base.en", "base.en-q5_1", "base.en-q8_0",
"small", "small-q5_1", "small-q8_0",
"small.en", "small.en-q5_1", "small.en-q8_0",
"medium", "medium-q5_0", "medium-q8_0",
"medium.en", "medium.en-q5_0", "medium.en-q8_0",
"large-v1",
"large-v2", "large-v2-q5_0", "large-v2-q8_0",
"large-v3", "large-v3-q5_0",
"large-v3-turbo", "large-v3-turbo-q5_0", "large-v3-turbo-q8_0",
}
)
var (
@ -44,7 +58,25 @@ var (
func main() {
flag.Usage = func() {
name := filepath.Base(flag.CommandLine.Name())
fmt.Fprintf(flag.CommandLine.Output(), "Usage: %s [options] <model>\n\n", name)
fmt.Fprintf(flag.CommandLine.Output(), `
Usage: %s [options] [<model>...]
Options:
-out string Specify the output folder where models will be saved.
Default: Current working directory.
-timeout duration Set the maximum duration for downloading a model.
Example: 10m, 1h (default: 30m0s).
-quiet Suppress all output except errors.
Examples:
1. Download a specific model:
%s -out ./models tiny-q8_0
2. Download all models:
%s -out ./models
`, name, name, name)
flag.PrintDefaults()
}
flag.Parse()
@ -114,23 +146,87 @@ func GetOut() (string, error) {
// GetModels returns the list of models to download
func GetModels() []string {
if flag.NArg() == 0 {
return modelNames
} else {
return flag.Args()
fmt.Println("No model specified.")
fmt.Println("Preparing to download all models...")
// Calculate total download size
fmt.Println("Calculating total download size...")
totalSize, err := CalculateTotalDownloadSize(modelNames)
if err != nil {
fmt.Println("Error calculating download sizes:", err)
os.Exit(1)
}
fmt.Println("View available models: https://huggingface.co/ggerganov/whisper.cpp/tree/main")
fmt.Printf("Total download size: %.2f GB\n", float64(totalSize)/(1024*1024*1024))
fmt.Println("Would you like to download all models? (y/N)")
// Prompt for user input
var response string
fmt.Scanln(&response)
if response != "y" && response != "Y" {
fmt.Println("Aborting. Specify a model to download.")
os.Exit(0)
}
return modelNames // Return all models if confirmed
}
return flag.Args() // Return specific models if arguments are provided
}
func CalculateTotalDownloadSize(models []string) (int64, error) {
var totalSize int64
client := http.Client{}
for _, model := range models {
modelURL, err := URLForModel(model)
if err != nil {
return 0, err
}
// Issue a HEAD request to get the file size
req, err := http.NewRequest("HEAD", modelURL, nil)
if err != nil {
return 0, err
}
resp, err := client.Do(req)
if err != nil {
return 0, err
}
resp.Body.Close()
if resp.StatusCode != http.StatusOK {
fmt.Printf("Warning: Unable to fetch size for %s (HTTP %d)\n", model, resp.StatusCode)
continue
}
size := resp.ContentLength
totalSize += size
}
return totalSize, nil
}
// URLForModel returns the URL for the given model on huggingface.co
func URLForModel(model string) (string, error) {
// Ensure "ggml-" prefix is added only once
if !strings.HasPrefix(model, "ggml-") {
model = "ggml-" + model
}
// Ensure ".bin" extension is added only once
if filepath.Ext(model) != srcExt {
model += srcExt
}
// Parse the base URL
url, err := url.Parse(srcUrl)
if err != nil {
return "", err
} else {
url.Path = filepath.Join(url.Path, model)
}
// Ensure no trailing slash in the base URL
url.Path = fmt.Sprintf("%s/%s", strings.TrimSuffix(url.Path, "/"), model)
return url.String(), nil
}

View File

@ -68,10 +68,6 @@ func (flags *Flags) GetOut() string {
return strings.ToLower(flags.Lookup("out").Value.String())
}
func (flags *Flags) IsSpeedup() bool {
return flags.Lookup("speedup").Value.String() == "true"
}
func (flags *Flags) IsTokens() bool {
return flags.Lookup("tokens").Value.String() == "true"
}
@ -111,10 +107,6 @@ func (flags *Flags) SetParams(context whisper.Context) error {
fmt.Fprintf(flags.Output(), "Setting duration to %v\n", duration)
context.SetDuration(duration)
}
if flags.IsSpeedup() {
fmt.Fprintf(flags.Output(), "Setting speedup to true\n")
context.SetSpeedup(true)
}
if threads := flags.GetThreads(); threads != 0 {
fmt.Fprintf(flags.Output(), "Setting threads to %d\n", threads)
context.SetThreads(threads)
@ -146,7 +138,6 @@ func registerFlags(flag *Flags) {
flag.Duration("offset", 0, "Time offset")
flag.Duration("duration", 0, "Duration of audio to process")
flag.Uint("threads", 0, "Number of threads to use")
flag.Bool("speedup", false, "Enable speedup")
flag.Uint("max-len", 0, "Maximum segment length in characters")
flag.Uint("max-tokens", 0, "Maximum tokens per segment")
flag.Float64("word-thold", 0, "Maximum segment score")

View File

@ -67,7 +67,7 @@ func Process(model whisper.Model, path string, flags *Flags) error {
// Process the data
fmt.Fprintf(flags.Output(), " ...processing %q\n", path)
context.ResetTimings()
if err := context.Process(data, cb, nil); err != nil {
if err := context.Process(data, nil, cb, nil); err != nil {
return err
}

View File

@ -1,10 +1,10 @@
module github.com/ggerganov/whisper.cpp/bindings/go
go 1.19
go 1.23
require (
github.com/go-audio/wav v1.1.0
github.com/stretchr/testify v1.8.1
github.com/stretchr/testify v1.9.0
)
require (

View File

@ -1,4 +1,3 @@
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/go-audio/audio v1.0.0 h1:zS9vebldgbQqktK4H0lUqWrG8P0NxCJVqcj7ZpNnwd4=
@ -9,15 +8,9 @@ github.com/go-audio/wav v1.1.0 h1:jQgLtbqBzY7G+BM8fXF7AHUk1uHUviWS4X39d5rsL2g=
github.com/go-audio/wav v1.1.0/go.mod h1:mpe9qfwbScEbkd8uybLuIpTgHyrISw/OTuvjUW2iGtE=
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
github.com/stretchr/objx v0.4.0/go.mod h1:YvHI0jy2hoMjB+UWwv71VJQ9isScKT/TqJzVSSt89Yw=
github.com/stretchr/objx v0.5.0/go.mod h1:Yh+to48EsGEfYuaHDzXPcE3xhTkx73EhmCGUpEOglKo=
github.com/stretchr/testify v1.7.1/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
github.com/stretchr/testify v1.8.0/go.mod h1:yNjHg4UonilssWZ8iaSj1OCr/vHnekPRkoO+kdMU+MU=
github.com/stretchr/testify v1.8.1 h1:w7B6lhMri9wdJUVmEZPGGhZzrYTPvgJArz7wNPgYKsk=
github.com/stretchr/testify v1.8.1/go.mod h1:w2LPCIKwWwSfY2zedu0+kehJoqGctiVI29o6fzry7u4=
github.com/stretchr/testify v1.9.0 h1:HtqpIVDClZ4nwg75+f6Lvsy/wHu+3BoSGCbBAcpTsTg=
github.com/stretchr/testify v1.9.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=

View File

@ -47,10 +47,6 @@ func (p *Params) SetPrintTimestamps(v bool) {
p.print_timestamps = toBool(v)
}
func (p *Params) SetSpeedup(v bool) {
p.speed_up = toBool(v)
}
// Set language id
func (p *Params) SetLanguage(lang int) error {
if lang == -1 {
@ -123,6 +119,28 @@ func (p *Params) SetAudioCtx(n int) {
p.audio_ctx = C.int(n)
}
func (p *Params) SetMaxContext(n int) {
p.n_max_text_ctx = C.int(n)
}
func (p *Params) SetBeamSize(n int) {
p.beam_search.beam_size = C.int(n)
}
func (p *Params) SetEntropyThold(t float32) {
p.entropy_thold = C.float(t)
}
func (p *Params) SetTemperature(t float32) {
p.temperature = C.float(t)
}
// Sets the fallback temperature incrementation
// Pass -1.0 to disable this feature
func (p *Params) SetTemperatureFallback(t float32) {
p.temperature_inc = C.float(t)
}
// Set initial prompt
func (p *Params) SetInitialPrompt(prompt string) {
p.initial_prompt = C.CString(prompt)
@ -153,6 +171,10 @@ func (p *Params) String() string {
str += fmt.Sprintf(" duration_ms=%d", p.duration_ms)
str += fmt.Sprintf(" audio_ctx=%d", p.audio_ctx)
str += fmt.Sprintf(" initial_prompt=%s", C.GoString(p.initial_prompt))
str += fmt.Sprintf(" entropy_thold=%f", p.entropy_thold)
str += fmt.Sprintf(" temperature=%f", p.temperature)
str += fmt.Sprintf(" temperature_inc=%f", p.temperature_inc)
str += fmt.Sprintf(" beam_size=%d", p.beam_search.beam_size)
if p.translate {
str += " translate"
}
@ -177,9 +199,6 @@ func (p *Params) String() string {
if p.token_timestamps {
str += " token_timestamps"
}
if p.speed_up {
str += " speed_up"
}
return str + ">"
}

View File

@ -71,16 +71,15 @@ func (context *context) Language() string {
return whisper.Whisper_lang_str(context.params.Language())
}
func (context *context) DetectedLanguage() string {
return whisper.Whisper_lang_str(context.model.ctx.Whisper_full_lang_id())
}
// Set translate flag
func (context *context) SetTranslate(v bool) {
context.params.SetTranslate(v)
}
// Set speedup flag
func (context *context) SetSpeedup(v bool) {
context.params.SetSpeedup(v)
}
func (context *context) SetSplitOnWord(v bool) {
context.params.SetSplitOnWord(v)
}
@ -130,6 +129,32 @@ func (context *context) SetAudioCtx(n uint) {
context.params.SetAudioCtx(int(n))
}
// Set maximum number of text context tokens to store
func (context *context) SetMaxContext(n int) {
context.params.SetMaxContext(n)
}
// Set Beam Size
func (context *context) SetBeamSize(n int) {
context.params.SetBeamSize(n)
}
// Set Entropy threshold
func (context *context) SetEntropyThold(t float32) {
context.params.SetEntropyThold(t)
}
// Set Temperature
func (context *context) SetTemperature(t float32) {
context.params.SetTemperature(t)
}
// Set the fallback temperature incrementation
// Pass -1.0 to disable this feature
func (context *context) SetTemperatureFallback(t float32) {
context.params.SetTemperatureFallback(t)
}
// Set initial prompt
func (context *context) SetInitialPrompt(prompt string) {
context.params.SetInitialPrompt(prompt)
@ -168,6 +193,7 @@ func (context *context) WhisperLangAutoDetect(offset_ms int, n_threads int) ([]f
// Process new sample data and return any errors
func (context *context) Process(
data []float32,
callEncoderBegin EncoderBeginCallback,
callNewSegment SegmentCallback,
callProgress ProgressCallback,
) error {
@ -182,7 +208,20 @@ func (context *context) Process(
// We don't do parallel processing at the moment
processors := 0
if processors > 1 {
if err := context.model.ctx.Whisper_full_parallel(context.params, data, processors, nil, func(new int) {
if err := context.model.ctx.Whisper_full_parallel(context.params, data, processors, callEncoderBegin,
func(new int) {
if callNewSegment != nil {
num_segments := context.model.ctx.Whisper_full_n_segments()
s0 := num_segments - new
for i := s0; i < num_segments; i++ {
callNewSegment(toSegment(context.model.ctx, i))
}
}
}); err != nil {
return err
}
} else if err := context.model.ctx.Whisper_full(context.params, data, callEncoderBegin,
func(new int) {
if callNewSegment != nil {
num_segments := context.model.ctx.Whisper_full_n_segments()
s0 := num_segments - new
@ -190,22 +229,11 @@ func (context *context) Process(
callNewSegment(toSegment(context.model.ctx, i))
}
}
}); err != nil {
return err
}
} else if err := context.model.ctx.Whisper_full(context.params, data, nil, func(new int) {
if callNewSegment != nil {
num_segments := context.model.ctx.Whisper_full_n_segments()
s0 := num_segments - new
for i := s0; i < num_segments; i++ {
callNewSegment(toSegment(context.model.ctx, i))
}, func(progress int) {
if callProgress != nil {
callProgress(progress)
}
}
}, func(progress int) {
if callProgress != nil {
callProgress(progress)
}
}); err != nil {
}); err != nil {
return err
}

View File

@ -4,52 +4,121 @@ import (
"os"
"testing"
// Packages
whisper "github.com/ggerganov/whisper.cpp/bindings/go/pkg/whisper"
"github.com/ggerganov/whisper.cpp/bindings/go/pkg/whisper"
"github.com/go-audio/wav"
assert "github.com/stretchr/testify/assert"
)
const (
ModelPath = "../../models/ggml-tiny.bin"
SamplePath = "../../samples/jfk.wav"
)
func Test_Whisper_000(t *testing.T) {
func TestSetLanguage(t *testing.T) {
assert := assert.New(t)
if _, err := os.Stat(ModelPath); os.IsNotExist(err) {
t.Skip("Skipping test, model not found:", ModelPath)
}
if _, err := os.Stat(SamplePath); os.IsNotExist(err) {
t.Skip("Skipping test, sample not found:", SamplePath)
}
// Load model
model, err := whisper.New(ModelPath)
assert.NoError(err)
assert.NotNil(model)
assert.NoError(model.Close())
t.Log("languages=", model.Languages())
}
func Test_Whisper_001(t *testing.T) {
assert := assert.New(t)
if _, err := os.Stat(ModelPath); os.IsNotExist(err) {
t.Skip("Skipping test, model not found:", ModelPath)
}
if _, err := os.Stat(SamplePath); os.IsNotExist(err) {
t.Skip("Skipping test, sample not found:", SamplePath)
}
// Load model
model, err := whisper.New(ModelPath)
assert.NoError(err)
assert.NotNil(model)
defer model.Close()
// Get context for decoding
ctx, err := model.NewContext()
context, err := model.NewContext()
assert.NoError(err)
assert.NotNil(ctx)
// This returns an error since
// the model 'models/ggml-small.en.bin'
// that is loaded is not multilingual
err = context.SetLanguage("en")
assert.Error(err)
}
func TestContextModelIsMultilingual(t *testing.T) {
assert := assert.New(t)
model, err := whisper.New(ModelPath)
assert.NoError(err)
assert.NotNil(model)
defer model.Close()
context, err := model.NewContext()
assert.NoError(err)
isMultilingual := context.IsMultilingual()
// This returns false since
// the model 'models/ggml-small.en.bin'
// that is loaded is not multilingual
assert.False(isMultilingual)
}
func TestLanguage(t *testing.T) {
assert := assert.New(t)
model, err := whisper.New(ModelPath)
assert.NoError(err)
assert.NotNil(model)
defer model.Close()
context, err := model.NewContext()
assert.NoError(err)
// This always returns en since
// the model 'models/ggml-small.en.bin'
// that is loaded is not multilingual
expectedLanguage := "en"
actualLanguage := context.Language()
assert.Equal(expectedLanguage, actualLanguage)
}
func TestProcess(t *testing.T) {
assert := assert.New(t)
fh, err := os.Open(SamplePath)
assert.NoError(err)
defer fh.Close()
// Decode the WAV file - load the full buffer
dec := wav.NewDecoder(fh)
buf, err := dec.FullPCMBuffer()
assert.NoError(err)
assert.Equal(uint16(1), dec.NumChans)
data := buf.AsFloat32Buffer().Data
model, err := whisper.New(ModelPath)
assert.NoError(err)
assert.NotNil(model)
defer model.Close()
context, err := model.NewContext()
assert.NoError(err)
err = context.Process(data, nil, nil, nil)
assert.NoError(err)
}
func TestDetectedLanguage(t *testing.T) {
assert := assert.New(t)
fh, err := os.Open(SamplePath)
assert.NoError(err)
defer fh.Close()
// Decode the WAV file - load the full buffer
dec := wav.NewDecoder(fh)
buf, err := dec.FullPCMBuffer()
assert.NoError(err)
assert.Equal(uint16(1), dec.NumChans)
data := buf.AsFloat32Buffer().Data
model, err := whisper.New(ModelPath)
assert.NoError(err)
assert.NotNil(model)
defer model.Close()
context, err := model.NewContext()
assert.NoError(err)
err = context.Process(data, nil, nil, nil)
assert.NoError(err)
expectedLanguage := "en"
actualLanguage := context.DetectedLanguage()
assert.Equal(expectedLanguage, actualLanguage)
}

View File

@ -16,6 +16,10 @@ type SegmentCallback func(Segment)
// processing. It is called during the Process function
type ProgressCallback func(int)
// EncoderBeginCallback is the callback function for checking if we want to
// continue processing. It is called during the Process function
type EncoderBeginCallback func() bool
// Model is the interface to a whisper model. Create a new model with the
// function whisper.New(string)
type Model interface {
@ -31,30 +35,35 @@ type Model interface {
Languages() []string
}
// Context is the speach recognition context.
// Context is the speech recognition context.
type Context interface {
SetLanguage(string) error // Set the language to use for speech recognition, use "auto" for auto detect language.
SetTranslate(bool) // Set translate flag
IsMultilingual() bool // Return true if the model is multilingual.
Language() string // Get language
DetectedLanguage() string // Get detected language
SetOffset(time.Duration) // Set offset
SetDuration(time.Duration) // Set duration
SetThreads(uint) // Set number of threads to use
SetSpeedup(bool) // Set speedup flag
SetSplitOnWord(bool) // Set split on word flag
SetTokenThreshold(float32) // Set timestamp token probability threshold
SetTokenSumThreshold(float32) // Set timestamp token sum probability threshold
SetMaxSegmentLength(uint) // Set max segment length in characters
SetTokenTimestamps(bool) // Set token timestamps flag
SetMaxTokensPerSegment(uint) // Set max tokens per segment (0 = no limit)
SetAudioCtx(uint) // Set audio encoder context
SetInitialPrompt(prompt string) // Set initial prompt
SetOffset(time.Duration) // Set offset
SetDuration(time.Duration) // Set duration
SetThreads(uint) // Set number of threads to use
SetSplitOnWord(bool) // Set split on word flag
SetTokenThreshold(float32) // Set timestamp token probability threshold
SetTokenSumThreshold(float32) // Set timestamp token sum probability threshold
SetMaxSegmentLength(uint) // Set max segment length in characters
SetTokenTimestamps(bool) // Set token timestamps flag
SetMaxTokensPerSegment(uint) // Set max tokens per segment (0 = no limit)
SetAudioCtx(uint) // Set audio encoder context
SetMaxContext(n int) // Set maximum number of text context tokens to store
SetBeamSize(n int) // Set Beam Size
SetEntropyThold(t float32) // Set Entropy threshold
SetInitialPrompt(prompt string) // Set initial prompt
SetTemperature(t float32) // Set temperature
SetTemperatureFallback(t float32) // Set temperature incrementation
// Process mono audio data and return any errors.
// If defined, newly generated segments are passed to the
// callback function during processing.
Process([]float32, SegmentCallback, ProgressCallback) error
Process([]float32, EncoderBeginCallback, SegmentCallback, ProgressCallback) error
// After process is called, return segments until the end of the stream
// is reached, when io.EOF is returned.

View File

@ -0,0 +1,91 @@
package whisper_test
import (
"testing"
"github.com/ggerganov/whisper.cpp/bindings/go/pkg/whisper"
assert "github.com/stretchr/testify/assert"
)
func TestNew(t *testing.T) {
assert := assert.New(t)
t.Run("valid model path", func(t *testing.T) {
model, err := whisper.New(ModelPath)
assert.NoError(err)
assert.NotNil(model)
defer model.Close()
})
t.Run("invalid model path", func(t *testing.T) {
invalidModelPath := "invalid-model-path.bin"
model, err := whisper.New(invalidModelPath)
assert.Error(err)
assert.Nil(model)
})
}
func TestClose(t *testing.T) {
assert := assert.New(t)
model, err := whisper.New(ModelPath)
assert.NoError(err)
assert.NotNil(model)
err = model.Close()
assert.NoError(err)
}
func TestNewContext(t *testing.T) {
assert := assert.New(t)
model, err := whisper.New(ModelPath)
assert.NoError(err)
assert.NotNil(model)
defer model.Close()
context, err := model.NewContext()
assert.NoError(err)
assert.NotNil(context)
}
func TestIsMultilingual(t *testing.T) {
assert := assert.New(t)
model, err := whisper.New(ModelPath)
assert.NoError(err)
assert.NotNil(model)
defer model.Close()
isMultilingual := model.IsMultilingual()
// This returns false since
// the model 'models/ggml-small.en.bin'
// that is loaded is not multilingual
assert.False(isMultilingual)
}
func TestLanguages(t *testing.T) {
assert := assert.New(t)
model, err := whisper.New(ModelPath)
assert.NoError(err)
assert.NotNil(model)
defer model.Close()
expectedLanguages := []string{
"en", "zh", "de", "es", "ru", "ko", "fr", "ja", "pt", "tr", "pl",
"ca", "nl", "ar", "sv", "it", "id", "hi", "fi", "vi", "he", "uk",
"el", "ms", "cs", "ro", "da", "hu", "ta", "no", "th", "ur", "hr",
"bg", "lt", "la", "mi", "ml", "cy", "sk", "te", "fa", "lv", "bn",
"sr", "az", "sl", "kn", "et", "mk", "br", "eu", "is", "hy", "ne",
"mn", "bs", "kk", "sq", "sw", "gl", "mr", "pa", "si", "km", "sn",
"yo", "so", "af", "oc", "ka", "be", "tg", "sd", "gu", "am", "yi",
"lo", "uz", "fo", "ht", "ps", "tk", "nn", "mt", "sa", "lb", "my",
"bo", "tl", "mg", "as", "tt", "haw", "ln", "ha", "ba", "jw", "su",
}
actualLanguages := model.Languages()
assert.Equal(expectedLanguages, actualLanguages)
}

View File

@ -0,0 +1,6 @@
package whisper_test
const (
ModelPath = "../../models/ggml-small.en.bin"
SamplePath = "../../samples/jfk.wav"
)

View File

@ -9,7 +9,7 @@ import (
// CGO
/*
#cgo LDFLAGS: -lwhisper -lm -lstdc++
#cgo LDFLAGS: -lwhisper -lggml -lggml-base -lggml-cpu -lm -lstdc++ -fopenmp
#cgo darwin LDFLAGS: -framework Accelerate -framework Metal -framework Foundation -framework CoreGraphics
#include <whisper.h>
#include <stdlib.h>

Submodule bindings/ios deleted from a2085436c2

View File

@ -67,5 +67,5 @@ copy /y ..\..\build\bin\Release\whisper.dll build\generated\resources\main\win32
## License
The license for the Go bindings is the same as the license for the rest of the whisper.cpp project, which is the MIT License. See the `LICENSE` file for more details.
The license for the Java bindings is the same as the license for the rest of the whisper.cpp project, which is the MIT License. See the `LICENSE` file for more details.

View File

@ -25,13 +25,13 @@ sourceSets {
}
tasks.register('copyLibwhisperDynlib', Copy) {
from '../../build'
include 'libwhisper.dynlib'
from '../../build/src'
include 'libwhisper.dylib'
into 'build/generated/resources/main/darwin'
}
tasks.register('copyLibwhisperSo', Copy) {
from '../../build'
from '../../build/src'
include 'libwhisper.so'
into 'build/generated/resources/main/linux-x86-64'
}
@ -55,7 +55,12 @@ java {
withJavadocJar()
}
sourcesJar() {
dependsOn copyLibs
}
jar {
dependsOn copyLibs
exclude '**/whisper_java.exp', '**/whisper_java.lib'
}
@ -67,6 +72,9 @@ tasks.withType(Test) {
useJUnitPlatform()
}
test.dependsOn copyLibs
processResources.dependsOn copyLibs
dependencies {
implementation "net.java.dev.jna:jna:5.13.0"
testImplementation "org.junit.jupiter:junit-jupiter:5.9.2"

0
bindings/java/gradlew vendored Normal file → Executable file
View File

View File

@ -0,0 +1,24 @@
package io.github.ggerganov.whispercpp;
/**
* Presets for alignment heads in DTW token timestamps
*/
public class WhisperConstants {
// Alignment heads presets
public static final int WHISPER_AHEADS_NONE = 0;
public static final int WHISPER_AHEADS_TINY_EN = 1;
public static final int WHISPER_AHEADS_TINY = 2;
public static final int WHISPER_AHEADS_BASE_EN = 3;
public static final int WHISPER_AHEADS_BASE = 4;
public static final int WHISPER_AHEADS_SMALL_EN = 5;
public static final int WHISPER_AHEADS_SMALL = 6;
public static final int WHISPER_AHEADS_MEDIUM_EN = 7;
public static final int WHISPER_AHEADS_MEDIUM = 8;
public static final int WHISPER_AHEADS_LARGE_V1 = 9;
public static final int WHISPER_AHEADS_LARGE_V2 = 10;
public static final int WHISPER_AHEADS_LARGE_V3 = 11;
public static final int WHISPER_AHEADS_LARGE_V3_TURBO = 12;
public static final int WHISPER_AHEADS_CUSTOM = 13;
public static final int WHISPER_AHEADS_N_TOP_MOST = 14;
public static final int WHISPER_AHEADS_COUNT = 15;
}

View File

@ -1,7 +1,9 @@
package io.github.ggerganov.whispercpp;
import com.sun.jna.NativeLong;
import com.sun.jna.Structure;
import com.sun.jna.ptr.PointerByReference;
import com.sun.jna.Pointer;
import io.github.ggerganov.whispercpp.ggml.GgmlType;
import io.github.ggerganov.whispercpp.WhisperModel;
import io.github.ggerganov.whispercpp.params.WhisperContextParams;
@ -9,33 +11,26 @@ import io.github.ggerganov.whispercpp.params.WhisperContextParams;
import java.util.List;
public class WhisperContext extends Structure {
int t_load_us = 0;
int t_start_us = 0;
public NativeLong t_load_us;
public NativeLong t_start_us;
/** weight type (FP32 / FP16 / QX) */
GgmlType wtype = GgmlType.GGML_TYPE_F16;
public GgmlType wtype = GgmlType.GGML_TYPE_F16;
/** intermediate type (FP32 or FP16) */
GgmlType itype = GgmlType.GGML_TYPE_F16;
public GgmlType itype = GgmlType.GGML_TYPE_F16;
// WhisperModel model;
public PointerByReference model;
// whisper_vocab vocab;
// whisper_state * state = nullptr;
public PointerByReference vocab;
public PointerByReference state;
public WhisperContextParams.ByValue params;
public Pointer model;
public Pointer vocab;
public Pointer state;
/** populated by whisper_init_from_file_with_params() */
String path_model;
WhisperContextParams params;
public Pointer path_model;
// public static class ByReference extends WhisperContext implements Structure.ByReference {
// }
//
// public static class ByValue extends WhisperContext implements Structure.ByValue {
// }
//
// @Override
// protected List<String> getFieldOrder() {
// return List.of("t_load_us", "t_start_us", "wtype", "itype", "model", "vocab", "state", "path_model");
// }
@Override
protected List<String> getFieldOrder() {
return List.of("t_load_us", "t_start_us", "wtype", "itype",
"params", "model", "vocab", "state", "path_model");
}
}

View File

@ -43,11 +43,11 @@ public class WhisperCpp implements AutoCloseable {
* @param modelPath - absolute path, or just the name (eg: "base", "base-en" or "base.en")
* @param params - params to use when initialising the context
*/
public void initContext(String modelPath, WhisperContextParams params) throws FileNotFoundException {
public void initContext(String modelPath, WhisperContextParams.ByValue params) throws FileNotFoundException {
initContextImpl(modelPath, params);
}
private void initContextImpl(String modelPath, WhisperContextParams params) throws FileNotFoundException {
private void initContextImpl(String modelPath, WhisperContextParams.ByValue params) throws FileNotFoundException {
if (ctx != null) {
lib.whisper_free(ctx);
}
@ -69,15 +69,13 @@ public class WhisperCpp implements AutoCloseable {
/**
* Provides default params which can be used with `whisper_init_from_file_with_params()` etc.
* Because this function allocates memory for the params, the caller must call either:
* - call `whisper_free_context_params()`
* - `Native.free(Pointer.nativeValue(pointer));`
* Returns a ByValue instance to ensure proper parameter passing to native code.
*/
public WhisperContextParams getContextDefaultParams() {
paramsPointer = lib.whisper_context_default_params_by_ref();
WhisperContextParams params = new WhisperContextParams(paramsPointer);
params.read();
return params;
public WhisperContextParams.ByValue getContextDefaultParams() {
WhisperContextParams.ByValue valueParams = new WhisperContextParams.ByValue(
lib.whisper_context_default_params_by_ref());
valueParams.read();
return valueParams;
}
/**
@ -88,7 +86,7 @@ public class WhisperCpp implements AutoCloseable {
*
* @param strategy - GREEDY
*/
public WhisperFullParams getFullDefaultParams(WhisperSamplingStrategy strategy) {
public WhisperFullParams.ByValue getFullDefaultParams(WhisperSamplingStrategy strategy) {
Pointer pointer;
// whisper_full_default_params_by_ref allocates memory which we need to delete, so only create max 1 pointer for each strategy.
@ -104,7 +102,7 @@ public class WhisperCpp implements AutoCloseable {
pointer = beamParamsPointer;
}
WhisperFullParams params = new WhisperFullParams(pointer);
WhisperFullParams.ByValue params = new WhisperFullParams.ByValue(pointer);
params.read();
return params;
}
@ -138,15 +136,21 @@ public class WhisperCpp implements AutoCloseable {
}
/**
* Run the entire model: PCM -> log mel spectrogram -> encoder -> decoder -> text.
* Run the entire model: PCM -&gt; log mel spectrogram -&gt; encoder -&gt; decoder -&gt; text.
* Not thread safe for same context
* Uses the specified decoding strategy to obtain the text.
*/
public String fullTranscribe(WhisperFullParams whisperParams, float[] audioData) throws IOException {
public String fullTranscribe(WhisperFullParams.ByValue whisperParams, float[] audioData) throws IOException {
if (ctx == null) {
throw new IllegalStateException("Model not initialised");
}
/*
WhisperFullParams.ByValue valueParams = new WhisperFullParams.ByValue(
lib.whisper_full_default_params_by_ref(WhisperSamplingStrategy.WHISPER_SAMPLING_BEAM_SEARCH.ordinal()));
valueParams.read();
*/
if (lib.whisper_full(ctx, whisperParams, audioData, audioData.length) != 0) {
throw new IOException("Failed to process audio");
}
@ -163,12 +167,17 @@ public class WhisperCpp implements AutoCloseable {
return str.toString().trim();
}
public List<WhisperSegment> fullTranscribeWithTime(WhisperFullParams whisperParams, float[] audioData) throws IOException {
if (ctx == null) {
throw new IllegalStateException("Model not initialised");
}
if (lib.whisper_full(ctx, whisperParams, audioData, audioData.length) != 0) {
WhisperFullParams.ByValue valueParams = new WhisperFullParams.ByValue(
lib.whisper_full_default_params_by_ref(WhisperSamplingStrategy.WHISPER_SAMPLING_BEAM_SEARCH.ordinal()));
valueParams.read();
if (lib.whisper_full(ctx, valueParams, audioData, audioData.length) != 0) {
throw new IOException("Failed to process audio");
}

View File

@ -20,7 +20,7 @@ public interface WhisperCppJnaLibrary extends Library {
* @return Whisper context on success, null on failure
*/
Pointer whisper_init_from_file(String path_model);
/**
* Provides default params which can be used with `whisper_init_from_file_with_params()` etc.
* Because this function allocates memory for the params, the caller must call either:
@ -38,7 +38,7 @@ public interface WhisperCppJnaLibrary extends Library {
* @param params Pointer to whisper_context_params
* @return Whisper context on success, null on failure
*/
Pointer whisper_init_from_file_with_params(String path_model, WhisperContextParams params);
Pointer whisper_init_from_file_with_params(String path_model, WhisperContextParams.ByValue params);
/**
* Allocate (almost) all memory needed for the model by loading from a buffer.
@ -180,12 +180,12 @@ public interface WhisperCppJnaLibrary extends Library {
/**
* @return the id of the specified language, returns -1 if not found.
* Examples:
* "de" -> 2
* "german" -> 2
* "de" -&gt; 2
* "german" -&gt; 2
*/
int whisper_lang_id(String lang);
/** @return the short string of the specified language id (e.g. 2 -> "de"), returns nullptr if not found */
/** @return the short string of the specified language id (e.g. 2 -&gt; "de"), returns nullptr if not found */
String whisper_lang_str(int id);
/**
@ -268,20 +268,21 @@ public interface WhisperCppJnaLibrary extends Library {
void whisper_free_params(Pointer params);
/**
* Run the entire model: PCM -> log mel spectrogram -> encoder -> decoder -> text
* Run the entire model: PCM -&gt; log mel spectrogram -&gt; encoder -&gt; decoder -&gt; text
* Not thread safe for same context
* Uses the specified decoding strategy to obtain the text.
*/
int whisper_full(Pointer ctx, WhisperFullParams params, final float[] samples, int n_samples);
int whisper_full(Pointer ctx, WhisperFullParams.ByValue params, final float[] samples, int n_samples);
int whisper_full_with_state(Pointer ctx, Pointer state, WhisperFullParams params, final float[] samples, int n_samples);
public int whisper_full_with_state(Pointer ctx, Pointer state, WhisperFullParams.ByValue params, float[] samples, int n_samples);
//int whisper_full_with_state(Pointer ctx, Pointer state, WhisperFullParams params, final float[] samples, int n_samples);
// Split the input audio in chunks and process each chunk separately using whisper_full_with_state()
// Result is stored in the default state of the context
// Not thread safe if executed in parallel on the same context.
// It seems this approach can offer some speedup in some cases.
// However, the transcription accuracy can be worse at the beginning and end of each chunk.
int whisper_full_parallel(Pointer ctx, WhisperFullParams params, final float[] samples, int n_samples, int n_processors);
int whisper_full_parallel(Pointer ctx, WhisperFullParams.ByValue params, final float[] samples, int n_samples, int n_processors);
/**
* Number of generated text segments.
@ -304,14 +305,6 @@ public interface WhisperCppJnaLibrary extends Library {
/** Language id associated with the provided state */
int whisper_full_lang_id_from_state(Pointer state);
/**
* Convert RAW PCM audio to log mel spectrogram but applies a Phase Vocoder to speed up the audio x2.
* The resulting spectrogram is stored inside the default state of the provided whisper context.
* @return 0 on success
*/
int whisper_pcm_to_mel_phase_vocoder(Pointer ctx, final float[] samples, int n_samples, int n_threads);
int whisper_pcm_to_mel_phase_vocoder_with_state(Pointer ctx, Pointer state, final float[] samples, int n_samples, int n_threads);
/** Get the start time of the specified segment. */
long whisper_full_get_segment_t0(Pointer ctx, int i_segment);

View File

@ -0,0 +1,17 @@
package io.github.ggerganov.whispercpp.callbacks;
import com.sun.jna.Callback;
/**
* Callback for aborting GGML computation
* Maps to the C typedef: bool (*ggml_abort_callback)(void * data)
*/
public interface GgmlAbortCallback extends Callback {
/**
* Return true to abort the computation, false to continue
*
* @param data User data passed to the callback
* @return true to abort, false to continue
*/
boolean invoke(com.sun.jna.Pointer data);
}

View File

@ -0,0 +1,30 @@
package io.github.ggerganov.whispercpp.params;
import com.sun.jna.*;
import java.util.Arrays;
import java.util.List;
public class WhisperAhead extends Structure {
public int n_text_layer;
public int n_head;
public WhisperAhead() {
super();
}
public WhisperAhead(int textLayer, int head) {
super();
this.n_text_layer = textLayer;
this.n_head = head;
}
@Override
protected List<String> getFieldOrder() {
return Arrays.asList("n_text_layer", "n_head");
}
public static class ByReference extends WhisperAhead implements Structure.ByReference {}
public static class ByValue extends WhisperAhead implements Structure.ByValue {}
}

View File

@ -0,0 +1,41 @@
package io.github.ggerganov.whispercpp.params;
import com.sun.jna.*;
import java.util.Arrays;
import java.util.List;
public class WhisperAheads extends Structure {
public NativeLong n_heads;
public Pointer heads;
public WhisperAheads() {
super();
}
/**
* Create alignment heads from an array of WhisperAhead objects
*/
public void setHeads(WhisperAhead[] aheadsArray) {
this.n_heads = new NativeLong(aheadsArray.length);
int structSize = aheadsArray[0].size();
Memory mem = new Memory(structSize * aheadsArray.length);
for (int i = 0; i < aheadsArray.length; i++) {
aheadsArray[i].write();
byte[] buffer = aheadsArray[i].getPointer().getByteArray(0, structSize);
mem.write(i * structSize, buffer, 0, buffer.length);
}
this.heads = mem;
}
@Override
protected List<String> getFieldOrder() {
return Arrays.asList("n_heads", "heads");
}
public static class ByReference extends WhisperAheads implements Structure.ByReference {}
public static class ByValue extends WhisperAheads implements Structure.ByValue {}
}

View File

@ -1,7 +1,5 @@
package io.github.ggerganov.whispercpp.params;
import com.sun.jna.*;
import java.util.Arrays;
import java.util.List;
@ -11,21 +9,73 @@ import java.util.List;
* whisper_context_default_params()
*/
public class WhisperContextParams extends Structure {
public WhisperContextParams(Pointer p) {
super(p);
}
/** Use GPU for inference Number (default = true) */
public WhisperContextParams() {
super();
}
/** Use GPU for inference (default = true) */
public CBool use_gpu;
/** Use GPU for inference Number (default = true) */
/** Use flash attention (default = false) */
public CBool flash_attn;
/** CUDA device to use (default = 0) */
public int gpu_device;
/** [EXPERIMENTAL] Enable token-level timestamps with DTW (default = false) */
public CBool dtw_token_timestamps;
/** [EXPERIMENTAL] Alignment heads preset for DTW */
public int dtw_aheads_preset;
/** Number of top layers to use for DTW when using WHISPER_AHEADS_N_TOP_MOST preset */
public int dtw_n_top;
public WhisperAheads.ByValue dtw_aheads;
/** DTW memory size (internal use) */
public NativeLong dtw_mem_size;
/** Use GPU for inference */
public void useGpu(boolean enable) {
use_gpu = enable ? CBool.TRUE : CBool.FALSE;
}
/** Use flash attention */
public void useFlashAttn(boolean enable) {
flash_attn = enable ? CBool.TRUE : CBool.FALSE;
}
/** Enable DTW token-level timestamps */
public void enableDtwTokenTimestamps(boolean enable) {
dtw_token_timestamps = enable ? CBool.TRUE : CBool.FALSE;
}
/** Set DTW alignment heads preset */
public void setDtwAheadsPreset(int preset) {
dtw_aheads_preset = preset;
}
@Override
protected List<String> getFieldOrder() {
return Arrays.asList("use_gpu");
return Arrays.asList(
"use_gpu",
"flash_attn",
"gpu_device",
"dtw_token_timestamps",
"dtw_aheads_preset",
"dtw_n_top",
"dtw_aheads",
"dtw_mem_size"
);
}
public static class ByValue extends WhisperContextParams implements Structure.ByValue {
public ByValue() { super(); }
public ByValue(Pointer p) { super(p); }
}
}

View File

@ -5,6 +5,7 @@ import io.github.ggerganov.whispercpp.callbacks.WhisperEncoderBeginCallback;
import io.github.ggerganov.whispercpp.callbacks.WhisperLogitsFilterCallback;
import io.github.ggerganov.whispercpp.callbacks.WhisperNewSegmentCallback;
import io.github.ggerganov.whispercpp.callbacks.WhisperProgressCallback;
import io.github.ggerganov.whispercpp.callbacks.GgmlAbortCallback;
import java.util.Arrays;
import java.util.List;
@ -16,10 +17,12 @@ import java.util.List;
*/
public class WhisperFullParams extends Structure {
public WhisperFullParams() {
super();
}
public WhisperFullParams(Pointer p) {
super(p);
// super(p, ALIGN_MSVC);
// super(p, ALIGN_GNUC);
}
/** Sampling strategy for whisper_full() function. */
@ -69,10 +72,10 @@ public class WhisperFullParams extends Structure {
single_segment = single ? CBool.TRUE : CBool.FALSE;
}
/** Flag to print special tokens (e.g., &lt;SOT>, &lt;EOT>, &lt;BEG>, etc.). (default = false) */
/** Flag to print special tokens (e.g., &lt;SOT&gt;, &lt;EOT&gt;, &lt;BEG&gt;, etc.). (default = false) */
public CBool print_special;
/** Flag to print special tokens (e.g., &lt;SOT>, &lt;EOT>, &lt;BEG>, etc.). (default = false) */
/** Flag to print special tokens (e.g., &lt;SOT&gt;, &lt;EOT&gt;, &lt;BEG&gt;, etc.). (default = false) */
public void printSpecial(boolean enable) {
print_special = enable ? CBool.TRUE : CBool.FALSE;
}
@ -129,12 +132,12 @@ public class WhisperFullParams extends Structure {
/** Maximum tokens per segment (0, default = no limit) */
public int max_tokens;
/** Flag to speed up the audio by 2x using Phase Vocoder. (default = false) */
public CBool speed_up;
/** [EXPERIMENTAL] Enable debug mode for extra info */
public CBool debug_mode;
/** Flag to speed up the audio by 2x using Phase Vocoder. (default = false) */
public void speedUp(boolean enable) {
speed_up = enable ? CBool.TRUE : CBool.FALSE;
/** Enable debug mode */
public void enableDebugMode(boolean enable) {
debug_mode = enable ? CBool.TRUE : CBool.FALSE;
}
/** Overwrite the audio context size (0 = use default). */
@ -189,11 +192,11 @@ public class WhisperFullParams extends Structure {
}
/** Flag to suppress non-speech tokens. */
public CBool suppress_non_speech_tokens;
public CBool suppress_nst;
/** Flag to suppress non-speech tokens. */
public void suppressNonSpeechTokens(boolean enable) {
suppress_non_speech_tokens = enable ? CBool.TRUE : CBool.FALSE;
suppress_nst = enable ? CBool.TRUE : CBool.FALSE;
}
/** Initial decoding temperature. */
@ -282,6 +285,16 @@ public class WhisperFullParams extends Structure {
*/
public Pointer encoder_begin_callback_user_data;
/** Callback used to abort GGML computation */
public Pointer abort_callback;
/** User data for the abort_callback */
public Pointer abort_callback_user_data;
public void setAbortCallback(GgmlAbortCallback callback) {
abort_callback = CallbackReference.getFunctionPointer(callback);
}
/**
* Callback by each decoder to filter obtained logits.
* WhisperLogitsFilterCallback
@ -318,17 +331,28 @@ public class WhisperFullParams extends Structure {
@Override
protected List<String> getFieldOrder() {
return Arrays.asList("strategy", "n_threads", "n_max_text_ctx", "offset_ms", "duration_ms", "translate",
"no_context", "single_segment", "no_timestamps",
"print_special", "print_progress", "print_realtime", "print_timestamps", "token_timestamps",
"thold_pt", "thold_ptsum", "max_len", "split_on_word", "max_tokens", "speed_up", "audio_ctx",
"tdrz_enable", "suppress_regex", "initial_prompt", "prompt_tokens", "prompt_n_tokens", "language", "detect_language",
"suppress_blank", "suppress_non_speech_tokens", "temperature", "max_initial_ts", "length_penalty",
"temperature_inc", "entropy_thold", "logprob_thold", "no_speech_thold", "greedy", "beam_search",
"new_segment_callback", "new_segment_callback_user_data",
return Arrays.asList("strategy", "n_threads", "n_max_text_ctx",
"offset_ms", "duration_ms", "translate", "no_context",
"no_timestamps", "single_segment", "print_special",
"print_progress", "print_realtime", "print_timestamps",
"token_timestamps", "thold_pt", "thold_ptsum", "max_len",
"split_on_word", "max_tokens", "debug_mode", "audio_ctx",
"tdrz_enable", "suppress_regex", "initial_prompt",
"prompt_tokens", "prompt_n_tokens", "language", "detect_language",
"suppress_blank", "suppress_nst", "temperature",
"max_initial_ts", "length_penalty", "temperature_inc",
"entropy_thold", "logprob_thold", "no_speech_thold", "greedy",
"beam_search", "new_segment_callback", "new_segment_callback_user_data",
"progress_callback", "progress_callback_user_data",
"encoder_begin_callback", "encoder_begin_callback_user_data",
"abort_callback", "abort_callback_user_data",
"logits_filter_callback", "logits_filter_callback_user_data",
"grammar_rules", "n_grammar_rules", "i_start_rule", "grammar_penalty");
}
public static class ByValue extends WhisperFullParams implements Structure.ByValue {
public ByValue() { super(); }
public ByValue(Pointer p) { super(p); }
}
}

View File

@ -76,7 +76,7 @@ class WhisperCppTest {
float[] floats = new float[b.length / 2];
//WhisperFullParams params = whisper.getFullDefaultParams(WhisperSamplingStrategy.WHISPER_SAMPLING_GREEDY);
WhisperFullParams params = whisper.getFullDefaultParams(WhisperSamplingStrategy.WHISPER_SAMPLING_BEAM_SEARCH);
WhisperFullParams.ByValue params = whisper.getFullDefaultParams(WhisperSamplingStrategy.WHISPER_SAMPLING_BEAM_SEARCH);
params.setProgressCallback((ctx, state, progress, user_data) -> System.out.println("progress: " + progress));
params.print_progress = CBool.FALSE;
//params.initial_prompt = "and so my fellow Americans um, like";

View File

@ -33,6 +33,9 @@ mkdir build-em && cd build-em
emcmake cmake .. && make -j
# run test
node ../tests/test-whisper.js
# For Node.js versions prior to v16.4.0, experimental features need to be enabled:
node --experimental-wasm-threads --experimental-wasm-simd ../tests/test-whisper.js
# publish npm package

View File

@ -1,6 +1,6 @@
{
"name": "whisper.cpp",
"version": "1.6.2",
"version": "1.7.4",
"description": "Whisper speech recognition",
"main": "whisper.js",
"scripts": {

3
bindings/ruby/.gitignore vendored Normal file
View File

@ -0,0 +1,3 @@
LICENSE
pkg/
lib/whisper.*

245
bindings/ruby/README.md Normal file
View File

@ -0,0 +1,245 @@
whispercpp
==========
![whisper.cpp](https://user-images.githubusercontent.com/1991296/235238348-05d0f6a4-da44-4900-a1de-d0707e75b763.jpeg)
Ruby bindings for [whisper.cpp][], an interface of automatic speech recognition model.
Installation
------------
Install the gem and add to the application's Gemfile by executing:
$ bundle add whispercpp
If bundler is not being used to manage dependencies, install the gem by executing:
$ gem install whispercpp
Usage
-----
```ruby
require "whisper"
whisper = Whisper::Context.new("base")
params = Whisper::Params.new(
language: "en",
offset: 10_000,
duration: 60_000,
max_text_tokens: 300,
translate: true,
print_timestamps: false,
initial_prompt: "Initial prompt here."
)
whisper.transcribe("path/to/audio.wav", params) do |whole_text|
puts whole_text
end
```
### Preparing model ###
Some models are prepared up-front:
```ruby
base_en = Whisper::Model.pre_converted_models["base.en"]
whisper = Whisper::Context.new(base_en)
```
At first time you use a model, it is downloaded automatically. After that, downloaded cached file is used. To clear cache, call `#clear_cache`:
```ruby
Whisper::Model.pre_converted_models["base"].clear_cache
```
You also can use shorthand for pre-converted models:
```ruby
whisper = Whisper::Context.new("base.en")
```
You can see the list of prepared model names by `Whisper::Model.pre_converted_models.keys`:
```ruby
puts Whisper::Model.pre_converted_models.keys
# tiny
# tiny.en
# tiny-q5_1
# tiny.en-q5_1
# tiny-q8_0
# base
# base.en
# base-q5_1
# base.en-q5_1
# base-q8_0
# :
# :
```
You can also use local model files you prepared:
```ruby
whisper = Whisper::Context.new("path/to/your/model.bin")
```
Or, you can download model files:
```ruby
whisper = Whisper::Context.new("https://example.net/uri/of/your/model.bin")
# Or
whisper = Whisper::Context.new(URI("https://example.net/uri/of/your/model.bin"))
```
See [models][] page for details.
### Preparing audio file ###
Currently, whisper.cpp accepts only 16-bit WAV files.
API
---
### Segments ###
Once `Whisper::Context#transcribe` called, you can retrieve segments by `#each_segment`:
```ruby
def format_time(time_ms)
sec, decimal_part = time_ms.divmod(1000)
min, sec = sec.divmod(60)
hour, min = min.divmod(60)
"%02d:%02d:%02d.%03d" % [hour, min, sec, decimal_part]
end
whisper
.transcribe("path/to/audio.wav", params)
.each_segment.with_index do |segment, index|
line = "[%{nth}: %{st} --> %{ed}] %{text}" % {
nth: index + 1,
st: format_time(segment.start_time),
ed: format_time(segment.end_time),
text: segment.text
}
line << " (speaker turned)" if segment.speaker_next_turn?
puts line
end
```
You can also add hook to params called on new segment:
```ruby
# Add hook before calling #transcribe
params.on_new_segment do |segment|
line = "[%{st} --> %{ed}] %{text}" % {
st: format_time(segment.start_time),
ed: format_time(segment.end_time),
text: segment.text
}
line << " (speaker turned)" if segment.speaker_next_turn?
puts line
end
whisper.transcribe("path/to/audio.wav", params)
```
### Models ###
You can see model information:
```ruby
whisper = Whisper::Context.new("base")
model = whisper.model
model.n_vocab # => 51864
model.n_audio_ctx # => 1500
model.n_audio_state # => 512
model.n_audio_head # => 8
model.n_audio_layer # => 6
model.n_text_ctx # => 448
model.n_text_state # => 512
model.n_text_head # => 8
model.n_text_layer # => 6
model.n_mels # => 80
model.ftype # => 1
model.type # => "base"
```
### Logging ###
You can set log callback:
```ruby
prefix = "[MyApp] "
log_callback = ->(level, buffer, user_data) {
case level
when Whisper::LOG_LEVEL_NONE
puts "#{user_data}none: #{buffer}"
when Whisper::LOG_LEVEL_INFO
puts "#{user_data}info: #{buffer}"
when Whisper::LOG_LEVEL_WARN
puts "#{user_data}warn: #{buffer}"
when Whisper::LOG_LEVEL_ERROR
puts "#{user_data}error: #{buffer}"
when Whisper::LOG_LEVEL_DEBUG
puts "#{user_data}debug: #{buffer}"
when Whisper::LOG_LEVEL_CONT
puts "#{user_data}same to previous: #{buffer}"
end
}
Whisper.log_set log_callback, prefix
```
Using this feature, you are also able to suppress log:
```ruby
Whisper.log_set ->(level, buffer, user_data) {
# do nothing
}, nil
Whisper::Context.new("base")
```
### Low-level API to transcribe ###
You can also call `Whisper::Context#full` and `#full_parallel` with a Ruby array as samples. Although `#transcribe` with audio file path is recommended because it extracts PCM samples in C++ and is fast, `#full` and `#full_parallel` give you flexibility.
```ruby
require "whisper"
require "wavefile"
reader = WaveFile::Reader.new("path/to/audio.wav", WaveFile::Format.new(:mono, :float, 16000))
samples = reader.enum_for(:each_buffer).map(&:samples).flatten
whisper = Whisper::Context.new("base")
whisper
.full(Whisper::Params.new, samples)
.each_segment do |segment|
puts segment.text
end
```
The second argument `samples` may be an array, an object with `length` and `each` method, or a MemoryView. If you can prepare audio data as C array and export it as a MemoryView, whispercpp accepts and works with it with zero copy.
Development
-----------
% git clone https://github.com/ggerganov/whisper.cpp.git
% cd whisper.cpp/bindings/ruby
% rake test
First call of `rake test` builds an extension and downloads a model for testing. After that, you add tests in `tests` directory and modify `ext/ruby_whisper.cpp`.
If something seems wrong on build, running `rake clean` solves some cases.
License
-------
The same to [whisper.cpp][].
[whisper.cpp]: https://github.com/ggerganov/whisper.cpp
[models]: https://github.com/ggerganov/whisper.cpp/tree/master/models

View File

@ -1,12 +1,66 @@
require 'rake/clean'
require 'rubygems/package'
require "bundler/gem_tasks"
require "rake/testtask"
require_relative "extsources"
desc 'Build gem'
task :package do
spec_source = File.read File.join(File.dirname(__FILE__),'whispercpp.gemspec')
spec = nil
# see: http://gist.github.com/16215
Thread.new { spec = eval("#{spec_source}") }.join
spec.validate
Gem::Package.build(spec)
SOURCES = FileList[]
EXTSOURCES.each do |src|
basename = src.pathmap("%f")
dest = basename == "LICENSE" ? basename : src.pathmap("%{../..,ext}p")
dir = dest.pathmap("%d")
file src
directory dir
file dest => [src, dir] do |t|
cp t.source, t.name
end
SOURCES.include dest
end
CLEAN.include SOURCES
CLEAN.include FileList["ext/**/*.o", "ext/**/*.metal", "ext/**/*.tmp", "ext/whisper.{so,bundle,dll}"]
SRC = FileList["ext/*.{c,cpp,h}"]
task build: SOURCES
directory "pkg"
CLOBBER.include "pkg"
LIB_NAME = "whisper".ext(RbConfig::CONFIG["DLEXT"])
SO_FILE = File.join("ext", LIB_NAME)
LIB_FILE = File.join("lib", LIB_NAME)
file "ext/Makefile" => SRC + ["ext/extconf.rb"] + SOURCES do |t|
chdir "ext" do
ruby "extconf.rb"
end
end
file SO_FILE => "ext/Makefile" do |t|
chdir "ext" do
sh "make"
end
end
CLEAN.include SO_FILE
directory "lib"
file LIB_FILE => [SO_FILE, "lib"] do |t|
copy t.source, t.name
end
CLEAN.include LIB_FILE
Rake::TestTask.new do |t|
t.test_files = FileList["tests/test_*.rb"]
end
TEST_MEMORY_VIEW = "tests/jfk_reader/jfk_reader.#{RbConfig::CONFIG['DLEXT']}"
file TEST_MEMORY_VIEW => "tests/jfk_reader/jfk_reader.c" do |t|
chdir "tests/jfk_reader" do
ruby "extconf.rb"
sh "make"
end
end
CLEAN.include "tests/jfk_reader/jfk_reader.{o,#{RbConfig::CONFIG['DLEXT']}}"
task test: [LIB_FILE, TEST_MEMORY_VIEW]

View File

@ -1,9 +1,11 @@
Makefile
ggml.c
ggml.h
ggml-alloc.c
ggml-alloc.h
whisper.so
whisper.bundle
whisper.cpp
whisper.h
dr_wav.h
whisper.dll
scripts/get-flags.mk
*.o
/*/**/*.c
/*/**/*.cpp
/*/**/*.h
/*/**/*.m
/*/**/*.metal

13
bindings/ruby/ext/cpu.mk Normal file
View File

@ -0,0 +1,13 @@
ggml/src/ggml-cpu/ggml-cpu-cpp.o: \
ggml/src/ggml-cpu/ggml-cpu.cpp \
ggml/src/ggml-cpu/unary-ops.cpp \
ggml/src/ggml-cpu/binary-ops.cpp \
ggml/src/ggml-cpu/vec.cpp \
ggml/src/ggml-cpu/ops.cpp \
ggml/include/ggml-backend.h \
ggml/include/ggml.h \
ggml/include/ggml-alloc.h \
ggml/src/ggml-backend-impl.h \
ggml/include/ggml-cpu.h \
ggml/src/ggml-impl.h
$(CXX) $(CXXFLAGS) -c $< -o $@

View File

@ -1,22 +1,10 @@
require 'mkmf'
system("cp #{File.join(File.dirname(__FILE__),'..','..','..','whisper.cpp')} .")
system("cp #{File.join(File.dirname(__FILE__),'..','..','..','whisper.h')} .")
system("cp #{File.join(File.dirname(__FILE__),'..','..','..','ggml.h')} .")
system("cp #{File.join(File.dirname(__FILE__),'..','..','..','ggml.c')} .")
system("cp #{File.join(File.dirname(__FILE__),'..','..','..','ggml-impl.h')} .")
system("cp #{File.join(File.dirname(__FILE__),'..','..','..','ggml-alloc.h')} .")
system("cp #{File.join(File.dirname(__FILE__),'..','..','..','ggml-alloc.c')} .")
system("cp #{File.join(File.dirname(__FILE__),'..','..','..','ggml-backend-impl.h')} .")
system("cp #{File.join(File.dirname(__FILE__),'..','..','..','ggml-backend.h')} .")
system("cp #{File.join(File.dirname(__FILE__),'..','..','..','ggml-backend.c')} .")
system("cp #{File.join(File.dirname(__FILE__),'..','..','..','ggml-common.h')} .")
system("cp #{File.join(File.dirname(__FILE__),'..','..','..','ggml-quants.h')} .")
system("cp #{File.join(File.dirname(__FILE__),'..','..','..','ggml-quants.c')} .")
system("cp #{File.join(File.dirname(__FILE__),'..','..','..','examples','dr_wav.h')} .")
# need to use c++ compiler flags
$CXXFLAGS << ' -std=c++11'
$CXXFLAGS << ' -std=c++17'
$LDFLAGS << ' -lstdc++'
# Set to true when building binary gems
if enable_config('static-stdlib', false)
$LDFLAGS << ' -static-libgcc -static-libstdc++'
@ -27,4 +15,198 @@ if enable_config('march-tune-native', false)
$CXXFLAGS << ' -march=native -mtune=native'
end
if ENV['WHISPER_METAL']
$GGML_METAL ||= true
$DEPRECATE_WARNING ||= true
end
$UNAME_S = `uname -s`.chomp
$UNAME_P = `uname -p`.chomp
$UNAME_M = `uname -m`.chomp
if $UNAME_S == 'Darwin'
unless ENV['GGML_NO_METAL']
$GGML_METAL ||= true
end
$GGML_NO_OPENMP ||= true
end
if $GGML_METAL
$GGML_METAL_EMBED_LIBRARY = true
end
$MK_CPPFLAGS = '-Iggml/include -Iggml/src -Iggml/src/ggml-cpu -Iinclude -Isrc -Iexamples -DGGML_USE_CPU'
$MK_CFLAGS = '-std=c11 -fPIC'
$MK_CXXFLAGS = '-std=c++17 -fPIC'
$MK_NVCCFLAGS = '-std=c++17'
$MK_LDFLAGS = ''
$OBJ_GGML = []
$OBJ_WHISPER = []
$OBJ_COMMON = []
$OBJ_SDL = []
$MK_CPPFLAGS << ' -D_XOPEN_SOURCE=600'
if $UNAME_S == 'Linux'
$MK_CPPFLAGS << ' -D_GNU_SOURCE'
end
if $UNAME_S == 'Darwin'
$MK_CPPFLAGS << ' -D_DARWIN_C_SOURCE'
end
if ENV['WHISPER_DEBUG']
$MK_CFLAGS << ' -O0 -g'
$MK_CXXFLAGS << ' -O0 -g'
$MK_LDFLAGS << ' -g'
$MK_NVCCFLAGS << ' -O0 -g'
else
$MK_CPPFLAGS << ' -DNDEBUG'
$MK_CFLAGS << ' -O3'
$MK_CXXFLAGS << ' -O3'
$MK_NVCCFLAGS << ' -O3'
end
$WARN_FLAGS =
' -Wall' <<
' -Wextra' <<
' -Wpedantic' <<
' -Wcast-qual' <<
' -Wno-unused-function'
$MK_CFLAGS <<
$WARN_FLAGS <<
' -Wshadow' <<
' -Wstrict-prototypes' <<
' -Wpointer-arith' <<
' -Wmissing-prototypes' <<
' -Werror=implicit-int' <<
' -Werror=implicit-function-declaration'
$MK_CXXFLAGS <<
$WARN_FLAGS <<
' -Wmissing-declarations' <<
' -Wmissing-noreturn'
unless `#{cc_command} #{$LDFLAGS} -Wl,-v 2>&1`.chomp.include? 'dyld-1015.7'
$MK_CPPFLAGS << ' -DHAVE_BUGGY_APPLE_LINKER'
end
if %w[Linux Darwin FreeBSD NetBSD OpenBSD Haiku].include? $UNAME_S
$MK_CFLAGS << ' -pthread'
$MK_CXXFLAGS << ' -pthread'
end
unless $_WIN32
$DSO_EXT = '.so'
else
$DSO_EXT = '.dll'
end
unless ENV['RISCV']
if %w[x86_64 i686 amd64].include? $UNAME_M
$HOST_CXXFLAGS ||= ''
$MK_CFLAGS << ' -march=native -mtune=native'
$HOST_CXXFLAGS << ' -march=native -mtune=native'
end
else
$MK_CFLAGS << ' -march=rv64gcv -mabi=lp64d'
$MK_CXXFLAGS << ' -march=rv64gcv -mabi=lp64d'
end
unless ENV['GGML_NO_ACCELERATE']
if $UNAME_S == 'Darwin'
$MK_CPPFLAGS << ' -DGGML_USE_ACCELERATE -DGGML_USE_BLAS -DGGML_BLAS_USE_ACCELERATE'
$MK_CPPFLAGS << ' -DACCELERATE_NEW_LAPACK'
$MK_CPPFLAGS << ' -DACCELERATE_LAPACK_ILP64'
$MK_LDFLAGS << ' -framework Accelerate'
$OBJ_GGML << 'ggml/src/ggml-blas/ggml-blas.o'
end
end
if ENV['GGML_OPENBLAS']
$MK_CPPFLAGS << " -DGGML_USE_BLAS #{`pkg-config --cflags-only-I openblas`.chomp}"
$MK_CFLAGS << " #{`pkg-config --cflags-only-other openblas)`.chomp}"
$MK_LDFLAGS << " #{`pkg-config --libs openblas`}"
$OBJ_GGML << 'ggml/src/ggml-blas/ggml-blas.o'
end
if ENV['GGML_OPENBLAS64']
$MK_CPPFLAGS << " -DGGML_USE_BLAS #{`pkg-config --cflags-only-I openblas64`.chomp}"
$MK_CFLAGS << " #{`pkg-config --cflags-only-other openblas64)`.chomp}"
$MK_LDFLAGS << " #{`pkg-config --libs openblas64`}"
$OBJ_GGML << 'ggml/src/ggml-blas/ggml-blas.o'
end
if $GGML_METAL
$MK_CPPFLAGS << ' -DGGML_USE_METAL'
$MK_LDFLAGS << ' -framework Foundation -framework Metal -framework MetalKit'
$OBJ_GGML << 'ggml/src/ggml-metal/ggml-metal.o'
if ENV['GGML_METAL_NDEBUG']
$MK_CPPFLAGS << ' -DGGML_METAL_NDEBUG'
end
if $GGML_METAL_EMBED_LIBRARY
$MK_CPPFLAGS << ' -DGGML_METAL_EMBED_LIBRARY'
$OBJ_GGML << 'ggml/src/ggml-metal/ggml-metal-embed.o'
end
end
$OBJ_GGML <<
'ggml/src/ggml.o' <<
'ggml/src/ggml-alloc.o' <<
'ggml/src/ggml-backend.o' <<
'ggml/src/ggml-backend-reg.o' <<
'ggml/src/ggml-opt.o' <<
'ggml/src/ggml-quants.o' <<
'ggml/src/ggml-threading.o' <<
'ggml/src/ggml-cpu/ggml-cpu.o' <<
'ggml/src/ggml-cpu/ggml-cpu-cpp.o' <<
'ggml/src/ggml-cpu/ggml-cpu-aarch64.o' <<
'ggml/src/ggml-cpu/ggml-cpu-hbm.o' <<
'ggml/src/ggml-cpu/ggml-cpu-quants.o' <<
'ggml/src/ggml-cpu/ggml-cpu-traits.o' <<
'ggml/src/ggml-cpu/unary-ops.o' <<
'ggml/src/ggml-cpu/binary-ops.o' <<
'ggml/src/ggml-cpu/vec.o' <<
'ggml/src/ggml-cpu/ops.o'
$OBJ_WHISPER <<
'src/whisper.o' <<
'examples/common.o' <<
'examples/common-whisper.o'
$objs = $OBJ_GGML + $OBJ_WHISPER + $OBJ_COMMON + $OBJ_SDL
$objs <<
"ruby_whisper.o" <<
"ruby_whisper_context.o" <<
"ruby_whisper_transcribe.o" <<
"ruby_whisper_params.o" <<
"ruby_whisper_error.o" <<
"ruby_whisper_segment.o" <<
"ruby_whisper_model.o"
$CPPFLAGS = "#{$MK_CPPFLAGS} #{$CPPFLAGS}"
$CFLAGS = "#{$CPPFLAGS} #{$MK_CFLAGS} #{$GF_CFLAGS} #{$CFLAGS}"
$BASE_CXXFLAGS = "#{$MK_CXXFLAGS} #{$CXXFLAGS}"
$CXXFLAGS = "#{$BASE_CXXFLAGS} #{$HOST_CXXFLAGS} #{$GF_CXXFLAGS} #{$CPPFLAGS}"
$NVCCFLAGS = "#{$MK_NVCCFLAGS} #{$NVCCFLAGS}"
$LDFLAGS = "#{$MK_LDFLAGS} #{$LDFLAGS}"
create_makefile('whisper')
File.open 'Makefile', 'a' do |file|
file.puts 'include scripts/get-flags.mk'
file.puts 'include cpu.mk'
if $GGML_METAL
file.puts 'include metal.mk'
if $GGML_METAL_EMBED_LIBRARY
file.puts 'include metal-embed.mk'
end
end
end

View File

@ -1,141 +0,0 @@
#pragma once
// ggml-backend internal header
#include "ggml-backend.h"
#ifdef __cplusplus
extern "C" {
#endif
//
// Backend buffer
//
// buffer type
typedef void * ggml_backend_buffer_type_context_t;
struct ggml_backend_buffer_type_i {
const char * (*GGML_CALL get_name) (ggml_backend_buffer_type_t buft);
ggml_backend_buffer_t (*GGML_CALL alloc_buffer) (ggml_backend_buffer_type_t buft, size_t size);
size_t (*GGML_CALL get_alignment) (ggml_backend_buffer_type_t buft); // tensor alignment
size_t (*GGML_CALL get_max_size) (ggml_backend_buffer_type_t buft); // allocation max size
size_t (*GGML_CALL get_alloc_size) (ggml_backend_buffer_type_t buft, const struct ggml_tensor * tensor); // data size needed to allocate the tensor, including padding
bool (*GGML_CALL supports_backend)(ggml_backend_buffer_type_t buft, ggml_backend_t backend); // check if the buffer type is usable by the backend
// check if tensor data is in host memory
// should be equivalent to supports_backend(buft, ggml_backend_cpu_init())
bool (*GGML_CALL is_host) (ggml_backend_buffer_type_t buft);
};
struct ggml_backend_buffer_type {
struct ggml_backend_buffer_type_i iface;
ggml_backend_buffer_type_context_t context;
};
// buffer
typedef void * ggml_backend_buffer_context_t;
struct ggml_backend_buffer_i {
const char * (*GGML_CALL get_name) (ggml_backend_buffer_t buffer);
void (*GGML_CALL free_buffer)(ggml_backend_buffer_t buffer);
void * (*GGML_CALL get_base) (ggml_backend_buffer_t buffer);
void (*GGML_CALL init_tensor)(ggml_backend_buffer_t buffer, struct ggml_tensor * tensor);
void (*GGML_CALL set_tensor) (ggml_backend_buffer_t buffer, struct ggml_tensor * tensor, const void * data, size_t offset, size_t size);
void (*GGML_CALL get_tensor) (ggml_backend_buffer_t buffer, const struct ggml_tensor * tensor, void * data, size_t offset, size_t size);
bool (*GGML_CALL cpy_tensor) (ggml_backend_buffer_t buffer, const struct ggml_tensor * src, struct ggml_tensor * dst); // dst is in the buffer, src may be in any buffer
void (*GGML_CALL clear) (ggml_backend_buffer_t buffer, uint8_t value);
void (*GGML_CALL reset) (ggml_backend_buffer_t buffer); // reset any internal state due to tensor initialization, such as tensor extras
};
struct ggml_backend_buffer {
struct ggml_backend_buffer_i iface;
ggml_backend_buffer_type_t buft;
ggml_backend_buffer_context_t context;
size_t size;
enum ggml_backend_buffer_usage usage;
};
GGML_CALL ggml_backend_buffer_t ggml_backend_buffer_init(
ggml_backend_buffer_type_t buft,
struct ggml_backend_buffer_i iface,
ggml_backend_buffer_context_t context,
size_t size);
// do not use directly, use ggml_backend_tensor_copy instead
bool ggml_backend_buffer_copy_tensor(const struct ggml_tensor * src, struct ggml_tensor * dst);
// buffer that contains a collection of buffers
GGML_CALL ggml_backend_buffer_t ggml_backend_multi_buffer_alloc_buffer(ggml_backend_buffer_t * buffers, size_t n_buffers);
GGML_CALL bool ggml_backend_buffer_is_multi_buffer(ggml_backend_buffer_t buffer);
GGML_CALL void ggml_backend_multi_buffer_set_usage(ggml_backend_buffer_t buffer, enum ggml_backend_buffer_usage usage);
//
// Backend
//
typedef void * ggml_backend_context_t;
struct ggml_backend_i {
const char * (*GGML_CALL get_name)(ggml_backend_t backend);
void (*GGML_CALL free)(ggml_backend_t backend);
// buffer allocation
ggml_backend_buffer_type_t (*GGML_CALL get_default_buffer_type)(ggml_backend_t backend);
// (optional) asynchronous tensor data access
void (*GGML_CALL set_tensor_async)(ggml_backend_t backend, struct ggml_tensor * tensor, const void * data, size_t offset, size_t size);
void (*GGML_CALL get_tensor_async)(ggml_backend_t backend, const struct ggml_tensor * tensor, void * data, size_t offset, size_t size);
bool (*GGML_CALL cpy_tensor_async)(ggml_backend_t backend_src, ggml_backend_t backend_dst, const struct ggml_tensor * src, struct ggml_tensor * dst);
// (optional) complete all pending operations
void (*GGML_CALL synchronize)(ggml_backend_t backend);
// compute graph with a plan (not used currently)
ggml_backend_graph_plan_t (*GGML_CALL graph_plan_create) (ggml_backend_t backend, const struct ggml_cgraph * cgraph);
void (*GGML_CALL graph_plan_free) (ggml_backend_t backend, ggml_backend_graph_plan_t plan);
// compute graph with a plan
enum ggml_status (*GGML_CALL graph_plan_compute)(ggml_backend_t backend, ggml_backend_graph_plan_t plan);
// compute graph without a plan (async)
enum ggml_status (*GGML_CALL graph_compute) (ggml_backend_t backend, struct ggml_cgraph * cgraph);
// check if the backend supports an operation
bool (*GGML_CALL supports_op)(ggml_backend_t backend, const struct ggml_tensor * op);
// check if the backend wants to run an operation, even if the weights are allocated in a CPU buffer
// these should be expensive operations with large batch sizes that may benefit from running on this backend
// even if the weight has to be copied from the CPU temporarily
bool (*GGML_CALL offload_op)(ggml_backend_t backend, const struct ggml_tensor * op);
// (optional) event synchronization
ggml_backend_event_t (*GGML_CALL event_new) (ggml_backend_t backend);
void (*GGML_CALL event_free) (ggml_backend_event_t event);
void (*GGML_CALL event_record) (ggml_backend_event_t event);
void (*GGML_CALL event_wait) (ggml_backend_t backend, ggml_backend_event_t event);
void (*GGML_CALL event_synchronize) (ggml_backend_event_t event);
};
struct ggml_backend {
ggml_guid_t guid;
struct ggml_backend_i iface;
ggml_backend_context_t context;
};
struct ggml_backend_event {
ggml_backend_t backend;
void * context;
};
//
// Backend registry
//
typedef ggml_backend_t (*GGML_CALL ggml_backend_init_fn)(const char * params, void * user_data);
GGML_CALL void ggml_backend_register(const char * name, ggml_backend_init_fn init_fn, ggml_backend_buffer_type_t default_buffer_type, void * user_data);
#ifdef __cplusplus
}
#endif

File diff suppressed because it is too large Load Diff

View File

@ -1,233 +0,0 @@
#pragma once
#include "ggml.h"
#include "ggml-alloc.h"
#ifdef __cplusplus
extern "C" {
#endif
typedef struct ggml_backend_buffer_type * ggml_backend_buffer_type_t;
typedef struct ggml_backend_buffer * ggml_backend_buffer_t;
typedef struct ggml_backend_event * ggml_backend_event_t;
typedef struct ggml_backend * ggml_backend_t;
typedef void * ggml_backend_graph_plan_t;
//
// Backend buffer
//
// buffer type
GGML_API const char * ggml_backend_buft_name (ggml_backend_buffer_type_t buft);
GGML_API GGML_CALL ggml_backend_buffer_t ggml_backend_buft_alloc_buffer (ggml_backend_buffer_type_t buft, size_t size);
GGML_API size_t ggml_backend_buft_get_alignment (ggml_backend_buffer_type_t buft);
GGML_API size_t ggml_backend_buft_get_max_size (ggml_backend_buffer_type_t buft);
GGML_API GGML_CALL size_t ggml_backend_buft_get_alloc_size (ggml_backend_buffer_type_t buft, struct ggml_tensor * tensor);
GGML_API bool ggml_backend_buft_supports_backend(ggml_backend_buffer_type_t buft, ggml_backend_t backend);
GGML_API bool ggml_backend_buft_is_host (ggml_backend_buffer_type_t buft);
// buffer
enum ggml_backend_buffer_usage {
GGML_BACKEND_BUFFER_USAGE_ANY = 0,
GGML_BACKEND_BUFFER_USAGE_WEIGHTS = 1,
};
GGML_API const char * ggml_backend_buffer_name (ggml_backend_buffer_t buffer);
GGML_API void ggml_backend_buffer_free (ggml_backend_buffer_t buffer);
GGML_API void * ggml_backend_buffer_get_base (ggml_backend_buffer_t buffer);
GGML_API size_t ggml_backend_buffer_get_size (ggml_backend_buffer_t buffer);
GGML_API GGML_CALL void ggml_backend_buffer_init_tensor (ggml_backend_buffer_t buffer, struct ggml_tensor * tensor);
GGML_API size_t ggml_backend_buffer_get_alignment (ggml_backend_buffer_t buffer);
GGML_API size_t ggml_backend_buffer_get_max_size (ggml_backend_buffer_t buffer);
GGML_API size_t ggml_backend_buffer_get_alloc_size(ggml_backend_buffer_t buffer, struct ggml_tensor * tensor);
GGML_API void ggml_backend_buffer_clear (ggml_backend_buffer_t buffer, uint8_t value);
GGML_API bool ggml_backend_buffer_is_host (ggml_backend_buffer_t buffer);
GGML_API void ggml_backend_buffer_set_usage (ggml_backend_buffer_t buffer, enum ggml_backend_buffer_usage usage);
GGML_API ggml_backend_buffer_type_t ggml_backend_buffer_get_type (ggml_backend_buffer_t buffer);
GGML_API void ggml_backend_buffer_reset (ggml_backend_buffer_t buffer);
//
// Backend
//
GGML_API ggml_guid_t ggml_backend_guid(ggml_backend_t backend);
GGML_API const char * ggml_backend_name(ggml_backend_t backend);
GGML_API void ggml_backend_free(ggml_backend_t backend);
GGML_API ggml_backend_buffer_type_t ggml_backend_get_default_buffer_type(ggml_backend_t backend);
GGML_API ggml_backend_buffer_t ggml_backend_alloc_buffer(ggml_backend_t backend, size_t size);
GGML_API size_t ggml_backend_get_alignment(ggml_backend_t backend);
GGML_API size_t ggml_backend_get_max_size(ggml_backend_t backend);
GGML_API void ggml_backend_tensor_set_async(ggml_backend_t backend, struct ggml_tensor * tensor, const void * data, size_t offset, size_t size);
GGML_API void ggml_backend_tensor_get_async(ggml_backend_t backend, const struct ggml_tensor * tensor, void * data, size_t offset, size_t size);
GGML_API GGML_CALL void ggml_backend_tensor_set( struct ggml_tensor * tensor, const void * data, size_t offset, size_t size);
GGML_API GGML_CALL void ggml_backend_tensor_get(const struct ggml_tensor * tensor, void * data, size_t offset, size_t size);
GGML_API void ggml_backend_synchronize(ggml_backend_t backend);
GGML_API ggml_backend_graph_plan_t ggml_backend_graph_plan_create(ggml_backend_t backend, struct ggml_cgraph * cgraph);
GGML_API void ggml_backend_graph_plan_free (ggml_backend_t backend, ggml_backend_graph_plan_t plan);
GGML_API enum ggml_status ggml_backend_graph_plan_compute (ggml_backend_t backend, ggml_backend_graph_plan_t plan);
GGML_API enum ggml_status ggml_backend_graph_compute (ggml_backend_t backend, struct ggml_cgraph * cgraph);
GGML_API enum ggml_status ggml_backend_graph_compute_async(ggml_backend_t backend, struct ggml_cgraph * cgraph);
GGML_API bool ggml_backend_supports_op(ggml_backend_t backend, const struct ggml_tensor * op);
GGML_API bool ggml_backend_offload_op(ggml_backend_t backend, const struct ggml_tensor * op);
// tensor copy between different backends
GGML_API void ggml_backend_tensor_copy(struct ggml_tensor * src, struct ggml_tensor * dst);
// asynchronous copy
// the copy is performed after all the currently queued operations in backend_src
// backend_dst will wait for the copy to complete before performing other operations
// automatic fallback to sync copy if async is not supported
GGML_API void ggml_backend_tensor_copy_async(ggml_backend_t backend_src, ggml_backend_t backend_dst, struct ggml_tensor * src, struct ggml_tensor * dst);
// events
GGML_API ggml_backend_event_t ggml_backend_event_new (ggml_backend_t backend);
GGML_API void ggml_backend_event_free (ggml_backend_event_t event);
GGML_API void ggml_backend_event_record (ggml_backend_event_t event);
GGML_API void ggml_backend_event_synchronize(ggml_backend_event_t event);
GGML_API void ggml_backend_event_wait (ggml_backend_t backend, ggml_backend_event_t event); // wait async on event
//
// CPU backend
//
GGML_API ggml_backend_t ggml_backend_cpu_init(void);
GGML_API GGML_CALL bool ggml_backend_is_cpu (ggml_backend_t backend);
GGML_API void ggml_backend_cpu_set_n_threads (ggml_backend_t backend_cpu, int n_threads);
GGML_API void ggml_backend_cpu_set_abort_callback(ggml_backend_t backend_cpu, ggml_abort_callback abort_callback, void * abort_callback_data);
// Create a backend buffer from an existing pointer
GGML_API GGML_CALL ggml_backend_buffer_t ggml_backend_cpu_buffer_from_ptr(void * ptr, size_t size);
GGML_API GGML_CALL ggml_backend_buffer_type_t ggml_backend_cpu_buffer_type(void);
#ifdef GGML_USE_CPU_HBM
GGML_API ggml_backend_buffer_type_t ggml_backend_cpu_hbm_buffer_type(void);
#endif
//
// Backend registry
//
// The backend registry is a registry of all the available backends, and allows initializing backends in a generic way
GGML_API size_t ggml_backend_reg_get_count(void);
GGML_API size_t ggml_backend_reg_find_by_name(const char * name);
GGML_API ggml_backend_t ggml_backend_reg_init_backend_from_str(const char * backend_str); // str is name[:params]
GGML_API const char * ggml_backend_reg_get_name(size_t i);
GGML_API ggml_backend_t ggml_backend_reg_init_backend(size_t i, const char * params); // params is backend-specific
GGML_API ggml_backend_buffer_type_t ggml_backend_reg_get_default_buffer_type(size_t i);
GGML_API ggml_backend_buffer_t ggml_backend_reg_alloc_buffer(size_t i, size_t size);
//
// Backend scheduler
//
// The backend scheduler allows for multiple backends to be used together
// Handles compute buffer allocation, assignment of tensors to backends, and copying of tensors between backends
// The backends are selected based on:
// - the backend that supports the operation
// - the location of the pre-allocated tensors (e.g. the weights)
/*
Example usage:
// operations that use tensors allocated in a buffer with USAGE_WEIGHTS will be assigned
// preferrably to run on the same backend as the buffer
ggml_backend_buffer_set_usage(buf_weights, GGML_BACKEND_BUFFER_USAGE_WEIGHTS);
sched = ggml_backend_sched_new({backend_gpu, backend_gpu2, backend_cpu}, NULL, num_backends, GGML_DEFAULT_GRAPH_SIZE, false);
// initialize buffers from a max size graph (optional)
reserve_graph = build_graph(sched, max_batch_size);
// manually assign nodes to a backend (optional, should not be needed in most cases)
struct ggml_tensor * node = ggml_mul_mat(ctx, ...);
ggml_backend_sched_set_tensor_backend(sched, node, backend_gpu);
ggml_backend_sched_reserve(sched, reserve_graph);
// compute
graph = build_graph(sched);
ggml_backend_sched_graph_compute(sched, graph);
// if there are graph inputs:
ggml_backend_sched_reset(sched);
ggml_backend_sched_alloc_graph(sched, graph);
ggml_backend_tensor_set(input_tensor, ...);
ggml_backend_sched_graph_compute(sched, graph);
}
*/
struct ggml_backend_sched;
typedef struct ggml_backend_sched * ggml_backend_sched_t;
// when ask == true, the scheduler wants to know if the user wants to observe this node
// this allows the scheduler to batch nodes together in order to evaluate them in a single call
//
// when ask == false, the scheduler is passing the node tensor to the user for observation
// if the user returns false, the scheduler will cancel the graph compute
//
typedef bool (*ggml_backend_sched_eval_callback)(struct ggml_tensor * t, bool ask, void * user_data);
// Initialize a backend scheduler
GGML_API ggml_backend_sched_t ggml_backend_sched_new(ggml_backend_t * backends, ggml_backend_buffer_type_t * bufts, int n_backends, size_t graph_size, bool parallel);
GGML_API void ggml_backend_sched_free(ggml_backend_sched_t sched);
// Initialize backend buffers from a measure graph
GGML_API bool ggml_backend_sched_reserve(ggml_backend_sched_t sched, struct ggml_cgraph * measure_graph);
// Get the number of splits of the last graph
GGML_API int ggml_backend_sched_get_n_splits(ggml_backend_sched_t sched);
GGML_API int ggml_backend_sched_get_n_copies(ggml_backend_sched_t sched);
GGML_API size_t ggml_backend_sched_get_buffer_size(ggml_backend_sched_t sched, ggml_backend_t backend);
GGML_API void ggml_backend_sched_set_tensor_backend(ggml_backend_sched_t sched, struct ggml_tensor * node, ggml_backend_t backend);
GGML_API ggml_backend_t ggml_backend_sched_get_tensor_backend(ggml_backend_sched_t sched, struct ggml_tensor * node);
// Allocate and compute graph on the backend scheduler
GGML_API bool ggml_backend_sched_alloc_graph(ggml_backend_sched_t sched, struct ggml_cgraph * graph);
GGML_API enum ggml_status ggml_backend_sched_graph_compute(ggml_backend_sched_t sched, struct ggml_cgraph * graph);
GGML_API enum ggml_status ggml_backend_sched_graph_compute_async(ggml_backend_sched_t sched, struct ggml_cgraph * graph);
GGML_API void ggml_backend_sched_synchronize(ggml_backend_sched_t sched);
// Reset all assignments and allocators - must be called before changing the node backends
GGML_API void ggml_backend_sched_reset(ggml_backend_sched_t sched);
// Set a callback to be called for each resulting node during graph compute
GGML_API void ggml_backend_sched_set_eval_callback(ggml_backend_sched_t sched, ggml_backend_sched_eval_callback callback, void * user_data);
//
// Utils
//
struct ggml_backend_graph_copy {
ggml_backend_buffer_t buffer;
struct ggml_context * ctx_allocated;
struct ggml_context * ctx_unallocated;
struct ggml_cgraph * graph;
};
// Copy a graph to a different backend
GGML_API struct ggml_backend_graph_copy ggml_backend_graph_copy(ggml_backend_t backend, struct ggml_cgraph * graph);
GGML_API void ggml_backend_graph_copy_free(struct ggml_backend_graph_copy copy);
typedef bool (*GGML_CALL ggml_backend_eval_callback)(int node_index, struct ggml_tensor * t1, struct ggml_tensor * t2, void * user_data);
// Compare the output of two backends
GGML_API bool ggml_backend_compare_graph_backend(ggml_backend_t backend1, ggml_backend_t backend2, struct ggml_cgraph * graph, ggml_backend_eval_callback callback, void * user_data);
// Tensor initialization
GGML_API void ggml_backend_tensor_alloc(ggml_backend_buffer_t buffer, struct ggml_tensor * tensor, void * addr);
GGML_API void ggml_backend_view_init(ggml_backend_buffer_t buffer, struct ggml_tensor * tensor);
#ifdef __cplusplus
}
#endif

View File

@ -1,43 +0,0 @@
#pragma once
#include "ggml.h"
#include "ggml-backend.h"
#ifdef GGML_USE_HIPBLAS
#define GGML_CUDA_NAME "ROCm"
#define GGML_CUBLAS_NAME "hipBLAS"
#else
#define GGML_CUDA_NAME "CUDA"
#define GGML_CUBLAS_NAME "cuBLAS"
#endif
#ifdef __cplusplus
extern "C" {
#endif
#define GGML_CUDA_MAX_DEVICES 16
// backend API
GGML_API GGML_CALL ggml_backend_t ggml_backend_cuda_init(int device);
GGML_API GGML_CALL bool ggml_backend_is_cuda(ggml_backend_t backend);
// device buffer
GGML_API GGML_CALL ggml_backend_buffer_type_t ggml_backend_cuda_buffer_type(int device);
// split tensor buffer that splits matrices by rows across multiple devices
GGML_API GGML_CALL ggml_backend_buffer_type_t ggml_backend_cuda_split_buffer_type(const float * tensor_split);
// pinned host buffer for use with the CPU backend for faster copies between CPU and GPU
GGML_API GGML_CALL ggml_backend_buffer_type_t ggml_backend_cuda_host_buffer_type(void);
GGML_API GGML_CALL int ggml_backend_cuda_get_device_count(void);
GGML_API GGML_CALL void ggml_backend_cuda_get_device_description(int device, char * description, size_t description_size);
GGML_API GGML_CALL void ggml_backend_cuda_get_device_memory(int device, size_t * free, size_t * total);
GGML_API GGML_CALL bool ggml_backend_cuda_register_host_buffer(void * buffer, size_t size);
GGML_API GGML_CALL void ggml_backend_cuda_unregister_host_buffer(void * buffer);
#ifdef __cplusplus
}
#endif

View File

@ -1,272 +0,0 @@
#pragma once
#include "ggml.h"
// GGML internal header
#include <assert.h>
#include <stdlib.h> // load `stdlib.h` before other headers to work around MinGW bug: https://sourceforge.net/p/mingw-w64/bugs/192/
#include <stddef.h>
#include <stdbool.h>
#include <string.h> // memcpy
#include <math.h> // fabsf
#ifdef __cplusplus
extern "C" {
#endif
// static_assert should be a #define, but if it's not,
// fall back to the _Static_assert C11 keyword.
// if C99 - static_assert is noop
// ref: https://stackoverflow.com/a/53923785/4039976
#ifndef __cplusplus
#ifndef static_assert
#if defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 201100L)
#define static_assert(cond, msg) _Static_assert(cond, msg)
#else
#define static_assert(cond, msg) struct global_scope_noop_trick
#endif
#endif
#endif
// __FMA__ and __F16C__ are not defined in MSVC, however they are implied with AVX2/AVX512
#if defined(_MSC_VER) && (defined(__AVX2__) || defined(__AVX512F__))
#ifndef __FMA__
#define __FMA__
#endif
#ifndef __F16C__
#define __F16C__
#endif
#endif
// __SSE3__ and __SSSE3__ are not defined in MSVC, but SSE3/SSSE3 are present when AVX/AVX2/AVX512 are available
#if defined(_MSC_VER) && (defined(__AVX__) || defined(__AVX2__) || defined(__AVX512F__))
#ifndef __SSE3__
#define __SSE3__
#endif
#ifndef __SSSE3__
#define __SSSE3__
#endif
#endif
// 16-bit float
// on Arm, we use __fp16
// on x86, we use uint16_t
#if defined(__ARM_NEON) && !defined(_MSC_VER)
// if YCM cannot find <arm_neon.h>, make a symbolic link to it, for example:
//
// $ ln -sfn /Library/Developer/CommandLineTools/usr/lib/clang/13.1.6/include/arm_neon.h ./src/
//
#include <arm_neon.h>
typedef __fp16 ggml_fp16_internal_t;
#define GGML_COMPUTE_FP16_TO_FP32(x) ggml_compute_fp16_to_fp32(x)
#define GGML_COMPUTE_FP32_TO_FP16(x) ggml_compute_fp32_to_fp16(x)
#define GGML_FP16_TO_FP32(x) ggml_compute_fp16_to_fp32(x)
static inline float ggml_compute_fp16_to_fp32(ggml_fp16_t h) {
ggml_fp16_internal_t tmp;
memcpy(&tmp, &h, sizeof(ggml_fp16_t));
return (float)tmp;
}
static inline ggml_fp16_t ggml_compute_fp32_to_fp16(float f) {
ggml_fp16_t res;
ggml_fp16_internal_t tmp = f;
memcpy(&res, &tmp, sizeof(ggml_fp16_t));
return res;
}
#else
typedef uint16_t ggml_fp16_internal_t;
#ifdef __wasm_simd128__
#include <wasm_simd128.h>
#else
#ifdef __POWER9_VECTOR__
#include <altivec.h>
#undef bool
#define bool _Bool
#else
#if defined(_MSC_VER) || defined(__MINGW32__)
#include <intrin.h>
#else
#if defined(__AVX__) || defined(__AVX2__) || defined(__AVX512F__) || defined(__SSSE3__) || defined(__SSE3__)
#if !defined(__riscv)
#include <immintrin.h>
#endif
#endif
#endif
#endif
#endif
#ifdef __riscv_v_intrinsic
#include <riscv_vector.h>
#endif
#ifdef __F16C__
#ifdef _MSC_VER
#define GGML_COMPUTE_FP16_TO_FP32(x) _mm_cvtss_f32(_mm_cvtph_ps(_mm_cvtsi32_si128(x)))
#define GGML_COMPUTE_FP32_TO_FP16(x) _mm_extract_epi16(_mm_cvtps_ph(_mm_set_ss(x), 0), 0)
#else
#define GGML_COMPUTE_FP16_TO_FP32(x) _cvtsh_ss(x)
#define GGML_COMPUTE_FP32_TO_FP16(x) _cvtss_sh(x, 0)
#endif
#elif defined(__POWER9_VECTOR__)
#define GGML_COMPUTE_FP16_TO_FP32(x) ggml_compute_fp16_to_fp32(x)
#define GGML_COMPUTE_FP32_TO_FP16(x) ggml_compute_fp32_to_fp16(x)
/* the inline asm below is about 12% faster than the lookup method */
#define GGML_FP16_TO_FP32(x) GGML_COMPUTE_FP16_TO_FP32(x)
#define GGML_FP32_TO_FP16(x) GGML_COMPUTE_FP32_TO_FP16(x)
static inline float ggml_compute_fp16_to_fp32(ggml_fp16_t h) {
register float f;
register double d;
__asm__(
"mtfprd %0,%2\n"
"xscvhpdp %0,%0\n"
"frsp %1,%0\n" :
/* temp */ "=d"(d),
/* out */ "=f"(f):
/* in */ "r"(h));
return f;
}
static inline ggml_fp16_t ggml_compute_fp32_to_fp16(float f) {
register double d;
register ggml_fp16_t r;
__asm__( /* xscvdphp can work on double or single precision */
"xscvdphp %0,%2\n"
"mffprd %1,%0\n" :
/* temp */ "=d"(d),
/* out */ "=r"(r):
/* in */ "f"(f));
return r;
}
#else
// FP16 <-> FP32
// ref: https://github.com/Maratyszcza/FP16
static inline float fp32_from_bits(uint32_t w) {
union {
uint32_t as_bits;
float as_value;
} fp32;
fp32.as_bits = w;
return fp32.as_value;
}
static inline uint32_t fp32_to_bits(float f) {
union {
float as_value;
uint32_t as_bits;
} fp32;
fp32.as_value = f;
return fp32.as_bits;
}
static inline float ggml_compute_fp16_to_fp32(ggml_fp16_t h) {
const uint32_t w = (uint32_t) h << 16;
const uint32_t sign = w & UINT32_C(0x80000000);
const uint32_t two_w = w + w;
const uint32_t exp_offset = UINT32_C(0xE0) << 23;
#if defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L) || defined(__GNUC__) && !defined(__STRICT_ANSI__)
const float exp_scale = 0x1.0p-112f;
#else
const float exp_scale = fp32_from_bits(UINT32_C(0x7800000));
#endif
const float normalized_value = fp32_from_bits((two_w >> 4) + exp_offset) * exp_scale;
const uint32_t magic_mask = UINT32_C(126) << 23;
const float magic_bias = 0.5f;
const float denormalized_value = fp32_from_bits((two_w >> 17) | magic_mask) - magic_bias;
const uint32_t denormalized_cutoff = UINT32_C(1) << 27;
const uint32_t result = sign |
(two_w < denormalized_cutoff ? fp32_to_bits(denormalized_value) : fp32_to_bits(normalized_value));
return fp32_from_bits(result);
}
static inline ggml_fp16_t ggml_compute_fp32_to_fp16(float f) {
#if defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L) || defined(__GNUC__) && !defined(__STRICT_ANSI__)
const float scale_to_inf = 0x1.0p+112f;
const float scale_to_zero = 0x1.0p-110f;
#else
const float scale_to_inf = fp32_from_bits(UINT32_C(0x77800000));
const float scale_to_zero = fp32_from_bits(UINT32_C(0x08800000));
#endif
float base = (fabsf(f) * scale_to_inf) * scale_to_zero;
const uint32_t w = fp32_to_bits(f);
const uint32_t shl1_w = w + w;
const uint32_t sign = w & UINT32_C(0x80000000);
uint32_t bias = shl1_w & UINT32_C(0xFF000000);
if (bias < UINT32_C(0x71000000)) {
bias = UINT32_C(0x71000000);
}
base = fp32_from_bits((bias >> 1) + UINT32_C(0x07800000)) + base;
const uint32_t bits = fp32_to_bits(base);
const uint32_t exp_bits = (bits >> 13) & UINT32_C(0x00007C00);
const uint32_t mantissa_bits = bits & UINT32_C(0x00000FFF);
const uint32_t nonsign = exp_bits + mantissa_bits;
return (sign >> 16) | (shl1_w > UINT32_C(0xFF000000) ? UINT16_C(0x7E00) : nonsign);
}
#define GGML_COMPUTE_FP16_TO_FP32(x) ggml_compute_fp16_to_fp32(x)
#define GGML_COMPUTE_FP32_TO_FP16(x) ggml_compute_fp32_to_fp16(x)
#endif // __F16C__
#endif // __ARM_NEON
// precomputed f32 table for f16 (256 KB)
// defined in ggml.c, initialized in ggml_init()
extern float ggml_table_f32_f16[1 << 16];
// On ARM NEON, it's quicker to directly convert x -> x instead of calling into ggml_lookup_fp16_to_fp32,
// so we define GGML_FP16_TO_FP32 and GGML_FP32_TO_FP16 elsewhere for NEON.
// This is also true for POWER9.
#if !defined(GGML_FP16_TO_FP32)
inline static float ggml_lookup_fp16_to_fp32(ggml_fp16_t f) {
uint16_t s;
memcpy(&s, &f, sizeof(uint16_t));
return ggml_table_f32_f16[s];
}
#define GGML_FP16_TO_FP32(x) ggml_lookup_fp16_to_fp32(x)
#endif
#if !defined(GGML_FP32_TO_FP16)
#define GGML_FP32_TO_FP16(x) GGML_COMPUTE_FP32_TO_FP16(x)
#endif
#define GGML_HASHTABLE_FULL ((size_t)-1)
#define GGML_HASHTABLE_ALREADY_EXISTS ((size_t)-2)
struct ggml_hash_set ggml_hash_set_new(size_t size);
bool ggml_hash_contains (const struct ggml_hash_set hash_set, struct ggml_tensor * key);
// returns GGML_HASHTABLE_FULL if table is full, otherwise the current index of the key or where it should be inserted
size_t ggml_hash_find (const struct ggml_hash_set hash_set, struct ggml_tensor * key);
// returns GGML_HASHTABLE_ALREADY_EXISTS if key already exists, index otherwise, asserts if table is full
size_t ggml_hash_insert ( struct ggml_hash_set hash_set, struct ggml_tensor * key);
// return index, asserts if table is full
size_t ggml_hash_find_or_insert( struct ggml_hash_set hash_set, struct ggml_tensor * key);
#ifdef __cplusplus
}
#endif

View File

@ -1,46 +0,0 @@
#pragma once
#include "ggml.h"
#include "ggml-backend.h"
#include <stdbool.h>
#include <stddef.h>
#include <stdint.h>
#ifdef __cplusplus
extern "C" {
#endif
struct ggml_vk_device {
int index;
int type; // same as VkPhysicalDeviceType
size_t heapSize;
const char * name;
const char * vendor;
int subgroupSize;
uint64_t bufferAlignment;
uint64_t maxAlloc;
};
struct ggml_vk_device * ggml_vk_available_devices(size_t memoryRequired, size_t * count);
bool ggml_vk_get_device(struct ggml_vk_device * device, size_t memoryRequired, const char * name);
bool ggml_vk_has_vulkan(void);
bool ggml_vk_has_device(void);
struct ggml_vk_device ggml_vk_current_device(void);
//
// backend API
//
// forward declaration
typedef struct ggml_backend * ggml_backend_t;
GGML_API ggml_backend_t ggml_backend_kompute_init(int device);
GGML_API bool ggml_backend_is_kompute(ggml_backend_t backend);
GGML_API ggml_backend_buffer_type_t ggml_backend_kompute_buffer_type(int device);
#ifdef __cplusplus
}
#endif

View File

@ -1,66 +0,0 @@
// An interface allowing to compute ggml_cgraph with Metal
//
// This is a fully functional interface that extends ggml with GPU support for Apple devices.
// A similar interface can be created for other GPU backends (e.g. Vulkan, CUDA, OpenCL, etc.)
//
// How it works?
//
// As long as your program can create and evaluate a ggml_cgraph on the CPU, you can use this
// interface to evaluate the same graph on the GPU. Instead of using ggml_graph_compute(), you
// use ggml_metal_graph_compute() (or ggml_vulkan_graph_compute(), etc.)
//
// You only need to make sure that all memory buffers that you used during the graph creation
// are mapped to the device memory with the ggml_metal_add_buffer() function. This mapping is
// used during the graph evaluation to determine the arguments of the compute kernels.
//
// Synchronization between device and host memory (for example for input and output tensors)
// is done with the ggml_metal_set_tensor() and ggml_metal_get_tensor() functions.
//
#pragma once
#include "ggml.h"
#include "ggml-backend.h"
#include <stddef.h>
#include <stdbool.h>
// max memory buffers that can be mapped to the device
#define GGML_METAL_MAX_BUFFERS 64
struct ggml_tensor;
struct ggml_cgraph;
#ifdef __cplusplus
extern "C" {
#endif
//
// backend API
// user-code should use only these functions
//
GGML_API void ggml_backend_metal_log_set_callback(ggml_log_callback log_callback, void * user_data);
GGML_API ggml_backend_t ggml_backend_metal_init(void);
GGML_API bool ggml_backend_is_metal(ggml_backend_t backend);
GGML_API GGML_CALL ggml_backend_buffer_t ggml_backend_metal_buffer_from_ptr(void * data, size_t size, size_t max_size);
GGML_API void ggml_backend_metal_set_n_cb(ggml_backend_t backend, int n_cb);
GGML_API GGML_CALL ggml_backend_buffer_type_t ggml_backend_metal_buffer_type(void);
// helper to check if the device supports a specific family
// ideally, the user code should be doing these checks
// ref: https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf
GGML_API bool ggml_backend_metal_supports_family(ggml_backend_t backend, int family);
// capture all command buffers committed the next time `ggml_backend_graph_compute` is called
GGML_API void ggml_backend_metal_capture_next_compute(ggml_backend_t backend);
#ifdef __cplusplus
}
#endif

View File

@ -1,36 +0,0 @@
#pragma once
#include "ggml.h"
#include "ggml-backend.h"
#ifdef __cplusplus
extern "C" {
#endif
GGML_API void ggml_cl_init(void);
GGML_API void ggml_cl_mul(const struct ggml_tensor * src0, const struct ggml_tensor * src1, struct ggml_tensor * dst);
GGML_API void ggml_cl_add(const struct ggml_tensor * src0, const struct ggml_tensor * src1, struct ggml_tensor * dst);
GGML_API bool ggml_cl_can_mul_mat(const struct ggml_tensor * src0, const struct ggml_tensor * src1, const struct ggml_tensor * dst);
GGML_API size_t ggml_cl_mul_mat_get_wsize(const struct ggml_tensor * src0, const struct ggml_tensor * src1, struct ggml_tensor * dst);
GGML_API void ggml_cl_mul_mat(const struct ggml_tensor * src0, const struct ggml_tensor * src1, struct ggml_tensor * dst, void * wdata, size_t wsize);
// GGML_API void * ggml_cl_host_malloc(size_t size);
// GGML_API void ggml_cl_host_free(void * ptr);
GGML_API void ggml_cl_free_data(const struct ggml_tensor* tensor);
GGML_API void ggml_cl_transform_tensor(void * data, struct ggml_tensor * tensor);
// backend API
// GGML_API ggml_backend_t ggml_backend_opencl_init(void);
// GGML_API bool ggml_backend_is_opencl(ggml_backend_t backend);
GGML_API ggml_backend_buffer_type_t ggml_backend_opencl_buffer_type(void);
// GGML_API ggml_backend_buffer_type_t ggml_backend_opencl_host_buffer_type(void);
#ifdef __cplusplus
}
#endif

File diff suppressed because it is too large Load Diff

View File

@ -1,133 +0,0 @@
#pragma once
#define GGML_COMMON_DECL_C
#include "ggml-common.h"
#include "ggml.h"
// GGML internal header
#ifdef __cplusplus
extern "C" {
#endif
// Quantization
void quantize_row_q4_0_reference(const float * GGML_RESTRICT x, block_q4_0 * GGML_RESTRICT y, int64_t k);
void quantize_row_q4_1_reference(const float * GGML_RESTRICT x, block_q4_1 * GGML_RESTRICT y, int64_t k);
void quantize_row_q5_0_reference(const float * GGML_RESTRICT x, block_q5_0 * GGML_RESTRICT y, int64_t k);
void quantize_row_q5_1_reference(const float * GGML_RESTRICT x, block_q5_1 * GGML_RESTRICT y, int64_t k);
void quantize_row_q8_0_reference(const float * GGML_RESTRICT x, block_q8_0 * GGML_RESTRICT y, int64_t k);
void quantize_row_q8_1_reference(const float * GGML_RESTRICT x, block_q8_1 * GGML_RESTRICT y, int64_t k);
void quantize_row_q2_K_reference(const float * GGML_RESTRICT x, block_q2_K * GGML_RESTRICT y, int64_t k);
void quantize_row_q3_K_reference(const float * GGML_RESTRICT x, block_q3_K * GGML_RESTRICT y, int64_t k);
void quantize_row_q4_K_reference(const float * GGML_RESTRICT x, block_q4_K * GGML_RESTRICT y, int64_t k);
void quantize_row_q5_K_reference(const float * GGML_RESTRICT x, block_q5_K * GGML_RESTRICT y, int64_t k);
void quantize_row_q6_K_reference(const float * GGML_RESTRICT x, block_q6_K * GGML_RESTRICT y, int64_t k);
void quantize_row_q8_K_reference(const float * GGML_RESTRICT x, block_q8_K * GGML_RESTRICT y, int64_t k);
void quantize_row_iq3_xxs_reference(const float * GGML_RESTRICT x, block_iq3_xxs * GGML_RESTRICT y, int64_t k);
void quantize_row_iq4_nl_reference (const float * GGML_RESTRICT x, block_iq4_nl * GGML_RESTRICT y, int64_t k);
void quantize_row_iq4_xs_reference (const float * GGML_RESTRICT x, block_iq4_xs * GGML_RESTRICT y, int64_t k);
void quantize_row_iq3_s_reference (const float * GGML_RESTRICT x, block_iq3_s * GGML_RESTRICT y, int64_t k);
void quantize_row_iq2_s_reference (const float * GGML_RESTRICT x, block_iq2_s * GGML_RESTRICT y, int64_t k);
void quantize_row_q4_0(const float * GGML_RESTRICT x, void * GGML_RESTRICT y, int64_t k);
void quantize_row_q4_1(const float * GGML_RESTRICT x, void * GGML_RESTRICT y, int64_t k);
void quantize_row_q5_0(const float * GGML_RESTRICT x, void * GGML_RESTRICT y, int64_t k);
void quantize_row_q5_1(const float * GGML_RESTRICT x, void * GGML_RESTRICT y, int64_t k);
void quantize_row_q8_0(const float * GGML_RESTRICT x, void * GGML_RESTRICT y, int64_t k);
void quantize_row_q8_1(const float * GGML_RESTRICT x, void * GGML_RESTRICT y, int64_t k);
void quantize_row_q2_K(const float * GGML_RESTRICT x, void * GGML_RESTRICT y, int64_t k);
void quantize_row_q3_K(const float * GGML_RESTRICT x, void * GGML_RESTRICT y, int64_t k);
void quantize_row_q4_K(const float * GGML_RESTRICT x, void * GGML_RESTRICT y, int64_t k);
void quantize_row_q5_K(const float * GGML_RESTRICT x, void * GGML_RESTRICT y, int64_t k);
void quantize_row_q6_K(const float * GGML_RESTRICT x, void * GGML_RESTRICT y, int64_t k);
void quantize_row_q8_K(const float * GGML_RESTRICT x, void * GGML_RESTRICT y, int64_t k);
void quantize_row_iq3_xxs(const float * GGML_RESTRICT x, void * GGML_RESTRICT y, int64_t k);
void quantize_row_iq4_nl (const float * GGML_RESTRICT x, void * GGML_RESTRICT y, int64_t k);
void quantize_row_iq4_xs (const float * GGML_RESTRICT x, void * GGML_RESTRICT y, int64_t k);
void quantize_row_iq3_s (const float * GGML_RESTRICT x, void * GGML_RESTRICT y, int64_t k);
void quantize_row_iq2_s (const float * GGML_RESTRICT x, void * GGML_RESTRICT y, int64_t k);
// Dequantization
void dequantize_row_q4_0(const block_q4_0 * GGML_RESTRICT x, float * GGML_RESTRICT y, int64_t k);
void dequantize_row_q4_1(const block_q4_1 * GGML_RESTRICT x, float * GGML_RESTRICT y, int64_t k);
void dequantize_row_q5_0(const block_q5_0 * GGML_RESTRICT x, float * GGML_RESTRICT y, int64_t k);
void dequantize_row_q5_1(const block_q5_1 * GGML_RESTRICT x, float * GGML_RESTRICT y, int64_t k);
void dequantize_row_q8_0(const block_q8_0 * GGML_RESTRICT x, float * GGML_RESTRICT y, int64_t k);
//void dequantize_row_q8_1(const block_q8_1 * GGML_RESTRICT x, float * GGML_RESTRICT y, int64_t k);
void dequantize_row_q2_K(const block_q2_K * GGML_RESTRICT x, float * GGML_RESTRICT y, int64_t k);
void dequantize_row_q3_K(const block_q3_K * GGML_RESTRICT x, float * GGML_RESTRICT y, int64_t k);
void dequantize_row_q4_K(const block_q4_K * GGML_RESTRICT x, float * GGML_RESTRICT y, int64_t k);
void dequantize_row_q5_K(const block_q5_K * GGML_RESTRICT x, float * GGML_RESTRICT y, int64_t k);
void dequantize_row_q6_K(const block_q6_K * GGML_RESTRICT x, float * GGML_RESTRICT y, int64_t k);
void dequantize_row_q8_K(const block_q8_K * GGML_RESTRICT x, float * GGML_RESTRICT y, int64_t k);
void dequantize_row_iq2_xxs(const block_iq2_xxs * GGML_RESTRICT x, float * GGML_RESTRICT y, int64_t k);
void dequantize_row_iq2_xs (const block_iq2_xs * GGML_RESTRICT x, float * GGML_RESTRICT y, int64_t k);
void dequantize_row_iq2_s (const block_iq2_s * GGML_RESTRICT x, float * GGML_RESTRICT y, int64_t k);
void dequantize_row_iq3_xxs(const block_iq3_xxs * GGML_RESTRICT x, float * GGML_RESTRICT y, int64_t k);
void dequantize_row_iq1_s (const block_iq1_s * GGML_RESTRICT x, float * GGML_RESTRICT y, int64_t k);
void dequantize_row_iq1_m (const block_iq1_m * GGML_RESTRICT x, float * GGML_RESTRICT y, int64_t k);
void dequantize_row_iq4_nl (const block_iq4_nl * GGML_RESTRICT x, float * GGML_RESTRICT y, int64_t k);
void dequantize_row_iq4_xs (const block_iq4_xs * GGML_RESTRICT x, float * GGML_RESTRICT y, int64_t k);
void dequantize_row_iq3_s (const block_iq3_s * GGML_RESTRICT x, float * GGML_RESTRICT y, int64_t k);
// Dot product
void ggml_vec_dot_q4_0_q8_0(int n, float * GGML_RESTRICT s, size_t bs, const void * GGML_RESTRICT vx, size_t bx, const void * GGML_RESTRICT vy, size_t by, int nrc);
void ggml_vec_dot_q4_1_q8_1(int n, float * GGML_RESTRICT s, size_t bs, const void * GGML_RESTRICT vx, size_t bx, const void * GGML_RESTRICT vy, size_t by, int nrc);
void ggml_vec_dot_q5_0_q8_0(int n, float * GGML_RESTRICT s, size_t bs, const void * GGML_RESTRICT vx, size_t bx, const void * GGML_RESTRICT vy, size_t by, int nrc);
void ggml_vec_dot_q5_1_q8_1(int n, float * GGML_RESTRICT s, size_t bs, const void * GGML_RESTRICT vx, size_t bx, const void * GGML_RESTRICT vy, size_t by, int nrc);
void ggml_vec_dot_q8_0_q8_0(int n, float * GGML_RESTRICT s, size_t bs, const void * GGML_RESTRICT vx, size_t bx, const void * GGML_RESTRICT vy, size_t by, int nrc);
void ggml_vec_dot_q2_K_q8_K(int n, float * GGML_RESTRICT s, size_t bs, const void * GGML_RESTRICT vx, size_t bx, const void * GGML_RESTRICT vy, size_t by, int nrc);
void ggml_vec_dot_q3_K_q8_K(int n, float * GGML_RESTRICT s, size_t bs, const void * GGML_RESTRICT vx, size_t bx, const void * GGML_RESTRICT vy, size_t by, int nrc);
void ggml_vec_dot_q4_K_q8_K(int n, float * GGML_RESTRICT s, size_t bs, const void * GGML_RESTRICT vx, size_t bx, const void * GGML_RESTRICT vy, size_t by, int nrc);
void ggml_vec_dot_q5_K_q8_K(int n, float * GGML_RESTRICT s, size_t bs, const void * GGML_RESTRICT vx, size_t bx, const void * GGML_RESTRICT vy, size_t by, int nrc);
void ggml_vec_dot_q6_K_q8_K(int n, float * GGML_RESTRICT s, size_t bs, const void * GGML_RESTRICT vx, size_t bx, const void * GGML_RESTRICT vy, size_t by, int nrc);
void ggml_vec_dot_iq2_xxs_q8_K(int n, float * GGML_RESTRICT s, size_t bs, const void * GGML_RESTRICT vx, size_t bx, const void * GGML_RESTRICT vy, size_t by, int nrc);
void ggml_vec_dot_iq2_xs_q8_K (int n, float * GGML_RESTRICT s, size_t bs, const void * GGML_RESTRICT vx, size_t bx, const void * GGML_RESTRICT vy, size_t by, int nrc);
void ggml_vec_dot_iq2_s_q8_K (int n, float * GGML_RESTRICT s, size_t bs, const void * GGML_RESTRICT vx, size_t bx, const void * GGML_RESTRICT vy, size_t by, int nrc);
void ggml_vec_dot_iq3_xxs_q8_K(int n, float * GGML_RESTRICT s, size_t bs, const void * GGML_RESTRICT vx, size_t bx, const void * GGML_RESTRICT vy, size_t by, int nrc);
void ggml_vec_dot_iq1_s_q8_K (int n, float * GGML_RESTRICT s, size_t bs, const void * GGML_RESTRICT vx, size_t bx, const void * GGML_RESTRICT vy, size_t by, int nrc);
void ggml_vec_dot_iq1_m_q8_K (int n, float * GGML_RESTRICT s, size_t bs, const void * GGML_RESTRICT vx, size_t bx, const void * GGML_RESTRICT vy, size_t by, int nrc);
void ggml_vec_dot_iq4_nl_q8_0 (int n, float * GGML_RESTRICT s, size_t bs, const void * GGML_RESTRICT vx, size_t bx, const void * GGML_RESTRICT vy, size_t by, int nrc);
void ggml_vec_dot_iq4_xs_q8_K (int n, float * GGML_RESTRICT s, size_t bs, const void * GGML_RESTRICT vx, size_t bx, const void * GGML_RESTRICT vy, size_t by, int nrc);
void ggml_vec_dot_iq3_s_q8_K (int n, float * GGML_RESTRICT s, size_t bs, const void * GGML_RESTRICT vx, size_t bx, const void * GGML_RESTRICT vy, size_t by, int nrc);
// Quantization utilizing an importance matrix (a.k.a. "Activation aWare Quantization")
size_t quantize_iq2_xxs(const float * GGML_RESTRICT src, void * GGML_RESTRICT dst, int64_t nrows, int64_t n_per_row, const float * imatrix);
size_t quantize_iq2_xs (const float * GGML_RESTRICT src, void * GGML_RESTRICT dst, int64_t nrows, int64_t n_per_row, const float * imatrix);
size_t quantize_iq2_s (const float * GGML_RESTRICT src, void * GGML_RESTRICT dst, int64_t nrows, int64_t n_per_row, const float * imatrix);
size_t quantize_iq3_xxs(const float * GGML_RESTRICT src, void * GGML_RESTRICT dst, int64_t nrows, int64_t n_per_row, const float * imatrix);
size_t quantize_iq1_s (const float * GGML_RESTRICT src, void * GGML_RESTRICT dst, int64_t nrows, int64_t n_per_row, const float * imatrix);
size_t quantize_iq1_m (const float * GGML_RESTRICT src, void * GGML_RESTRICT dst, int64_t nrows, int64_t n_per_row, const float * imatrix);
size_t quantize_iq4_nl (const float * GGML_RESTRICT src, void * GGML_RESTRICT dst, int64_t nrows, int64_t n_per_row, const float * imatrix);
size_t quantize_iq4_xs (const float * GGML_RESTRICT src, void * GGML_RESTRICT dst, int64_t nrows, int64_t n_per_row, const float * imatrix);
size_t quantize_iq3_s (const float * GGML_RESTRICT src, void * GGML_RESTRICT dst, int64_t nrows, int64_t n_per_row, const float * imatrix);
size_t quantize_q2_K(const float * GGML_RESTRICT src, void * GGML_RESTRICT dst, int64_t nrows, int64_t n_per_row, const float * imatrix);
size_t quantize_q3_K(const float * GGML_RESTRICT src, void * GGML_RESTRICT dst, int64_t nrows, int64_t n_per_row, const float * imatrix);
size_t quantize_q4_K(const float * GGML_RESTRICT src, void * GGML_RESTRICT dst, int64_t nrows, int64_t n_per_row, const float * imatrix);
size_t quantize_q5_K(const float * GGML_RESTRICT src, void * GGML_RESTRICT dst, int64_t nrows, int64_t n_per_row, const float * imatrix);
size_t quantize_q6_K(const float * GGML_RESTRICT src, void * GGML_RESTRICT dst, int64_t nrows, int64_t n_per_row, const float * imatrix);
size_t quantize_q4_0(const float * GGML_RESTRICT src, void * GGML_RESTRICT dst, int64_t nrows, int64_t n_per_row, const float * imatrix);
size_t quantize_q4_1(const float * GGML_RESTRICT src, void * GGML_RESTRICT dst, int64_t nrows, int64_t n_per_row, const float * imatrix);
size_t quantize_q5_0(const float * GGML_RESTRICT src, void * GGML_RESTRICT dst, int64_t nrows, int64_t n_per_row, const float * imatrix);
size_t quantize_q5_1(const float * GGML_RESTRICT src, void * GGML_RESTRICT dst, int64_t nrows, int64_t n_per_row, const float * imatrix);
size_t quantize_q8_0(const float * GGML_RESTRICT src, void * GGML_RESTRICT dst, int64_t nrows, int64_t n_per_row, const float * imatrix);
void iq2xs_init_impl(enum ggml_type type);
void iq2xs_free_impl(enum ggml_type type);
void iq3xs_init_impl(int grid_size);
void iq3xs_free_impl(int grid_size);
#ifdef __cplusplus
}
#endif

View File

@ -1,49 +0,0 @@
//
// MIT license
// Copyright (C) 2024 Intel Corporation
// SPDX-License-Identifier: MIT
//
#pragma once
#include "ggml.h"
#include "ggml-backend.h"
#ifdef __cplusplus
extern "C" {
#endif
#define GGML_SYCL_MAX_DEVICES 48
#define GGML_SYCL_NAME "SYCL"
// backend API
GGML_API ggml_backend_t ggml_backend_sycl_init(int device);
// devide buffer
GGML_API ggml_backend_buffer_type_t ggml_backend_sycl_buffer_type(int device);
// split tensor buffer that splits matrices by rows across multiple devices
GGML_API GGML_CALL ggml_backend_buffer_type_t ggml_backend_sycl_split_buffer_type(const float * tensor_split);
// pinned host buffer for use with the CPU backend for faster copies between CPU and GPU
GGML_API ggml_backend_buffer_type_t ggml_backend_sycl_host_buffer_type(void);
GGML_API void ggml_backend_sycl_print_sycl_devices(void);
GGML_API GGML_CALL void ggml_sycl_get_gpu_list(int *id_list, int max_len);
GGML_API GGML_CALL void ggml_sycl_get_device_description(int device, char *description, size_t description_size);
GGML_API GGML_CALL int ggml_backend_sycl_get_device_count();
GGML_API GGML_CALL void ggml_backend_sycl_get_device_memory(int device, size_t *free, size_t *total);
GGML_API GGML_CALL int ggml_backend_sycl_get_device_index(int device_id);
// TODO: these are temporary
// ref: https://github.com/ggerganov/llama.cpp/pull/6022#issuecomment-1992615670
GGML_API GGML_CALL int ggml_backend_sycl_get_device_id(int device_index);
GGML_API GGML_CALL void ggml_backend_sycl_set_single_device_mode(int main_gpu_id);
GGML_API GGML_CALL void ggml_backend_sycl_set_mul_device_mode();
// SYCL doesn't support registering host memory, keep here for reference
// GGML_API GGML_CALL bool ggml_backend_sycl_register_host_buffer(void * buffer, size_t size);
// GGML_API GGML_CALL void ggml_backend_sycl_unregister_host_buffer(void * buffer);
#ifdef __cplusplus
}
#endif

View File

@ -1,29 +0,0 @@
#pragma once
#include "ggml.h"
#include "ggml-backend.h"
#ifdef __cplusplus
extern "C" {
#endif
#define GGML_VK_NAME "Vulkan"
#define GGML_VK_MAX_DEVICES 16
GGML_API void ggml_vk_instance_init(void);
// backend API
GGML_API GGML_CALL ggml_backend_t ggml_backend_vk_init(size_t dev_num);
GGML_API GGML_CALL bool ggml_backend_is_vk(ggml_backend_t backend);
GGML_API GGML_CALL int ggml_backend_vk_get_device_count(void);
GGML_API GGML_CALL void ggml_backend_vk_get_device_description(int device, char * description, size_t description_size);
GGML_API GGML_CALL void ggml_backend_vk_get_device_memory(int device, size_t * free, size_t * total);
GGML_API GGML_CALL ggml_backend_buffer_type_t ggml_backend_vk_buffer_type(size_t dev_num);
// pinned host buffer for use with the CPU backend for faster copies between CPU and GPU
GGML_API GGML_CALL ggml_backend_buffer_type_t ggml_backend_vk_host_buffer_type(void);
#ifdef __cplusplus
}
#endif

View File

@ -0,0 +1,17 @@
ggml/src/ggml-metal/ggml-metal-embed.o: \
ggml/src/ggml-metal/ggml-metal.metal \
ggml/src/ggml-metal/ggml-metal-impl.h \
ggml/src/ggml-common.h
@echo "Embedding Metal library"
@sed -e '/__embed_ggml-common.h__/r ggml/src/ggml-common.h' -e '/__embed_ggml-common.h__/d' < ggml/src/ggml-metal/ggml-metal.metal > ggml/src/ggml-metal/ggml-metal-embed.metal.tmp
@sed -e '/#include "ggml-metal-impl.h"/r ggml/src/ggml-metal/ggml-metal-impl.h' -e '/#include "ggml-metal-impl.h"/d' < ggml/src/ggml-metal/ggml-metal-embed.metal.tmp > ggml/src/ggml-metal/ggml-metal-embed.metal
$(eval TEMP_ASSEMBLY=$(shell mktemp -d))
@echo ".section __DATA, __ggml_metallib" > $(TEMP_ASSEMBLY)/ggml-metal-embed.s
@echo ".globl _ggml_metallib_start" >> $(TEMP_ASSEMBLY)/ggml-metal-embed.s
@echo "_ggml_metallib_start:" >> $(TEMP_ASSEMBLY)/ggml-metal-embed.s
@echo ".incbin \"ggml/src/ggml-metal/ggml-metal-embed.metal\"" >> $(TEMP_ASSEMBLY)/ggml-metal-embed.s
@echo ".globl _ggml_metallib_end" >> $(TEMP_ASSEMBLY)/ggml-metal-embed.s
@echo "_ggml_metallib_end:" >> $(TEMP_ASSEMBLY)/ggml-metal-embed.s
$(CC) $(CFLAGS) -c $(TEMP_ASSEMBLY)/ggml-metal-embed.s -o $@
@rm -f ${TEMP_ASSEMBLY}/ggml-metal-embed.s
@rmdir ${TEMP_ASSEMBLY}

View File

@ -0,0 +1,6 @@
ggml/src/ggml-metal/ggml-metal.o: \
ggml/src/ggml-metal/ggml-metal.m \
ggml/src/ggml-metal/ggml-metal-impl.h \
ggml/include/ggml-metal.h \
ggml/include/ggml.h
$(CC) $(CFLAGS) -c $< -o $@

View File

@ -0,0 +1,164 @@
#include <ruby.h>
#include <ruby/memory_view.h>
#include "ruby_whisper.h"
VALUE mWhisper;
VALUE cContext;
VALUE cParams;
VALUE eError;
VALUE cSegment;
VALUE cModel;
ID id_to_s;
ID id_call;
ID id___method__;
ID id_to_enum;
ID id_length;
ID id_next;
ID id_new;
ID id_to_path;
ID id_URI;
ID id_pre_converted_models;
static bool is_log_callback_finalized = false;
// High level API
extern VALUE ruby_whisper_segment_allocate(VALUE klass);
extern void init_ruby_whisper_context(VALUE *mWhisper);
extern void init_ruby_whisper_params(VALUE *mWhisper);
extern void init_ruby_whisper_error(VALUE *mWhisper);
extern void init_ruby_whisper_segment(VALUE *mWhisper, VALUE *cSegment);
extern void init_ruby_whisper_model(VALUE *mWhisper);
extern void register_callbacks(ruby_whisper_params *rwp, VALUE *context);
/*
* call-seq:
* lang_max_id -> Integer
*/
static VALUE ruby_whisper_s_lang_max_id(VALUE self) {
return INT2NUM(whisper_lang_max_id());
}
/*
* call-seq:
* lang_id(lang_name) -> Integer
*/
static VALUE ruby_whisper_s_lang_id(VALUE self, VALUE lang) {
const char * lang_str = StringValueCStr(lang);
const int id = whisper_lang_id(lang_str);
if (-1 == id) {
rb_raise(rb_eArgError, "language not found: %s", lang_str);
}
return INT2NUM(id);
}
/*
* call-seq:
* lang_str(lang_id) -> String
*/
static VALUE ruby_whisper_s_lang_str(VALUE self, VALUE id) {
const int lang_id = NUM2INT(id);
const char * str = whisper_lang_str(lang_id);
if (NULL == str) {
rb_raise(rb_eIndexError, "id %d outside of language id", lang_id);
}
return rb_str_new2(str);
}
/*
* call-seq:
* lang_str(lang_id) -> String
*/
static VALUE ruby_whisper_s_lang_str_full(VALUE self, VALUE id) {
const int lang_id = NUM2INT(id);
const char * str_full = whisper_lang_str_full(lang_id);
if (NULL == str_full) {
rb_raise(rb_eIndexError, "id %d outside of language id", lang_id);
}
return rb_str_new2(str_full);
}
static VALUE ruby_whisper_s_finalize_log_callback(VALUE self, VALUE id) {
is_log_callback_finalized = true;
return Qnil;
}
static void
ruby_whisper_log_callback(enum ggml_log_level level, const char * buffer, void * user_data) {
if (is_log_callback_finalized) {
return;
}
VALUE log_callback = rb_iv_get(mWhisper, "log_callback");
VALUE udata = rb_iv_get(mWhisper, "user_data");
rb_funcall(log_callback, id_call, 3, INT2NUM(level), rb_str_new2(buffer), udata);
}
/*
* call-seq:
* log_set ->(level, buffer, user_data) { ... }, user_data -> nil
*/
static VALUE ruby_whisper_s_log_set(VALUE self, VALUE log_callback, VALUE user_data) {
VALUE old_callback = rb_iv_get(self, "log_callback");
if (!NIL_P(old_callback)) {
rb_undefine_finalizer(old_callback);
}
rb_iv_set(self, "log_callback", log_callback);
rb_iv_set(self, "user_data", user_data);
VALUE finalize_log_callback = rb_funcall(mWhisper, rb_intern("method"), 1, rb_str_new2("finalize_log_callback"));
rb_define_finalizer(log_callback, finalize_log_callback);
whisper_log_set(ruby_whisper_log_callback, NULL);
return Qnil;
}
static void rb_whisper_model_mark(ruby_whisper_model *rwm) {
rb_gc_mark(rwm->context);
}
static VALUE ruby_whisper_model_allocate(VALUE klass) {
ruby_whisper_model *rwm;
rwm = ALLOC(ruby_whisper_model);
return Data_Wrap_Struct(klass, rb_whisper_model_mark, RUBY_DEFAULT_FREE, rwm);
}
void Init_whisper() {
id_to_s = rb_intern("to_s");
id_call = rb_intern("call");
id___method__ = rb_intern("__method__");
id_to_enum = rb_intern("to_enum");
id_length = rb_intern("length");
id_next = rb_intern("next");
id_new = rb_intern("new");
id_to_path = rb_intern("to_path");
id_URI = rb_intern("URI");
id_pre_converted_models = rb_intern("pre_converted_models");
mWhisper = rb_define_module("Whisper");
rb_define_const(mWhisper, "LOG_LEVEL_NONE", INT2NUM(GGML_LOG_LEVEL_NONE));
rb_define_const(mWhisper, "LOG_LEVEL_INFO", INT2NUM(GGML_LOG_LEVEL_INFO));
rb_define_const(mWhisper, "LOG_LEVEL_WARN", INT2NUM(GGML_LOG_LEVEL_WARN));
rb_define_const(mWhisper, "LOG_LEVEL_ERROR", INT2NUM(GGML_LOG_LEVEL_ERROR));
rb_define_const(mWhisper, "LOG_LEVEL_DEBUG", INT2NUM(GGML_LOG_LEVEL_DEBUG));
rb_define_const(mWhisper, "LOG_LEVEL_CONT", INT2NUM(GGML_LOG_LEVEL_CONT));
rb_define_singleton_method(mWhisper, "lang_max_id", ruby_whisper_s_lang_max_id, 0);
rb_define_singleton_method(mWhisper, "lang_id", ruby_whisper_s_lang_id, 1);
rb_define_singleton_method(mWhisper, "lang_str", ruby_whisper_s_lang_str, 1);
rb_define_singleton_method(mWhisper, "lang_str_full", ruby_whisper_s_lang_str_full, 1);
rb_define_singleton_method(mWhisper, "log_set", ruby_whisper_s_log_set, 2);
rb_define_private_method(rb_singleton_class(mWhisper), "finalize_log_callback", ruby_whisper_s_finalize_log_callback, 1);
init_ruby_whisper_context(&mWhisper);
init_ruby_whisper_params(&mWhisper);
init_ruby_whisper_error(&mWhisper);
init_ruby_whisper_segment(&mWhisper, &cContext);
init_ruby_whisper_model(&mWhisper);
rb_require("whisper/model/uri");
}

View File

@ -1,426 +0,0 @@
#include <ruby.h>
#include "ruby_whisper.h"
#define DR_WAV_IMPLEMENTATION
#include "dr_wav.h"
#include <cmath>
#include <fstream>
#include <cstdio>
#include <string>
#include <thread>
#include <vector>
#ifdef __cplusplus
extern "C" {
#endif
#define BOOL_PARAMS_SETTER(self, prop, value) \
ruby_whisper_params *rwp; \
Data_Get_Struct(self, ruby_whisper_params, rwp); \
if (value == Qfalse || value == Qnil) { \
rwp->params.prop = false; \
} else { \
rwp->params.prop = true; \
} \
return value; \
#define BOOL_PARAMS_GETTER(self, prop) \
ruby_whisper_params *rwp; \
Data_Get_Struct(self, ruby_whisper_params, rwp); \
if (rwp->params.prop) { \
return Qtrue; \
} else { \
return Qfalse; \
}
VALUE mWhisper;
VALUE cContext;
VALUE cParams;
static void ruby_whisper_free(ruby_whisper *rw) {
if (rw->context) {
whisper_free(rw->context);
rw->context = NULL;
}
}
static void ruby_whisper_params_free(ruby_whisper_params *rwp) {
}
void rb_whisper_mark(ruby_whisper *rw) {
// call rb_gc_mark on any ruby references in rw
}
void rb_whisper_free(ruby_whisper *rw) {
ruby_whisper_free(rw);
free(rw);
}
void rb_whisper_params_mark(ruby_whisper_params *rwp) {
}
void rb_whisper_params_free(ruby_whisper_params *rwp) {
ruby_whisper_params_free(rwp);
free(rwp);
}
static VALUE ruby_whisper_allocate(VALUE klass) {
ruby_whisper *rw;
rw = ALLOC(ruby_whisper);
rw->context = NULL;
return Data_Wrap_Struct(klass, rb_whisper_mark, rb_whisper_free, rw);
}
static VALUE ruby_whisper_params_allocate(VALUE klass) {
ruby_whisper_params *rwp;
rwp = ALLOC(ruby_whisper_params);
rwp->params = whisper_full_default_params(WHISPER_SAMPLING_GREEDY);
return Data_Wrap_Struct(klass, rb_whisper_params_mark, rb_whisper_params_free, rwp);
}
static VALUE ruby_whisper_initialize(int argc, VALUE *argv, VALUE self) {
ruby_whisper *rw;
VALUE whisper_model_file_path;
// TODO: we can support init from buffer here too maybe another ruby object to expose
rb_scan_args(argc, argv, "01", &whisper_model_file_path);
Data_Get_Struct(self, ruby_whisper, rw);
if (!rb_respond_to(whisper_model_file_path, rb_intern("to_s"))) {
rb_raise(rb_eRuntimeError, "Expected file path to model to initialize Whisper::Context");
}
rw->context = whisper_init_from_file_with_params(StringValueCStr(whisper_model_file_path), whisper_context_default_params());
if (rw->context == nullptr) {
rb_raise(rb_eRuntimeError, "error: failed to initialize whisper context");
}
return self;
}
/*
* transcribe a single file
* can emit to a block results
*
**/
static VALUE ruby_whisper_transcribe(int argc, VALUE *argv, VALUE self) {
ruby_whisper *rw;
ruby_whisper_params *rwp;
VALUE wave_file_path, blk, params;
rb_scan_args(argc, argv, "02&", &wave_file_path, &params, &blk);
Data_Get_Struct(self, ruby_whisper, rw);
Data_Get_Struct(params, ruby_whisper_params, rwp);
if (!rb_respond_to(wave_file_path, rb_intern("to_s"))) {
rb_raise(rb_eRuntimeError, "Expected file path to wave file");
}
std::string fname_inp = StringValueCStr(wave_file_path);
std::vector<float> pcmf32; // mono-channel F32 PCM
std::vector<std::vector<float>> pcmf32s; // stereo-channel F32 PCM
// WAV input - this is directly from main.cpp example
{
drwav wav;
std::vector<uint8_t> wav_data; // used for pipe input from stdin
if (fname_inp == "-") {
{
uint8_t buf[1024];
while (true) {
const size_t n = fread(buf, 1, sizeof(buf), stdin);
if (n == 0) {
break;
}
wav_data.insert(wav_data.end(), buf, buf + n);
}
}
if (drwav_init_memory(&wav, wav_data.data(), wav_data.size(), nullptr) == false) {
fprintf(stderr, "error: failed to open WAV file from stdin\n");
return self;
}
fprintf(stderr, "%s: read %zu bytes from stdin\n", __func__, wav_data.size());
} else if (drwav_init_file(&wav, fname_inp.c_str(), nullptr) == false) {
fprintf(stderr, "error: failed to open '%s' as WAV file\n", fname_inp.c_str());
return self;
}
if (wav.channels != 1 && wav.channels != 2) {
fprintf(stderr, "WAV file '%s' must be mono or stereo\n", fname_inp.c_str());
return self;
}
if (rwp->diarize && wav.channels != 2 && rwp->params.print_timestamps == false) {
fprintf(stderr, "WAV file '%s' must be stereo for diarization and timestamps have to be enabled\n", fname_inp.c_str());
return self;
}
if (wav.sampleRate != WHISPER_SAMPLE_RATE) {
fprintf(stderr, "WAV file '%s' must be %i kHz\n", fname_inp.c_str(), WHISPER_SAMPLE_RATE/1000);
return self;
}
if (wav.bitsPerSample != 16) {
fprintf(stderr, "WAV file '%s' must be 16-bit\n", fname_inp.c_str());
return self;
}
const uint64_t n = wav_data.empty() ? wav.totalPCMFrameCount : wav_data.size()/(wav.channels*wav.bitsPerSample/8);
std::vector<int16_t> pcm16;
pcm16.resize(n*wav.channels);
drwav_read_pcm_frames_s16(&wav, n, pcm16.data());
drwav_uninit(&wav);
// convert to mono, float
pcmf32.resize(n);
if (wav.channels == 1) {
for (uint64_t i = 0; i < n; i++) {
pcmf32[i] = float(pcm16[i])/32768.0f;
}
} else {
for (uint64_t i = 0; i < n; i++) {
pcmf32[i] = float(pcm16[2*i] + pcm16[2*i + 1])/65536.0f;
}
}
if (rwp->diarize) {
// convert to stereo, float
pcmf32s.resize(2);
pcmf32s[0].resize(n);
pcmf32s[1].resize(n);
for (uint64_t i = 0; i < n; i++) {
pcmf32s[0][i] = float(pcm16[2*i])/32768.0f;
pcmf32s[1][i] = float(pcm16[2*i + 1])/32768.0f;
}
}
}
{
static bool is_aborted = false; // NOTE: this should be atomic to avoid data race
rwp->params.encoder_begin_callback = [](struct whisper_context * /*ctx*/, struct whisper_state * /*state*/, void * user_data) {
bool is_aborted = *(bool*)user_data;
return !is_aborted;
};
rwp->params.encoder_begin_callback_user_data = &is_aborted;
}
if (whisper_full_parallel(rw->context, rwp->params, pcmf32.data(), pcmf32.size(), 1) != 0) {
fprintf(stderr, "failed to process audio\n");
return self;
}
const int n_segments = whisper_full_n_segments(rw->context);
VALUE output = rb_str_new2("");
for (int i = 0; i < n_segments; ++i) {
const char * text = whisper_full_get_segment_text(rw->context, i);
output = rb_str_concat(output, rb_str_new2(text));
}
VALUE idCall = rb_intern("call");
if (blk != Qnil) {
rb_funcall(blk, idCall, 1, output);
}
return self;
}
/*
* params.language = "auto" | "en", etc...
*/
static VALUE ruby_whisper_params_set_language(VALUE self, VALUE value) {
ruby_whisper_params *rwp;
Data_Get_Struct(self, ruby_whisper_params, rwp);
if (value == Qfalse || value == Qnil) {
rwp->params.language = "auto";
} else {
rwp->params.language = StringValueCStr(value);
}
return value;
}
static VALUE ruby_whisper_params_get_language(VALUE self) {
ruby_whisper_params *rwp;
Data_Get_Struct(self, ruby_whisper_params, rwp);
if (rwp->params.language) {
return rb_str_new2(rwp->params.language);
} else {
return rb_str_new2("auto");
}
}
static VALUE ruby_whisper_params_set_translate(VALUE self, VALUE value) {
BOOL_PARAMS_SETTER(self, translate, value)
}
static VALUE ruby_whisper_params_get_translate(VALUE self) {
BOOL_PARAMS_GETTER(self, translate)
}
static VALUE ruby_whisper_params_set_no_context(VALUE self, VALUE value) {
BOOL_PARAMS_SETTER(self, no_context, value)
}
static VALUE ruby_whisper_params_get_no_context(VALUE self) {
BOOL_PARAMS_GETTER(self, no_context)
}
static VALUE ruby_whisper_params_set_single_segment(VALUE self, VALUE value) {
BOOL_PARAMS_SETTER(self, single_segment, value)
}
static VALUE ruby_whisper_params_get_single_segment(VALUE self) {
BOOL_PARAMS_GETTER(self, single_segment)
}
static VALUE ruby_whisper_params_set_print_special(VALUE self, VALUE value) {
BOOL_PARAMS_SETTER(self, print_special, value)
}
static VALUE ruby_whisper_params_get_print_special(VALUE self) {
BOOL_PARAMS_GETTER(self, print_special)
}
static VALUE ruby_whisper_params_set_print_progress(VALUE self, VALUE value) {
BOOL_PARAMS_SETTER(self, print_progress, value)
}
static VALUE ruby_whisper_params_get_print_progress(VALUE self) {
BOOL_PARAMS_GETTER(self, print_progress)
}
static VALUE ruby_whisper_params_set_print_realtime(VALUE self, VALUE value) {
BOOL_PARAMS_SETTER(self, print_realtime, value)
}
static VALUE ruby_whisper_params_get_print_realtime(VALUE self) {
BOOL_PARAMS_GETTER(self, print_realtime)
}
static VALUE ruby_whisper_params_set_print_timestamps(VALUE self, VALUE value) {
BOOL_PARAMS_SETTER(self, print_timestamps, value)
}
static VALUE ruby_whisper_params_get_print_timestamps(VALUE self) {
BOOL_PARAMS_GETTER(self, print_timestamps)
}
static VALUE ruby_whisper_params_set_suppress_blank(VALUE self, VALUE value) {
BOOL_PARAMS_SETTER(self, suppress_blank, value)
}
static VALUE ruby_whisper_params_get_suppress_blank(VALUE self) {
BOOL_PARAMS_GETTER(self, suppress_blank)
}
static VALUE ruby_whisper_params_set_suppress_non_speech_tokens(VALUE self, VALUE value) {
BOOL_PARAMS_SETTER(self, suppress_non_speech_tokens, value)
}
static VALUE ruby_whisper_params_get_suppress_non_speech_tokens(VALUE self) {
BOOL_PARAMS_GETTER(self, suppress_non_speech_tokens)
}
static VALUE ruby_whisper_params_get_token_timestamps(VALUE self) {
BOOL_PARAMS_GETTER(self, token_timestamps)
}
static VALUE ruby_whisper_params_set_token_timestamps(VALUE self, VALUE value) {
BOOL_PARAMS_SETTER(self, token_timestamps, value)
}
static VALUE ruby_whisper_params_get_split_on_word(VALUE self) {
BOOL_PARAMS_GETTER(self, split_on_word)
}
static VALUE ruby_whisper_params_set_split_on_word(VALUE self, VALUE value) {
BOOL_PARAMS_SETTER(self, split_on_word, value)
}
static VALUE ruby_whisper_params_get_speed_up(VALUE self) {
BOOL_PARAMS_GETTER(self, speed_up)
}
static VALUE ruby_whisper_params_set_speed_up(VALUE self, VALUE value) {
BOOL_PARAMS_SETTER(self, speed_up, value)
}
static VALUE ruby_whisper_params_get_diarize(VALUE self) {
ruby_whisper_params *rwp;
Data_Get_Struct(self, ruby_whisper_params, rwp);
if (rwp->diarize) {
return Qtrue;
} else {
return Qfalse;
}
}
static VALUE ruby_whisper_params_set_diarize(VALUE self, VALUE value) {
ruby_whisper_params *rwp;
Data_Get_Struct(self, ruby_whisper_params, rwp);
if (value == Qfalse || value == Qnil) {
rwp->diarize = false;
} else {
rwp->diarize = true;
} \
return value;
}
static VALUE ruby_whisper_params_get_offset(VALUE self) {
ruby_whisper_params *rwp;
Data_Get_Struct(self, ruby_whisper_params, rwp);
return INT2NUM(rwp->params.offset_ms);
}
static VALUE ruby_whisper_params_set_offset(VALUE self, VALUE value) {
ruby_whisper_params *rwp;
Data_Get_Struct(self, ruby_whisper_params, rwp);
rwp->params.offset_ms = NUM2INT(value);
return value;
}
static VALUE ruby_whisper_params_get_duration(VALUE self) {
ruby_whisper_params *rwp;
Data_Get_Struct(self, ruby_whisper_params, rwp);
return INT2NUM(rwp->params.duration_ms);
}
static VALUE ruby_whisper_params_set_duration(VALUE self, VALUE value) {
ruby_whisper_params *rwp;
Data_Get_Struct(self, ruby_whisper_params, rwp);
rwp->params.duration_ms = NUM2INT(value);
return value;
}
static VALUE ruby_whisper_params_get_max_text_tokens(VALUE self) {
ruby_whisper_params *rwp;
Data_Get_Struct(self, ruby_whisper_params, rwp);
return INT2NUM(rwp->params.n_max_text_ctx);
}
static VALUE ruby_whisper_params_set_max_text_tokens(VALUE self, VALUE value) {
ruby_whisper_params *rwp;
Data_Get_Struct(self, ruby_whisper_params, rwp);
rwp->params.n_max_text_ctx = NUM2INT(value);
return value;
}
void Init_whisper() {
mWhisper = rb_define_module("Whisper");
cContext = rb_define_class_under(mWhisper, "Context", rb_cObject);
cParams = rb_define_class_under(mWhisper, "Params", rb_cObject);
rb_define_alloc_func(cContext, ruby_whisper_allocate);
rb_define_method(cContext, "initialize", ruby_whisper_initialize, -1);
rb_define_method(cContext, "transcribe", ruby_whisper_transcribe, -1);
rb_define_alloc_func(cParams, ruby_whisper_params_allocate);
rb_define_method(cParams, "language=", ruby_whisper_params_set_language, 1);
rb_define_method(cParams, "language", ruby_whisper_params_get_language, 0);
rb_define_method(cParams, "translate=", ruby_whisper_params_set_translate, 1);
rb_define_method(cParams, "translate", ruby_whisper_params_get_translate, 0);
rb_define_method(cParams, "no_context=", ruby_whisper_params_set_no_context, 1);
rb_define_method(cParams, "no_context", ruby_whisper_params_get_no_context, 0);
rb_define_method(cParams, "single_segment=", ruby_whisper_params_set_single_segment, 1);
rb_define_method(cParams, "single_segment", ruby_whisper_params_get_single_segment, 0);
rb_define_method(cParams, "print_special", ruby_whisper_params_get_print_special, 0);
rb_define_method(cParams, "print_special=", ruby_whisper_params_set_print_special, 1);
rb_define_method(cParams, "print_progress", ruby_whisper_params_get_print_progress, 0);
rb_define_method(cParams, "print_progress=", ruby_whisper_params_set_print_progress, 1);
rb_define_method(cParams, "print_realtime", ruby_whisper_params_get_print_realtime, 0);
rb_define_method(cParams, "print_realtime=", ruby_whisper_params_set_print_realtime, 1);
rb_define_method(cParams, "print_timestamps", ruby_whisper_params_get_print_timestamps, 0);
rb_define_method(cParams, "print_timestamps=", ruby_whisper_params_set_print_timestamps, 1);
rb_define_method(cParams, "suppress_blank", ruby_whisper_params_get_suppress_blank, 0);
rb_define_method(cParams, "suppress_blank=", ruby_whisper_params_set_suppress_blank, 1);
rb_define_method(cParams, "suppress_non_speech_tokens", ruby_whisper_params_get_suppress_non_speech_tokens, 0);
rb_define_method(cParams, "suppress_non_speech_tokens=", ruby_whisper_params_set_suppress_non_speech_tokens, 1);
rb_define_method(cParams, "token_timestamps", ruby_whisper_params_get_token_timestamps, 0);
rb_define_method(cParams, "token_timestamps=", ruby_whisper_params_set_token_timestamps, 1);
rb_define_method(cParams, "split_on_word", ruby_whisper_params_get_split_on_word, 0);
rb_define_method(cParams, "split_on_word=", ruby_whisper_params_set_split_on_word, 1);
rb_define_method(cParams, "speed_up", ruby_whisper_params_get_speed_up, 0);
rb_define_method(cParams, "speed_up=", ruby_whisper_params_set_speed_up, 1);
rb_define_method(cParams, "diarize", ruby_whisper_params_get_diarize, 0);
rb_define_method(cParams, "diarize=", ruby_whisper_params_set_diarize, 1);
rb_define_method(cParams, "offset", ruby_whisper_params_get_offset, 0);
rb_define_method(cParams, "offset=", ruby_whisper_params_set_offset, 1);
rb_define_method(cParams, "duration", ruby_whisper_params_get_duration, 0);
rb_define_method(cParams, "duration=", ruby_whisper_params_set_duration, 1);
rb_define_method(cParams, "max_text_tokens", ruby_whisper_params_get_max_text_tokens, 0);
rb_define_method(cParams, "max_text_tokens=", ruby_whisper_params_set_max_text_tokens, 1);
}
#ifdef __cplusplus
}
#endif

View File

@ -1,8 +1,15 @@
#ifndef __RUBY_WHISPER_H
#define __RUBY_WHISPER_H
#ifndef RUBY_WHISPER_H
#define RUBY_WHISPER_H
#include "whisper.h"
typedef struct {
VALUE *context;
VALUE user_data;
VALUE callback;
VALUE callbacks;
} ruby_whisper_callback_container;
typedef struct {
struct whisper_context *context;
} ruby_whisper;
@ -10,6 +17,18 @@ typedef struct {
typedef struct {
struct whisper_full_params params;
bool diarize;
ruby_whisper_callback_container *new_segment_callback_container;
ruby_whisper_callback_container *progress_callback_container;
ruby_whisper_callback_container *abort_callback_container;
} ruby_whisper_params;
typedef struct {
VALUE context;
int index;
} ruby_whisper_segment;
typedef struct {
VALUE context;
} ruby_whisper_model;
#endif

View File

@ -0,0 +1,613 @@
#include <ruby.h>
#include <ruby/memory_view.h>
#include "ruby_whisper.h"
extern ID id_to_s;
extern ID id___method__;
extern ID id_to_enum;
extern ID id_length;
extern ID id_next;
extern ID id_new;
extern ID id_to_path;
extern ID id_URI;
extern ID id_pre_converted_models;
extern VALUE cContext;
extern VALUE eError;
extern VALUE cModel;
extern VALUE ruby_whisper_transcribe(int argc, VALUE *argv, VALUE self);
extern VALUE rb_whisper_model_initialize(VALUE context);
extern VALUE rb_whisper_segment_initialize(VALUE context, int index);
extern void register_callbacks(ruby_whisper_params *rwp, VALUE *context);
static void
ruby_whisper_free(ruby_whisper *rw)
{
if (rw->context) {
whisper_free(rw->context);
rw->context = NULL;
}
}
void
rb_whisper_mark(ruby_whisper *rw)
{
// call rb_gc_mark on any ruby references in rw
}
void
rb_whisper_free(ruby_whisper *rw)
{
ruby_whisper_free(rw);
free(rw);
}
static VALUE
ruby_whisper_allocate(VALUE klass)
{
ruby_whisper *rw;
rw = ALLOC(ruby_whisper);
rw->context = NULL;
return Data_Wrap_Struct(klass, rb_whisper_mark, rb_whisper_free, rw);
}
/*
* call-seq:
* new("base.en") -> Whisper::Context
* new("path/to/model.bin") -> Whisper::Context
* new(Whisper::Model::URI.new("https://example.net/uri/of/model.bin")) -> Whisper::Context
*/
static VALUE
ruby_whisper_initialize(int argc, VALUE *argv, VALUE self)
{
ruby_whisper *rw;
VALUE whisper_model_file_path;
// TODO: we can support init from buffer here too maybe another ruby object to expose
rb_scan_args(argc, argv, "01", &whisper_model_file_path);
Data_Get_Struct(self, ruby_whisper, rw);
VALUE pre_converted_models = rb_funcall(cModel, id_pre_converted_models, 0);
VALUE pre_converted_model = rb_hash_aref(pre_converted_models, whisper_model_file_path);
if (!NIL_P(pre_converted_model)) {
whisper_model_file_path = pre_converted_model;
}
if (TYPE(whisper_model_file_path) == T_STRING) {
const char * whisper_model_file_path_str = StringValueCStr(whisper_model_file_path);
if (strncmp("http://", whisper_model_file_path_str, 7) == 0 || strncmp("https://", whisper_model_file_path_str, 8) == 0) {
VALUE uri_class = rb_const_get(cModel, id_URI);
whisper_model_file_path = rb_class_new_instance(1, &whisper_model_file_path, uri_class);
}
}
if (rb_obj_is_kind_of(whisper_model_file_path, rb_path2class("URI::HTTP"))) {
VALUE uri_class = rb_const_get(cModel, id_URI);
whisper_model_file_path = rb_class_new_instance(1, &whisper_model_file_path, uri_class);
}
if (rb_respond_to(whisper_model_file_path, id_to_path)) {
whisper_model_file_path = rb_funcall(whisper_model_file_path, id_to_path, 0);
}
if (!rb_respond_to(whisper_model_file_path, id_to_s)) {
rb_raise(rb_eRuntimeError, "Expected file path to model to initialize Whisper::Context");
}
rw->context = whisper_init_from_file_with_params(StringValueCStr(whisper_model_file_path), whisper_context_default_params());
if (rw->context == NULL) {
rb_raise(rb_eRuntimeError, "error: failed to initialize whisper context");
}
return self;
}
/*
* call-seq:
* model_n_vocab -> Integer
*/
VALUE ruby_whisper_model_n_vocab(VALUE self)
{
ruby_whisper *rw;
Data_Get_Struct(self, ruby_whisper, rw);
return INT2NUM(whisper_model_n_vocab(rw->context));
}
/*
* call-seq:
* model_n_audio_ctx -> Integer
*/
VALUE ruby_whisper_model_n_audio_ctx(VALUE self)
{
ruby_whisper *rw;
Data_Get_Struct(self, ruby_whisper, rw);
return INT2NUM(whisper_model_n_audio_ctx(rw->context));
}
/*
* call-seq:
* model_n_audio_state -> Integer
*/
VALUE ruby_whisper_model_n_audio_state(VALUE self)
{
ruby_whisper *rw;
Data_Get_Struct(self, ruby_whisper, rw);
return INT2NUM(whisper_model_n_audio_state(rw->context));
}
/*
* call-seq:
* model_n_audio_head -> Integer
*/
VALUE ruby_whisper_model_n_audio_head(VALUE self)
{
ruby_whisper *rw;
Data_Get_Struct(self, ruby_whisper, rw);
return INT2NUM(whisper_model_n_audio_head(rw->context));
}
/*
* call-seq:
* model_n_audio_layer -> Integer
*/
VALUE ruby_whisper_model_n_audio_layer(VALUE self)
{
ruby_whisper *rw;
Data_Get_Struct(self, ruby_whisper, rw);
return INT2NUM(whisper_model_n_audio_layer(rw->context));
}
/*
* call-seq:
* model_n_text_ctx -> Integer
*/
VALUE ruby_whisper_model_n_text_ctx(VALUE self)
{
ruby_whisper *rw;
Data_Get_Struct(self, ruby_whisper, rw);
return INT2NUM(whisper_model_n_text_ctx(rw->context));
}
/*
* call-seq:
* model_n_text_state -> Integer
*/
VALUE ruby_whisper_model_n_text_state(VALUE self)
{
ruby_whisper *rw;
Data_Get_Struct(self, ruby_whisper, rw);
return INT2NUM(whisper_model_n_text_state(rw->context));
}
/*
* call-seq:
* model_n_text_head -> Integer
*/
VALUE ruby_whisper_model_n_text_head(VALUE self)
{
ruby_whisper *rw;
Data_Get_Struct(self, ruby_whisper, rw);
return INT2NUM(whisper_model_n_text_head(rw->context));
}
/*
* call-seq:
* model_n_text_layer -> Integer
*/
VALUE ruby_whisper_model_n_text_layer(VALUE self)
{
ruby_whisper *rw;
Data_Get_Struct(self, ruby_whisper, rw);
return INT2NUM(whisper_model_n_text_layer(rw->context));
}
/*
* call-seq:
* model_n_mels -> Integer
*/
VALUE ruby_whisper_model_n_mels(VALUE self)
{
ruby_whisper *rw;
Data_Get_Struct(self, ruby_whisper, rw);
return INT2NUM(whisper_model_n_mels(rw->context));
}
/*
* call-seq:
* model_ftype -> Integer
*/
VALUE ruby_whisper_model_ftype(VALUE self)
{
ruby_whisper *rw;
Data_Get_Struct(self, ruby_whisper, rw);
return INT2NUM(whisper_model_ftype(rw->context));
}
/*
* call-seq:
* model_type -> String
*/
VALUE ruby_whisper_model_type(VALUE self)
{
ruby_whisper *rw;
Data_Get_Struct(self, ruby_whisper, rw);
return rb_str_new2(whisper_model_type_readable(rw->context));
}
/*
* Run the entire model: PCM -> log mel spectrogram -> encoder -> decoder -> text
* Not thread safe for same context
* Uses the specified decoding strategy to obtain the text.
*
* call-seq:
* full(params, samples, n_samples) -> nil
* full(params, samples) -> nil
*
* The second argument +samples+ must be an array of samples, respond to :length, or be a MemoryView of an array of float. It must be 32 bit float PCM audio data.
*/
VALUE ruby_whisper_full(int argc, VALUE *argv, VALUE self)
{
if (argc < 2 || argc > 3) {
rb_raise(rb_eArgError, "wrong number of arguments (given %d, expected 2..3)", argc);
}
ruby_whisper *rw;
ruby_whisper_params *rwp;
Data_Get_Struct(self, ruby_whisper, rw);
VALUE params = argv[0];
Data_Get_Struct(params, ruby_whisper_params, rwp);
VALUE samples = argv[1];
int n_samples;
rb_memory_view_t view;
const bool memory_view_available_p = rb_memory_view_available_p(samples);
if (argc == 3) {
n_samples = NUM2INT(argv[2]);
if (TYPE(samples) == T_ARRAY) {
if (RARRAY_LEN(samples) < n_samples) {
rb_raise(rb_eArgError, "samples length %ld is less than n_samples %d", RARRAY_LEN(samples), n_samples);
}
}
// Should check when samples.respond_to?(:length)?
} else {
if (TYPE(samples) == T_ARRAY) {
n_samples = RARRAY_LEN(samples);
} else if (memory_view_available_p) {
if (!rb_memory_view_get(samples, &view, RUBY_MEMORY_VIEW_SIMPLE)) {
view.obj = Qnil;
rb_raise(rb_eArgError, "unable to get a memory view");
}
n_samples = view.byte_size / view.item_size;
} else if (rb_respond_to(samples, id_length)) {
n_samples = NUM2INT(rb_funcall(samples, id_length, 0));
} else {
rb_raise(rb_eArgError, "samples must respond to :length or be a MemoryView of an array of flaot when n_samples is not given");
}
}
float * c_samples = (float *)malloc(n_samples * sizeof(float));
if (memory_view_available_p) {
c_samples = (float *)view.data;
} else {
if (TYPE(samples) == T_ARRAY) {
for (int i = 0; i < n_samples; i++) {
c_samples[i] = RFLOAT_VALUE(rb_ary_entry(samples, i));
}
} else {
// TODO: use rb_block_call
VALUE iter = rb_funcall(samples, id_to_enum, 1, rb_str_new2("each"));
for (int i = 0; i < n_samples; i++) {
// TODO: check if iter is exhausted and raise ArgumentError appropriately
VALUE sample = rb_funcall(iter, id_next, 0);
c_samples[i] = RFLOAT_VALUE(sample);
}
}
}
register_callbacks(rwp, &self);
const int result = whisper_full(rw->context, rwp->params, c_samples, n_samples);
if (0 == result) {
return self;
} else {
rb_exc_raise(rb_funcall(eError, id_new, 1, result));
}
}
/*
* Split the input audio in chunks and process each chunk separately using whisper_full_with_state()
* Result is stored in the default state of the context
* Not thread safe if executed in parallel on the same context.
* It seems this approach can offer some speedup in some cases.
* However, the transcription accuracy can be worse at the beginning and end of each chunk.
*
* call-seq:
* full_parallel(params, samples) -> nil
* full_parallel(params, samples, n_samples) -> nil
* full_parallel(params, samples, n_samples, n_processors) -> nil
* full_parallel(params, samples, nil, n_processors) -> nil
*/
static VALUE
ruby_whisper_full_parallel(int argc, VALUE *argv,VALUE self)
{
if (argc < 2 || argc > 4) {
rb_raise(rb_eArgError, "wrong number of arguments (given %d, expected 2..3)", argc);
}
ruby_whisper *rw;
ruby_whisper_params *rwp;
Data_Get_Struct(self, ruby_whisper, rw);
VALUE params = argv[0];
Data_Get_Struct(params, ruby_whisper_params, rwp);
VALUE samples = argv[1];
int n_samples;
int n_processors;
rb_memory_view_t view;
const bool memory_view_available_p = rb_memory_view_available_p(samples);
switch (argc) {
case 2:
n_processors = 1;
break;
case 3:
n_processors = 1;
break;
case 4:
n_processors = NUM2INT(argv[3]);
break;
}
if (argc >= 3 && !NIL_P(argv[2])) {
n_samples = NUM2INT(argv[2]);
if (TYPE(samples) == T_ARRAY) {
if (RARRAY_LEN(samples) < n_samples) {
rb_raise(rb_eArgError, "samples length %ld is less than n_samples %d", RARRAY_LEN(samples), n_samples);
}
}
// Should check when samples.respond_to?(:length)?
} else if (memory_view_available_p) {
if (!rb_memory_view_get(samples, &view, RUBY_MEMORY_VIEW_SIMPLE)) {
view.obj = Qnil;
rb_raise(rb_eArgError, "unable to get a memory view");
}
n_samples = view.byte_size / view.item_size;
} else {
if (TYPE(samples) == T_ARRAY) {
n_samples = RARRAY_LEN(samples);
} else if (rb_respond_to(samples, id_length)) {
n_samples = NUM2INT(rb_funcall(samples, id_length, 0));
} else {
rb_raise(rb_eArgError, "samples must respond to :length or be a MemoryView of an array of flaot when n_samples is not given");
}
}
float * c_samples = (float *)malloc(n_samples * sizeof(float));
if (memory_view_available_p) {
c_samples = (float *)view.data;
} else {
if (TYPE(samples) == T_ARRAY) {
for (int i = 0; i < n_samples; i++) {
c_samples[i] = RFLOAT_VALUE(rb_ary_entry(samples, i));
}
} else {
// FIXME: use rb_block_call
VALUE iter = rb_funcall(samples, id_to_enum, 1, rb_str_new2("each"));
for (int i = 0; i < n_samples; i++) {
// TODO: check if iter is exhausted and raise ArgumentError
VALUE sample = rb_funcall(iter, id_next, 0);
c_samples[i] = RFLOAT_VALUE(sample);
}
}
}
register_callbacks(rwp, &self);
const int result = whisper_full_parallel(rw->context, rwp->params, c_samples, n_samples, n_processors);
if (0 == result) {
return self;
} else {
rb_exc_raise(rb_funcall(eError, id_new, 1, result));
}
}
/*
* Number of segments.
*
* call-seq:
* full_n_segments -> Integer
*/
static VALUE
ruby_whisper_full_n_segments(VALUE self)
{
ruby_whisper *rw;
Data_Get_Struct(self, ruby_whisper, rw);
return INT2NUM(whisper_full_n_segments(rw->context));
}
/*
* Language ID, which can be converted to string by Whisper.lang_str and Whisper.lang_str_full.
*
* call-seq:
* full_lang_id -> Integer
*/
static VALUE
ruby_whisper_full_lang_id(VALUE self)
{
ruby_whisper *rw;
Data_Get_Struct(self, ruby_whisper, rw);
return INT2NUM(whisper_full_lang_id(rw->context));
}
static int ruby_whisper_full_check_segment_index(const ruby_whisper * rw, const VALUE i_segment)
{
const int c_i_segment = NUM2INT(i_segment);
if (c_i_segment < 0 || c_i_segment >= whisper_full_n_segments(rw->context)) {
rb_raise(rb_eIndexError, "segment index %d out of range", c_i_segment);
}
return c_i_segment;
}
/*
* Start time of a segment indexed by +segment_index+ in centiseconds (10 times milliseconds).
*
* full_get_segment_t0(3) # => 1668 (16680 ms)
*
* call-seq:
* full_get_segment_t0(segment_index) -> Integer
*/
static VALUE
ruby_whisper_full_get_segment_t0(VALUE self, VALUE i_segment)
{
ruby_whisper *rw;
Data_Get_Struct(self, ruby_whisper, rw);
const int c_i_segment = ruby_whisper_full_check_segment_index(rw, i_segment);
const int64_t t0 = whisper_full_get_segment_t0(rw->context, c_i_segment);
return INT2NUM(t0);
}
/*
* End time of a segment indexed by +segment_index+ in centiseconds (10 times milliseconds).
*
* full_get_segment_t1(3) # => 1668 (16680 ms)
*
* call-seq:
* full_get_segment_t1(segment_index) -> Integer
*/
static VALUE
ruby_whisper_full_get_segment_t1(VALUE self, VALUE i_segment)
{
ruby_whisper *rw;
Data_Get_Struct(self, ruby_whisper, rw);
const int c_i_segment = ruby_whisper_full_check_segment_index(rw, i_segment);
const int64_t t1 = whisper_full_get_segment_t1(rw->context, c_i_segment);
return INT2NUM(t1);
}
/*
* Whether the next segment indexed by +segment_index+ is predicated as a speaker turn.
*
* full_get_segment_speacker_turn_next(3) # => true
*
* call-seq:
* full_get_segment_speacker_turn_next(segment_index) -> bool
*/
static VALUE
ruby_whisper_full_get_segment_speaker_turn_next(VALUE self, VALUE i_segment)
{
ruby_whisper *rw;
Data_Get_Struct(self, ruby_whisper, rw);
const int c_i_segment = ruby_whisper_full_check_segment_index(rw, i_segment);
const bool speaker_turn_next = whisper_full_get_segment_speaker_turn_next(rw->context, c_i_segment);
return speaker_turn_next ? Qtrue : Qfalse;
}
/*
* Text of a segment indexed by +segment_index+.
*
* full_get_segment_text(3) # => "ask not what your country can do for you, ..."
*
* call-seq:
* full_get_segment_text(segment_index) -> String
*/
static VALUE
ruby_whisper_full_get_segment_text(VALUE self, VALUE i_segment)
{
ruby_whisper *rw;
Data_Get_Struct(self, ruby_whisper, rw);
const int c_i_segment = ruby_whisper_full_check_segment_index(rw, i_segment);
const char * text = whisper_full_get_segment_text(rw->context, c_i_segment);
return rb_str_new2(text);
}
/*
* call-seq:
* full_get_segment_no_speech_prob(segment_index) -> Float
*/
static VALUE
ruby_whisper_full_get_segment_no_speech_prob(VALUE self, VALUE i_segment)
{
ruby_whisper *rw;
Data_Get_Struct(self, ruby_whisper, rw);
const int c_i_segment = ruby_whisper_full_check_segment_index(rw, i_segment);
const float no_speech_prob = whisper_full_get_segment_no_speech_prob(rw->context, c_i_segment);
return DBL2NUM(no_speech_prob);
}
// High level API
static VALUE
ruby_whisper_full_get_segment(VALUE self, VALUE i_segment)
{
return rb_whisper_segment_initialize(self, NUM2INT(i_segment));
}
/*
* Yields each Whisper::Segment:
*
* whisper.transcribe("path/to/audio.wav", params)
* whisper.each_segment do |segment|
* puts segment.text
* end
*
* Returns an Enumerator if no block given:
*
* whisper.transcribe("path/to/audio.wav", params)
* enum = whisper.each_segment
* enum.to_a # => [#<Whisper::Segment>, ...]
*
* call-seq:
* each_segment {|segment| ... }
* each_segment -> Enumerator
*/
static VALUE
ruby_whisper_each_segment(VALUE self)
{
if (!rb_block_given_p()) {
const VALUE method_name = rb_funcall(self, id___method__, 0);
return rb_funcall(self, id_to_enum, 1, method_name);
}
ruby_whisper *rw;
Data_Get_Struct(self, ruby_whisper, rw);
const int n_segments = whisper_full_n_segments(rw->context);
for (int i = 0; i < n_segments; ++i) {
rb_yield(rb_whisper_segment_initialize(self, i));
}
return self;
}
/*
* call-seq:
* model -> Whisper::Model
*/
static VALUE
ruby_whisper_get_model(VALUE self)
{
return rb_whisper_model_initialize(self);
}
void
init_ruby_whisper_context(VALUE *mWhisper)
{
cContext = rb_define_class_under(*mWhisper, "Context", rb_cObject);
rb_define_alloc_func(cContext, ruby_whisper_allocate);
rb_define_method(cContext, "initialize", ruby_whisper_initialize, -1);
rb_define_method(cContext, "transcribe", ruby_whisper_transcribe, -1);
rb_define_method(cContext, "model_n_vocab", ruby_whisper_model_n_vocab, 0);
rb_define_method(cContext, "model_n_audio_ctx", ruby_whisper_model_n_audio_ctx, 0);
rb_define_method(cContext, "model_n_audio_state", ruby_whisper_model_n_audio_state, 0);
rb_define_method(cContext, "model_n_audio_head", ruby_whisper_model_n_audio_head, 0);
rb_define_method(cContext, "model_n_audio_layer", ruby_whisper_model_n_audio_layer, 0);
rb_define_method(cContext, "model_n_text_ctx", ruby_whisper_model_n_text_ctx, 0);
rb_define_method(cContext, "model_n_text_state", ruby_whisper_model_n_text_state, 0);
rb_define_method(cContext, "model_n_text_head", ruby_whisper_model_n_text_head, 0);
rb_define_method(cContext, "model_n_text_layer", ruby_whisper_model_n_text_layer, 0);
rb_define_method(cContext, "model_n_mels", ruby_whisper_model_n_mels, 0);
rb_define_method(cContext, "model_ftype", ruby_whisper_model_ftype, 0);
rb_define_method(cContext, "model_type", ruby_whisper_model_type, 0);
rb_define_method(cContext, "full_n_segments", ruby_whisper_full_n_segments, 0);
rb_define_method(cContext, "full_lang_id", ruby_whisper_full_lang_id, 0);
rb_define_method(cContext, "full_get_segment_t0", ruby_whisper_full_get_segment_t0, 1);
rb_define_method(cContext, "full_get_segment_t1", ruby_whisper_full_get_segment_t1, 1);
rb_define_method(cContext, "full_get_segment_speaker_turn_next", ruby_whisper_full_get_segment_speaker_turn_next, 1);
rb_define_method(cContext, "full_get_segment_text", ruby_whisper_full_get_segment_text, 1);
rb_define_method(cContext, "full_get_segment_no_speech_prob", ruby_whisper_full_get_segment_no_speech_prob, 1);
rb_define_method(cContext, "full", ruby_whisper_full, -1);
rb_define_method(cContext, "full_parallel", ruby_whisper_full_parallel, -1);
// High leve
rb_define_method(cContext, "full_get_segment", ruby_whisper_full_get_segment, 1);
rb_define_method(cContext, "each_segment", ruby_whisper_each_segment, 0);
rb_define_method(cContext, "model", ruby_whisper_get_model, 0);
}

View File

@ -0,0 +1,52 @@
#include <ruby.h>
extern VALUE eError;
VALUE ruby_whisper_error_initialize(VALUE self, VALUE code)
{
const int c_code = NUM2INT(code);
const char *raw_message;
switch (c_code) {
case -2:
raw_message = "failed to compute log mel spectrogram";
break;
case -3:
raw_message = "failed to auto-detect language";
break;
case -4:
raw_message = "too many decoders requested";
break;
case -5:
raw_message = "audio_ctx is larger than the maximum allowed";
break;
case -6:
raw_message = "failed to encode";
break;
case -7:
raw_message = "whisper_kv_cache_init() failed for self-attention cache";
break;
case -8:
raw_message = "failed to decode";
break;
case -9:
raw_message = "failed to decode";
break;
default:
raw_message = "unknown error";
break;
}
const VALUE message = rb_str_new2(raw_message);
rb_call_super(1, &message);
rb_iv_set(self, "@code", code);
return self;
}
void
init_ruby_whisper_error(VALUE *mWhisper)
{
eError = rb_define_class_under(*mWhisper, "Error", rb_eStandardError);
rb_define_attr(eError, "code", true, false);
rb_define_method(eError, "initialize", ruby_whisper_error_initialize, 1);
}

View File

@ -0,0 +1,210 @@
#include <ruby.h>
#include "ruby_whisper.h"
extern VALUE cModel;
static void rb_whisper_model_mark(ruby_whisper_model *rwm) {
rb_gc_mark(rwm->context);
}
static VALUE ruby_whisper_model_allocate(VALUE klass) {
ruby_whisper_model *rwm;
rwm = ALLOC(ruby_whisper_model);
return Data_Wrap_Struct(klass, rb_whisper_model_mark, RUBY_DEFAULT_FREE, rwm);
}
VALUE rb_whisper_model_initialize(VALUE context) {
ruby_whisper_model *rwm;
const VALUE model = ruby_whisper_model_allocate(cModel);
Data_Get_Struct(model, ruby_whisper_model, rwm);
rwm->context = context;
return model;
};
/*
* call-seq:
* n_vocab -> Integer
*/
static VALUE
ruby_whisper_model_n_vocab(VALUE self)
{
ruby_whisper_model *rwm;
Data_Get_Struct(self, ruby_whisper_model, rwm);
ruby_whisper *rw;
Data_Get_Struct(rwm->context, ruby_whisper, rw);
return INT2NUM(whisper_model_n_vocab(rw->context));
}
/*
* call-seq:
* n_audio_ctx -> Integer
*/
static VALUE
ruby_whisper_model_n_audio_ctx(VALUE self)
{
ruby_whisper_model *rwm;
Data_Get_Struct(self, ruby_whisper_model, rwm);
ruby_whisper *rw;
Data_Get_Struct(rwm->context, ruby_whisper, rw);
return INT2NUM(whisper_model_n_audio_ctx(rw->context));
}
/*
* call-seq:
* n_audio_state -> Integer
*/
static VALUE
ruby_whisper_model_n_audio_state(VALUE self)
{
ruby_whisper_model *rwm;
Data_Get_Struct(self, ruby_whisper_model, rwm);
ruby_whisper *rw;
Data_Get_Struct(rwm->context, ruby_whisper, rw);
return INT2NUM(whisper_model_n_audio_state(rw->context));
}
/*
* call-seq:
* n_audio_head -> Integer
*/
static VALUE
ruby_whisper_model_n_audio_head(VALUE self)
{
ruby_whisper_model *rwm;
Data_Get_Struct(self, ruby_whisper_model, rwm);
ruby_whisper *rw;
Data_Get_Struct(rwm->context, ruby_whisper, rw);
return INT2NUM(whisper_model_n_audio_head(rw->context));
}
/*
* call-seq:
* n_audio_layer -> Integer
*/
static VALUE
ruby_whisper_model_n_audio_layer(VALUE self)
{
ruby_whisper_model *rwm;
Data_Get_Struct(self, ruby_whisper_model, rwm);
ruby_whisper *rw;
Data_Get_Struct(rwm->context, ruby_whisper, rw);
return INT2NUM(whisper_model_n_audio_layer(rw->context));
}
/*
* call-seq:
* n_text_ctx -> Integer
*/
static VALUE
ruby_whisper_model_n_text_ctx(VALUE self)
{
ruby_whisper_model *rwm;
Data_Get_Struct(self, ruby_whisper_model, rwm);
ruby_whisper *rw;
Data_Get_Struct(rwm->context, ruby_whisper, rw);
return INT2NUM(whisper_model_n_text_ctx(rw->context));
}
/*
* call-seq:
* n_text_state -> Integer
*/
static VALUE
ruby_whisper_model_n_text_state(VALUE self)
{
ruby_whisper_model *rwm;
Data_Get_Struct(self, ruby_whisper_model, rwm);
ruby_whisper *rw;
Data_Get_Struct(rwm->context, ruby_whisper, rw);
return INT2NUM(whisper_model_n_text_state(rw->context));
}
/*
* call-seq:
* n_text_head -> Integer
*/
static VALUE
ruby_whisper_model_n_text_head(VALUE self)
{
ruby_whisper_model *rwm;
Data_Get_Struct(self, ruby_whisper_model, rwm);
ruby_whisper *rw;
Data_Get_Struct(rwm->context, ruby_whisper, rw);
return INT2NUM(whisper_model_n_text_head(rw->context));
}
/*
* call-seq:
* n_text_layer -> Integer
*/
static VALUE
ruby_whisper_model_n_text_layer(VALUE self)
{
ruby_whisper_model *rwm;
Data_Get_Struct(self, ruby_whisper_model, rwm);
ruby_whisper *rw;
Data_Get_Struct(rwm->context, ruby_whisper, rw);
return INT2NUM(whisper_model_n_text_layer(rw->context));
}
/*
* call-seq:
* n_mels -> Integer
*/
static VALUE
ruby_whisper_model_n_mels(VALUE self)
{
ruby_whisper_model *rwm;
Data_Get_Struct(self, ruby_whisper_model, rwm);
ruby_whisper *rw;
Data_Get_Struct(rwm->context, ruby_whisper, rw);
return INT2NUM(whisper_model_n_mels(rw->context));
}
/*
* call-seq:
* ftype -> Integer
*/
static VALUE
ruby_whisper_model_ftype(VALUE self)
{
ruby_whisper_model *rwm;
Data_Get_Struct(self, ruby_whisper_model, rwm);
ruby_whisper *rw;
Data_Get_Struct(rwm->context, ruby_whisper, rw);
return INT2NUM(whisper_model_ftype(rw->context));
}
/*
* call-seq:
* type -> String
*/
static VALUE
ruby_whisper_model_type(VALUE self)
{
ruby_whisper_model *rwm;
Data_Get_Struct(self, ruby_whisper_model, rwm);
ruby_whisper *rw;
Data_Get_Struct(rwm->context, ruby_whisper, rw);
return rb_str_new2(whisper_model_type_readable(rw->context));
}
void
init_ruby_whisper_model(VALUE *mWhisper)
{
cModel = rb_define_class_under(*mWhisper, "Model", rb_cObject);
rb_define_alloc_func(cModel, ruby_whisper_model_allocate);
rb_define_method(cModel, "n_vocab", ruby_whisper_model_n_vocab, 0);
rb_define_method(cModel, "n_audio_ctx", ruby_whisper_model_n_audio_ctx, 0);
rb_define_method(cModel, "n_audio_state", ruby_whisper_model_n_audio_state, 0);
rb_define_method(cModel, "n_audio_head", ruby_whisper_model_n_audio_head, 0);
rb_define_method(cModel, "n_audio_layer", ruby_whisper_model_n_audio_layer, 0);
rb_define_method(cModel, "n_text_ctx", ruby_whisper_model_n_text_ctx, 0);
rb_define_method(cModel, "n_text_state", ruby_whisper_model_n_text_state, 0);
rb_define_method(cModel, "n_text_head", ruby_whisper_model_n_text_head, 0);
rb_define_method(cModel, "n_text_layer", ruby_whisper_model_n_text_layer, 0);
rb_define_method(cModel, "n_mels", ruby_whisper_model_n_mels, 0);
rb_define_method(cModel, "ftype", ruby_whisper_model_ftype, 0);
rb_define_method(cModel, "type", ruby_whisper_model_type, 0);
}

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,123 @@
#include <ruby.h>
#include "ruby_whisper.h"
extern VALUE cSegment;
static void
rb_whisper_segment_mark(ruby_whisper_segment *rws)
{
rb_gc_mark(rws->context);
}
VALUE
ruby_whisper_segment_allocate(VALUE klass)
{
ruby_whisper_segment *rws;
rws = ALLOC(ruby_whisper_segment);
return Data_Wrap_Struct(klass, rb_whisper_segment_mark, RUBY_DEFAULT_FREE, rws);
}
VALUE
rb_whisper_segment_initialize(VALUE context, int index)
{
ruby_whisper_segment *rws;
const VALUE segment = ruby_whisper_segment_allocate(cSegment);
Data_Get_Struct(segment, ruby_whisper_segment, rws);
rws->context = context;
rws->index = index;
return segment;
};
/*
* Start time in milliseconds.
*
* call-seq:
* start_time -> Integer
*/
static VALUE
ruby_whisper_segment_get_start_time(VALUE self)
{
ruby_whisper_segment *rws;
Data_Get_Struct(self, ruby_whisper_segment, rws);
ruby_whisper *rw;
Data_Get_Struct(rws->context, ruby_whisper, rw);
const int64_t t0 = whisper_full_get_segment_t0(rw->context, rws->index);
// able to multiply 10 without overflow because to_timestamp() in whisper.cpp does it
return INT2NUM(t0 * 10);
}
/*
* End time in milliseconds.
*
* call-seq:
* end_time -> Integer
*/
static VALUE
ruby_whisper_segment_get_end_time(VALUE self)
{
ruby_whisper_segment *rws;
Data_Get_Struct(self, ruby_whisper_segment, rws);
ruby_whisper *rw;
Data_Get_Struct(rws->context, ruby_whisper, rw);
const int64_t t1 = whisper_full_get_segment_t1(rw->context, rws->index);
// able to multiply 10 without overflow because to_timestamp() in whisper.cpp does it
return INT2NUM(t1 * 10);
}
/*
* Whether the next segment is predicted as a speaker turn.
*
* call-seq:
* speaker_turn_next? -> bool
*/
static VALUE
ruby_whisper_segment_get_speaker_turn_next(VALUE self)
{
ruby_whisper_segment *rws;
Data_Get_Struct(self, ruby_whisper_segment, rws);
ruby_whisper *rw;
Data_Get_Struct(rws->context, ruby_whisper, rw);
return whisper_full_get_segment_speaker_turn_next(rw->context, rws->index) ? Qtrue : Qfalse;
}
/*
* call-seq:
* text -> String
*/
static VALUE
ruby_whisper_segment_get_text(VALUE self)
{
ruby_whisper_segment *rws;
Data_Get_Struct(self, ruby_whisper_segment, rws);
ruby_whisper *rw;
Data_Get_Struct(rws->context, ruby_whisper, rw);
const char * text = whisper_full_get_segment_text(rw->context, rws->index);
return rb_str_new2(text);
}
/*
* call-seq:
* no_speech_prob -> Float
*/
static VALUE
ruby_whisper_segment_get_no_speech_prob(VALUE self)
{
ruby_whisper_segment *rws;
Data_Get_Struct(self, ruby_whisper_segment, rws);
ruby_whisper *rw;
Data_Get_Struct(rws->context, ruby_whisper, rw);
return DBL2NUM(whisper_full_get_segment_no_speech_prob(rw->context, rws->index));
}
void
init_ruby_whisper_segment(VALUE *mWhisper, VALUE *cContext)
{
cSegment = rb_define_class_under(*mWhisper, "Segment", rb_cObject);
rb_define_alloc_func(cSegment, ruby_whisper_segment_allocate);
rb_define_method(cSegment, "start_time", ruby_whisper_segment_get_start_time, 0);
rb_define_method(cSegment, "end_time", ruby_whisper_segment_get_end_time, 0);
rb_define_method(cSegment, "speaker_next_turn?", ruby_whisper_segment_get_speaker_turn_next, 0);
rb_define_method(cSegment, "text", ruby_whisper_segment_get_text, 0);
rb_define_method(cSegment, "no_speech_prob", ruby_whisper_segment_get_no_speech_prob, 0);
}

View File

@ -0,0 +1,83 @@
#include <ruby.h>
#include "ruby_whisper.h"
#include "common-whisper.h"
#include <string>
#include <vector>
#ifdef __cplusplus
extern "C" {
#endif
extern ID id_to_s;
extern ID id_call;
extern void
register_callbacks(ruby_whisper_params * rwp, VALUE * self);
/*
* transcribe a single file
* can emit to a block results
*
* params = Whisper::Params.new
* params.duration = 60_000
* whisper.transcribe "path/to/audio.wav", params do |text|
* puts text
* end
*
* call-seq:
* transcribe(path_to_audio, params) {|text| ...}
**/
VALUE
ruby_whisper_transcribe(int argc, VALUE *argv, VALUE self) {
ruby_whisper *rw;
ruby_whisper_params *rwp;
VALUE wave_file_path, blk, params;
rb_scan_args(argc, argv, "02&", &wave_file_path, &params, &blk);
Data_Get_Struct(self, ruby_whisper, rw);
Data_Get_Struct(params, ruby_whisper_params, rwp);
if (!rb_respond_to(wave_file_path, id_to_s)) {
rb_raise(rb_eRuntimeError, "Expected file path to wave file");
}
std::string fname_inp = StringValueCStr(wave_file_path);
std::vector<float> pcmf32; // mono-channel F32 PCM
std::vector<std::vector<float>> pcmf32s; // stereo-channel F32 PCM
if (!read_audio_data(fname_inp, pcmf32, pcmf32s, rwp->diarize)) {
fprintf(stderr, "error: failed to open '%s' as WAV file\n", fname_inp.c_str());
return self;
}
{
static bool is_aborted = false; // NOTE: this should be atomic to avoid data race
rwp->params.encoder_begin_callback = [](struct whisper_context * /*ctx*/, struct whisper_state * /*state*/, void * user_data) {
bool is_aborted = *(bool*)user_data;
return !is_aborted;
};
rwp->params.encoder_begin_callback_user_data = &is_aborted;
}
register_callbacks(rwp, &self);
if (whisper_full_parallel(rw->context, rwp->params, pcmf32.data(), pcmf32.size(), 1) != 0) {
fprintf(stderr, "failed to process audio\n");
return self;
}
const int n_segments = whisper_full_n_segments(rw->context);
VALUE output = rb_str_new2("");
for (int i = 0; i < n_segments; ++i) {
const char * text = whisper_full_get_segment_text(rw->context, i);
output = rb_str_concat(output, rb_str_new2(text));
}
VALUE idCall = id_call;
if (blk != Qnil) {
rb_funcall(blk, idCall, 1, output);
}
return self;
}
#ifdef __cplusplus
}
#endif

View File

@ -0,0 +1,6 @@
require "yaml"
sources = `git ls-files -z ../..`.split("\x0")
paths = YAML.load_file("../../.github/workflows/bindings-ruby.yml")[true]["push"]["paths"]
paths.delete "bindings/ruby/**"
EXTSOURCES = (Dir.glob(paths, base: "../..").collect {|path| "../../#{path}"} << "../../LICENSE") & sources

View File

@ -0,0 +1,170 @@
require "uri"
require "net/http"
require "time"
require "pathname"
require "io/console/size"
module Whisper
class Model
class URI
def initialize(uri)
@uri = URI(uri)
end
def to_path
cache
cache_path.to_path
end
def clear_cache
path = cache_path
path.delete if path.exist?
end
private
def cache_path
base_cache_dir/@uri.host/@uri.path[1..]
end
def base_cache_dir
base = case RUBY_PLATFORM
when /mswin|mingw/
ENV.key?("LOCALAPPDATA") ? Pathname(ENV["LOCALAPPDATA"]) : Pathname(Dir.home)/"AppData/Local"
when /darwin/
Pathname(Dir.home)/"Library/Caches"
else
ENV.key?("XDG_CACHE_HOME") ? ENV["XDG_CACHE_HOME"] : Pathname(Dir.home)/".cache"
end
base/"whisper.cpp"
end
def cache
path = cache_path
headers = {}
headers["if-modified-since"] = path.mtime.httpdate if path.exist?
request @uri, headers
path
end
def request(uri, headers)
Net::HTTP.start uri.host, uri.port, use_ssl: uri.scheme == "https" do |http|
request = Net::HTTP::Get.new(uri, headers)
http.request request do |response|
case response
when Net::HTTPNotModified
# noop
when Net::HTTPOK
download response
when Net::HTTPRedirection
request URI(response["location"]), headers
else
return if headers.key?("if-modified-since") # Use cache file
raise "#{response.code} #{response.message}\n#{response.body}"
end
end
end
rescue => err
if cache_path.exist?
warn err
# Use cache file
else
raise
end
end
def download(response)
path = cache_path
path.dirname.mkpath unless path.dirname.exist?
downloading_path = Pathname("#{path}.downloading")
size = response.content_length
downloading_path.open "wb" do |file|
downloaded = 0
response.read_body do |chunk|
file << chunk
downloaded += chunk.bytesize
show_progress downloaded, size
end
$stderr.puts
end
downloading_path.rename path
end
def show_progress(current, size)
progress_rate_available = size && $stderr.tty?
unless @prev
@prev = Time.now
$stderr.puts "Downloading #{@uri} to #{cache_path}"
end
now = Time.now
if progress_rate_available
return if now - @prev < 1 && current < size
progress_width = 20
progress = current.to_f / size
arrow_length = progress * progress_width
arrow = "=" * (arrow_length - 1) + ">" + " " * (progress_width - arrow_length)
line = "[#{arrow}] (#{format_bytesize(current)} / #{format_bytesize(size)})"
padding = ' ' * ($stderr.winsize[1] - line.size)
$stderr.print "\r#{line}#{padding}"
else
return if now - @prev < 1
$stderr.print "."
end
@prev = now
end
def format_bytesize(bytesize)
return "0.0 B" if bytesize.zero?
units = %w[B KiB MiB GiB TiB]
exp = (Math.log(bytesize) / Math.log(1024)).to_i
format("%.1f %s", bytesize.to_f / 1024 ** exp, units[exp])
end
end
@pre_converted_models = %w[
tiny
tiny.en
tiny-q5_1
tiny.en-q5_1
tiny-q8_0
base
base.en
base-q5_1
base.en-q5_1
base-q8_0
small
small.en
small.en-tdrz
small-q5_1
small.en-q5_1
small-q8_0
medium
medium.en
medium-q5_0
medium.en-q5_0
medium-q8_0
large-v1
large-v2
large-v2-q5_0
large-v2-q8_0
large-v3
large-v3-q5_0
large-v3-turbo
large-v3-turbo-q5_0
large-v3-turbo-q8_0
].each_with_object({}) {|name, models|
models[name] = URI.new("https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-#{name}.bin")
}
class << self
attr_reader :pre_converted_models
end
end
end

View File

@ -0,0 +1,189 @@
module Whisper
interface _Samples
def length: () -> Integer
def each: { (Float) -> void } -> void
end
type log_callback = ^(Integer level, String message, Object user_data) -> void
type new_segment_callback = ^(Whisper::Context, void, Integer n_new, Object user_data) -> void
type progress_callback = ^(Whisper::Context, void, Integer progress, Object user_data) -> void
type abort_callback = ^(Whisper::Context, void, Object user_data) -> boolish
LOG_LEVEL_NONE: Integer
LOG_LEVEL_INFO: Integer
LOG_LEVEL_WARN: Integer
LOG_LEVEL_ERROR: Integer
LOG_LEVEL_DEBUG: Integer
LOG_LEVEL_CONT: Integer
def self.lang_max_id: () -> Integer
def self.lang_id: (string name) -> Integer
def self.lang_str: (Integer id) -> String
def self.lang_str_full: (Integer id) -> String
def self.log_set: (log_callback, Object? user_data) -> log_callback
class Context
def self.new: (string | _ToPath | ::URI::HTTP) -> instance
def transcribe: (string, Params) -> self
| (string, Params) { (String) -> void } -> self
def model_n_vocab: () -> Integer
def model_n_audio_ctx: () -> Integer
def model_n_audio_state: () -> Integer
def model_n_text_head: () -> Integer
def model_n_text_layer: () -> Integer
def model_n_mels: () -> Integer
def model_ftype: () -> Integer
def model_type: () -> String
def each_segment: { (Segment) -> void } -> void
| () -> Enumerator[Segment]
def model: () -> Model
def full_get_segment: (Integer nth) -> Segment
def full_n_segments: () -> Integer
def full_lang_id: () -> Integer
def full_get_segment_t0: (Integer) -> Integer
def full_get_segment_t1: (Integer) -> Integer
def full_get_segment_speaker_turn_next: (Integer) -> (true | false)
def full_get_segment_text: (Integer) -> String
def full_get_segment_no_speech_prob: (Integer) -> Float
def full: (Params, Array[Float] samples, ?Integer n_samples) -> self
| (Params, _Samples, ?Integer n_samples) -> self
def full_parallel: (Params, Array[Float], ?Integer n_samples) -> self
| (Params, _Samples, ?Integer n_samples) -> self
| (Params, _Samples, ?Integer? n_samples, Integer n_processors) -> self
end
class Params
def self.new: (
?language: string,
?translate: boolish,
?no_context: boolish,
?single_segment: boolish,
?print_special: boolish,
?print_progress: boolish,
?print_realtime: boolish,
?print_timestamps: boolish,
?suppress_blank: boolish,
?suppress_nst: boolish,
?token_timestamps: boolish,
?split_on_word: boolish,
?initial_prompt: string | nil,
?diarize: boolish,
?offset: Integer,
?duration: Integer,
?max_text_tokens: Integer,
?temperature: Float,
?max_initial_ts: Float,
?length_penalty: Float,
?temperature_inc: Float,
?entropy_thold: Float,
?logprob_thold: Float,
?no_speech_thold: Float,
?new_segment_callback: new_segment_callback,
?new_segment_callback_user_data: Object,
?progress_callback: progress_callback,
?progress_callback_user_data: Object,
?abort_callback: abort_callback,
?abort_callback_user_data: Object
) -> instance
def language=: (String) -> String # TODO: Enumerate lang names
def language: () -> String
def translate=: (boolish) -> boolish
def translate: () -> (true | false)
def no_context=: (boolish) -> boolish
def no_context: () -> (true | false)
def single_segment=: (boolish) -> boolish
def single_segment: () -> (true | false)
def print_special=: (boolish) -> boolish
def print_special: () -> (true | false)
def print_progress=: (boolish) -> boolish
def print_progress: () -> (true | false)
def print_realtime=: (boolish) -> boolish
def print_realtime: () -> (true | false)
def print_timestamps=: (boolish) -> boolish
def print_timestamps: () -> (true | false)
def suppress_blank=: (boolish) -> boolish
def suppress_blank: () -> (true | false)
def suppress_nst=: (boolish) -> boolish
def suppress_nst: () -> (true | false)
def token_timestamps=: (boolish) -> boolish
def token_timestamps: () -> (true | false)
def split_on_word=: (boolish) -> boolish
def split_on_word: () -> (true | false)
def initial_prompt=: (_ToS) -> _ToS
def initial_prompt: () -> (String | nil)
def diarize=: (boolish) -> boolish
def diarize: () -> (true | false)
def offset=: (Integer) -> Integer
def offset: () -> Integer
def duration=: (Integer) -> Integer
def duration: () -> Integer
def max_text_tokens=: (Integer) -> Integer
def max_text_tokens: () -> Integer
def temperature=: (Float) -> Float
def temperature: () -> Float
def max_initial_ts=: (Float) -> Float
def max_initial_ts: () -> Float
def length_penalty=: (Float) -> Float
def length_penalty: () -> Float
def temperature_inc=: (Float) -> Float
def temperature_inc: () -> Float
def entropy_thold=: (Float) -> Float
def entropy_thold: () -> Float
def logprob_thold=: (Float) -> Float
def logprob_thold: () -> Float
def no_speech_thold=: (Float) -> Float
def no_speech_thold: () -> Float
def new_segment_callback=: (new_segment_callback) -> new_segment_callback
def new_segment_callback: () -> (new_segment_callback | nil)
def new_segment_callback_user_data=: (Object) -> Object
def new_segment_callback_user_data: () -> Object
def progress_callback=: (progress_callback) -> progress_callback
def progress_callback: () -> (progress_callback | nil)
def progress_callback_user_data=: (Object) -> Object
def progress_callback_user_data: () -> Object
def abort_callback=: (abort_callback) -> abort_callback
def abort_callback: () -> (abort_callback | nil)
def abort_callback_user_data=: (Object) -> Object
def abort_callback_user_data: () -> Object
def on_new_segment: { (Segment) -> void } -> void
def on_progress: { (Integer progress) -> void } -> void
def abort_on: { (Object user_data) -> boolish } -> void
end
class Model
def self.pre_converted_models: () -> Hash[String, Model::URI]
def self.new: () -> instance
def n_vocab: () -> Integer
def n_audio_ctx: () -> Integer
def n_audio_state: () -> Integer
def n_audio_head: () -> Integer
def n_audio_layer: () -> Integer
def n_text_ctx: () -> Integer
def n_text_state: () -> Integer
def n_text_head: () -> Integer
def n_text_layer: () -> Integer
def n_mels: () -> Integer
def ftype: () -> Integer
def type: () -> String
class URI
def self.new: (string | ::URI::HTTP) -> self
def to_path: -> String
def clear_cache: -> void
end
end
class Segment
def start_time: () -> Integer
def end_time: () -> Integer
def speaker_next_turn?: () -> (true | false)
def text: () -> String
def no_speech_prob: () -> Float
end
class Error < StandardError
attr_reader code: Integer
def self.new: (Integer code) -> instance
end
end

View File

@ -0,0 +1,24 @@
require "test/unit"
require "whisper"
require_relative "jfk_reader/jfk_reader"
class TestBase < Test::Unit::TestCase
AUDIO = File.join(__dir__, "..", "..", "..", "samples", "jfk.wav")
class << self
attr_reader :whisper
def startup
@whisper = Whisper::Context.new("base.en")
params = Whisper::Params.new
params.print_timestamps = false
@whisper.transcribe(TestBase::AUDIO, params)
end
end
private
def whisper
self.class.whisper
end
end

View File

@ -0,0 +1,5 @@
Makefile
jfk_reader.o
jfk_reader.so
jfk_reader.bundle
jfk_reader.dll

View File

@ -0,0 +1,3 @@
require "mkmf"
create_makefile("jfk_reader")

View File

@ -0,0 +1,68 @@
#include <ruby.h>
#include <ruby/memory_view.h>
#include <ruby/encoding.h>
static VALUE
jfk_reader_initialize(VALUE self, VALUE audio_path)
{
rb_iv_set(self, "audio_path", audio_path);
return Qnil;
}
static bool
jfk_reader_get_memory_view(const VALUE obj, rb_memory_view_t *view, int flags)
{
VALUE audio_path = rb_iv_get(obj, "audio_path");
const char *audio_path_str = StringValueCStr(audio_path);
const int n_samples = 176000;
float *data = (float *)malloc(n_samples * sizeof(float));
short *samples = (short *)malloc(n_samples * sizeof(short));
FILE *file = fopen(audio_path_str, "rb");
fseek(file, 78, SEEK_SET);
fread(samples, sizeof(short), n_samples, file);
fclose(file);
for (int i = 0; i < n_samples; i++) {
data[i] = samples[i]/32768.0;
}
view->obj = obj;
view->data = (void *)data;
view->byte_size = sizeof(float) * n_samples;
view->readonly = true;
view->format = "f";
view->item_size = sizeof(float);
view->item_desc.components = NULL;
view->item_desc.length = 0;
view->ndim = 1;
view->shape = NULL;
view->sub_offsets = NULL;
view->private_data = NULL;
return true;
}
static bool
jfk_reader_release_memory_view(const VALUE obj, rb_memory_view_t *view)
{
return true;
}
static bool
jfk_reader_memory_view_available_p(const VALUE obj)
{
return true;
}
static const rb_memory_view_entry_t jfk_reader_view_entry = {
jfk_reader_get_memory_view,
jfk_reader_release_memory_view,
jfk_reader_memory_view_available_p
};
void Init_jfk_reader(void)
{
VALUE cJFKReader = rb_define_class("JFKReader", rb_cObject);
rb_memory_view_register(cJFKReader, &jfk_reader_view_entry);
rb_define_method(cJFKReader, "initialize", jfk_reader_initialize, 1);
}

View File

@ -0,0 +1,160 @@
require_relative "helper"
class TestCallback < TestBase
def setup
GC.start
@params = Whisper::Params.new
@whisper = Whisper::Context.new("base.en")
@audio = File.join(AUDIO)
end
def test_new_segment_callback
@params.new_segment_callback = ->(context, state, n_new, user_data) {
assert_kind_of Integer, n_new
assert n_new > 0
assert_same @whisper, context
n_segments = context.full_n_segments
n_new.times do |i|
i_segment = n_segments - 1 + i
start_time = context.full_get_segment_t0(i_segment) * 10
end_time = context.full_get_segment_t1(i_segment) * 10
text = context.full_get_segment_text(i_segment)
assert_kind_of Integer, start_time
assert start_time >= 0
assert_kind_of Integer, end_time
assert end_time > 0
assert_match(/ask not what your country can do for you, ask what you can do for your country/, text) if i_segment == 0
end
}
@whisper.transcribe(@audio, @params)
end
def test_new_segment_callback_closure
search_word = "what"
@params.new_segment_callback = ->(context, state, n_new, user_data) {
n_segments = context.full_n_segments
n_new.times do |i|
i_segment = n_segments - 1 + i
text = context.full_get_segment_text(i_segment)
if text.include?(search_word)
t0 = context.full_get_segment_t0(i_segment)
t1 = context.full_get_segment_t1(i_segment)
raise "search word '#{search_word}' found at between #{t0} and #{t1}"
end
end
}
assert_raise RuntimeError do
@whisper.transcribe(@audio, @params)
end
end
def test_new_segment_callback_user_data
udata = Object.new
@params.new_segment_callback_user_data = udata
@params.new_segment_callback = ->(context, state, n_new, user_data) {
assert_same udata, user_data
}
@whisper.transcribe(@audio, @params)
end
def test_new_segment_callback_user_data_gc
@params.new_segment_callback_user_data = "My user data"
@params.new_segment_callback = ->(context, state, n_new, user_data) {
assert_equal "My user data", user_data
}
GC.start
assert_same @whisper, @whisper.transcribe(@audio, @params)
end
def test_progress_callback
first = nil
last = nil
@params.progress_callback = ->(context, state, progress, user_data) {
assert_kind_of Integer, progress
assert 0 <= progress && progress <= 100
assert_same @whisper, context
first = progress if first.nil?
last = progress
}
@whisper.transcribe(@audio, @params)
assert_equal 0, first
assert_equal 100, last
end
def test_progress_callback_user_data
udata = Object.new
@params.progress_callback_user_data = udata
@params.progress_callback = ->(context, state, n_new, user_data) {
assert_same udata, user_data
}
@whisper.transcribe(@audio, @params)
end
def test_on_progress
first = nil
last = nil
@params.on_progress do |progress|
assert_kind_of Integer, progress
assert 0 <= progress && progress <= 100
first = progress if first.nil?
last = progress
end
@whisper.transcribe(@audio, @params)
assert_equal 0, first
assert_equal 100, last
end
def test_abort_callback
i = 0
@params.abort_callback = ->(user_data) {
assert_nil user_data
i += 1
return false
}
@whisper.transcribe(@audio, @params)
assert i > 0
end
def test_abort_callback_abort
i = 0
@params.abort_callback = ->(user_data) {
i += 1
return i == 3
}
@whisper.transcribe(@audio, @params)
assert_equal 3, i
end
def test_abort_callback_user_data
udata = Object.new
@params.abort_callback_user_data = udata
yielded = nil
@params.abort_callback = ->(user_data) {
yielded = user_data
}
@whisper.transcribe(@audio, @params)
assert_same udata, yielded
end
def test_abort_on
do_abort = false
_aborted_from_callback = false
@params.on_new_segment do |segment|
do_abort = true if segment.text.match?(/ask/)
end
i = 0
@params.abort_on do
i += 1
do_abort
end
@whisper.transcribe(@audio, @params)
assert i > 0
end
end

View File

@ -0,0 +1,20 @@
require_relative "helper"
class TestError < TestBase
def test_error
error = Whisper::Error.new(-2)
assert_equal "failed to compute log mel spectrogram", error.message
assert_equal(-2, error.code)
end
def test_unknown_error
error = Whisper::Error.new(-20)
assert_equal "unknown error", error.message
end
def test_non_int_code
assert_raise TypeError do
_error = Whisper::Error.new("non int")
end
end
end

View File

@ -0,0 +1,109 @@
require_relative "helper"
require "pathname"
class TestModel < TestBase
def test_model
whisper = Whisper::Context.new("base.en")
assert_instance_of Whisper::Model, whisper.model
end
def test_attributes
whisper = Whisper::Context.new("base.en")
model = whisper.model
assert_equal 51864, model.n_vocab
assert_equal 1500, model.n_audio_ctx
assert_equal 512, model.n_audio_state
assert_equal 8, model.n_audio_head
assert_equal 6, model.n_audio_layer
assert_equal 448, model.n_text_ctx
assert_equal 512, model.n_text_state
assert_equal 8, model.n_text_head
assert_equal 6, model.n_text_layer
assert_equal 80, model.n_mels
assert_equal 1, model.ftype
assert_equal "base", model.type
end
def test_gc
model = Whisper::Context.new("base.en").model
GC.start
assert_equal 51864, model.n_vocab
assert_equal 1500, model.n_audio_ctx
assert_equal 512, model.n_audio_state
assert_equal 8, model.n_audio_head
assert_equal 6, model.n_audio_layer
assert_equal 448, model.n_text_ctx
assert_equal 512, model.n_text_state
assert_equal 8, model.n_text_head
assert_equal 6, model.n_text_layer
assert_equal 80, model.n_mels
assert_equal 1, model.ftype
assert_equal "base", model.type
end
def test_pathname
path = Pathname(Whisper::Model.pre_converted_models["base.en"].to_path)
whisper = Whisper::Context.new(path)
model = whisper.model
assert_equal 51864, model.n_vocab
assert_equal 1500, model.n_audio_ctx
assert_equal 512, model.n_audio_state
assert_equal 8, model.n_audio_head
assert_equal 6, model.n_audio_layer
assert_equal 448, model.n_text_ctx
assert_equal 512, model.n_text_state
assert_equal 8, model.n_text_head
assert_equal 6, model.n_text_layer
assert_equal 80, model.n_mels
assert_equal 1, model.ftype
assert_equal "base", model.type
end
def test_auto_download
path = Whisper::Model.pre_converted_models["base.en"].to_path
assert_path_exist path
assert_equal 147964211, File.size(path)
end
def test_uri_string
path = "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin"
whisper = Whisper::Context.new(path)
model = whisper.model
assert_equal 51864, model.n_vocab
assert_equal 1500, model.n_audio_ctx
assert_equal 512, model.n_audio_state
assert_equal 8, model.n_audio_head
assert_equal 6, model.n_audio_layer
assert_equal 448, model.n_text_ctx
assert_equal 512, model.n_text_state
assert_equal 8, model.n_text_head
assert_equal 6, model.n_text_layer
assert_equal 80, model.n_mels
assert_equal 1, model.ftype
assert_equal "base", model.type
end
def test_uri
path = URI("https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin")
whisper = Whisper::Context.new(path)
model = whisper.model
assert_equal 51864, model.n_vocab
assert_equal 1500, model.n_audio_ctx
assert_equal 512, model.n_audio_state
assert_equal 8, model.n_audio_head
assert_equal 6, model.n_audio_layer
assert_equal 448, model.n_text_ctx
assert_equal 512, model.n_text_state
assert_equal 8, model.n_text_head
assert_equal 6, model.n_text_layer
assert_equal 80, model.n_mels
assert_equal 1, model.ftype
assert_equal "base", model.type
end
end

View File

@ -0,0 +1,31 @@
require_relative "helper"
require 'tempfile'
require 'tmpdir'
require 'shellwords'
class TestPackage < TestBase
def test_build
Tempfile.create do |file|
assert system("gem", "build", "whispercpp.gemspec", "--output", file.to_path.shellescape, exception: true)
assert file.size > 0
assert_path_exist file.to_path
end
end
sub_test_case "Building binary on installation" do
def setup
system "rake", "build", exception: true
end
def test_install
match_data = `rake -Tbuild`.match(/(whispercpp-(.+)\.gem)/)
filename = match_data[1]
version = match_data[2]
basename = "whisper.#{RbConfig::CONFIG["DLEXT"]}"
Dir.mktmpdir do |dir|
system "gem", "install", "--install-dir", dir.shellescape, "--no-document", "pkg/#{filename.shellescape}", exception: true
assert_path_exist File.join(dir, "gems/whispercpp-#{version}/lib", basename)
end
end
end
end

View File

@ -0,0 +1,246 @@
require_relative "helper"
class TestParams < TestBase
PARAM_NAMES = [
:language,
:translate,
:no_context,
:single_segment,
:print_special,
:print_progress,
:print_realtime,
:print_timestamps,
:suppress_blank,
:suppress_nst,
:token_timestamps,
:split_on_word,
:initial_prompt,
:diarize,
:offset,
:duration,
:max_text_tokens,
:temperature,
:max_initial_ts,
:length_penalty,
:temperature_inc,
:entropy_thold,
:logprob_thold,
:no_speech_thold,
:new_segment_callback,
:new_segment_callback_user_data,
:progress_callback,
:progress_callback_user_data,
:abort_callback,
:abort_callback_user_data,
]
def setup
@params = Whisper::Params.new
end
def test_language
@params.language = "en"
assert_equal @params.language, "en"
@params.language = "auto"
assert_equal @params.language, "auto"
end
def test_offset
@params.offset = 10_000
assert_equal @params.offset, 10_000
@params.offset = 0
assert_equal @params.offset, 0
end
def test_duration
@params.duration = 60_000
assert_equal @params.duration, 60_000
@params.duration = 0
assert_equal @params.duration, 0
end
def test_max_text_tokens
@params.max_text_tokens = 300
assert_equal @params.max_text_tokens, 300
@params.max_text_tokens = 0
assert_equal @params.max_text_tokens, 0
end
def test_translate
@params.translate = true
assert @params.translate
@params.translate = false
assert !@params.translate
end
def test_no_context
@params.no_context = true
assert @params.no_context
@params.no_context = false
assert !@params.no_context
end
def test_single_segment
@params.single_segment = true
assert @params.single_segment
@params.single_segment = false
assert !@params.single_segment
end
def test_print_special
@params.print_special = true
assert @params.print_special
@params.print_special = false
assert !@params.print_special
end
def test_print_progress
@params.print_progress = true
assert @params.print_progress
@params.print_progress = false
assert !@params.print_progress
end
def test_print_realtime
@params.print_realtime = true
assert @params.print_realtime
@params.print_realtime = false
assert !@params.print_realtime
end
def test_print_timestamps
@params.print_timestamps = true
assert @params.print_timestamps
@params.print_timestamps = false
assert !@params.print_timestamps
end
def test_suppress_blank
@params.suppress_blank = true
assert @params.suppress_blank
@params.suppress_blank = false
assert !@params.suppress_blank
end
def test_suppress_nst
@params.suppress_nst = true
assert @params.suppress_nst
@params.suppress_nst = false
assert !@params.suppress_nst
end
def test_token_timestamps
@params.token_timestamps = true
assert @params.token_timestamps
@params.token_timestamps = false
assert !@params.token_timestamps
end
def test_split_on_word
@params.split_on_word = true
assert @params.split_on_word
@params.split_on_word = false
assert !@params.split_on_word
end
def test_initial_prompt
assert_nil @params.initial_prompt
@params.initial_prompt = "You are a polite person."
assert_equal "You are a polite person.", @params.initial_prompt
end
def test_temperature
assert_equal 0.0, @params.temperature
@params.temperature = 0.5
assert_equal 0.5, @params.temperature
end
def test_max_initial_ts
assert_equal 1.0, @params.max_initial_ts
@params.max_initial_ts = 600.0
assert_equal 600.0, @params.max_initial_ts
end
def test_length_penalty
assert_equal(-1.0, @params.length_penalty)
@params.length_penalty = 0.5
assert_equal 0.5, @params.length_penalty
end
def test_temperature_inc
assert_in_delta 0.2, @params.temperature_inc
@params.temperature_inc = 0.5
assert_in_delta 0.5, @params.temperature_inc
end
def test_entropy_thold
assert_in_delta 2.4, @params.entropy_thold
@params.entropy_thold = 3.0
assert_in_delta 3.0, @params.entropy_thold
end
def test_logprob_thold
assert_in_delta(-1.0, @params.logprob_thold)
@params.logprob_thold = -0.5
assert_in_delta(-0.5, @params.logprob_thold)
end
def test_no_speech_thold
assert_in_delta 0.6, @params.no_speech_thold
@params.no_speech_thold = 0.2
assert_in_delta 0.2, @params.no_speech_thold
end
def test_new_with_kw_args
params = Whisper::Params.new(language: "es")
assert_equal "es", params.language
assert_equal 1.0, params.max_initial_ts
end
def test_new_with_kw_args_non_existent
assert_raise ArgumentError do
Whisper::Params.new(non_existent: "value")
end
end
def test_new_with_kw_args_wrong_type
assert_raise TypeError do
Whisper::Params.new(language: 3)
end
end
data(PARAM_NAMES.collect {|param| [param, param]}.to_h)
def test_new_with_kw_args_default_values(param)
default_value = @params.send(param)
value = case [param, default_value]
in [*, true | false]
!default_value
in [*, Integer | Float]
default_value + 1
in [:language, *]
"es"
in [:initial_prompt, *]
"Initial prompt"
in [/_callback\Z/, *]
proc {}
in [/_user_data\Z/, *]
Object.new
end
params = Whisper::Params.new(param => value)
if Float === value
assert_in_delta value, params.send(param)
else
assert_equal value, params.send(param)
end
PARAM_NAMES.reject {|name| name == param}.each do |name|
expected = @params.send(name)
actual = params.send(name)
if Float === expected
assert_in_delta expected, actual
else
assert_equal expected, actual
end
end
end
end

View File

@ -0,0 +1,74 @@
require_relative "helper"
class TestSegment < TestBase
def test_iteration
whisper.each_segment do |segment|
assert_instance_of Whisper::Segment, segment
end
end
def test_enumerator
enum = whisper.each_segment
assert_instance_of Enumerator, enum
enum.to_a.each_with_index do |segment, index|
assert_instance_of Whisper::Segment, segment
assert_kind_of Integer, index
end
end
def test_start_time
i = 0
whisper.each_segment do |segment|
assert_equal 0, segment.start_time if i == 0
i += 1
end
end
def test_end_time
i = 0
whisper.each_segment do |segment|
assert_equal whisper.full_get_segment_t1(i) * 10, segment.end_time
i += 1
end
end
def test_no_speech_prob
no_speech_prob = nil
whisper.each_segment do |segment|
no_speech_prob = segment.no_speech_prob
end
assert no_speech_prob > 0.0
end
def test_on_new_segment
params = Whisper::Params.new
seg = nil
index = 0
params.on_new_segment do |segment|
assert_instance_of Whisper::Segment, segment
if index == 0
seg = segment
assert_equal 0, segment.start_time
assert_match(/ask not what your country can do for you, ask what you can do for your country/, segment.text)
end
index += 1
end
whisper.transcribe(AUDIO, params)
assert_equal 0, seg.start_time
assert_match(/ask not what your country can do for you, ask what you can do for your country/, seg.text)
end
def test_on_new_segment_twice
params = Whisper::Params.new
seg = nil
params.on_new_segment do |segment|
seg = segment
return
end
params.on_new_segment do |segment|
assert_same seg, segment
return
end
whisper.transcribe(AUDIO, params)
end
end

View File

@ -1,138 +1,226 @@
TOPDIR = File.expand_path(File.join(File.dirname(__FILE__), '..'))
EXTDIR = File.join(TOPDIR, 'ext')
#$LIBDIR = File.join(TOPDIR, 'lib')
#$:.unshift(LIBDIR)
$:.unshift(EXTDIR)
require_relative "helper"
require "stringio"
require "etc"
require 'whisper'
require 'test/unit'
# Exists to detect memory-related bug
Whisper.log_set ->(level, buffer, user_data) {}, nil
class TestWhisper < Test::Unit::TestCase
class TestWhisper < TestBase
def setup
@params = Whisper::Params.new
end
def test_language
@params.language = "en"
assert_equal @params.language, "en"
@params.language = "auto"
assert_equal @params.language, "auto"
end
def test_offset
@params.offset = 10_000
assert_equal @params.offset, 10_000
@params.offset = 0
assert_equal @params.offset, 0
end
def test_duration
@params.duration = 60_000
assert_equal @params.duration, 60_000
@params.duration = 0
assert_equal @params.duration, 0
end
def test_max_text_tokens
@params.max_text_tokens = 300
assert_equal @params.max_text_tokens, 300
@params.max_text_tokens = 0
assert_equal @params.max_text_tokens, 0
end
def test_translate
@params.translate = true
assert @params.translate
@params.translate = false
assert !@params.translate
end
def test_no_context
@params.no_context = true
assert @params.no_context
@params.no_context = false
assert !@params.no_context
end
def test_single_segment
@params.single_segment = true
assert @params.single_segment
@params.single_segment = false
assert !@params.single_segment
end
def test_print_special
@params.print_special = true
assert @params.print_special
@params.print_special = false
assert !@params.print_special
end
def test_print_progress
@params.print_progress = true
assert @params.print_progress
@params.print_progress = false
assert !@params.print_progress
end
def test_print_realtime
@params.print_realtime = true
assert @params.print_realtime
@params.print_realtime = false
assert !@params.print_realtime
end
def test_print_timestamps
@params.print_timestamps = true
assert @params.print_timestamps
@params.print_timestamps = false
assert !@params.print_timestamps
end
def test_suppress_blank
@params.suppress_blank = true
assert @params.suppress_blank
@params.suppress_blank = false
assert !@params.suppress_blank
end
def test_suppress_non_speech_tokens
@params.suppress_non_speech_tokens = true
assert @params.suppress_non_speech_tokens
@params.suppress_non_speech_tokens = false
assert !@params.suppress_non_speech_tokens
end
def test_token_timestamps
@params.token_timestamps = true
assert @params.token_timestamps
@params.token_timestamps = false
assert !@params.token_timestamps
end
def test_split_on_word
@params.split_on_word = true
assert @params.split_on_word
@params.split_on_word = false
assert !@params.split_on_word
end
def test_speed_up
@params.speed_up = true
assert @params.speed_up
@params.speed_up = false
assert !@params.speed_up
end
def test_whisper
@whisper = Whisper::Context.new(File.join(TOPDIR, '..', '..', 'models', 'ggml-base.en.bin'))
@whisper = Whisper::Context.new("base.en")
params = Whisper::Params.new
params.print_timestamps = false
jfk = File.join(TOPDIR, '..', '..', 'samples', 'jfk.wav')
@whisper.transcribe(jfk, params) {|text|
assert_match /ask not what your country can do for you, ask what you can do for your country/, text
@whisper.transcribe(AUDIO, params) {|text|
assert_match(/ask not what your country can do for you, ask what you can do for your country/, text)
}
end
sub_test_case "After transcription" do
def test_full_n_segments
assert_equal 1, whisper.full_n_segments
end
def test_full_lang_id
assert_equal 0, whisper.full_lang_id
end
def test_full_get_segment
segment = whisper.full_get_segment(0)
assert_equal 0, segment.start_time
assert_match(/ask not what your country can do for you, ask what you can do for your country/, segment.text)
end
def test_full_get_segment_t0
assert_equal 0, whisper.full_get_segment_t0(0)
assert_raise IndexError do
whisper.full_get_segment_t0(whisper.full_n_segments)
end
assert_raise IndexError do
whisper.full_get_segment_t0(-1)
end
end
def test_full_get_segment_t1
t1 = whisper.full_get_segment_t1(0)
assert_kind_of Integer, t1
assert t1 > 0
assert_raise IndexError do
whisper.full_get_segment_t1(whisper.full_n_segments)
end
end
def test_full_get_segment_speaker_turn_next
assert_false whisper.full_get_segment_speaker_turn_next(0)
end
def test_full_get_segment_text
assert_match(/ask not what your country can do for you, ask what you can do for your country/, whisper.full_get_segment_text(0))
end
def test_full_get_segment_no_speech_prob
prob = whisper.full_get_segment_no_speech_prob(0)
assert prob > 0.0
assert prob < 1.0
end
end
def test_lang_max_id
assert_kind_of Integer, Whisper.lang_max_id
end
def test_lang_id
assert_equal 0, Whisper.lang_id("en")
assert_raise ArgumentError do
Whisper.lang_id("non existing language")
end
end
def test_lang_str
assert_equal "en", Whisper.lang_str(0)
assert_raise IndexError do
Whisper.lang_str(Whisper.lang_max_id + 1)
end
end
def test_lang_str_full
assert_equal "english", Whisper.lang_str_full(0)
assert_raise IndexError do
Whisper.lang_str_full(Whisper.lang_max_id + 1)
end
end
def test_log_set
user_data = Object.new
logs = []
log_callback = ->(level, buffer, udata) {
logs << [level, buffer, udata]
}
Whisper.log_set log_callback, user_data
Whisper::Context.new("base.en")
assert logs.length > 30
logs.each do |log|
assert_include [Whisper::LOG_LEVEL_DEBUG, Whisper::LOG_LEVEL_INFO, Whisper::LOG_LEVEL_WARN], log[0]
assert_same user_data, log[2]
end
end
def test_log_suppress
stderr = $stderr
Whisper.log_set ->(level, buffer, user_data) {
# do nothing
}, nil
dev = StringIO.new("")
$stderr = dev
Whisper::Context.new("base.en")
assert_empty dev.string
ensure
$stderr = stderr
end
sub_test_case "full" do
def setup
super
@whisper = Whisper::Context.new("base.en")
@samples = File.read(AUDIO, nil, 78).unpack("s<*").collect {|i| i.to_f / 2**15}
end
def test_full
@whisper.full(@params, @samples, @samples.length)
assert_equal 1, @whisper.full_n_segments
assert_match(/ask not what your country can do for you, ask what you can do for your country/, @whisper.each_segment.first.text)
end
def test_full_without_length
@whisper.full(@params, @samples)
assert_equal 1, @whisper.full_n_segments
assert_match(/ask not what your country can do for you, ask what you can do for your country/, @whisper.each_segment.first.text)
end
def test_full_enumerator
samples = @samples.each
@whisper.full(@params, samples, @samples.length)
assert_equal 1, @whisper.full_n_segments
assert_match(/ask not what your country can do for you, ask what you can do for your country/, @whisper.each_segment.first.text)
end
def test_full_enumerator_without_length
samples = @samples.each
assert_raise ArgumentError do
@whisper.full(@params, samples)
end
end
def test_full_enumerator_with_too_large_length
samples = @samples.each.take(10).to_enum
assert_raise StopIteration do
@whisper.full(@params, samples, 11)
end
end
def test_full_with_memory_view
samples = JFKReader.new(AUDIO)
@whisper.full(@params, samples)
assert_equal 1, @whisper.full_n_segments
assert_match(/ask not what your country can do for you, ask what you can do for your country/, @whisper.each_segment.first.text)
end
def test_full_parallel
nprocessors = 2
@whisper.full_parallel(@params, @samples, @samples.length, nprocessors)
assert_equal nprocessors, @whisper.full_n_segments
text = @whisper.each_segment.collect(&:text).join
assert_match(/ask what you can do/i, text)
assert_match(/for your country/i, text)
end
def test_full_parallel_with_memory_view
nprocessors = 2
samples = JFKReader.new(AUDIO)
@whisper.full_parallel(@params, samples, nil, nprocessors)
assert_equal nprocessors, @whisper.full_n_segments
text = @whisper.each_segment.collect(&:text).join
assert_match(/ask what you can do/i, text)
assert_match(/for your country/i, text)
end
def test_full_parallel_without_length_and_n_processors
@whisper.full_parallel(@params, @samples)
assert_equal 1, @whisper.full_n_segments
text = @whisper.each_segment.collect(&:text).join
assert_match(/ask what you can do/i, text)
assert_match(/for your country/i, text)
end
def test_full_parallel_without_length
nprocessors = 2
@whisper.full_parallel(@params, @samples, nil, nprocessors)
assert_equal nprocessors, @whisper.full_n_segments
text = @whisper.each_segment.collect(&:text).join
assert_match(/ask what you can do/i, text)
assert_match(/for your country/i, text)
end
def test_full_parallel_without_n_processors
@whisper.full_parallel(@params, @samples, @samples.length)
assert_equal 1, @whisper.full_n_segments
text = @whisper.each_segment.collect(&:text).join
assert_match(/ask what you can do/i, text)
assert_match(/for your country/i, text)
end
end
end

View File

@ -1,28 +1,36 @@
require_relative "extsources"
Gem::Specification.new do |s|
s.name = "whispercpp"
s.authors = ["Georgi Gerganov", "Todd A. Fisher"]
s.version = '1.3.0'
s.date = '2024-05-14'
s.version = '1.3.1'
s.date = '2024-12-19'
s.description = %q{High-performance inference of OpenAI's Whisper automatic speech recognition (ASR) model via Ruby}
s.email = 'todd.fisher@gmail.com'
s.extra_rdoc_files = ['LICENSE', 'README.md']
s.files = ["LICENSE", "README.md", "Rakefile", "ext/extconf.rb", "ext/ggml.c", "ext/ruby_whisper.cpp", "ext/whisper.cpp", "ext/dr_wav.h", "ext/ggml.h", "ext/ruby_whisper.h", "ext/whisper.h"]
#### Load-time details
s.require_paths = ['lib','ext']
s.files = `git ls-files . -z`.split("\x0") +
EXTSOURCES.collect {|file|
basename = File.basename(file)
if s.extra_rdoc_files.include?(basename)
basename
else
file.sub("../..", "ext")
end
}
s.summary = %q{Ruby whisper.cpp bindings}
s.test_files = ["tests/test_whisper.rb"]
s.test_files = s.files.select {|file| file.start_with? "tests/"}
s.extensions << 'ext/extconf.rb'
s.required_ruby_version = '>= 3.1.0'
#### Documentation and testing.
s.homepage = 'https://github.com/ggerganov/whisper.cpp'
s.rdoc_options = ['--main', '../../README.md']
s.rdoc_options = ['--main', 'README.md']
s.platform = Gem::Platform::RUBY
s.licenses = ['MIT']
end

535
build-xcframework.sh Executable file
View File

@ -0,0 +1,535 @@
#!/bin/bash
#
# Options
IOS_MIN_OS_VERSION=16.4
MACOS_MIN_OS_VERSION=13.3
VISIONOS_MIN_OS_VERSION=1.0
TVOS_MIN_OS_VERSION=16.4
BUILD_SHARED_LIBS=OFF
WHISPER_BUILD_EXAMPLES=OFF
WHISPER_BUILD_TESTS=OFF
WHISPER_BUILD_SERVER=OFF
GGML_METAL=ON
GGML_METAL_EMBED_LIBRARY=ON
GGML_BLAS_DEFAULT=ON
GGML_METAL_USE_BF16=ON
GGML_OPENMP=OFF
COMMON_C_FLAGS="-Wno-macro-redefined -Wno-shorten-64-to-32 -Wno-unused-command-line-argument -g"
COMMON_CXX_FLAGS="-Wno-macro-redefined -Wno-shorten-64-to-32 -Wno-unused-command-line-argument -g"
# Common options for all builds
COMMON_CMAKE_ARGS=(
-DCMAKE_XCODE_ATTRIBUTE_CODE_SIGNING_REQUIRED=NO
-DCMAKE_XCODE_ATTRIBUTE_CODE_SIGN_IDENTITY=""
-DCMAKE_XCODE_ATTRIBUTE_CODE_SIGNING_ALLOWED=NO
-DCMAKE_XCODE_ATTRIBUTE_DEBUG_INFORMATION_FORMAT="dwarf-with-dsym"
-DCMAKE_XCODE_ATTRIBUTE_GCC_GENERATE_DEBUGGING_SYMBOLS=YES
-DCMAKE_XCODE_ATTRIBUTE_COPY_PHASE_STRIP=NO
-DCMAKE_XCODE_ATTRIBUTE_STRIP_INSTALLED_PRODUCT=NO
-DCMAKE_XCODE_ATTRIBUTE_DEVELOPMENT_TEAM=ggml
-DBUILD_SHARED_LIBS=${BUILD_SHARED_LIBS}
-DWHISPER_BUILD_EXAMPLES=${WHISPER_BUILD_EXAMPLES}
-DWHISPER_BUILD_TESTS=${WHISPER_BUILD_TESTS}
-DWHISPER_BUILD_SERVER=${WHISPER_BUILD_SERVER}
-DGGML_METAL_EMBED_LIBRARY=${GGML_METAL_EMBED_LIBRARY}
-DGGML_BLAS_DEFAULT=${GGML_BLAS_DEFAULT}
-DGGML_METAL=${GGML_METAL}
-DGGML_METAL_USE_BF16=${GGML_METAL_USE_BF16}
-DGGML_NATIVE=OFF
-DGGML_OPENMP=${GGML_OPENMP}
)
check_required_tool() {
local tool=$1
local install_message=$2
if ! command -v $tool &> /dev/null; then
echo "Error: $tool is required but not found."
echo "$install_message"
exit 1
fi
}
echo "Checking for required tools..."
check_required_tool "cmake" "Please install CMake 3.28.0 or later (brew install cmake)"
check_required_tool "xcodebuild" "Please install Xcode and Xcode Command Line Tools (xcode-select --install)"
check_required_tool "libtool" "Please install libtool which should be available with Xcode Command Line Tools (CLT). Make sure Xcode CLT is installed (xcode-select --install)"
check_required_tool "dsymutil" "Please install Xcode and Xcode Command Line Tools (xcode-select --install)"
set -e
## Clean up previous builds
rm -rf build-apple
rm -rf build-ios-sim
rm -rf build-ios-device
rm -rf build-macos
rm -rf build-visionos
rm -rf build-visionos-sim
rm -rf build-tvos-sim
rm -rf build-tvos-device
# Setup the xcframework build directory structure
setup_framework_structure() {
local build_dir=$1
local min_os_version=$2
local platform=$3 # "ios", "macos", "visionos", or "tvos"
local framework_name="whisper"
echo "Creating ${platform}-style framework structure for ${build_dir}"
if [[ "$platform" == "macos" ]]; then
# macOS versioned structure uses versioned directories
mkdir -p ${build_dir}/framework/${framework_name}.framework/Versions/A/Headers
mkdir -p ${build_dir}/framework/${framework_name}.framework/Versions/A/Modules
mkdir -p ${build_dir}/framework/${framework_name}.framework/Versions/A/Resources
# Create symbolic links
ln -sf A ${build_dir}/framework/${framework_name}.framework/Versions/Current
ln -sf Versions/Current/Headers ${build_dir}/framework/${framework_name}.framework/Headers
ln -sf Versions/Current/Modules ${build_dir}/framework/${framework_name}.framework/Modules
ln -sf Versions/Current/Resources ${build_dir}/framework/${framework_name}.framework/Resources
ln -sf Versions/Current/${framework_name} ${build_dir}/framework/${framework_name}.framework/${framework_name}
# Set header and module paths
local header_path=${build_dir}/framework/${framework_name}.framework/Versions/A/Headers/
local module_path=${build_dir}/framework/${framework_name}.framework/Versions/A/Modules/
else
# iOS/VisionOS/tvOS use a flat structure
mkdir -p ${build_dir}/framework/${framework_name}.framework/Headers
mkdir -p ${build_dir}/framework/${framework_name}.framework/Modules
# Remove any existing structure to ensure clean build
rm -rf ${build_dir}/framework/${framework_name}.framework/Versions
# Set header and module paths
local header_path=${build_dir}/framework/${framework_name}.framework/Headers/
local module_path=${build_dir}/framework/${framework_name}.framework/Modules/
fi
# Copy all required headers (common for all platforms)
cp include/whisper.h ${header_path}
cp ggml/include/ggml.h ${header_path}
cp ggml/include/ggml-alloc.h ${header_path}
cp ggml/include/ggml-backend.h ${header_path}
cp ggml/include/ggml-metal.h ${header_path}
cp ggml/include/ggml-cpu.h ${header_path}
cp ggml/include/ggml-blas.h ${header_path}
cp ggml/include/gguf.h ${header_path}
# Create module map (common for all platforms)
cat > ${module_path}module.modulemap << EOF
framework module whisper {
header "whisper.h"
header "ggml.h"
header "ggml-alloc.h"
header "ggml-backend.h"
header "ggml-metal.h"
header "ggml-cpu.h"
header "ggml-blas.h"
header "gguf.h"
link "c++"
link framework "Accelerate"
link framework "Metal"
link framework "Foundation"
export *
}
EOF
# Platform-specific settings for Info.plist
local platform_name=""
local sdk_name=""
local supported_platform=""
case "$platform" in
"ios")
platform_name="iphoneos"
sdk_name="iphoneos${min_os_version}"
supported_platform="iPhoneOS"
local plist_path="${build_dir}/framework/${framework_name}.framework/Info.plist"
local device_family=' <key>UIDeviceFamily</key>
<array>
<integer>1</integer>
<integer>2</integer>
</array>'
;;
"macos")
platform_name="macosx"
sdk_name="macosx${min_os_version}"
supported_platform="MacOSX"
local plist_path="${build_dir}/framework/${framework_name}.framework/Versions/A/Resources/Info.plist"
local device_family=""
;;
"visionos")
platform_name="xros"
sdk_name="xros${min_os_version}"
supported_platform="XRPlatform"
local plist_path="${build_dir}/framework/${framework_name}.framework/Info.plist"
local device_family=""
;;
"tvos")
platform_name="appletvos"
sdk_name="appletvos${min_os_version}"
supported_platform="AppleTVOS"
local plist_path="${build_dir}/framework/${framework_name}.framework/Info.plist"
local device_family=' <key>UIDeviceFamily</key>
<array>
<integer>3</integer>
</array>'
;;
esac
# Create Info.plist
cat > ${plist_path} << EOF
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>CFBundleDevelopmentRegion</key>
<string>en</string>
<key>CFBundleExecutable</key>
<string>whisper</string>
<key>CFBundleIdentifier</key>
<string>org.ggml.whisper</string>
<key>CFBundleInfoDictionaryVersion</key>
<string>6.0</string>
<key>CFBundleName</key>
<string>whisper</string>
<key>CFBundlePackageType</key>
<string>FMWK</string>
<key>CFBundleShortVersionString</key>
<string>1.0</string>
<key>CFBundleVersion</key>
<string>1</string>
<key>MinimumOSVersion</key>
<string>${min_os_version}</string>
<key>CFBundleSupportedPlatforms</key>
<array>
<string>${supported_platform}</string>
</array>${device_family}
<key>DTPlatformName</key>
<string>${platform_name}</string>
<key>DTSDKName</key>
<string>${sdk_name}</string>
</dict>
</plist>
EOF
}
# Create dynamic libraries from static libraries.
combine_static_libraries() {
local build_dir="$1"
local release_dir="$2"
local platform="$3" # "ios", "macos", "visionos", or "tvos"
local is_simulator="$4"
local base_dir="$(pwd)"
local framework_name="whisper"
# Determine output path based on platform
local output_lib=""
if [[ "$platform" == "macos" ]]; then
# macOS uses versioned structure
output_lib="${build_dir}/framework/${framework_name}.framework/Versions/A/${framework_name}"
else
# iOS, visionOS, and tvOS use a directory flat structure
output_lib="${build_dir}/framework/${framework_name}.framework/${framework_name}"
fi
local libs=(
"${base_dir}/${build_dir}/src/${release_dir}/libwhisper.a"
"${base_dir}/${build_dir}/ggml/src/${release_dir}/libggml.a"
"${base_dir}/${build_dir}/ggml/src/${release_dir}/libggml-base.a"
"${base_dir}/${build_dir}/ggml/src/${release_dir}/libggml-cpu.a"
"${base_dir}/${build_dir}/ggml/src/ggml-metal/${release_dir}/libggml-metal.a"
"${base_dir}/${build_dir}/ggml/src/ggml-blas/${release_dir}/libggml-blas.a"
)
if [[ "$platform" == "macos" || "$platform" == "ios" ]]; then
echo "Adding libwhisper.coreml library to the build."
libs+=(
"${base_dir}/${build_dir}/src/${release_dir}/libwhisper.coreml.a"
)
fi
# Create temporary directory for processing
local temp_dir="${base_dir}/${build_dir}/temp"
echo "Creating temporary directory: ${temp_dir}"
mkdir -p "${temp_dir}"
# Since we have multiple architectures libtool will find object files that do not
# match the target architecture. We suppress these warnings.
libtool -static -o "${temp_dir}/combined.a" "${libs[@]}" 2> /dev/null
# Determine SDK, architectures, and install_name based on platform and simulator flag.
local sdk=""
local archs=""
local min_version_flag=""
local install_name=""
local frameworks="-framework Foundation -framework Metal -framework Accelerate"
case "$platform" in
"ios")
if [[ "$is_simulator" == "true" ]]; then
sdk="iphonesimulator"
archs="arm64 x86_64"
min_version_flag="-mios-simulator-version-min=${IOS_MIN_OS_VERSION}"
else
sdk="iphoneos"
archs="arm64"
min_version_flag="-mios-version-min=${IOS_MIN_OS_VERSION}"
fi
install_name="@rpath/whisper.framework/whisper"
frameworks+=" -framework CoreML"
;;
"macos")
sdk="macosx"
archs="arm64 x86_64"
min_version_flag="-mmacosx-version-min=${MACOS_MIN_OS_VERSION}"
install_name="@rpath/whisper.framework/Versions/Current/whisper"
frameworks+=" -framework CoreML"
;;
"visionos")
if [[ "$is_simulator" == "true" ]]; then
sdk="xrsimulator"
archs="arm64 x86_64"
min_version_flag="-mtargetos=xros${VISIONOS_MIN_OS_VERSION}-simulator"
else
sdk="xros"
archs="arm64"
min_version_flag="-mtargetos=xros${VISIONOS_MIN_OS_VERSION}"
fi
# Use flat structure for visionOS, same as iOS
install_name="@rpath/whisper.framework/whisper"
;;
"tvos")
if [[ "$is_simulator" == "true" ]]; then
sdk="appletvsimulator"
archs="arm64 x86_64"
min_version_flag="-mtvos-simulator-version-min=${TVOS_MIN_OS_VERSION}"
else
sdk="appletvos"
archs="arm64"
min_version_flag="-mtvos-version-min=${TVOS_MIN_OS_VERSION}"
fi
install_name="@rpath/whisper.framework/whisper"
;;
esac
# Build architecture flags
local arch_flags=""
for arch in $archs; do
arch_flags+=" -arch $arch"
done
# Create dynamic library
echo "Creating dynamic library for ${platform}."
xcrun -sdk $sdk clang++ -dynamiclib \
-isysroot $(xcrun --sdk $sdk --show-sdk-path) \
$arch_flags \
$min_version_flag \
-Wl,-force_load,"${temp_dir}/combined.a" \
$frameworks \
-install_name "$install_name" \
-o "${base_dir}/${output_lib}"
# Platform-specific post-processing for device builds
if [[ "$is_simulator" == "false" ]]; then
if command -v vtool &>/dev/null; then
case "$platform" in
"ios")
echo "Marking binary as a framework binary for iOS..."
vtool -set-build-version ios ${IOS_MIN_OS_VERSION} ${IOS_MIN_OS_VERSION} -replace \
-output "${base_dir}/${output_lib}" "${base_dir}/${output_lib}"
;;
"visionos")
echo "Marking binary as a framework binary for visionOS..."
vtool -set-build-version xros ${VISIONOS_MIN_OS_VERSION} ${VISIONOS_MIN_OS_VERSION} -replace \
-output "${base_dir}/${output_lib}" "${base_dir}/${output_lib}"
;;
"tvos")
echo "Marking binary as a framework binary for tvOS..."
vtool -set-build-version tvos ${TVOS_MIN_OS_VERSION} ${TVOS_MIN_OS_VERSION} -replace \
-output "${base_dir}/${output_lib}" "${base_dir}/${output_lib}"
;;
esac
else
echo "Warning: vtool not found. Binary may not pass App Store validation."
fi
fi
echo "Creating properly formatted dSYM..."
# Create a separate directory for dSYMs for all platforms
mkdir -p "${base_dir}/${build_dir}/dSYMs"
# iOS and visionOS style dSYM (flat structure)
if [[ "$platform" == "ios" || "$platform" == "visionos" || "$platform" == "tvos" ]]; then
# Generate dSYM in the dSYMs directory
xcrun dsymutil "${base_dir}/${output_lib}" -o "${base_dir}/${build_dir}/dSYMs/whisper.dSYM"
# Create a copy of the binary that will be stripped
cp "${base_dir}/${output_lib}" "${temp_dir}/binary_to_strip"
# Strip debug symbols from the copy
xcrun strip -S "${temp_dir}/binary_to_strip" -o "${temp_dir}/stripped_lib"
# Replace the original with the stripped version
mv "${temp_dir}/stripped_lib" "${base_dir}/${output_lib}"
else
# macOS style dSYM
# First strip debug info to a separate file
xcrun strip -S "${base_dir}/${output_lib}" -o "${temp_dir}/stripped_lib"
# Generate dSYM in the dSYMs directory
xcrun dsymutil "${base_dir}/${output_lib}" -o "${base_dir}/${build_dir}/dSYMs/whisper.dSYM"
# Replace original binary with stripped version
mv "${temp_dir}/stripped_lib" "${base_dir}/${output_lib}"
fi
# Remove any automatically generated dSYM files in the framework structure as they will
# otherwise case Invalid Bundle Structure validation errors.
if [ -d "${base_dir}/${output_lib}.dSYM" ]; then
echo "Removing generated dSYM file in framework structure: ${base_dir}/${output_lib}.dSYM"
rm -rf "${base_dir}/${output_lib}.dSYM"
fi
# Clean up
rm -rf "${temp_dir}"
}
echo "Building for iOS simulator..."
cmake -B build-ios-sim -G Xcode \
"${COMMON_CMAKE_ARGS[@]}" \
-DCMAKE_OSX_DEPLOYMENT_TARGET=${IOS_MIN_OS_VERSION} \
-DIOS=ON \
-DCMAKE_SYSTEM_NAME=iOS \
-DCMAKE_OSX_SYSROOT=iphonesimulator \
-DCMAKE_OSX_ARCHITECTURES="arm64;x86_64" \
-DCMAKE_XCODE_ATTRIBUTE_SUPPORTED_PLATFORMS=iphonesimulator \
-DCMAKE_C_FLAGS="${COMMON_C_FLAGS}" \
-DCMAKE_CXX_FLAGS="${COMMON_CXX_FLAGS}" \
-DWHISPER_COREML="ON" \
-DWHISPER_COREML_ALLOW_FALLBACK="ON" \
-S .
cmake --build build-ios-sim --config Release -- -quiet
echo "Building for iOS devices..."
cmake -B build-ios-device -G Xcode \
"${COMMON_CMAKE_ARGS[@]}" \
-DCMAKE_OSX_DEPLOYMENT_TARGET=${IOS_MIN_OS_VERSION} \
-DCMAKE_OSX_SYSROOT=iphoneos \
-DCMAKE_OSX_ARCHITECTURES="arm64" \
-DCMAKE_XCODE_ATTRIBUTE_SUPPORTED_PLATFORMS=iphoneos \
-DCMAKE_C_FLAGS="${COMMON_C_FLAGS}" \
-DCMAKE_CXX_FLAGS="${COMMON_CXX_FLAGS}" \
-DWHISPER_COREML="ON" \
-DWHISPER_COREML_ALLOW_FALLBACK="ON" \
-S .
cmake --build build-ios-device --config Release -- -quiet
echo "Building for macOS..."
cmake -B build-macos -G Xcode \
"${COMMON_CMAKE_ARGS[@]}" \
-DCMAKE_OSX_DEPLOYMENT_TARGET=${MACOS_MIN_OS_VERSION} \
-DCMAKE_OSX_ARCHITECTURES="arm64;x86_64" \
-DCMAKE_C_FLAGS="${COMMON_C_FLAGS}" \
-DCMAKE_CXX_FLAGS="${COMMON_CXX_FLAGS}" \
-DWHISPER_COREML="ON" \
-DWHISPER_COREML_ALLOW_FALLBACK="ON" \
-S .
cmake --build build-macos --config Release -- -quiet
echo "Building for visionOS..."
cmake -B build-visionos -G Xcode \
"${COMMON_CMAKE_ARGS[@]}" \
-DCMAKE_OSX_DEPLOYMENT_TARGET=${VISIONOS_MIN_OS_VERSION} \
-DCMAKE_OSX_ARCHITECTURES="arm64" \
-DCMAKE_SYSTEM_NAME=visionOS \
-DCMAKE_OSX_SYSROOT=xros \
-DCMAKE_XCODE_ATTRIBUTE_SUPPORTED_PLATFORMS=xros \
-DCMAKE_C_FLAGS="-D_XOPEN_SOURCE=700 ${COMMON_C_FLAGS}" \
-DCMAKE_CXX_FLAGS="-D_XOPEN_SOURCE=700 ${COMMON_CXX_FLAGS}" \
-S .
cmake --build build-visionos --config Release -- -quiet
echo "Building for visionOS simulator..."
cmake -B build-visionos-sim -G Xcode \
"${COMMON_CMAKE_ARGS[@]}" \
-DCMAKE_OSX_DEPLOYMENT_TARGET=${VISIONOS_MIN_OS_VERSION} \
-DCMAKE_OSX_ARCHITECTURES="arm64;x86_64" \
-DCMAKE_SYSTEM_NAME=visionOS \
-DCMAKE_OSX_SYSROOT=xrsimulator \
-DCMAKE_XCODE_ATTRIBUTE_SUPPORTED_PLATFORMS=xrsimulator \
-DCMAKE_C_FLAGS="-D_XOPEN_SOURCE=700 ${COMMON_C_FLAGS}" \
-DCMAKE_CXX_FLAGS="-D_XOPEN_SOURCE=700 ${COMMON_CXX_FLAGS}" \
-S .
cmake --build build-visionos-sim --config Release -- -quiet
# Add tvOS builds (might need the same u_int definitions as watchOS and visionOS)
echo "Building for tvOS simulator..."
cmake -B build-tvos-sim -G Xcode \
"${COMMON_CMAKE_ARGS[@]}" \
-DCMAKE_OSX_DEPLOYMENT_TARGET=${TVOS_MIN_OS_VERSION} \
-DCMAKE_SYSTEM_NAME=tvOS \
-DCMAKE_OSX_SYSROOT=appletvsimulator \
-DCMAKE_OSX_ARCHITECTURES="arm64;x86_64" \
-DGGML_METAL=ON \
-DCMAKE_XCODE_ATTRIBUTE_SUPPORTED_PLATFORMS=appletvsimulator \
-DCMAKE_C_FLAGS="${COMMON_C_FLAGS}" \
-DCMAKE_CXX_FLAGS="${COMMON_CXX_FLAGS}" \
-S .
cmake --build build-tvos-sim --config Release -- -quiet
echo "Building for tvOS devices..."
cmake -B build-tvos-device -G Xcode \
"${COMMON_CMAKE_ARGS[@]}" \
-DCMAKE_OSX_DEPLOYMENT_TARGET=${TVOS_MIN_OS_VERSION} \
-DCMAKE_SYSTEM_NAME=tvOS \
-DCMAKE_OSX_SYSROOT=appletvos \
-DCMAKE_OSX_ARCHITECTURES="arm64" \
-DGGML_METAL=ON \
-DCMAKE_XCODE_ATTRIBUTE_SUPPORTED_PLATFORMS=appletvos \
-DCMAKE_C_FLAGS="${COMMON_C_FLAGS}" \
-DCMAKE_CXX_FLAGS="${COMMON_CXX_FLAGS}" \
-S .
cmake --build build-tvos-device --config Release -- -quiet
# Setup frameworks and copy binaries and headers
echo "Setting up framework structures..."
setup_framework_structure "build-ios-sim" ${IOS_MIN_OS_VERSION} "ios"
setup_framework_structure "build-ios-device" ${IOS_MIN_OS_VERSION} "ios"
setup_framework_structure "build-macos" ${MACOS_MIN_OS_VERSION} "macos"
setup_framework_structure "build-visionos" ${VISIONOS_MIN_OS_VERSION} "visionos"
setup_framework_structure "build-visionos-sim" ${VISIONOS_MIN_OS_VERSION} "visionos"
setup_framework_structure "build-tvos-sim" ${TVOS_MIN_OS_VERSION} "tvos"
setup_framework_structure "build-tvos-device" ${TVOS_MIN_OS_VERSION} "tvos"
# Create dynamic libraries from static libraries
echo "Creating dynamic libraries from static libraries..."
combine_static_libraries "build-ios-sim" "Release-iphonesimulator" "ios" "true"
combine_static_libraries "build-ios-device" "Release-iphoneos" "ios" "false"
combine_static_libraries "build-macos" "Release" "macos" "false"
combine_static_libraries "build-visionos" "Release-xros" "visionos" "false"
combine_static_libraries "build-visionos-sim" "Release-xrsimulator" "visionos" "true"
combine_static_libraries "build-tvos-sim" "Release-appletvsimulator" "tvos" "true"
combine_static_libraries "build-tvos-device" "Release-appletvos" "tvos" "false"
# Create XCFramework with correct debug symbols paths
echo "Creating XCFramework..."
xcodebuild -create-xcframework \
-framework $(pwd)/build-ios-sim/framework/whisper.framework \
-debug-symbols $(pwd)/build-ios-sim/dSYMs/whisper.dSYM \
-framework $(pwd)/build-ios-device/framework/whisper.framework \
-debug-symbols $(pwd)/build-ios-device/dSYMs/whisper.dSYM \
-framework $(pwd)/build-macos/framework/whisper.framework \
-debug-symbols $(pwd)/build-macos/dSYMS/whisper.dSYM \
-framework $(pwd)/build-visionos/framework/whisper.framework \
-debug-symbols $(pwd)/build-visionos/dSYMs/whisper.dSYM \
-framework $(pwd)/build-visionos-sim/framework/whisper.framework \
-debug-symbols $(pwd)/build-visionos-sim/dSYMs/whisper.dSYM \
-framework $(pwd)/build-tvos-device/framework/whisper.framework \
-debug-symbols $(pwd)/build-tvos-device/dSYMs/whisper.dSYM \
-framework $(pwd)/build-tvos-sim/framework/whisper.framework \
-debug-symbols $(pwd)/build-tvos-sim/dSYMs/whisper.dSYM \
-output $(pwd)/build-apple/whisper.xcframework

41
ci/README.md Normal file
View File

@ -0,0 +1,41 @@
# CI
In addition to [Github Actions](https://github.com/ggerganov/whisper.cpp/actions) `whisper.cpp` uses a custom CI framework:
https://github.com/ggml-org/ci
It monitors the `master` branch for new commits and runs the
[ci/run.sh](https://github.com/ggerganov/whisper.cpp/blob/master/ci/run.sh) script on dedicated cloud instances. This allows us
to execute heavier workloads compared to just using Github Actions. Also with time, the cloud instances will be scaled
to cover various hardware architectures, including GPU and Apple Silicon instances.
Collaborators can optionally trigger the CI run by adding the `ggml-ci` keyword to their commit message.
Only the branches of this repo are monitored for this keyword.
It is a good practice, before publishing changes to execute the full CI locally on your machine:
```bash
mkdir tmp
# CPU-only build
bash ./ci/run.sh ./tmp/results ./tmp/mnt
# with CUDA support
GG_BUILD_CUDA=1 bash ./ci/run.sh ./tmp/results ./tmp/mnt
```
## Environment Variables
The CI script supports several environment variables to control the build:
| Variable | Description |
|----------|-------------|
| `GG_BUILD_CUDA` | Enable NVIDIA CUDA GPU acceleration |
| `GG_BUILD_SYCL` | Enable Intel SYCL acceleration |
| `GG_BUILD_VULKAN` | Enable Vulkan GPU acceleration |
| `GG_BUILD_METAL` | Enable Metal acceleration on Apple Silicon |
| `GG_BUILD_BLAS` | Enable BLAS CPU acceleration |
| `GG_BUILD_OPENVINO` | Enable OpenVINO support |
| `GG_BUILD_COREML` | Enable Core ML support for Apple Neural Engine |
| `GG_BUILD_LOW_PERF` | Limit tests for low-performance hardware |
| `GG_BUILD_TEST_MODELS` | Comma-separated list of models to test (e.g. "tiny.en,tiny,base,medium", defaults to all models unless `GG_BUILD_LOW_PERF` is set) |

336
ci/run.sh Normal file
View File

@ -0,0 +1,336 @@
#!/bin/bash
#
# sample usage:
#
# mkdir tmp
#
# # CPU-only build
# bash ./ci/run.sh ./tmp/results ./tmp/mnt
#
# # with CUDA support
# GG_BUILD_CUDA=1 bash ./ci/run.sh ./tmp/results ./tmp/mnt
#
# # with SYCL support
# GG_BUILD_SYCL=1 bash ./ci/run.sh ./tmp/results ./tmp/mnt
if [ -z "$2" ]; then
echo "usage: $0 <output-dir> <mnt-dir>"
exit 1
fi
mkdir -p "$1"
mkdir -p "$2"
OUT=$(realpath "$1")
MNT=$(realpath "$2")
rm -f "$OUT/*.log"
rm -f "$OUT/*.exit"
rm -f "$OUT/*.md"
sd=`dirname $0`
cd $sd/../
SRC=`pwd`
ALL_MODELS=( "tiny.en" "tiny" "base.en" "base" "small.en" "small" "medium.en" "medium" "large-v1" "large-v2" "large-v3" "large-v3-turbo" )
BENCH_N_THREADS=4
BENCH_ENCODER_ONLY=0
BENCH_FLASH_ATTN=0
# check for user-specified models first. if not specified, use fast models
if [ ! -z ${GG_BUILD_TEST_MODELS} ]; then
IFS=',' read -r -a MODELS <<< "${GG_BUILD_TEST_MODELS}"
else
if [ ! -z ${GG_BUILD_LOW_PERF} ]; then
MODELS=( "tiny" "base" "small" )
else
MODELS=("${ALL_MODELS[@]}")
fi
fi
CMAKE_EXTRA="-DWHISPER_FATAL_WARNINGS=ON"
if [ ! -z ${GG_BUILD_CUDA} ]; then
CMAKE_EXTRA="${CMAKE_EXTRA} -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=native"
fi
if [ ! -z ${GG_BUILD_SYCL} ]; then
if [ -z ${ONEAPI_ROOT} ]; then
echo "Not detected ONEAPI_ROOT, please install oneAPI base toolkit and enable it by:"
echo "source /opt/intel/oneapi/setvars.sh"
exit 1
fi
CMAKE_EXTRA="${CMAKE_EXTRA} -DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_SYCL_F16=ON"
fi
if [ ! -z ${GG_BUILD_OPENVINO} ]; then
CMAKE_EXTRA="${CMAKE_EXTRA} -DWHISPER_OPENVINO=ON"
fi
if [ ! -z ${GG_BUILD_METAL} ]; then
CMAKE_EXTRA="${CMAKE_EXTRA} -DGGML_METAL=ON"
fi
if [ ! -z ${GG_BUILD_VULKAN} ]; then
CMAKE_EXTRA="${CMAKE_EXTRA} -DGGML_VULKAN=ON"
fi
if [ ! -z ${GG_BUILD_BLAS} ]; then
CMAKE_EXTRA="${CMAKE_EXTRA} -DGGML_BLAS=ON"
fi
if [ ! -z ${GG_BUILD_COREML} ]; then
CMAKE_EXTRA="${CMAKE_EXTRA} -DWHISPER_COREML=ON"
fi
## helpers
# download a file if it does not exist or if it is outdated
function gg_wget {
local out=$1
local url=$2
local cwd=`pwd`
mkdir -p $out
cd $out
# should not re-download if file is the same
wget -nv -N $url
cd $cwd
}
function gg_download_model {
local model_name=$1
local model_file="$MNT/models/ggml-${model_name}.bin"
if [ ! -f ${model_file} ]; then
local cwd=`pwd`
mkdir -p "$MNT/models"
cd "$MNT/models"
bash "$cwd/models/download-ggml-model.sh" ${model_name} .
cd "$cwd"
fi
}
function gg_printf {
printf -- "$@" >> $OUT/README.md
}
# Helper function to check command exit status
function gg_check_last_command_status {
local exit_file=$1
local command_name=$2
local exit_status=$?
echo "$exit_status" > "$exit_file"
if [ $exit_status -ne 0 ]; then
echo "Error: Command $command_name failed with exit status $exit_status"
return 1
fi
return 0
}
# Usage: gg_run <test_name> [additional_args...]
#
# Parameters:
# test_name - Name of the test to run (calls gg_run_<test_name>)
# additional_args - Any additional arguments to pass to the test function (first argument is appended to the log filename)
function gg_run {
ci=$1
if [ $# -gt 1 ]; then
ci="${ci}_${2}"
fi
set -o pipefail
set -x
gg_run_$1 "$@" | tee $OUT/$ci.log
cur=$?
echo "$cur" > $OUT/$ci.exit
set +x
set +o pipefail
gg_sum_$1 "$@"
ret=$((ret | cur))
}
function gg_check_build_requirements {
if ! command -v cmake &> /dev/null; then
gg_printf 'cmake not found, please install'
fi
if ! command -v make &> /dev/null; then
gg_printf 'make not found, please install'
fi
}
## ci
function gg_run_ctest {
mode=$2
cd ${SRC}
rm -rf build-ci-${mode} && mkdir build-ci-${mode} && cd build-ci-${mode}
set -e
gg_check_build_requirements
(time cmake -DCMAKE_BUILD_TYPE=${mode} ${CMAKE_EXTRA} .. ) 2>&1 | tee -a $OUT/${ci}-cmake.log
(time make -j$(nproc) ) 2>&1 | tee -a $OUT/${ci}-make.log
(time ctest --output-on-failure -L main -E test-opt ) 2>&1 | tee -a $OUT/${ci}-ctest.log
set +e
}
function gg_sum_ctest {
mode=$2
gg_printf '### %s\n\n' "${ci}"
gg_printf 'Runs ctest in '${mode}' mode\n'
gg_printf '- status: %s\n' "$(cat $OUT/${ci}.exit)"
gg_printf '```\n'
gg_printf '%s\n' "$(cat $OUT/${ci}-ctest.log)"
gg_printf '```\n'
}
function gg_run_bench {
cd ${SRC}
# set flash attention flag if enabled
fattn=""
if [ "$BENCH_FLASH_ATTN" -eq 1 ]; then
fattn="-fa"
fi
# run memcpy benchmark if not encoder-only mode
if [ "$BENCH_ENCODER_ONLY" -eq 0 ]; then
echo "Running memcpy benchmark"
(time ./build-ci-release/bin/whisper-bench -w 1 -t $BENCH_N_THREADS 2>&1) | tee -a $OUT/${ci}-memcpy.log
gg_check_last_command_status "$OUT/${ci}-memcpy.exit" "memcpy benchmark"
echo "Running ggml_mul_mat benchmark with $BENCH_N_THREADS threads"
(time ./build-ci-release/bin/whisper-bench -w 2 -t $BENCH_N_THREADS 2>&1) | tee -a $OUT/${ci}-mul_mat.log
gg_check_last_command_status "$OUT/${ci}-mul_mat.exit" "ggml_mul_mat benchmark"
fi
echo "Running benchmark for all models"
# generate header for the benchmark table
{
printf "| %16s | %13s | %3s | %3s | %7s | %7s | %7s | %7s | %7s |\n" "Config" "Model" "Th" "FA" "Enc." "Dec." "Bch5" "PP" "Commit"
printf "| %16s | %13s | %3s | %3s | %7s | %7s | %7s | %7s | %7s |\n" "---" "---" "---" "---" "---" "---" "---" "---" "---"
} | tee -a $OUT/${ci}-models-table.log
# run benchmark for each model
for model in "${MODELS[@]}"; do
echo "Benchmarking model: $model"
# run the benchmark and capture output
output=$(./build-ci-release/bin/whisper-bench -m $MNT/models/ggml-$model.bin -t $BENCH_N_THREADS $fattn 2>&1)
ret=$?
# save the raw output
echo "$output" > $OUT/${ci}-bench-$model.log
if [ $ret -eq 0 ]; then
# parse the benchmark results
encode_time=$(echo "$output" | grep "encode time" | awk '{print $11}')
decode_time=$(echo "$output" | grep "decode time" | awk '{print $11}')
batchd_time=$(echo "$output" | grep "batchd time" | awk '{print $11}')
prompt_time=$(echo "$output" | grep "prompt time" | awk '{print $11}')
system_info=$(echo "$output" | grep "system_info")
actual_threads=$(echo "$output" | grep "system_info" | awk '{print $4}')
# determine configuration
config=""
if [[ $system_info == *"AVX2 = 1"* ]]; then
config="$config AVX2"
fi
if [[ $system_info == *"NEON = 1"* ]]; then
config="$config NEON"
fi
if [[ $system_info == *"BLAS = 1"* ]]; then
config="$config BLAS"
fi
if [[ $system_info == *"COREML = 1"* ]]; then
config="$config COREML"
fi
if [[ $system_info == *"CUDA = 1"* ]]; then
config="$config CUDA"
fi
if [[ $system_info == *"METAL = 1"* ]]; then
config="$config METAL"
fi
# get commit hash
commit=$(git rev-parse --short HEAD)
# add row to benchmark table
printf "| %16s | %13s | %3s | %3s | %7s | %7s | %7s | %7s | %7s |\n" \
"$config" "$model" "$actual_threads" "$BENCH_FLASH_ATTN" "$encode_time" "$decode_time" "$batchd_time" "$prompt_time" "$commit" \
| tee -a $OUT/${ci}-models-table.log
else
echo "Benchmark failed for model: $model" | tee -a $OUT/${ci}-bench-errors.log
fi
done
}
function gg_sum_bench {
gg_printf '### %s\n\n' "${ci}"
gg_printf 'Whisper Benchmark Results\n'
gg_printf '- status: %s\n' "$(cat $OUT/${ci}.exit)"
# show memcpy and ggml_mul_mat benchmark results if available
if [ "$BENCH_ENCODER_ONLY" -eq 0 ]; then
if [ -f "$OUT/${ci}-memcpy.log" ]; then
gg_printf '#### memcpy Benchmark\n\n'
gg_printf '```\n%s\n```\n\n' "$(cat $OUT/${ci}-memcpy.log)"
fi
if [ -f "$OUT/${ci}-mul_mat.log" ]; then
gg_printf '#### ggml_mul_mat Benchmark\n\n'
gg_printf '```\n%s\n```\n\n' "$(cat $OUT/${ci}-mul_mat.log)"
fi
fi
# show model benchmark results
gg_printf '#### Model Benchmarks\n\n'
if [ -f "$OUT/${ci}-models-table.log" ]; then
gg_printf '%s\n\n' "$(cat $OUT/${ci}-models-table.log)"
else
gg_printf 'No model benchmark results available.\n\n'
fi
# show any errors that occurred
if [ -f "$OUT/${ci}-bench-errors.log" ]; then
gg_printf '#### Benchmark Errors\n\n'
gg_printf '```\n%s\n```\n\n' "$(cat $OUT/${ci}-bench-errors.log)"
fi
}
ret=0
for model in "${MODELS[@]}"; do
test $ret -eq 0 && gg_download_model ${model}
done
if [ -z ${GG_BUILD_SYCL}]; then
test $ret -eq 0 && gg_run ctest debug
fi
test $ret -eq 0 && gg_run ctest release
test $ret -eq 0 && gg_run bench
exit $ret

28
close-issue.yml Normal file
View File

@ -0,0 +1,28 @@
name: Close inactive issues
on:
schedule:
- cron: "42 0 * * *"
# Fine-grant permission
# https://docs.github.com/en/actions/security-for-github-actions/security-guides/automatic-token-authentication#modifying-the-permissions-for-the-github_token
permissions:
issues: write
jobs:
close-issues:
runs-on: ubuntu-latest
permissions:
issues: write
pull-requests: write
steps:
- uses: actions/stale@v5
with:
exempt-issue-labels: "refactor,help wanted,good first issue,research,bug,roadmap"
days-before-issue-stale: 30
days-before-issue-close: 14
stale-issue-label: "stale"
close-issue-message: "This issue was closed because it has been inactive for 14 days since being marked as stale."
days-before-pr-stale: -1
days-before-pr-close: -1
operations-per-run: 10000
repo-token: ${{ secrets.GITHUB_TOKEN }}

View File

@ -13,5 +13,4 @@ set_target_properties(${TARGET}
PROPERTIES
EXPORT_COMPILE_COMMANDS ON
RUNTIME_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/bin"
INSTALL_RPATH "${CMAKE_INSTALL_PREFIX}/lib"
)

View File

@ -36,7 +36,7 @@ include(FindPackageHandleStandardArgs)
# The default components were taken from a survey over other FindFFMPEG.cmake files
if (NOT FFmpeg_FIND_COMPONENTS)
set(FFmpeg_FIND_COMPONENTS AVFORMAT AVCODEC AVUTIL SWRESAMPLE)
set(FFmpeg_FIND_COMPONENTS AVFORMAT AVCODEC AVUTIL SWRESAMPLE)
endif()
#
@ -84,7 +84,7 @@ macro(find_component _component _pkgconfig _library _header)
# CMake's default is to search first for shared libraries and then for static libraries.
# Todo later: add option to prefer static libs over dynamic:
find_library(${_component}_LIBRARIES NAMES ${_library} lib${_library}.a
find_library(${_component}_LIBRARIES NAMES ${_library} lib${_library}.a
HINTS
${PC_${_component}_LIBDIR}
${PC_${_component}_LIBRARY_DIRS}

60
cmake/build-info.cmake Normal file
View File

@ -0,0 +1,60 @@
set(BUILD_NUMBER 0)
set(BUILD_COMMIT "unknown")
set(BUILD_COMPILER "unknown")
set(BUILD_TARGET "unknown")
# Look for git
find_package(Git)
if(NOT Git_FOUND)
find_program(GIT_EXECUTABLE NAMES git git.exe)
if(GIT_EXECUTABLE)
set(Git_FOUND TRUE)
message(STATUS "Found Git: ${GIT_EXECUTABLE}")
else()
message(WARNING "Git not found. Build info will not be accurate.")
endif()
endif()
# Get the commit count and hash
if(Git_FOUND)
execute_process(
COMMAND ${GIT_EXECUTABLE} rev-parse --short HEAD
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
OUTPUT_VARIABLE HEAD
OUTPUT_STRIP_TRAILING_WHITESPACE
RESULT_VARIABLE RES
)
if (RES EQUAL 0)
set(BUILD_COMMIT ${HEAD})
endif()
execute_process(
COMMAND ${GIT_EXECUTABLE} rev-list --count HEAD
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
OUTPUT_VARIABLE COUNT
OUTPUT_STRIP_TRAILING_WHITESPACE
RESULT_VARIABLE RES
)
if (RES EQUAL 0)
set(BUILD_NUMBER ${COUNT})
endif()
endif()
if(MSVC)
set(BUILD_COMPILER "${CMAKE_C_COMPILER_ID} ${CMAKE_C_COMPILER_VERSION}")
set(BUILD_TARGET ${CMAKE_VS_PLATFORM_NAME})
add_compile_options("$<$<COMPILE_LANGUAGE:C>:/utf-8>")
add_compile_options("$<$<COMPILE_LANGUAGE:CXX>:/utf-8>")
else()
execute_process(
COMMAND sh -c "$@ --version | head -1" _ ${CMAKE_C_COMPILER}
OUTPUT_VARIABLE OUT
OUTPUT_STRIP_TRAILING_WHITESPACE
)
set(BUILD_COMPILER ${OUT})
execute_process(
COMMAND ${CMAKE_C_COMPILER} -dumpmachine
OUTPUT_VARIABLE OUT
OUTPUT_STRIP_TRAILING_WHITESPACE
)
set(BUILD_TARGET ${OUT})
endif()

View File

@ -0,0 +1,65 @@
set(WHISPER_VERSION @WHISPER_INSTALL_VERSION@)
set(WHISPER_BUILD_COMMIT @WHISPER_BUILD_COMMIT@)
set(WHISPER_BUILD_NUMBER @WHISPER_BUILD_NUMBER@)
set(WHISPER_SHARED_LIB @BUILD_SHARED_LIBS@)
set(GGML_BLAS @GGML_BLAS@)
set(GGML_CUDA @GGML_CUDA@)
set(GGML_METAL @GGML_METAL@)
set(GGML_HIPBLAS @GGML_HIPBLAS@)
set(GGML_ACCELERATE @GGML_ACCELERATE@)
@PACKAGE_INIT@
set_and_check(WHISPER_INCLUDE_DIR "@PACKAGE_WHISPER_INCLUDE_INSTALL_DIR@")
set_and_check(WHISPER_LIB_DIR "@PACKAGE_WHISPER_LIB_INSTALL_DIR@")
set_and_check(WHISPER_BIN_DIR "@PACKAGE_WHISPER_BIN_INSTALL_DIR@")
# Ensure transient dependencies satisfied
find_package(Threads REQUIRED)
if (APPLE AND GGML_ACCELERATE)
find_library(ACCELERATE_FRAMEWORK Accelerate REQUIRED)
endif()
if (GGML_BLAS)
find_package(BLAS REQUIRED)
endif()
if (GGML_CUDA)
find_package(CUDAToolkit REQUIRED)
endif()
if (GGML_METAL)
find_library(FOUNDATION_LIBRARY Foundation REQUIRED)
find_library(METAL_FRAMEWORK Metal REQUIRED)
find_library(METALKIT_FRAMEWORK MetalKit REQUIRED)
endif()
if (GGML_HIPBLAS)
find_package(hip REQUIRED)
find_package(hipblas REQUIRED)
find_package(rocblas REQUIRED)
endif()
find_library(whisper_LIBRARY whisper
REQUIRED
HINTS ${WHISPER_LIB_DIR})
set(_whisper_link_deps "Threads::Threads" "@WHISPER_EXTRA_LIBS@")
set(_whisper_transient_defines "@WHISPER_TRANSIENT_DEFINES@")
add_library(whisper UNKNOWN IMPORTED)
set_target_properties(whisper
PROPERTIES
INTERFACE_INCLUDE_DIRECTORIES "${WHISPER_INCLUDE_DIR}"
INTERFACE_LINK_LIBRARIES "${_whisper_link_deps}"
INTERFACE_COMPILE_DEFINITIONS "${_whisper_transient_defines}"
IMPORTED_LINK_INTERFACE_LANGUAGES "CXX"
IMPORTED_LOCATION "${whisper_LIBRARY}"
INTERFACE_COMPILE_FEATURES cxx_std_11
POSITION_INDEPENDENT_CODE ON )
check_required_components(whisper)

Some files were not shown because too many files have changed in this diff Show More