whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-06-24 09:10:57 +00:00

Author	SHA1	Message	Date
R0CKSTAR	121d27a495	musa: correct MUSA SDK rc4.0.1 download URL (#3217 ) * musa: correct MUSA SDK rc4.0.1 download URL Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Fix typo Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-06-03 06:02:12 +02:00
Fujimoto Seiji	b9d27b1358	tests : add a new benchmark test for long-form audio (#3185 ) * tests : add a new benchmark test for long-form audio Based on "Earnings-21" corpus by Del Rio et al. Earnings-21: A Practical Benchmark for ASR in the Wild (2021) https://arxiv.org/abs/2104.11348 This dataset contains 39 hours of long-form speech, sourced from public earning calls. Each recording contains roughly 50 minutes of English dialogues between multiple speakers (2-20 persons). This benchmark suite should allow us to evaluate the performance of whisper.cpp on long-form audio data. Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net> * tests : apply PR feedback to 'earnings21/README.md' Based on feedback from Daniel Bevenius. - Simplify how to download & prepare a Silero VAD model. - Fix typo: inferece -> inference Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net> * tests : avoid crashing on non-UTF-8 characters Based on feedback from Daniel Bevenius. Add 'errors' parameter to open() in order to avoid unhandled exception on invalid UTF-8 bytes. Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net> * tests : try to interpret the hypothesis as Windows-1252 Based on the discussion in PR#3185. Evidently Whisper.cpp can represent a quotation mark as '0x93', which implifies Windows-1252 (Microsoft's ASCII excention), and cannot be decoded by UTF-8. Add an explicit decoding loop to address the issue. Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net> --------- Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net>	2025-05-28 07:08:44 +02:00
Daniel Bevenius	e41bc5c61a	vad : add initial Voice Activity Detection (VAD) support (#3065 ) Some checks failed CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled Details CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled Details CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled Details CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled Details CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled Details CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled Details CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled Details CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled Details CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled Details CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled Details CI / emscripten (Release) (push) Has been cancelled Details CI / ios-xcode-build (Release) (push) Has been cancelled Details CI / android (push) Has been cancelled Details CI / android_java (push) Has been cancelled Details CI / bindings-java (push) Has been cancelled Details CI / quantize (push) Has been cancelled Details CI / release (push) Has been cancelled Details CI / coreml-base-en (push) Has been cancelled Details CI / vad (push) Has been cancelled Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main-musa.Dockerfile platform:linux/amd64 tag:main-musa]) (push) Has been cancelled Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled Details Examples WASM / deploy-wasm-github-pages (push) Has been cancelled Details * vad : add initial Voice Activity Detection (VAD) support This commit add support for Voice Activity Detection (VAD). When enabled this feature will process the audio input and detect speech segments. This information is then used to reduce the number of samples that need to be processed by whisper_full. Resolves: https://github.com/ggml-org/whisper.cpp/issues/3003 --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-05-12 16:10:11 +02:00
Fujimoto Seiji	448f3d3b93	tests : add script to benchmark whisper.cpp on LibriSpeech corpus (#2999 ) Some checks failed CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run Details CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run Details CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run Details CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run Details CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run Details CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run Details CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run Details CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run Details CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run Details CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run Details CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run Details CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run Details CI / emscripten (Release) (push) Waiting to run Details CI / ios-xcode-build (Release) (push) Blocked by required conditions Details CI / android (push) Waiting to run Details CI / android_java (push) Waiting to run Details CI / quantize (push) Waiting to run Details CI / release (push) Blocked by required conditions Details CI / coreml-base-en (push) Blocked by required conditions Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run Details Examples WASM / deploy-wasm-github-pages (push) Waiting to run Details Bindings Tests (Go) / ubuntu-22 (push) Has been cancelled Details * tests : add script to benchmark whisper.cpp on LibriSpeech corpus LibriSpeech is a widely-used benchmark dataset for training and testing speech recognition models. This adds a set of scripts to measure the recognition accuracy of whisper.cpp models, following the common benchmark standards. Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net> * Document how to prepare `whisper-cli` and model files Feedback from Daniel Bevenius. This adds a short code example how to prepare the `whisper-cli` command, to make the initial setup step a little bit clearer. Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net> * tests : Simplify how to set up Python environment Based on a feedback from Georgi Gerganov. Instead of setting up a virtual environment in Makefile, let users set up the Python environment. This is better since users may have their own preferred workflow/toolkit. Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net> --------- Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net>	2025-04-04 19:51:26 +03:00
Daniel Bevenius	f31b404fcb	tests : remove gh label test-whisper-cli-tiny-en (#2988 ) Some checks are pending CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run Details CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run Details CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run Details CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run Details CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run Details CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run Details CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run Details CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run Details CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run Details CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run Details CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run Details CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run Details CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run Details CI / emscripten (Release) (push) Waiting to run Details CI / ios-xcode-build (Release) (push) Blocked by required conditions Details CI / android (push) Waiting to run Details CI / android_java (push) Waiting to run Details CI / quantize (push) Waiting to run Details CI / release (push) Blocked by required conditions Details CI / coreml-base-en (push) Blocked by required conditions Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run Details Examples WASM / deploy-wasm-github-pages (push) Waiting to run Details This commit removes test-whisper-cli-tiny-en from the gh label. The motivation for this change is that until recently the tests were disabled. But now that they are enabled some of the tests, specifically the ci jobs that use sanatizers (e.g. thread-sanitizer) take a long time to run as they are instrumented. Some of these jobs also have matricies which means that there are multiple jobs are created that all run these tests. The suggestion here is to limit the number of tests that are run in the ci jobs so cut down the CI build time.	2025-04-02 10:50:31 +02:00
Daniel Bevenius	4200430e75	tests : re-enable tests [no ci] (#2977 ) This commit re-enables the tests in the build process which are currently commented out. It is possible to build the tests using `-DWHISPER_BUILD_TESTS=ON` and then run a single test using: ```console $ ctest -R test-whisper-cli-tiny.en --test-dir build Internal ctest changing into directory: /home/danbev/work/ai/whisper-work/build Test project /home/danbev/work/ai/whisper-work/build Start 2: test-whisper-cli-tiny.en 1/1 Test #2: test-whisper-cli-tiny.en ......... Passed 4.44 sec 100% tests passed, 0 tests failed out of 1 Label Time Summary: en = 4.44 secproc (1 test) gh = 4.44 secproc (1 test) tiny = 4.44 sec*proc (1 test) Total Test time (real) = 4.44 sec ``` Some of the tests take a long time to run so it might not be a good idea to enable them in CI, or perhaps we could only run a subset of the tests in CI.	2025-03-31 17:04:37 +02:00
Georgi Gerganov	2e59dced12	whisper : rename binaries + fix install (#2648 ) * whisper : rename binaries + fix install * cont : try to fix ci * cont : fix emscripten builds	2024-12-21 09:43:49 +02:00
Georgi Gerganov	ed733e85a1	scripts : update to new build system	2024-12-09 11:30:16 +02:00
Georgi Gerganov	55e422109b	scripts : add turbo-q8_0 to the benchmark	2024-10-29 19:37:24 +02:00
Georgi Gerganov	2ef717b293	whisper : add large-v3-turbo (#2440 )	2024-10-01 15:57:06 +03:00
Georgi Gerganov	8feb375fbd	tests : remove test-backend-ops (#2434 )	2024-09-27 11:49:01 +03:00
Mengqing Cao	81c999fe0a	cann : add Ascend NPU support (#2336 ) * enable Ascend NPU in src/whisper.cpp * sync test-backend-ops with llama.cpp	2024-08-09 15:21:56 +03:00
Georgi Gerganov	e30c679928	whisper : reorganize source code + improve CMake (#2256 ) * scripts : update sync [no ci] * files : reorganize [no ci] * sync : llama.cpp * cmake : link math library * cmake : build normal ggml library * files : move headers to include * objc : fix path to ggml-metal.h * ci : fix WHISPER_CUDA -> GGML_CUDA * scripts : sync LICENSE [no ci]	2024-06-26 19:34:09 +03:00
William Tambellini	1b51fdf170	examples : add support for decoding input with ffmpeg (Linux) (#2133 ) - search for ffmpeg libs/headers at cmake time - added ffmpeg-transcode.cpp into libcommon if ffmpeg on - hooked ffmpeg trancoding in common read_wav(...) - passed test: ./main -m ggml-base.en.bin -f samples/jfk.mp3	2024-05-21 18:31:41 +03:00
Georgi Gerganov	bfbaa4dce5	whisper : make large version explicit + fix data size units (#1493 )	2023-11-15 19:42:25 +02:00
Georgi Gerganov	2cdfc4e025	whisper : add support for large v3 (#1444 ) * whisper : add support for large v3 * bench : fix build + fix go bindings * bench : fix n_mels * models : update readme	2023-11-07 15:30:18 +02:00
Georgi Gerganov	bab97c83d0	tests : add "threads" to run-tests.sh	2023-04-29 12:32:28 +03:00
Georgi Gerganov	f309f97df6	Node.js package (#260 ) * npm : preparing infra for node package * npm : package infra ready * npm : initial version ready * npm : change name to whisper.cpp whisper.js is taken	2022-12-12 20:17:27 +02:00
Georgi Gerganov	9fe7306f4b	models : add the new "large" model release by OpenAI The old "large" model is now renamed "large-v1". If you have been using it, make sure to rename it and download the new "large" model for best results.	2022-12-06 18:48:57 +02:00
Georgi Gerganov	9b7df68753	tests : adding transcription tests	2022-12-02 21:40:02 +02:00
Georgi Gerganov	e905c6f827	wip : initial WASM port Works but it is very slow because no SIMD is used. For example, jfk.wav is processed in ~23 seconds using "tiny.en" model	2022-10-22 18:54:01 +03:00
Georgi Gerganov	b43b36e006	Update tests	2022-10-08 11:43:42 +03:00
Georgi Gerganov	37110d693e	ci : add base model tests to GH Actions	2022-10-08 11:43:42 +03:00
Georgi Gerganov	2f069335ab	Adding sanitizer tests	2022-10-08 11:43:42 +03:00
Georgi Gerganov	877c058179	Add CMake support	2022-10-08 09:02:41 +03:00

25 Commits