Commit Graph

193 Commits

Author SHA1 Message Date
8159a9ab99 whisper : reuse whisper_decode_with_state (#1521) 2023-11-20 13:16:11 +02:00
46cc26d1b9 whisper : fix with_state methods to use the correct state (#1519)
Co-authored-by: Sandro Hanea <sandrohanea@microsoft.com>
2023-11-19 11:25:30 +02:00
f784f9fa12 whisper : fix overriding the audio context 2023-11-19 10:32:32 +02:00
848e54f3ad bench : fix memcpy bench size 2023-11-16 10:59:32 +02:00
bfbaa4dce5 whisper : make large version explicit + fix data size units (#1493) 2023-11-15 19:42:25 +02:00
b6c5f49b78 whisper : add batched decoding (#1486)
* whisper : add whisper_batch

* whisper : move kv_self to whisper_state

* whisper : full batched decoding support

* whisper : fix memory leak in whisper_batch

* whisper : fix mem leak again + remove oboslete function

* whisper : clear kv cache when using whisper_decode API

* whisper : speed-up sampling

* whisper : fix decoders initializer

* bench : add batch size 5 bench

* whisper : add comment about the KV cache size

* whisper : add check for max number of decoders

* whisper : avoid starting sampling threads with bs=1

* whisper : enable beam-search by default

* cuda : sync llama.cpp fixes
2023-11-15 16:12:52 +02:00
3e5c7feeff whisper : add grammar-based sampling (#1229)
* whisper : add grammar-based sampling

* build : fix after master merge

* command : fix exception when recognizing the command

* whisper : fine-tuning grammar functionality

* command : grammar-related improvements

- option to read grammar from file
- add sample grammars for colors and chess moves
- fine-tune the performance further

* grammars : add assistant + update comments

* command : enable beam-search, add "no_timestamps", add "context", add p

* whisper : remove comment

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-11-13 10:51:34 +02:00
3172006a24 ggml : fix some compile warnings 2023-11-12 16:36:20 +02:00
b0502836b8 whisper : add full CUDA and Metal offloading (#1472)
* whisper : migrate to ggml-backend

* whisper : fix logit reading

* whisper : fix tensor allocation during load

* whisper : fix beam-search with CUDA

* whisper : free backends + fix compile warning

* whisper : print when CUDA is enabled

* whisper : fix CoreML

* make : clean-up

* talk : fix compile warning

* whisper : support ggml_conv with CUDA and Metal (#1473)

* ggml : add CUDA support for ggml_conv

* whisper : remove ggml_repeat for conv bias + single backend

* cuda : fix im2col kernel

* metal : add im2col support + mul mat-vec f16 x f16

* bench-all : add q4 models

* whisper : clean-up

* quantize-all : fix

* ggml : im2col opts

* whisper : avoid whisper_model_data wrapper

* whisper : add note that ggml_mul_mat_pad does not work with CUDA

* whisper : factor out graph compute in common function

* whisper : fixes

* whisper : fix UB with measure buffers

* whisper : try to fix the parallel whisper_state functionality (#1479)

* whisper : try to fix the parallel whisper_state functionality

* whisper : fix multi-state Metal

* whisper : free backend instances in whisper_state
2023-11-12 15:31:08 +02:00
ec7a6f04f9 whisper : return with error from whisper_encode_internal and whisper_decode_internal when abort callback is true (#1456)
Co-authored-by: Ben Nortier <ben@bjnortier.com>
2023-11-10 13:51:16 +02:00
0de8582f65 coreml : use the correct n_mel value (#1458) 2023-11-08 20:01:41 +00:00
baeb733691 whisper : reset mel time when resetting timings (#1452)
Co-authored-by: Ben Nortier <ben@bjnortier.com>
2023-11-08 15:52:23 +02:00
2cdfc4e025 whisper : add support for large v3 (#1444)
* whisper : add support for large v3

* bench : fix build + fix go bindings

* bench : fix n_mels

* models : update readme
2023-11-07 15:30:18 +02:00
11b503055e whisper : reset ctx->t_start_us when calling whisper_reset_timings() (#1434)
Co-authored-by: Ben Nortier <ben@bjnortier.com>
2023-11-07 11:04:32 +02:00
0c91aef2d8 whisper : add missing about callback initializers 2023-11-07 10:49:51 +02:00
0463028bc2 whisper : add context param to disable gpu (#1293)
* whisper : check state->ctx_metal not null

* whisper : add whisper_context_params { use_gpu }

* whisper : new API with params & deprecate old API

* examples : use no-gpu param && whisper_init_from_file_with_params

* whisper.objc : enable metal & disable on simulator

* whisper.swiftui, metal : enable metal & support load default.metallib

* whisper.android : use new API

* bindings : use new API

* addon.node : fix build & test

* bindings : updata java binding

* bindings : add missing whisper_context_default_params_by_ref WHISPER_API for java

* metal : use SWIFTPM_MODULE_BUNDLE for GGML_SWIFT and reuse library load

* metal : move bundle var into block

* metal : use SWIFT_PACKAGE instead of GGML_SWIFT

* style : minor updates

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-11-06 11:04:24 +02:00
39cfad0dee whisper : add support for new distilled Whisper models (#1424)
* whisper : add support for new distilled Whisper models

* whisper : print log when using distilled models
2023-11-05 19:43:45 +02:00
f96e1c5b78 sync : ggml (backend v2, k-quants, CUDA opts, Metal opts, etc.) (#1422)
* sync : ggml (backend v2, k-quants, CUDA opts, Metal opts, etc.)

* metal : allow env metal variable to override resource path (#1415)

* Allow env variable to override resource path

* Update ggml-metal.m

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* sync : restore common / main from `master`

* sync : restore whisper from `master`

* talk-llama : update to latest llama.cpp

* ruby : fix build

* ggml : fix 32-bit ARM build

* ggml : fix MIN / MAX macro collisions + update ios bindings

* ggml : fix ifdefs and MIN / MAX again

* exampels : fix Obj-C and Swift examples

* ggml : fix 32-bit ARM compatibility

* ggml : one more attempt to fix 32-bit ARM compat

* whisper : fix support for larger graphs

---------

Co-authored-by: Chris Raethke <codesoda@users.noreply.github.com>
2023-11-03 21:35:05 +02:00
940cdb1396 whisper : abort callback improvements (#1345)
* whisper : initialize abort_callback to null

* whisper : add example how to use abort_callback
2023-10-08 17:22:24 +03:00
2f668c330e whisper : add abort callback (#1335) 2023-10-04 11:57:55 +03:00
4037705531 whisper : add missing speaker turn API function for whisper_state (#1330) 2023-10-03 22:55:48 +03:00
951a119926 whisper : increase tokenizer buffer (close #1259) 2023-09-15 21:11:43 +03:00
b8432f28f4 metal : add F32 support + update bench output 2023-09-15 13:56:08 +03:00
93935980f8 whisper : Metal and ggml-alloc support (#1270)
* metal : init

* whisper : factor out graph builds

* whisper : allocate encoder and decoder using ggml-alloc

* whisper : ggml-alloc is now supported

* whisper : CoreML support ggml-alloc

* build : fix ggml-alloc

* ios : update submodule

* extra : update sync-ggml.sh script to also sync ggml-alloc

* ci : see if this is causing the crash

* whisper : refactor ggml-alloc init

* whisper.android : try to fix build

* whisper : initial Metal version

* ci : try to debug vmem issue

* metal : decoder works on GPU!

* metal : add multi-decoder support

* ggml : fix ggml_nbytes (probably temp solution)

* metal : run "cross" step on the GPU

* whisper : remove ggml_repeat in the encoder

* whisper : offload the Encoder to Metal

* ggml : use simpler ggml_bytes() implementation

* ggml-alloc : try to make CI happy by reducing vram to 128GB

* whisper : add whisper_allocr to wrap ggml_allocr

* whisper : factor out alloc init in a function

* cmake : update to support Metal build

* whisper : add <functional> header

* objc : fix build (no Metal yet)

* ios : add Metal support

* swiftui : fix build

* metal : speed-up KQ multiplication

* metal : sync latest llama.cpp kernels

* readme : add Metal info

* ios : update submodule

* coreml : add code to toggle Core ML config (CPU, ANE, GPU)

* bench : fix timings by running a pre-heat

* bench : start benching the decoder

* whisper : add ggml_mul_mat_pad

* bench : fix uninitialized vars

* whisper : add comment for disabling mul-mat padding

* whisper : add description of ggml_mul_mat_pad

* whisper : clean-up ggml_mul_mat_pad

* metal : remove the "concurrent" flag

* bench : variable n_past

* ios : update SPM package
2023-09-15 12:18:18 +03:00
3fec2119e6 whisper : fix bench regression + fix performance when using CPU BLAS (#1275)
* whisper : fix bench regression

* ggml : use sched_yield when using BLAS + add comment
2023-09-12 13:54:04 +03:00
9b14418863 whisper : faster beam_search sampling via reduced KV cache copies (#1243)
* Faster `beam_search` sampling

Refine the KV cache update logic for more intelligent and efficient updating.

* Faster `whisper_sample_token_topk`

* Update whisper.cpp

* Update whisper.cpp

* Update whisper.cpp

* Reduce `memory allocation`

* Add `pointer swapping`

* Fixed some bugs

* Update whisper.cpp

* Apply suggestions from code review

* Updated the logic for determining `two-copy`

* Updated the logic for determining `two-copy` v2

* whisper : add debug logs + coding style

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-09-10 16:04:27 +03:00
59a3d0cb57 ggml : sync (ggml-alloc, GPU, eps, etc.) (#1220)
* ggml : sync (ggml-alloc, GPU, eps, etc.)

* ggml : fix build

* wasm : fix build
2023-09-05 13:54:40 +03:00
8e30bf3c02 ggml : fix compilation errors incurred by -Werror (#1227)
The -Werror warning option turns all warnings into errors. This PR makes
the compiler happy to build ggml.c and whisper.cpp with the stricter option.
2023-08-30 22:09:15 +03:00
601c2d2181 ggml : detect SSSE3 (#1211)
* ggml : add ggml_cpu_has_ssse3

* whisper : show SSSE3 in system info

* make : detect SSSE3 via cpuinfo
2023-08-27 21:36:41 +03:00
b5bb5c85d4 whisper : allow whisper_full from mel spectrogram - no audio (#1214)
Co-authored-by: jbrough <jamie1612@gmail.com>
2023-08-27 20:02:57 +03:00
7e54df414e whisper : significantly improve the inference quality (#1148)
* Fix MSVC compile error C3688

Instead of simply using 'add_compile_options(/utf-8)' to address the MSVC compile error C3688, a better approach would be to handle it in a way that prevents passing '/utf-8' to NVCC.

* Significantly improve inference quality

In the function `log_mel_spectrogram_worker_thread`, there's an array out-of-bounds issue occurring during the calculation of complex number moduli. This issue is causing disruptions in the FFT spectrum, which, in turn, is reducing the quality of inference.

* Significantly improve inference quality

At last, I've pinpointed the actual source of the problem. Given that the frequency spectrum generated from real input data is symmetrical around the Nyquist frequency, there's a for-loop within the `log_mel_spectrogram_worker_thread` function that attempts to fold the frequency spectrum. Regrettably, a bug within this for-loop is causing a frame shift in the frequency spectrum. The previous attempt to remedy this, which involved using `fft_size + 1` when calculating the modulus, was merely a band-aid solution and did not address the underlying issue.

* Addressed a few minor issues

Fixed the issue of `fft_out` continuously expanding. Resolved the fallback caused by using 'break' instead of `fft_in[j] = 0`.

* Significantly improve inference quality 

Thanks for your patience everyone. It's finally sorted out. Now, the right side of the FFT spectrum is being flipped over to the left, and the amplitudes at corresponding positions on the left and right are added together (the spectrum on the left needs to be shifted by one position), then the average is calculated. FFT_OUT[0] is no longer discarded, making full use of the limited space to pack in more information.

* Add annotation and performance improvement

* Calculate FFT only when fft_in are not all zero

* Some minor performance improvement

* Fixed a bug impacting inference quality

* The first version after all the analysis is completed.

* Fix some bugs and add debug mode

* Fixed several bugs

* Temporarily disable speed-up mode and add debug mode.

* Add debug mode

* Disable speed-up mode and add debug mode

* Fix CI error (#1)

* Fix error

* Fix error

* Fixed several bugs including [BLANK_AUDIO] problem

* Remove Hard-coded hann window

* Some Final Fix (#2)

* Fix error

* Fix error

* Probably the last commit

* Probably the last commit

* whisper : minor coding style changes

* whisper : remove debug from public API

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-08-27 19:51:33 +03:00
aad2dad38a whisper : minor fixes (#1154) 2023-08-27 19:02:00 +03:00
c84cf87261 whisper : add precalculated values of sin/cos for speeding up FFT (#1142)
* Add sin/cos precalculated values to speedup FFT

* Update whisper.cpp

Co-authored-by: bobqianic <129547291+bobqianic@users.noreply.github.com>

* Update whisper.cpp

Co-authored-by: bobqianic <129547291+bobqianic@users.noreply.github.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: bobqianic <129547291+bobqianic@users.noreply.github.com>
2023-08-25 15:51:14 +03:00
fabf79fc67 whisper : expose API to let user control log output (#1060)
* expose api to let user control log output

Add
  whisper_set_log_callback()
that lets user set a callback for log messages.

Change all the
  fprintf(stderr, ...)
to call via the above.

* whisper : add <cstdarg>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-25 18:58:25 +03:00
925915ae37 whisper : move progress calculation out of whisper.cpp (#1081)
Current `progress_step` was hardcoded into whisper.cpp, this resulted in
bindings having to access progress only at that step even if progress
callback was being called at every iteration.

With this change we get greater granularity progress reporting from
whisper.cpp and bindings/implementations can define their own progress step.
2023-07-25 18:53:34 +03:00
4774d2feb0 whisper : minor OpenVINO refactoring (#1037)
Hopefully I didn't break something - haven't tested
2023-07-04 20:28:27 +03:00
62b81276e0 whisper : add OpenVINO support (#1037)
* openvino: use OpenVINO encoder inference

* openvino: add python script for OpenVINO model generation

* whisper: Fix 'unused' warnings when OpenVINO isn't enabled in build

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* whisper: Fix compilation error

* whisper: revert whisper_get_openvino_path_encoder & whisper_get_openvino_path_cache to non-const func signatures

* cmake: Add openvino-encoder as separate object target

* whisper : minor style fixes

* minor : indentation fixes

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-04 15:56:11 +03:00
c8d0f5fe98 whisper : support speaker segmentation (local diarization) of mono audio via tinydiarize (#1058)
* add HuggingFace mirror to download  ggml model

* support tdrz via simple hack overriding solm tokens

* fix incorrect translate/transcribe token_ids that are not static const

* add apollo 13 sample for tdrz demo

* render [SPEAKER TURN] consistently in all terminal output using vocab.id_to_token

* extend whisper_segment with speaker_turn_next field and save in json output

* fix failing go build

* slipped in some python syntax whoops

* whisper : finalize tinydiarize support (add flag + fixes)

* whisper : tdrz support for word-level timestamps (respect max_len)

* java : try to fix tests after adding tdrz_enable flag

* main : remove TODO leftover

* java : fix params order list after adding "tdrz_enable"

* whisper : fix solm and add nosp token

* main : print tinydiarize help

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-04 09:45:00 +03:00
d6509bf78d ggml : sync latest repo (mostly refactoring changes) 2023-07-02 21:46:09 +03:00
72deb41eb2 whisper : split_on_word no longer trims (#1046) 2023-06-25 23:51:01 +03:00
44cb044e66 whisper : fix build with -Werror=undef (#1045) 2023-06-25 15:30:39 +03:00
5feb0dffba ggml : sync latest ggml lib 2023-06-25 14:30:44 +03:00
d7c936b44a Feature/java bindings2 (#944)
* Java needs to call `whisper_full_default_params_by_ref()`, returning struct by val does not seem to work.
* added convenience methods to WhisperFullParams
* Remove unused WhisperJavaParams
2023-05-29 09:38:58 +10:00
56a87ba45d whisper : fix hebrew language code (#935) 2023-05-20 18:17:54 +03:00
fd01209d09 coreml : support quantized model files 2023-05-14 18:09:44 +03:00
e693074aa6 ggml : sync latest ggml
- New Q4 and Q5 formats
- Various improvements
2023-05-14 18:04:23 +03:00
b806420873 whisper : add detect-language mode (#853)
* add detectlanguage flag

* renaming and help

* no idea why that last one didn't commit

* run language detection if dl is set

* help message fix

* various fixes

* fix quitting

* fix language being english on print
2023-05-02 19:51:52 +03:00
d375d73b2e bench : improve benchmarks 2023-05-01 14:44:39 +03:00
7765770f89 whisper : add memory sizes for Q8_0 (close #846) 2023-05-01 10:03:56 +03:00
c94c469592 whisper : fix quantize bug (#842)
* whisper : debug

* whisper : fix bug during quantization
2023-04-30 22:50:04 +03:00