Commit Graph

663 Commits

Author SHA1 Message Date
6108d3cc58 whisper : use correct seek_end when offset is used (#833)
Whenever an `offset_ms` is provided, the value of `seek_end` is
calculated incorrectly. This causes Whisper to keep transcribing
after the end of the file.

The current behavior looks like
```
[00:34:40.000 --> 00:34:47.000]   This is an example audio file.
[00:34:47.000 --> 00:34:49.000]   The text has been redacted
[00:34:49.000 --> 00:34:51.000]   This is the end of the audio.
[00:34:51.000 --> 00:34:52.000]   ***
[00:34:52.000 --> 00:34:53.000]   ***
[00:34:53.000 --> 00:34:54.000]   ***
[00:34:55.000 --> 00:34:56.000]   ***
...
```

The expected behavior should be
```
[00:34:40.000 --> 00:34:47.000]   This is an example audio file.
[00:34:47.000 --> 00:34:49.000]   The text has been redacted
[00:34:49.000 --> 00:34:51.000]   This is the end of the audio.
- end of program -
```

This commit changes the calculation of the `seek_end` variable to
only add `seek_start` if a custom `duration_ms` is provided.
Otherwise, it defaults to the end of the file.

Signed-off-by: Thijs Raymakers <thijs@raymakers.nl>
2023-04-29 18:55:37 +03:00
bab97c83d0 tests : add "threads" to run-tests.sh 2023-04-29 12:32:28 +03:00
3eaeb030ff extra : add sync-ggml.sh script 2023-04-29 12:32:28 +03:00
acec73ab6e ggml : sync latest ggml + llama.cpp updates (quantization) 2023-04-29 12:32:28 +03:00
5cc17418c7 whisper.android : add some tips (#816) 2023-04-29 11:00:20 +03:00
3efb81dec6 build : add WHISPER_COREML_ALLOW_FALLBACK to make / CMake (#812) 2023-04-29 10:55:24 +03:00
94a7cd2a07 whisper : allow non-CoreML fallback when Core ML cannot be loaded (#812)
if the Core ML model cannot be loaded, continue without Core ML instead of
returning. This allows a single build to transcribe using Core ML models
where available, and regular models when not.
2023-04-29 10:49:02 +03:00
3e82ff4747 whisper : fix bug from previous commit 2023-04-29 10:42:14 +03:00
b5bd2f43c5 whisper : avoid designated initializers 2023-04-29 10:36:50 +03:00
94aa56f19e minor : improve C++ and Python style (#768)
* use some STL functions

* use self.field than setattr, use pathlib.Path

* recover some format

* const some iter

* Keep the original

* 2 space
2023-04-29 10:06:25 +03:00
4d89ee2e59 readme : add logo 2023-04-28 22:41:29 +03:00
70567eff23 main : escape quotes in csv output (#815) 2023-04-23 19:01:59 +03:00
02ec83c5d5 stream : flush upon finishing inference (#811) 2023-04-23 17:00:30 +03:00
2bd4b8d577 examples : add missing #include <cstdint> (#798)
common.cpp uses uint8_t and uint64_t, which are defined in <cstdint>.
2023-04-23 16:52:52 +03:00
eecf2c3d41 main : update escape_double_quotes() function (#776)
Updated the escape_double_quotes() function such that the function now escapes both double quotes and backslashes in the input string.

Changes Made:

- Renamed the function to escape_quotes_and_backslashes

- Modified the condition in the first loop to increment the value of 'escaped_length' for both double quotes and backslashes.

- Modified the condition in second loop to add a backslash before the current character if it is a double quote or a backslash.

Resolves: #769
2023-04-23 16:47:30 +03:00
c23588cc4b release : v1.3.0 v1.3.0 2023-04-15 17:30:44 +03:00
5108b30e6d whisper : pad audio instead of spectrogram (#579)
Also, fallback only if more temperatures are available and if we are
at least 3 seconds before the end of the audio
2023-04-15 17:19:19 +03:00
f19e23fbd1 whisper : restore decoder temperature fallbacks
I disabled this because there were many complaints about slow decoding.
The current implementation does not allow batching the decoders when
using the "best of" or "beam size" parameters, so the decoding time is
proportional to the number of decoders, which is obviously not great.

However, now there are even more complaints about wrong decodings and
repetition.

So, making a compromise by re-enabling the fallbacks, but defaulting to
just 2 "best of" / "beam size" decoders. Also, the temperature step is
increased from 0.2 to 0.4 - i.e. from maximum of 5 fallbacks to maximum
of 2.

Also, the stream example now has fallbacks enabled by default.

close #471 #477 #508 #612 #719 #731
2023-04-15 16:12:55 +03:00
ea1f8a50d4 ggml, ci : fix build on whisper.android (ARM_NEON) + add CI (#764)
* ggml : fix undefined symbol by remove inline handle

* ggml : make own ggml_aligned_malloc function

* ci: add ios/android build
2023-04-15 14:21:58 +03:00
3dead611bb whisper : slightly faster Log Mel computation + n-1 FFT threads (#568) 2023-04-15 14:18:46 +03:00
355da83690 readme : fix link 2023-04-15 13:30:36 +03:00
3e5c49e59a readme : add usage instructions for Core ML 2023-04-15 13:30:07 +03:00
5e47e223bd whisper : add Core ML support (#566)
* coreml : use Core ML encoder inference

* coreml : simlpify whisper_encode + log messages

* whisper : resolve rebase conflicts

* coreml : add scripts for CoreML model generation

* bench-all : recognize COREML flag
2023-04-15 13:21:27 +03:00
794ff3074a whisper : do not launch log_mel threads when n_thread is 1 (#763) 2023-04-14 22:35:34 +03:00
7e2afa4384 whisper : fix the bug related to word splitting errors in the "tokenize" function. (#760)
Co-authored-by: AfryMask <afrymask@gmail.com>
2023-04-14 20:35:03 +03:00
1c5edc3cb3 readme : add SwiftWhisper to listed bindings (#755) 2023-04-14 20:24:00 +03:00
34b772727d gitignore : add .test 2023-04-14 20:13:47 +03:00
2c856fb9e5 whisper : fix potential memory leaks (#740)
* fix potential memory leak if whisper_init_state failed

* fix potential memory leak if gpt2_init failed
2023-04-14 20:05:56 +03:00
7727a40dc9 license : update year (#739) 2023-04-14 20:04:42 +03:00
b5639ed313 whisper : fix typos in whisper.h (#737)
Fixed a couple of typos (in comments, so nothing major). Keep up the great work 😄
2023-04-14 20:03:16 +03:00
2c4ac2627d stream : support language auto-detect (#501)
#445  fix Language auto-detect "auto" flag does not work using the stream tool
2023-04-14 20:02:18 +03:00
674a8e579b readme : add unity bindings (#733) 2023-04-14 19:59:44 +03:00
001083a769 talk, talk-llama : add basic example script for eleven-labs tts (#728) 2023-04-14 19:53:58 +03:00
62b51c3070 models : change convert-pt-to-ggml to use .tiktoken tokenizer files (#725) 2023-04-14 19:50:39 +03:00
61128870b8 cmake : add msvc compiler args /utf-8 fix error C3688 (#721)
* force msvc compiler use utf-8 encode

* only enable on msvc
2023-04-14 19:36:38 +03:00
78548dc03f talk-llama : correct default speak.sh path (#720)
There is `speak.sh` file in `./examples/talk-llama` as described in README.
However `./examples/talk/speak.sh` is used in `talk-llama.cpp`, this commit corrects that.
2023-04-14 19:36:09 +03:00
66110dafcc main : add lrc output support (#718)
* add lrc output support.

* fix wrong comment
2023-04-14 19:35:33 +03:00
Sam
b73a4638ac readme : make the quick start instructions clearer. (#716)
Users wanting to make use of this implementation of the whisper model with no prior knowledge of C/C++ may download the Whisper model but fail to use of the "make" command as specified given that they forgot or didn't know they needed to clone the repository first. Hope this modification clears things up.
2023-04-14 19:33:06 +03:00
5f16420333 make : disable avx in case f16c is not available (#706)
Why:

* ggml.c does not support AVX without F16C
2023-04-14 19:31:51 +03:00
ccb47e7e10 readme : add shell command example for --print-colors (#710)
The section of the readme file explaining `--print-colors` includes only a screenshot with directories that are inconsistent with other examples. This commit adds an example shell command, consistent with the remaining examples.
2023-04-14 19:25:23 +03:00
677ad754a0 ggml : sync latest ggml 2023-04-14 19:20:39 +03:00
514cd04452 whisper : fix bug in prompt processing (close #705)
Was dereferencing a dangling pointer
2023-04-14 19:17:07 +03:00
6704a81255 go : exposed various parts to the Go Interface (#697) 2023-04-14 18:52:10 +03:00
463e46338c ggml : fix q4_1 dot product types (#759)
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-04-14 13:34:20 +03:00
2f889132c6 ggml : sync latest changes from ggml and llama.cpp 2023-04-13 18:53:44 +03:00
ebef1e8620 ggml : fix WASM build 2023-04-10 23:18:29 +03:00
114df388fe talk-llama : increase context to 2048 2023-04-10 23:09:15 +03:00
ea36831459 talk-llama : update to latest llama.cpp (improved performance) 2023-04-10 22:59:13 +03:00
69b8503935 ggml : backport llama.cpp updates (close #709)
- About x2 overall performance improvement on Apple Silicon
- Results should now be the same for different number of threads (not
  tested)
2023-04-10 22:28:54 +03:00
0a2d1210bc whisper : add progress callback (#600) 2023-03-30 20:29:29 +03:00