* Allow a regular expression to describe tokens to suppress.
Example: --suppress-tokens-re "[,\.]|[ ]?[0-9]+" will suppress commas, periods, and numeric tokens.
Technique inspired by https://github.com/openai/whisper/discussions/1041
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Blind change to fix Java test.
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Implemented command-style grammar in the main example.
Mostly just copied the relevant parts from the command example.
* main : code style
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* whisper.cpp: impl dtw algo
* WIP: producing and placing DTW timestamps on tokens
* Fix compile and assertion errors. Attempt to DTW timestamp with single_segment=false.
* Fix mistake causing incorrect alignment of dtw timestamps
* implement N_TOP_MOST and CUSTOM alignment heads setting
* whisper: fix typo on alignment heads enum
* Fix issues related to changes in whisper.cpp
* Fixed excessive memory use when using DTW timestamps. Other minor fixes to DTW timestamping function
* decoder: save cross QKs only if requested
* Calling median filter with ggml_map_custom1
* Reimpl aheads n_top_most and custom. Sanity checks on chosen aheads
* Copying cross QKs from decoder backend correctly
* dtw: cleanup
* Fix incorrect n_frames passed to dtw when near end of audio
* Fix aheads_masks_init for backend != CPU
* whisper : minor style
* main : add dtw (wip)
* whisper: fix invalid memory access in aheads_masks_init
* main : add dtw (cont)
* whisper : minor
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* talk-llama: pass file instead of arg
it is too hard to quote text in a portable way
* talk-llama: pass heard_ok as a file
* talk-llama: let eleven-labs.py accept options
Options: -v voice, -s savefile, -p (--play)
* talk-llama: check installed commands in "speak"
Pass "-q" to eleven-labs.py to skip checking whether elevenlabs is installed
* talk-llama: pass voice_id again
in order to sync talk with talk-llama
* talk: sync with talk-llama
Passing text_to_speak as a file is safer and more portable
cf. https://stackoverflow.com/a/59036879/45375
* talk and talk-llama: get all installed voices in speak.ps1
* talk and talk-llama: get voices from api
* talk and talk-llama: add more options to eleven-labs.py
and remove DEFAULT_VOICE because it is deprecated (https://www.reddit.com/r/ElevenLabs/comments/1830abt/what_happened_to_bella/)
```
usage: eleven-labs.py [-q] [-l] [-h] [-n NAME | -v NUMBER] [-f KEY=VAL] [-s FILE | -p] [TEXTFILE]
options:
-q, --quick skip checking the required library
action:
TEXTFILE read the text file (default: stdin)
-l, --list show the list of voices and exit
-h, --help show this help and exit
voice selection:
-n NAME, --name NAME get a voice object by name (default: Arnold)
-v NUMBER, --voice NUMBER
get a voice object by number (see --list)
-f KEY=VAL, --filter KEY=VAL
filter voices by labels (default: "use case=narration")
this option can be used multiple times
filtering will be disabled if the first -f has no "=" (e.g. -f "any")
output:
-s FILE, --save FILE save the TTS to a file (default: audio.mp3)
-p, --play play the TTS with ffplay
```
* examples: add speak_with_file()
as suggested in the review
* talk and talk-llama: ignore to_speak.txt
In commit dda4b0e of PR #1872, I've introduced a check for the
existence of files before loading the model. However, I haven't
considered the case where whisper.cpp might read from stdin as well,
and in such cases, the checks should ignore the "-" argument as it
does not represent a regular file.
Additionally, this commit removes the usage of 'stat()' in favor of
the recently introduced function 'is_file_exist()' in common.cpp from
PR #1871.
Apologies for the bug introduced in the previous PR and any
inconvenience it may have caused.
Until the most recent commit (3d42463), the main.cpp sample file does
not check whether the input files exist or not. Consequently, the
model is loaded first before reporting whether there was a failure or
not when processing a file. In environments with HDD, this can take
about 50 seconds or more, depending on the loaded model.
This commit addresses this issue by checking in advance whether the
input files exist or not.
* added audio_ctx argument to main and server examples
* Better default value
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* better default value (again)
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Whisper plugin in Obsidian requires an API key which is
then sent as an authorization header.
However, the presence of an authorization header requires
a CORS Preflight, so both the OPTIONS method and
the Access-Control-Allow-Headers: authorization must be
handled.
* server: include additional fields in the verbose_json response as OpenAI does
* server: show request examples on home page
* server: todo note for compression_ratio and no_speech_prob
* server: add simple demo form to the homepage
Benchmarks are failing because JNI expects a jstring and the benchmarks
are missing a return statement (i.e., returning null). The functions
actually build a jstring but don't return it, so this seems to have been
an oversight.
This patch returns the jstring and now the benchmarks run successfully.
Fixes#1783.
* examples/server: implement "verbose_json" format with token details.
This is intended to mirror the format of openai's Python
whisper.transcribe() return values.
* server: don't write WAV to a temporary file if not converting
* server: use std::lock_guard instead of manual lock/unlock
* make : fix server example building on MSYS2 environments (Windows)
It was not working since commit eff3570f78
when server was introduced.
* cmake : simplify server example lib deps on Windows
server uses httplib::Server, not httplib::SSLServer, so there is no need
to mention cryptographic libraries in target_link_libraries.
Winsock (ws2_32) suffices here.
Also use plain library names like we use in other places.
Add commented-out command to optionally use Piper (https://github.com/rhasspy/piper) as text-to-speech solution for the talk-llama example. Piper voices sound almost like real people which is a big improvement (e.g.) from something like espeak.
Since we use prefetchVirtualMemory we specify we target win 8 or above, otherwise other compilers will refuse to use the prefetchVirtualMemory api, (I understand you are loading it dynamically but the header definition has this limitation)
- The code comes from examples/main
- The output mimetype is set to text/vtt
Example usage:
```shell
curl 127.0.0.1:8080/inference \
-H "Content-Type: multipart/form-data" \
-F file="@samples/jfk.wav" \
-F temperature="0.2" \
-F response-format="vtt"
```
This commit adds a support of .srt format to Whisper server. The code is
effectively backported from examples/main. The output mimetype is set to
application/x-subrip as per https://en.wikipedia.org/wiki/SubRip.
Example usage:
curl 127.0.0.1:8080/inference \
-H "Content-Type: multipart/form-data" \
-F file="@<file-path>" \
-F temperature="0.2" \
-F response-format="srt"
* server : automatically convert audio on the server
* server : remove rebundant comments
* server : automatic conversion refactor
* server : update server readme
* server : remove unnecessary comments and tabs
* server : put back remove calling
* server : apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* server : check ffmpeg before the server lunch
* server : fix indentation
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* server : fix function typo calling
* server : fix function typo calling
* server : add warning in readme
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Add first draft of server
* Added json support and base funcs for server.cpp
* Add more user input via api-request
also some clean up
* Add reqest params and load post function
Also some general clean up
* Remove unused function
* Add readme
* Add exception handlers
* Update examples/server/server.cpp
* make : add server target
* Add magic curl syntax
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* ISSUE-1329: replace " with ' so it doesn't try to execute code in backticks.
* Typo
* Update to keep possessives in the output
Closes the ' then puts a ' in quotes then reopens the ' to escape the ' characters.
* save the recorded audio to a file
* Alignment -help
* Save the correct audio
* chage to a consistent coding style
* Correct typo
* Update examples/stream/stream.cpp
* Update examples/stream/stream.cpp
* Correct variable misuse
* Update examples/stream/stream.cpp
* Update examples/stream/stream.cpp
* Update examples/stream/stream.cpp
* Update examples/stream/stream.cpp
* add *.bin .cxx/ .gradle/ cmake-build-debug/ to gitignore
* add whisper.android.java
* Added support for older versions of Android of Java
* add examples for android java
* add README.md for android java
* add fullTranscribeWithTime
* 增加 toString()方法和测试
* change return type to void
* update to v1.4.1
* add WhisperService
* chage to whisper_full_get_segment_t1
* add method transcribeDataWithTime
* modified toString
```
return "[" + start + " --> " + end + "]:" + sentence;
```
* Optimize code logic
* update text view on handle
* set max lines
* change Chinese to English
* Update bindings/java/build.gradle
* Update .gitignore
* add android.java to github action
* chage android.java to android_java in build.yml
* remove gradle
* chage jdk to temurin in android_java of CI
* chage jdk to temurin 11 in android_java of CI
* add x to gradlew
* set api-level for android_java of CI
* Update examples/whisper.android.java/app/src/main/jni/whisper/CMakeLists.txt
* add ndk version in build.gradle
* remove local.properties
* add testFullTranscribeWithTime
---------
Co-authored-by: litongmacos <litongjava@qq.com>
Co-authored-by: bobqianic <129547291+bobqianic@users.noreply.github.com>