Commit Graph

58 Commits

Author SHA1 Message Date
eb23f4ef16 openvino : fix convert-whisper-to-openvino.py (#1890)
Fix issue: Conversion from Whisper to OpenVino failed #1870

convert-whisper-to-openvino.py stopped working with OpenVINO version 2023.0.0-10926-b4452d56304-releases/2023/0 .

Error was: TypeError: load(): incompatible function arguments. The following argument types are supported:
    1. (self: openvino._pyopenvino.FrontEnd, path: object) -> ov::frontend::InputModel

Tested successfully with a large-v3 conversion.

Co-authored-by: Stefan Grundmann <grundmanns@sandiego.gov>
2024-02-22 15:11:35 +02:00
3d42463845 models : add update py requirements 2024-02-13 11:51:32 +02:00
4bbb60efce docs : make model options / model install methods clearer (#1806)
* Make models more "discoverable"

* Clean up code block language identifiers

* make 3 options clearer

* undo Prettier formatter change

* docs: `$` shell prompt, consistently

* docs: minor changes
2024-01-26 17:39:54 +02:00
d05b7ee90e models : make all scripts to be POSIX Compliant (#1725)
* download-coreml-model: make it POSIX-compliant

* download-ggml-model: posix compliant (2nd)

* minor edit

* forgot to add newline

* generate-coreml-interface: far more straightforward

* generate-coreml-model: done with the posix thingy

* typo

* Update download-ggml-model.sh

* fix

* fix typo

* another fix

* Update download-coreml-model.sh

* Update download-ggml-model.sh

* Update download-coreml-model.sh
2024-01-12 14:11:04 +02:00
ba5bcde874 coreml : fix ANE optimized encoder (#1716) 2024-01-04 16:28:30 +02:00
a5cc3dc8a2 download : fix large q5 model name (#1695)
fixed typo in large-v3-q5-0 model name to match HF link
2023-12-29 11:14:32 +02:00
d2ee117a0a docker : Dockerize whisper.cpp (#1674)
* build: add dockerfile for ci

* ci: add action to build/push docker image

* fix: lowercase repository to fix ci

* ci: update cuBLAS flag

* build: install curl and ffmped in image

* docs: add docker section

* fix: improve args check when download model
2023-12-22 11:16:02 +00:00
c7606b47df models : add info about distilled models 2023-11-15 21:10:13 +02:00
bfbaa4dce5 whisper : make large version explicit + fix data size units (#1493) 2023-11-15 19:42:25 +02:00
953419c69a openvino : update convert-whisper-to-openvino.py to support v3 (#1459) 2023-11-09 12:42:39 +02:00
0de8582f65 coreml : use the correct n_mel value (#1458) 2023-11-08 20:01:41 +00:00
2cdfc4e025 whisper : add support for large v3 (#1444)
* whisper : add support for large v3

* bench : fix build + fix go bindings

* bench : fix n_mels

* models : update readme
2023-11-07 15:30:18 +02:00
8a2bee6717 models : use absolute paths for the converted model (#1356) 2023-11-03 10:44:27 +02:00
45c87b5481 models : Faster download for models on windows using BitTransfer (#1404) 2023-10-30 19:18:12 +00:00
91c0b23384 models : add conversion scripts from HuggingFace models to CoreML (#1304) 2023-10-04 12:00:25 +03:00
aed5d40607 models : add quantum models to download-ggml-model.sh (#1235)
* Add quantized models to download-ggml-model.sh

* Update names in download-ggml-model script to normalized
2023-09-07 12:16:58 +03:00
62b81276e0 whisper : add OpenVINO support (#1037)
* openvino: use OpenVINO encoder inference

* openvino: add python script for OpenVINO model generation

* whisper: Fix 'unused' warnings when OpenVINO isn't enabled in build

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* whisper: Fix compilation error

* whisper: revert whisper_get_openvino_path_encoder & whisper_get_openvino_path_cache to non-const func signatures

* cmake: Add openvino-encoder as separate object target

* whisper : minor style fixes

* minor : indentation fixes

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-04 15:56:11 +03:00
c8d0f5fe98 whisper : support speaker segmentation (local diarization) of mono audio via tinydiarize (#1058)
* add HuggingFace mirror to download  ggml model

* support tdrz via simple hack overriding solm tokens

* fix incorrect translate/transcribe token_ids that are not static const

* add apollo 13 sample for tdrz demo

* render [SPEAKER TURN] consistently in all terminal output using vocab.id_to_token

* extend whisper_segment with speaker_turn_next field and save in json output

* fix failing go build

* slipped in some python syntax whoops

* whisper : finalize tinydiarize support (add flag + fixes)

* whisper : tdrz support for word-level timestamps (respect max_len)

* java : try to fix tests after adding tdrz_enable flag

* main : remove TODO leftover

* java : fix params order list after adding "tdrz_enable"

* whisper : fix solm and add nosp token

* main : print tinydiarize help

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-04 09:45:00 +03:00
6c68218e3c models : add ggml_to_pt script (#1042)
* adding ggml_to_pt

* typo sys too many args

* fixing swap errors dimensions

---------

Co-authored-by: simonMoisselin <simon.moisselin@gmail.com>
2023-06-25 15:29:54 +03:00
f11f33f1c0 models : cd statements are quoted to allow spaces in path (#1041) 2023-06-25 15:27:28 +03:00
8ac23c9f77 models : handle paths with spaces in download script (close #1038) 2023-06-25 15:23:23 +03:00
3ec7bfffe0 py : make convert-pt-to-ggml.py backwards compatible with older vocab.json tokenizer files (#1001)
* patch checkpoint convert script to keep compatibility with older hf_transformers whisper tokenizer

* typo fix
2023-06-25 13:50:14 +03:00
9b926844e3 models : fix README.md (#964)
Fixes typo on line 76 of models/README.md
2023-05-27 10:40:28 +03:00
95b02d76b0 coreml : add support of large-v1 model (#926) 2023-05-15 18:36:06 +03:00
9931d66400 readme : add instructions on converting to GGML + "--no-config" to wget (#874) 2023-05-08 20:58:36 +03:00
94aa56f19e minor : improve C++ and Python style (#768)
* use some STL functions

* use self.field than setattr, use pathlib.Path

* recover some format

* const some iter

* Keep the original

* 2 space
2023-04-29 10:06:25 +03:00
5e47e223bd whisper : add Core ML support (#566)
* coreml : use Core ML encoder inference

* coreml : simlpify whisper_encode + log messages

* whisper : resolve rebase conflicts

* coreml : add scripts for CoreML model generation

* bench-all : recognize COREML flag
2023-04-15 13:21:27 +03:00
62b51c3070 models : change convert-pt-to-ggml to use .tiktoken tokenizer files (#725) 2023-04-14 19:50:39 +03:00
18e6fb0287 models : handle spaces and special characters in shell script paths (#677)
This commit modifies the `get_script_path` function to correctly handle
spaces and special characters in directory paths. The fix involves adding
double quotes around variables and commands where needed to ensure proper
parsing of paths with spaces and special characters.
2023-03-29 23:38:33 +03:00
992aa2cd1b models : change default encoding to utf8 (#605) 2023-03-22 21:17:24 +02:00
1beff6f66d models : change HF hosting from dataset to model 2023-03-22 20:44:56 +02:00
d629c034a4 models : fix HF model URL (close #356) 2023-01-02 09:54:43 +02:00
3467230a77 models : fix typo in convert-h5-to-ggml.py
signficant -> significant
2022-12-31 09:49:01 +02:00
77226aa89d models : fix support for spaces in path (close #315) 2022-12-23 11:11:38 +02:00
a613f16aec talk : improve prompting 2022-12-12 23:44:36 +02:00
d91c001120 Fix paths echoed after the download
Was using models path instead of root path
2022-12-08 09:23:52 +02:00
9fe7306f4b models : add the new "large" model release by OpenAI
The old "large" model is now renamed "large-v1".
If you have been using it, make sure to rename it and download the new
"large" model for best results.
2022-12-06 18:48:57 +02:00
abce28ea99 talk.wasm : move to https://whisper.ggerganov.com/talk
This way, we can share the same models across different WASM examples
and not have to download them for each page
2022-11-24 18:24:06 +02:00
a2ecd54455 models : add instructions for using HF fine-tuned models 2022-11-24 17:54:41 +02:00
00f46dbc1d models : add usage comments to the HF convert script (#157) 2022-11-23 23:22:40 +02:00
5698bddbc9 models : fix HF fine-tuned model conversion script (#157)
It works now
2022-11-23 23:14:11 +02:00
d64d6ca3fd models : minor changes to the HF convert script (#157) 2022-11-23 22:07:20 +02:00
93482d0373 models : add "convert-h5-to-ggml.py" script (#157)
Converts transformers models to ggml.
Although the conversion is successful, it does not work for some reason.
Not sure why
2022-11-23 17:19:22 +02:00
e70e5c8b53 models : simplify the conversion script
"transformers" dependency is not actually needed
2022-11-16 19:22:32 +02:00
55a0e1a64e Update download-ggml-model.sh
follow curl redirect to new hosting site
2022-11-16 18:59:44 +02:00
864a78a8d0 models : change default hosting to Hugging Face
My Linode is running out of monthly bandwidth due to the big interest in
the project
2022-11-15 19:47:06 +02:00
46a68fb9b5 minor : remove one more redundant line 2022-11-11 18:02:58 +02:00
ccd56a9c5b minor : fix double float32 conversion in python script 2022-11-11 17:58:51 +02:00
b5dde365e9 extra : compute SHA of all models files 2022-11-02 18:31:55 +02:00
b26345cc7b Added for Windows implemenated script download-ggml-model.cmd 2022-10-31 19:38:20 +02:00