whisper : reduce memory usage during inference (#431)

* ggml : add "scratch" buffer support

* ggml : support for scratch ring-buffer

* ggml : bug fix in ggml_repeat()

* ggml : error on scratch buffer overflow

* whisper : use scratch buffers during inference (base model only)

* whisper : update memory usage for all models

* whisper : fix encoder memory usage

* whisper : use whisper_context functions instead of macros

* whisper : fix FF + remove it from README

* ggml : reuse ggml_new_i32

* ggml : refactor the scratch buffer storage

* whisper : reorder scratch buffers in the decoder

* main : add option to disable temp fallback

* Update README.md
This commit is contained in:
Georgi Gerganov
2023-02-04 09:45:52 +02:00
committed by GitHub
parent c306a7fd89
commit f3ee4a9673
7 changed files with 702 additions and 472 deletions

File diff suppressed because one or more lines are too long