Georgi Gerganov 7094ea5e75
whisper : use flash attention (#2152)
* whisper : use flash attention in the encoder

* whisper : add kv_pad

* whisper : remove extra backend instance (huh?)

* whisper : use FA for cross-attention

* whisper : use FA for self-attention

* whisper : simplify encoder FA

* whisper : add flash_attn runtime parameter

* scripts : add bench log

* scripts : add M1 Pro bench log
2024-05-15 09:38:19 +03:00
..
2024-05-15 09:38:19 +03:00
2024-02-23 09:22:24 +02:00
2024-05-13 11:02:26 +03:00
2023-11-20 13:16:38 +02:00
2023-02-18 09:42:31 +02:00