Georgi Gerganov
7094ea5e75
whisper : use flash attention (#2152)
* whisper : use flash attention in the encoder
* whisper : add kv_pad
* whisper : remove extra backend instance (huh?)
* whisper : use FA for cross-attention
* whisper : use FA for self-attention
* whisper : simplify encoder FA
* whisper : add flash_attn runtime parameter
* scripts : add bench log
* scripts : add M1 Pro bench log
2024-05-15 09:38:19 +03:00
..
2024-05-15 09:38:19 +03:00
2024-05-15 09:38:19 +03:00
2024-04-09 20:13:41 +03:00
2024-04-09 20:13:41 +03:00
2024-04-09 20:13:41 +03:00
2024-04-09 20:13:41 +03:00
2024-04-09 20:27:44 +03:00
2024-04-09 20:13:41 +03:00
2024-04-09 20:13:41 +03:00
2024-05-14 19:15:35 +03:00
2024-05-14 19:16:32 +03:00
2024-05-14 19:15:35 +03:00
2024-04-09 20:13:41 +03:00