whisper.cpp

extern/whisper.cpp

Fork 1

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-06-03 00:15:40 +02:00

Commit Graph

Author	SHA1	Message	Date
Jeff Bolz	a221288dc6	vulkan: workaround FA compile failures on macos (llama/13517)	2025-05-19 14:58:39 +03:00
Jeff Bolz	a04b329ad1	vulkan: scalar flash attention implementation (llama/13324) * vulkan: scalar flash attention implementation * vulkan: always use fp32 for scalar flash attention * vulkan: use vector loads in scalar flash attention shader * vulkan: remove PV matrix, helps with register usage * vulkan: reduce register usage in scalar FA, but perf may be slightly worse * vulkan: load each Q value once. optimize O reduction. more tuning * vulkan: support q4_0/q8_0 KV in scalar FA * CI: increase timeout to accommodate newly-supported tests * vulkan: for scalar FA, select between 1 and 8 rows * vulkan: avoid using Float16 capability in scalar FA	2025-05-13 13:59:21 +03:00

Author

SHA1

Message

Date

Jeff Bolz

a221288dc6

vulkan: workaround FA compile failures on macos (llama/13517)

2025-05-19 14:58:39 +03:00

Jeff Bolz

a04b329ad1

vulkan: scalar flash attention implementation (llama/13324)

* vulkan: scalar flash attention implementation

* vulkan: always use fp32 for scalar flash attention

* vulkan: use vector loads in scalar flash attention shader

* vulkan: remove PV matrix, helps with register usage

* vulkan: reduce register usage in scalar FA, but perf may be slightly worse

* vulkan: load each Q value once. optimize O reduction. more tuning

* vulkan: support q4_0/q8_0 KV in scalar FA

* CI: increase timeout to accommodate newly-supported tests

* vulkan: for scalar FA, select between 1 and 8 rows

* vulkan: avoid using Float16 capability in scalar FA

2025-05-13 13:59:21 +03:00

2 Commits