whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-08-02 03:23:48 +02:00

Files

Jeff Bolz a04b329ad1 vulkan: scalar flash attention implementation (llama/13324)

* vulkan: scalar flash attention implementation

* vulkan: always use fp32 for scalar flash attention

* vulkan: use vector loads in scalar flash attention shader

* vulkan: remove PV matrix, helps with register usage

* vulkan: reduce register usage in scalar FA, but perf may be slightly worse

* vulkan: load each Q value once. optimize O reduction. more tuning

* vulkan: support q4_0/q8_0 KV in scalar FA

* CI: increase timeout to accommodate newly-supported tests

* vulkan: for scalar FA, select between 1 and 8 rows

* vulkan: avoid using Float16 capability in scalar FA

2025-05-13 13:59:21 +03:00

cmake

ggml : sync/merge cmake,riscv,powerpc, add common.cmake (ggml/0)

2025-03-27 11:06:03 +02:00

include

CUDA: fix bad asserts for partial offload (llama/13337)

2025-05-07 21:00:32 +03:00

src

vulkan: scalar flash attention implementation (llama/13324)

2025-05-13 13:59:21 +03:00

.gitignore

whisper : reorganize source code + improve CMake (#2256 )

2024-06-26 19:34:09 +03:00

CMakeLists.txt

whisper: remove MSVC warnings pragmas (#3090 )

2025-05-05 13:09:35 +02:00