whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-08-03 22:21:15 +02:00

Files

Jeff Bolz 45f1f9144f vulkan: Optimize soft_max (llama/10301)

* vulkan: Optimize soft_max

Large soft_max could already saturate memory, but small/medium sizes were
pretty slow. The bulk of the gains for them comes from using a smaller
workgroup size, and making the workgroup size match the subgroup size also
makes the barriers much cheaper.

Cache some values in locals to avoid refetching/recomputing. And stamp
out a few "template instantiations" so smaller cases will fully unroll.

Add a missing early return for OOB rows. This happens when there are more
than 512 rows and the dispatch is 512 x H.

* vulkan: Further soft_max optimizations

Restore the workgroup size of 512 case, use it for >1024.

Use unrollable loops for more iteration counts.

2024-11-20 21:00:08 +02:00

include

ggml: new optimization interface (ggml/988)

2024-11-20 21:00:08 +02:00

src

vulkan: Optimize soft_max (llama/10301)

2024-11-20 21:00:08 +02:00

.gitignore

whisper : reorganize source code + improve CMake (#2256 )

2024-06-26 19:34:09 +03:00

CMakeLists.txt

CUDA: remove DMMV, consolidate F16 mult mat vec (llama/10318)

2024-11-20 21:00:08 +02:00

ggml_vk_generate_shaders.py

whisper : reorganize source code + improve CMake (#2256 )

2024-06-26 19:34:09 +03:00