whisper.cpp/ggml
Jeff Bolz 1bebb1a116 vulkan: Optimize some mat-vec mul quant shaders (llama/10296)
Compute two result elements per workgroup (for Q{4,5}_{0,1}). This reuses
the B loads across the rows and also reuses some addressing calculations.
This required manually partially unrolling the loop, since the compiler
is less willing to unroll outer loops.

Add bounds-checking on the last iteration of the loop. I think this was at
least partly broken before.

Optimize the Q4_K shader to vectorize most loads and reduce the number of
bit twiddling instructions.
2024-11-20 21:00:08 +02:00
..
include ggml: new optimization interface (ggml/988) 2024-11-20 21:00:08 +02:00
src vulkan: Optimize some mat-vec mul quant shaders (llama/10296) 2024-11-20 21:00:08 +02:00
.gitignore whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
CMakeLists.txt backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (llama/9921) 2024-11-20 21:00:08 +02:00
ggml_vk_generate_shaders.py whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00