whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-06-01 07:25:49 +02:00

History

Jeff Bolz 1bebb1a116 vulkan: Optimize some mat-vec mul quant shaders (llama/10296)

Compute two result elements per workgroup (for Q{4,5}_{0,1}). This reuses
the B loads across the rows and also reuses some addressing calculations.
This required manually partially unrolling the loop, since the compiler
is less willing to unroll outer loops.

Add bounds-checking on the last iteration of the loop. I think this was at
least partly broken before.

Optimize the Q4_K shader to vectorize most loads and reduce the number of
bit twiddling instructions.

2024-11-20 21:00:08 +02:00

include

ggml: new optimization interface (ggml/988)

2024-11-20 21:00:08 +02:00

src

vulkan: Optimize some mat-vec mul quant shaders (llama/10296)

2024-11-20 21:00:08 +02:00

.gitignore

whisper : reorganize source code + improve CMake (#2256 )

2024-06-26 19:34:09 +03:00

CMakeLists.txt

backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (llama/9921)

2024-11-20 21:00:08 +02:00

ggml_vk_generate_shaders.py

whisper : reorganize source code + improve CMake (#2256 )

2024-06-26 19:34:09 +03:00