whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-07-24 14:45:51 +02:00

Files

Eve 3216efef2e AVX BF16 and single scale quant optimizations (llama/10212)

* use 128 bit loads (i've tried 256->128 to death and its slower)

* double accumulator

* avx bf16 vec dot

* +3% q4_0 inference

* +7% tg +5% pp compared to master

* slower f16c version, kep for reference

* 256b version, also slow. i tried :)

* revert f16

* faster with madd

* split to functions

* Q8_0 and IQ4_NL, 5-7% faster

* fix potential overflow (performance reduced)

* 16 bit add for q4_0 only

* merge

2024-11-20 21:00:08 +02:00

cmake

whisper : reorganize source code + improve CMake (#2256 )

2024-06-26 19:34:09 +03:00

include

backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (llama/9921)

2024-11-20 21:00:08 +02:00

src

AVX BF16 and single scale quant optimizations (llama/10212)

2024-11-20 21:00:08 +02:00

.gitignore

whisper : reorganize source code + improve CMake (#2256 )

2024-06-26 19:34:09 +03:00

CMakeLists.txt

backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (llama/9921)

2024-11-20 21:00:08 +02:00

ggml_vk_generate_shaders.py

whisper : reorganize source code + improve CMake (#2256 )

2024-06-26 19:34:09 +03:00