whisper.cpp/ggml
Eve 3216efef2e AVX BF16 and single scale quant optimizations (llama/10212)
* use 128 bit loads (i've tried 256->128 to death and its slower)

* double accumulator

* avx bf16 vec dot

* +3% q4_0 inference

* +7% tg +5% pp compared to master

* slower f16c version, kep for reference

* 256b version, also slow. i tried :)

* revert f16

* faster with madd

* split to functions

* Q8_0 and IQ4_NL, 5-7% faster

* fix potential overflow (performance reduced)

* 16 bit add for q4_0 only

* merge
2024-11-20 21:00:08 +02:00
..
cmake whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
include backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (llama/9921) 2024-11-20 21:00:08 +02:00
src AVX BF16 and single scale quant optimizations (llama/10212) 2024-11-20 21:00:08 +02:00
.gitignore whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
CMakeLists.txt backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (llama/9921) 2024-11-20 21:00:08 +02:00
ggml_vk_generate_shaders.py whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00