whisper.cpp/ggml
Johannes Gäßler 15d71189e9 CUDA: optimize and refactor MMQ (llama/8416)
* CUDA: optimize and refactor MMQ

* explicit q8_1 memory layouts, add documentation
2024-08-08 22:48:46 +03:00
..
cmake whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
include ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (llama/5780) 2024-08-08 22:48:46 +03:00
src CUDA: optimize and refactor MMQ (llama/8416) 2024-08-08 22:48:46 +03:00
.gitignore whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
CMakeLists.txt ggml : move sgemm sources to llamafile subfolder (llama/8394) 2024-08-08 22:48:46 +03:00
ggml_vk_generate_shaders.py whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00