whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-07-15 06:25:04 +02:00

Author	SHA1	Message	Date
Daniel Bevenius	09846f4e12	whisper: remove MSVC warnings pragmas (#3090 ) * ggml : remove MSVC warnings pragmas This commit removes the MSVC-specific pragmas as these are now handled in CMakeLists.txt. * whisper : remove MSVC warning pragmas This commit removes the MSVC-specific pragmas. These are now handled in the CMakeLists.txt file.	2025-05-05 13:09:35 +02:00
Srihari-mcw	ee0013865d	ggml : Add AVX512 implementation of GEMM - Q4_Kx8 (llama/12829) * Add AVX512 implementation of GEMM - q4kx8 * Update changes to remove unnecessary whitespaces	2025-04-24 20:39:16 +03:00
SXX	915c14ef10	ggml: use _mm[512/256]_dpbusd[_avx]_epi32 to directly accumulate into the result register (llama/12773) * ggml: use _mm[512/256]_dpbusd[_avx]_epi32 to directly accumulate into the result register * simplifies the codebase by removing redundant functions	2025-04-24 20:39:16 +03:00
Georgi Gerganov	3c4d363872	ggml : fix MUL_MAT_ID repack with Q8_K (llama/12544) * ggml : fix MUL_MAT_ID repack with Q8_K ggml-ci * ggml : improve repack templates ggml-ci	2025-03-27 11:06:03 +02:00
Srihari-mcw	8058f19d0b	ggml : block interleaving support for Q4_K quantization for x86 AVX2 architecture (llama/12332) * Add block interleaving support for Q4_K quantization * Remove whitespaces and fix CI/CD issues * Update pointer of bsums from int16_t to const int16_t * Add vector version of quantize_q8_K_4x8 function * Update code formatting based on review comments	2025-03-27 11:06:03 +02:00
William Tambellini	c98681e6d5	ggml : upgrade init_tensor API to return a ggml_status (llama/11854) * Upgrade init_tensor API to return a ggml_status To prepare for an 'abort-free' ggml (ggml not to abort on OOMs but return a OOM status), as agreeed with Diego in the ggml repo, upgrade the init_tensor() and view_init() APIs to return a ggml_status. * misc fixes --------- Co-authored-by: slaren <slarengh@gmail.com>	2025-03-08 15:13:01 +02:00
Diego Devesa	09fabffdf5	ggml-backend : only offload from host buffers (fix) (llama/11124)	2025-01-14 10:38:01 +02:00
Srihari-mcw	3fcba3e58b	ggml : fixes for AVXVNNI instruction set with MSVC and Clang (llama/11027) * Fixes for clang AVX VNNI * enable AVX VNNI and alder lake build for MSVC * Apply suggestions from code review --------- Co-authored-by: slaren <slarengh@gmail.com>	2025-01-04 10:45:01 +02:00
Adrien Gallouët	6d502f33dc	ggml-cpu: replace NEON asm with intrinsics in ggml_gemv_q4_0_4x8_q8_0() (llama/10874) * ggml-cpu: replace NEON asm with intrinsics in ggml_gemv_q4_0_4x8_q8_0() Signed-off-by: Adrien Gallouët <angt@huggingface.co> * ggml-cpu: format code Signed-off-by: Adrien Gallouët <angt@huggingface.co> --------- Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-01-04 10:45:01 +02:00
Georgi Gerganov	d0a050b51f	ggml : disable iq4_nl interleave size 8 (llama/10709) ggml-ci	2024-12-18 12:52:16 +02:00
Djip007	e990d1b791	ggml : refactor online repacking (llama/10446) * rename ggml-cpu-aarch64.c to .cpp * reformat extra cpu backend. - clean Q4_0_N_M and IQ4_0_N_M - remove from "file" tensor type - allow only with dynamic repack - extract cpu extra bufts and convert to C++ - hbm - "aarch64" - more generic use of extra buffer - generalise extra_supports_op - new API for "cpu-accel": - amx - aarch64 * clang-format * Clean Q4_0_N_M ref Enable restrict on C++ * add op GGML_OP_MUL_MAT_ID for Q4_0_N_M with runtime repack * added/corrected control on tensor size for Q4 repacking. * Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * add debug logs on repacks. --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-12-18 12:52:16 +02:00

11 Commits