whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-08-11 05:34:19 +02:00

Author	SHA1	Message	Date
Johannes Gäßler	95efcf011d	CUDA: fix overflow in FA, tune performance (llama/14840)	2025-07-28 13:02:32 +03:00
Johannes Gäßler	8272aa9f14	CUDA: fix compilation with GGML_CUDA_F16 (llama/14837)	2025-07-28 13:02:32 +03:00
Johannes Gäßler	a65976fc3c	CUDA: fix quantized KV cache + multiple sequences (llama/14822) * CUDA: fix quantized KV cache + multiple sequences * Update ggml/src/ggml-cuda/fattn-common.cuh Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-07-28 13:02:32 +03:00
lixing-star	026d8a0c6e	ggml: fix loongarch quantize_row_q8_1 error (llama/14827)	2025-07-28 13:02:32 +03:00
chen fan	49d5540206	CANN: weight format to NZ for Ascend310P3 (llama/14407) * weight format to nz for 310p * remove quant weight format to nz * clean code * fix * make the conditions for converting weights to NZ format consistent * clean code	2025-07-28 13:02:32 +03:00
Aman Gupta	f8402d0a95	CUDA: add fused rms norm (llama/14800)	2025-07-28 13:02:32 +03:00
Jeff Bolz	c91361379a	vulkan: fix rms_norm_mul to handle broadcasting dim0 (llama/14817)	2025-07-28 13:02:32 +03:00
Sigbjørn Skjæret	810018a63a	cuda : implement bf16 cpy ops and enable bf16 cont (llama/14763) * implement bf16 cpy ops and enable bf16 cont * deduplicate copy functions * deduplicate checks	2025-07-28 13:02:32 +03:00
lhez	de49384ab3	opencl: remove unreachable `return` (llama/14806)	2025-07-28 13:02:32 +03:00
R0CKSTAR	9008410087	cuda: remove linking to cublasLt (llama/14790) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-07-28 13:02:32 +03:00
Sigbjørn Skjæret	e81e17b048	opencl: fix `im2col` when `KW!=KH` (llama/14803)	2025-07-28 13:02:32 +03:00
rmatif	a2a5612402	opencl: add conv2d kernel (llama/14403) * add conv2d kernel * fix trailing whitespace * whitespace fixe * handle f16 input and f16 kernel, more opt * resolve conflicts * use enqueue_ndrange_kernel	2025-07-28 13:02:32 +03:00
Romain Biessy	52ad451c8a	sycl: Fix im2col (llama/14797)	2025-07-28 13:02:32 +03:00
Charles Xu	fc2ff438fd	kleidiai: add support for get_rows (llama/14676) * kleidiai: add support for get_rows * apply fixes based on code review * apply more fixes based on code review	2025-07-28 13:02:32 +03:00
Jeff Bolz	e3f4162a06	vulkan/cuda: Fix im2col when KW!=KH (llama/14789) The tid is decomposed into "ow + kyOW + kxOW*KH". Change "ksize" to match.	2025-07-28 13:02:32 +03:00
Ervin Áron Tasnádi	92a9e85d8b	ggml: adds CONV_2D op and direct GEMM Vulkan implementation (llama/14316) * ggml/ggml-vulkan/test-backend-ops: adds CONV_2D for Vulkan * ggml-vulkan: adds f32 scalar shader to compute 2D convolution directly with gemm (no need for im2col), * test-backend-ops: adds test_case_ref to check the validity/performance of ops against reference implementations having different graphs, adds tests * * Performance fixes: minimized branch divergence, uses collectives to eliminate redundant calculation, macros removed. * Kernel shared memory size check * Updates test-backend-ops to support graphs for performance measurement. * * Apple/Win32 compile errors fixed * Subgroup size used to determine tile size -> fixes llvmpipe errors. * Collectives disabled by default. * Intel support is disabled as the performance is poor. * Conv2d enabled for Intel with disabled collectives, disabled for Apple * test-backend-ops modifications are reverted * Trailing spaces and missing override fixed. * Triggering pipeline relaunch. * Code formatted with .clang-format.	2025-07-28 13:02:32 +03:00
Peter0x44	50f983a17e	vulkan: Add logging for bf16 features to ggml_vk_print_gpu_info (#13274 ) (llama/14707)	2025-07-28 13:02:32 +03:00
0cc4m	b06f314667	Vulkan: Fix fprintf format-security warning (llama/14770)	2025-07-28 13:02:32 +03:00
Kai Pastor	5c3b794c51	cmake : fix usage issues (ggml/1257) * CMake config: Create target only once Fix error on repeated find_package(ggml). For simplicity, check only for the top-level ggml::ggml. * CMake config: Add CUDA link libs * CMake config: Add OpenCL link libs * CMake config: Use canonical find_dependency Use set and append to control link lib variables. Apply more $<LINK_ONLY...>. * CMake config: Wire OpenMP dependency	2025-07-28 13:02:32 +03:00
Daniel Bevenius	e238dc1bdd	ggml-cpu : remove stdlib include from repack.cpp (ggml/1276) This commit removes the inclusion of `<cstdlib>`. The motivation for this change is that this source file does not seem to use any functions from this header and the comment about `qsort` is a little misleading/confusing.	2025-07-28 13:02:32 +03:00
Rich Waters	e7bf0294ec	Support static xcframework packaging in build-xcframework.sh (#3322 ) * This commit allows for the building of a static xcframework by adding a BUILD_STATIC_XCFRAMEWORK option. When enabled, the build-xcframework.sh script builds a self-contained static whisper.xcframework. The motivation for this change is so that command line binaries can link whisper.cpp without forcing users to install the whisper.xcframework separately. * Update build-xcframework.sh Co-authored-by: Daniel Bevenius <daniel.bevenius@gmail.com> * Address reviewer feedback: remove extra indentation around static xcframework creation. * squash! Address reviewer feedback: remove extra indentation around static xcframework creation. Fix whitespaces. --------- Co-authored-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2025-07-26 12:25:44 +02:00
Daniel Bevenius	7de8dd783f	examples : add note about WHISPER_WASM_SINGLE_FILE [no ci] (#3332 ) This commit adds a note to the README files of the WASM examples about the `WHISPER_WASM_SINGLE_FILE` option. The motivation for this is that currently this option is not documented and might be surprising to users who expect a separate .wasm file to be generated. Refs: https://github.com/ggml-org/whisper.cpp/issues/3290	2025-07-24 16:06:48 +02:00
Daniel Bevenius	85e474fd55	ci : add paths to build.yml (#3333 ) This commit adds specific paths to the GitHub Actions workflow file `.github/workflows/build.yml`. The motivation for this to avoid unnecessary builds when unrelated files are changed, which can save resources and time during the CI process. Refs: https://github.com/ggml-org/whisper.cpp/issues/3285	2025-07-24 16:04:21 +02:00
R0CKSTAR	210bbbe4d5	musa: upgrade musa sdk to rc4.2.0 (#3324 ) * musa: upgrade musa sdk to 4.2.0 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * musa: restore rc in docker image tag Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-07-24 13:19:57 +03:00
Sacha Arbonel	1f5cf0b288	server : hide language probabilities option behind flag (#3328 ) * examples/server: hide language probabilities option behind flag * code review * fix	2025-07-21 13:03:54 +02:00
BVK Chaitanya	2e6be2f380	go: fix Mac OS X builds (#3310 ) This commit fixes Go bindings build failure for Mac OS X (15.1) which is currently failing. Co-authored-by: Chaitanya Bayapuneni <bvk@mini.cinnamon-interval.ts.net>	2025-07-21 08:47:35 +02:00
Georgi Gerganov	c0dc391349	sync : ggml ggml-ci	2025-07-20 00:23:50 +03:00
Georgi Gerganov	0ed687c6f1	metal : fuse add, mul + add tests (llama/14596) ggml-ci	2025-07-20 00:23:50 +03:00
Oliver Simons	d4a7ea1634	cuda : Fix Gemma3n not executed as CUDA_GRAPH on NVGPUs (llama/14741) * Fix Gemma3n not executed as CUDA_GRAPH on NVGPUs Gemma3n uses Matrix-Matrix addition as part of their input processing, wrongly triggering CUDA_GRAPH disablement on NVGPUs even when batch-size of 1 is used. * Exclude `project_per_layer_input` by matching node names This ensures that all other graphs which don't exhibit this pattern do not have their behavior changed. * Revert unnecessary formatting changes	2025-07-20 00:23:50 +03:00
Aman Gupta	9a07cb064a	CUDA: set_rows + cpy.cu refactor (llama/14712)	2025-07-20 00:23:50 +03:00
Neo Zhang Jianyu	fed20b0682	use max work group size for device to replace the magic number (llama/14732)	2025-07-20 00:23:50 +03:00
Reese Levine	17c5411195	ggml: Add initial WebGPU backend (llama/14521) * Minimal setup of webgpu backend with dawn. Just prints out the adapter and segfaults * Initialize webgpu device * Making progress on setting up the backend * Finish more boilerplate/utility functions * Organize file and work on alloc buffer * Add webgpu_context to prepare for actually running some shaders * Work on memset and add shader loading * Work on memset polyfill * Implement set_tensor as webgpu WriteBuffer, remove host_buffer stubs since webgpu doesn't support it * Implement get_tensor and buffer_clear * Finish rest of setup * Start work on compute graph * Basic mat mul working * Work on emscripten build * Basic WebGPU backend instructions * Use EMSCRIPTEN flag * Work on passing ci, implement 4d tensor multiplication * Pass thread safety test * Implement permuting for mul_mat and cpy * minor cleanups * Address feedback * Remove division by type size in cpy op * Fix formatting and add github action workflows for vulkan and metal (m-series) webgpu backends * Fix name * Fix macos dawn prefix path	2025-07-20 00:23:50 +03:00
Georgi Gerganov	ae1bb2c8ea	llama : add high-throughput mode (llama/14363) * kv-cache : prepare K/V buffers for separation ggml-ci * batched-bench : fix oob write ggml-ci * llama : add "virtual sequences" ggml-ci * llama : use "stream" vs "virtual sequence" ggml-ci * graph : fix stream splitting when KV cache is not used ggml-ci * kv-cache : add multi-stream save/load support ggml-ci * llama : add "--attn-streams" flag ggml-ci * kv-cache : fix handling when find_slot fails ggml-ci * kv-cache : restore find_slot impl ggml-ci * kv-cache : add comments * kv-cache : add bounds checks for sequence id ggml-ci * cont : add n_seq_max to batch allocr ggml-ci * kv-cache : perform stream copies lazily after llama_synchronize ggml-ci * kv-cache : avoid throwing exceptions across the C boundary ggml-ci * CUDA: 4D FlashAttention support (llama/14628) * CUDA: 4D FlashAttention support * CUDA: fix WMMA FA kernel * llama : rename attn_streams -> kv_unified ggml-ci * common : rename kv_split -> kv_unified ggml-ci --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-07-20 00:23:50 +03:00
Georgi Gerganov	9cc645fec0	ggml : add asserts (llama/14720) * ggml : add asserts ggml-ci * cont : fix constant type Co-authored-by: Diego Devesa <slarengh@gmail.com> --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-07-20 00:23:50 +03:00
Jeff Bolz	8d1a0485f1	vulkan: fix noncontig check for mat_mul_id splitting (llama/14683) * vulkan: fix noncontig check for mat_mul_id splitting Remove supports_op check for > 4096 (splitting fixes this) * vulkan: fix batched matmul dequant for Q*_K	2025-07-20 00:23:50 +03:00
Jeff Bolz	b33841c453	vulkan: add RTE variants for glu/add/sub/mul/div (llama/14653)	2025-07-20 00:23:50 +03:00
R0CKSTAR	ab79c6c118	cuda: fix build warnings in set-rows.cu (unused variable) (llama/14687) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-07-20 00:23:50 +03:00
Anton Mitkov	a6b9271c2c	sycl: Hotfix for non dnnl codepath (llama/14677)	2025-07-20 00:23:50 +03:00
shalinib-ibm	ded2e3cf6d	ggml : refactor llamafile_sgemm PPC code (llama/14673) Remove un-necessary templates from class definition and packing functions Reduce deeply nested conditionals, if-else switching in mnapck function Replace repetitive code with inline functions in Packing functions 2 ~ 7% improvement in Q8 Model 15 ~ 50% improvement in Q4 Model Signed-off-by: Shalini Salomi Bodapati <Shalini.Salomi.Bodapati@ibm.com>	2025-07-20 00:23:50 +03:00
Akarshan Biswas	ebb0e9d0ed	SYCL: use 1D kernel for set_rows (llama/14618) * SYCL: Use 1D kernel for set_rows * Remove dangling comment * Refactor and use ceil_div	2025-07-20 00:23:50 +03:00
Anton Mitkov	24803d62c6	sycl: Batched mulmat rework for oneDNN dispatch (llama/14617)	2025-07-20 00:23:50 +03:00
Sigbjørn Skjæret	0611387d17	cuda : add set rows for bf16 (llama/14664)	2025-07-20 00:23:50 +03:00
Yavor Ivanov	fe33572b22	cuda : add ELU support (llama/14657)	2025-07-20 00:23:50 +03:00
Georgi Gerganov	21308b4e6e	ggml : add build-time message to remind about ggml_set_rows (llama/14661) ggml-ci	2025-07-20 00:23:50 +03:00
Yavor Ivanov	3cad26d807	metal : Add missing unary ops Metal support (llama/14660)	2025-07-20 00:23:50 +03:00
Aman Gupta	66b3a39bdc	CUDA: add set rows for f32 and f16 (llama/14551) * CUDA: add set rows for f32 and f16 * Review: change kernel params, use strides from host * Use 1-d kernel * Review: use int64_t for blockDim.x, rename nb->s for clarity	2025-07-20 00:23:50 +03:00
Charles Xu	032697b9a8	whisper: validate get_rows support for cpu extra buffer (#3323 )	2025-07-14 15:13:44 +03:00
Greg Sadetsky	a16da91365	examples : update links in wasm examples (#3318 ) * fix 404 link * update link in whisper.wasm example * update example in command.wasm * update link in bench.wasm example * update link in stream.wasm example	2025-07-12 23:22:35 +02:00
Georgi Gerganov	3775c503d5	sync : resolve conflicts (#0 ) ggml-ci	2025-07-12 19:23:56 +03:00
Georgi Gerganov	6ddff4d96a	talk-llama : sync llama.cpp ggml-ci	2025-07-12 19:23:56 +03:00

1 2 3 4 5 ...

2952 Commits