whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-08-14 19:08:43 +02:00

Author	SHA1	Message	Date
Kai Pastor	5c3b794c51	cmake : fix usage issues (ggml/1257) * CMake config: Create target only once Fix error on repeated find_package(ggml). For simplicity, check only for the top-level ggml::ggml. * CMake config: Add CUDA link libs * CMake config: Add OpenCL link libs * CMake config: Use canonical find_dependency Use set and append to control link lib variables. Apply more $<LINK_ONLY...>. * CMake config: Wire OpenMP dependency	2025-07-28 13:02:32 +03:00
Daniel Bevenius	e238dc1bdd	ggml-cpu : remove stdlib include from repack.cpp (ggml/1276) This commit removes the inclusion of `<cstdlib>`. The motivation for this change is that this source file does not seem to use any functions from this header and the comment about `qsort` is a little misleading/confusing.	2025-07-28 13:02:32 +03:00
Rich Waters	e7bf0294ec	Support static xcframework packaging in build-xcframework.sh (#3322 ) * This commit allows for the building of a static xcframework by adding a BUILD_STATIC_XCFRAMEWORK option. When enabled, the build-xcframework.sh script builds a self-contained static whisper.xcframework. The motivation for this change is so that command line binaries can link whisper.cpp without forcing users to install the whisper.xcframework separately. * Update build-xcframework.sh Co-authored-by: Daniel Bevenius <daniel.bevenius@gmail.com> * Address reviewer feedback: remove extra indentation around static xcframework creation. * squash! Address reviewer feedback: remove extra indentation around static xcframework creation. Fix whitespaces. --------- Co-authored-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2025-07-26 12:25:44 +02:00
Daniel Bevenius	7de8dd783f	examples : add note about WHISPER_WASM_SINGLE_FILE [no ci] (#3332 ) This commit adds a note to the README files of the WASM examples about the `WHISPER_WASM_SINGLE_FILE` option. The motivation for this is that currently this option is not documented and might be surprising to users who expect a separate .wasm file to be generated. Refs: https://github.com/ggml-org/whisper.cpp/issues/3290	2025-07-24 16:06:48 +02:00
Daniel Bevenius	85e474fd55	ci : add paths to build.yml (#3333 ) This commit adds specific paths to the GitHub Actions workflow file `.github/workflows/build.yml`. The motivation for this to avoid unnecessary builds when unrelated files are changed, which can save resources and time during the CI process. Refs: https://github.com/ggml-org/whisper.cpp/issues/3285	2025-07-24 16:04:21 +02:00
R0CKSTAR	210bbbe4d5	musa: upgrade musa sdk to rc4.2.0 (#3324 ) * musa: upgrade musa sdk to 4.2.0 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * musa: restore rc in docker image tag Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-07-24 13:19:57 +03:00
Sacha Arbonel	1f5cf0b288	server : hide language probabilities option behind flag (#3328 ) * examples/server: hide language probabilities option behind flag * code review * fix	2025-07-21 13:03:54 +02:00
BVK Chaitanya	2e6be2f380	go: fix Mac OS X builds (#3310 ) This commit fixes Go bindings build failure for Mac OS X (15.1) which is currently failing. Co-authored-by: Chaitanya Bayapuneni <bvk@mini.cinnamon-interval.ts.net>	2025-07-21 08:47:35 +02:00
Georgi Gerganov	c0dc391349	sync : ggml ggml-ci	2025-07-20 00:23:50 +03:00
Georgi Gerganov	0ed687c6f1	metal : fuse add, mul + add tests (llama/14596) ggml-ci	2025-07-20 00:23:50 +03:00
Oliver Simons	d4a7ea1634	cuda : Fix Gemma3n not executed as CUDA_GRAPH on NVGPUs (llama/14741) * Fix Gemma3n not executed as CUDA_GRAPH on NVGPUs Gemma3n uses Matrix-Matrix addition as part of their input processing, wrongly triggering CUDA_GRAPH disablement on NVGPUs even when batch-size of 1 is used. * Exclude `project_per_layer_input` by matching node names This ensures that all other graphs which don't exhibit this pattern do not have their behavior changed. * Revert unnecessary formatting changes	2025-07-20 00:23:50 +03:00
Aman Gupta	9a07cb064a	CUDA: set_rows + cpy.cu refactor (llama/14712)	2025-07-20 00:23:50 +03:00
Neo Zhang Jianyu	fed20b0682	use max work group size for device to replace the magic number (llama/14732)	2025-07-20 00:23:50 +03:00
Reese Levine	17c5411195	ggml: Add initial WebGPU backend (llama/14521) * Minimal setup of webgpu backend with dawn. Just prints out the adapter and segfaults * Initialize webgpu device * Making progress on setting up the backend * Finish more boilerplate/utility functions * Organize file and work on alloc buffer * Add webgpu_context to prepare for actually running some shaders * Work on memset and add shader loading * Work on memset polyfill * Implement set_tensor as webgpu WriteBuffer, remove host_buffer stubs since webgpu doesn't support it * Implement get_tensor and buffer_clear * Finish rest of setup * Start work on compute graph * Basic mat mul working * Work on emscripten build * Basic WebGPU backend instructions * Use EMSCRIPTEN flag * Work on passing ci, implement 4d tensor multiplication * Pass thread safety test * Implement permuting for mul_mat and cpy * minor cleanups * Address feedback * Remove division by type size in cpy op * Fix formatting and add github action workflows for vulkan and metal (m-series) webgpu backends * Fix name * Fix macos dawn prefix path	2025-07-20 00:23:50 +03:00
Georgi Gerganov	ae1bb2c8ea	llama : add high-throughput mode (llama/14363) * kv-cache : prepare K/V buffers for separation ggml-ci * batched-bench : fix oob write ggml-ci * llama : add "virtual sequences" ggml-ci * llama : use "stream" vs "virtual sequence" ggml-ci * graph : fix stream splitting when KV cache is not used ggml-ci * kv-cache : add multi-stream save/load support ggml-ci * llama : add "--attn-streams" flag ggml-ci * kv-cache : fix handling when find_slot fails ggml-ci * kv-cache : restore find_slot impl ggml-ci * kv-cache : add comments * kv-cache : add bounds checks for sequence id ggml-ci * cont : add n_seq_max to batch allocr ggml-ci * kv-cache : perform stream copies lazily after llama_synchronize ggml-ci * kv-cache : avoid throwing exceptions across the C boundary ggml-ci * CUDA: 4D FlashAttention support (llama/14628) * CUDA: 4D FlashAttention support * CUDA: fix WMMA FA kernel * llama : rename attn_streams -> kv_unified ggml-ci * common : rename kv_split -> kv_unified ggml-ci --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-07-20 00:23:50 +03:00
Georgi Gerganov	9cc645fec0	ggml : add asserts (llama/14720) * ggml : add asserts ggml-ci * cont : fix constant type Co-authored-by: Diego Devesa <slarengh@gmail.com> --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-07-20 00:23:50 +03:00
Jeff Bolz	8d1a0485f1	vulkan: fix noncontig check for mat_mul_id splitting (llama/14683) * vulkan: fix noncontig check for mat_mul_id splitting Remove supports_op check for > 4096 (splitting fixes this) * vulkan: fix batched matmul dequant for Q*_K	2025-07-20 00:23:50 +03:00
Jeff Bolz	b33841c453	vulkan: add RTE variants for glu/add/sub/mul/div (llama/14653)	2025-07-20 00:23:50 +03:00
R0CKSTAR	ab79c6c118	cuda: fix build warnings in set-rows.cu (unused variable) (llama/14687) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-07-20 00:23:50 +03:00
Anton Mitkov	a6b9271c2c	sycl: Hotfix for non dnnl codepath (llama/14677)	2025-07-20 00:23:50 +03:00
shalinib-ibm	ded2e3cf6d	ggml : refactor llamafile_sgemm PPC code (llama/14673) Remove un-necessary templates from class definition and packing functions Reduce deeply nested conditionals, if-else switching in mnapck function Replace repetitive code with inline functions in Packing functions 2 ~ 7% improvement in Q8 Model 15 ~ 50% improvement in Q4 Model Signed-off-by: Shalini Salomi Bodapati <Shalini.Salomi.Bodapati@ibm.com>	2025-07-20 00:23:50 +03:00
Akarshan Biswas	ebb0e9d0ed	SYCL: use 1D kernel for set_rows (llama/14618) * SYCL: Use 1D kernel for set_rows * Remove dangling comment * Refactor and use ceil_div	2025-07-20 00:23:50 +03:00
Anton Mitkov	24803d62c6	sycl: Batched mulmat rework for oneDNN dispatch (llama/14617)	2025-07-20 00:23:50 +03:00
Sigbjørn Skjæret	0611387d17	cuda : add set rows for bf16 (llama/14664)	2025-07-20 00:23:50 +03:00
Yavor Ivanov	fe33572b22	cuda : add ELU support (llama/14657)	2025-07-20 00:23:50 +03:00
Georgi Gerganov	21308b4e6e	ggml : add build-time message to remind about ggml_set_rows (llama/14661) ggml-ci	2025-07-20 00:23:50 +03:00
Yavor Ivanov	3cad26d807	metal : Add missing unary ops Metal support (llama/14660)	2025-07-20 00:23:50 +03:00
Aman Gupta	66b3a39bdc	CUDA: add set rows for f32 and f16 (llama/14551) * CUDA: add set rows for f32 and f16 * Review: change kernel params, use strides from host * Use 1-d kernel * Review: use int64_t for blockDim.x, rename nb->s for clarity	2025-07-20 00:23:50 +03:00
Charles Xu	032697b9a8	whisper: validate get_rows support for cpu extra buffer (#3323 )	2025-07-14 15:13:44 +03:00
Greg Sadetsky	a16da91365	examples : update links in wasm examples (#3318 ) * fix 404 link * update link in whisper.wasm example * update example in command.wasm * update link in bench.wasm example * update link in stream.wasm example	2025-07-12 23:22:35 +02:00
Georgi Gerganov	3775c503d5	sync : resolve conflicts (#0 ) ggml-ci	2025-07-12 19:23:56 +03:00
Georgi Gerganov	6ddff4d96a	talk-llama : sync llama.cpp ggml-ci	2025-07-12 19:23:56 +03:00
Georgi Gerganov	6d64e4abf3	sync : ggml	2025-07-12 19:23:56 +03:00
Georgi Gerganov	85dcc74b88	sync : resolve conflicts (ggml/0) ggml-ci	2025-07-12 19:23:56 +03:00
Jeff Bolz	915fc153a5	vulkan: support SET_ROWS (llama/14587) * vulkan: support SET_ROWS Add variants of the copy_to_quant shader that do the SET_ROWS operation. Change these shaders to spread the work across the workgroup. The memory access pattern is probably not great (one thread per quant block), but should be fine for now. * vulkan: optimize set_rows Larger workgroups for non-quant types. Set "norepeat" (there is manual repeat logic). Use fastmod.	2025-07-12 19:23:56 +03:00
Jeff Bolz	8670a3fd5d	vulkan: optimizations for deepseek prompt processing (llama/14555) * vulkan: allow unclamped loads in coopmat2 mul_mat_id shader * vulkan: increase coopmat2 mul_mat_id tile size * vulkan: optimize mat_mul_id row_ids search to batch loads, and port to coopmat1 path * vulkan: use smaller FA row size when head size is large. applies to both scalar and CM2 paths (CM1 isn't used due to shared memory limits)	2025-07-12 19:23:56 +03:00
Tarek Dakhran	74f6d47904	model : support LiquidAI LFM2 hybrid family (llama/14620) Important LFM2 was [merged ](https://github.com/huggingface/transformers/pull/39340)into transformers, but has not yet been released. To convert into gguf, install transformers from source ```shell pip install "transformers @ git+https://github.com/huggingface/transformers.git@main" ```	2025-07-12 19:23:56 +03:00
Slobodan Josic	a4ff4ec9cb	HIP : Add HIP 7.0+ compatibility for hipBLAS compute types (llama/14634)	2025-07-12 19:23:56 +03:00
rmatif	b0754136be	opencl: add tiled mul_mat_f16_f32 (llama/14535) * add tiled mul_mat_f16_f32 * fix trailing whitespace * add insightful comments	2025-07-12 19:23:56 +03:00
lhez	6f113cbcaa	opencl: add `set_rows` for `f16` and `f32` (llama/14547) * opencl: add `set_rows` for `f16` and `f32` * opencl: better choose workgroup size for `set_rows`	2025-07-12 19:23:56 +03:00
Akarshan Biswas	3c21cde540	SYCL: Initial set_rows kernel implementation (llama/14562) * SYCL: Initial set_rows kernel implementation * Revert max_threads to 256 * Refactor set_rows and address review comments * Deduplicate conversion function * Remove guard before kernel launch and refactor * Fix and add back SFINAE	2025-07-12 19:23:56 +03:00
compilade	fb885fa48b	cuda : support Falcon-H1 state size for SSM_SCAN (llama/14602)	2025-07-12 19:23:56 +03:00
Xuan-Son Nguyen	2021870fb8	ggml : add ggml_scale_bias (llama/14417) * ggml : add ggml_scale_bias * ggml_vec_mad1_f32 * add more simd * add CUDA * sycl * vulkan * cann (placeholder) * opencl * will this fix cpu? * fix cuda * suggestions from coderabbit * fix cann compile error * vDSP_vsmsa * rm __ARM_FEATURE_SVE * use memcpy for op params * make code looks more consistent * use scalar for __ARM_FEATURE_SVE * add x param to ggml_vec_mad1_f32	2025-07-12 19:23:56 +03:00
Miaoqian Lin	48b18f9eb8	ggml : prevent integer overflow in gguf tensor size calculation (llama/14595)	2025-07-12 19:23:56 +03:00
Jeff Bolz	fadb3233b6	vulkan: optimize flash attention split_k_reduce (llama/14554) * vulkan: allow FA split_k with smaller KV values * vulkan: spread split_k_reduce work across more threads k_num can get rather large. Use the whole workgroup to reduce the M/L values. Launch a thread for each element in the HSV dimension of the output. Helps a lot for large HSV (like deepseek).	2025-07-12 19:23:56 +03:00
Jeff Bolz	9750e4c988	vulkan : fix rope with partial rotation and non-cont src (llama/14582)	2025-07-12 19:23:56 +03:00
Georgi Gerganov	c3942b3db6	cuda : fix rope with partial rotation and non-cont src (llama/14580) * cuda : fix rope non-cont ggml-ci * cont : fix multi-rope + add test ggml-ci * sycl : try fix ggml-ci * cont : fix sycl + clean-up cuda ggml-ci	2025-07-12 19:23:56 +03:00
Aman Gupta	98e7beac6c	CUDA: add bilinear interpolation for upscale (llama/14563)	2025-07-12 19:23:56 +03:00
R0CKSTAR	7e9c6bbab2	musa: fix build warnings (unused variable) (llama/14561) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-07-12 19:23:56 +03:00
Aman Gupta	8e545f466c	CUDA: add bf16 and i32 to getrows (llama/14529)	2025-07-12 19:23:56 +03:00

1 2 3 4 5 ...

2934 Commits