whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-07-12 02:35:19 +02:00

Author	SHA1	Message	Date
Masato Nakasaka	f0f5a9f7fb	vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs (llama/14001) * allowing B580 and U9-288V * experimenting code to detect Xe2 * allowing coopmat only for Xe2 GPUs * fixed comment wording * fixed comment wording * removed unnecessary driver check	2025-06-10 12:40:33 +03:00
Diego Devesa	13a03c5d33	llama : allow using mmap without PrefetchVirtualMemory, apply GGML_WIN_VER to llama.cpp sources (llama/14013)	2025-06-10 12:40:33 +03:00
Jeff Bolz	6dd91d4f7e	vulkan: automatically deduce size of push constants (llama/13936)	2025-06-10 12:40:33 +03:00
Ervin Áron Tasnádi	5171b24f70	ggml-vulkan: adds support for op CONV_TRANSPOSE_1D (llama/13813) * * ggml-vulkan: adds op CONV_TRANSPOSE_1D * test-backend-ops: adds more spohisticated tests for CONV_TRANSPOSE_1D * Missing barrier added to shader. Number of additional tests reduced to 108. * * Fixes typo in variable name. * Removes extra whitespaces. * Adds int64->int32 casts to prevent possible warnings. * Problem size reduced in tests to pass tests with llvmpipe. * supports_op condition moved from unintended position	2025-06-10 12:40:33 +03:00
Diego Devesa	23e2fe0682	releases : use dl backend for linux release, remove arm64 linux release (llama/13996)	2025-06-10 12:40:33 +03:00
Johannes Gäßler	7f4d110f53	CUDA: fix FTZ in FA for Gemma 3 (llama/13991)	2025-06-10 12:40:33 +03:00
Jeff Bolz	ee0ef39fee	vulkan: fix warnings in perf logger querypool code (llama/13937)	2025-06-10 12:40:33 +03:00
lhez	62791ba2e6	opencl: add `backend_synchronize` (llama/13939) * This is not needed by the normal use where the result is read using `tensor_get`, but it allows perf mode of `test-backend-ops` to properly measure performance.	2025-06-10 12:40:33 +03:00
rmatif	e16ef08884	OpenCL: Add concat, tsembd, upscale, tanh, pad and repeat (llama/13840) * add concat, pad, repeat, tsembd, tanh, upscale * small fixes	2025-06-10 12:40:33 +03:00
Georgi Gerganov	c72d3ce935	metal : use F32 accumulators in FA kernels (llama/13975) ggml-ci	2025-06-10 12:40:33 +03:00
shalinib-ibm	126aeb4a49	cmake : Handle mixed-case 'Power' strings in POWER CPU detection (llama/13966) Some systems report the CPU implementation as "Power11" instead of "POWER11". The existing CMake logic uses a case-sensitive regular expression to extract the CPU generation, which fails when the casing doesn't exactly match "POWER". This patch provides a fix by first converting the string to uppercase before applying the regex. Signed-off-by: root <root@rheldb2v.pperf.tadn.ibm.com> Co-authored-by: root <root@rheldb2v.pperf.tadn.ibm.com>	2025-06-10 12:40:33 +03:00
Atharva Dubey	ef2a79d2b8	sycl: quantize and reorder the input to q8_1 when reorder is enabled (llama/13826) * [WIP]: fuse q8 quantization and reorder * wip2: fuse q8 quantization and reorder * working q8 reorder commit * restored common.hpp * remove debug prints * remove unnecessary headers and remove trailing whitespace * Update ggml/src/ggml-sycl/ggml-sycl.cpp Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@intel.com> --------- Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@intel.com>	2025-06-10 12:40:33 +03:00
Johannes Gäßler	9589645e72	gguf: fix failure on version == 0 (llama/13956)	2025-06-10 12:40:33 +03:00
Aaron Teo	20f913d119	ggml: check if non-native endian model is being loaded (llama/13943) * gguf: prevent non-native endian models from being loaded Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * gguf: update error message Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * gguf: make the non-native endian check more verbose Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: move ggml_assert location Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: reword the endianness check error message Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-10 12:40:33 +03:00
Kai Pastor	b933d17c30	Add in-build ggml::ggml ALIAS library (ggml/1260) Enable uniform linking with subproject and with find_package.	2025-06-10 12:40:33 +03:00
KITAITI Makoto	fbead67549	ruby : output format (#3237 ) * Fix a typo * Don't allocate output string unless needed * Add methods to output SRT and WebVTT * Add tests for output methods * Make constants for output private * Add signatures for output methods * Add document on output methods * Fix method name: Segment#speaker_next_turn? -> #speacker_turn_next? * Add Whisper::Segment#descotruct_keys * Add test for Whisper::Context#descotruct_keys * Add signature of Whisper::Segment#deconstruct_keys * Use parentheses to suppress warning * Update date	2025-06-10 06:10:17 +02:00
藍+85CD	d78f081423	ci : build and publish main-intel image (#3231 )	2025-06-09 06:42:53 +02:00
藍+85CD	b175baa665	docker : add main-intel dockerfile (#3229 )	2025-06-06 05:30:02 +02:00
KITAITI Makoto	799eacdde4	ruby : Add parallel transcription support (#3222 ) * Fix indentation of code sample in document comment * Make Whisper::Context#transcribe able to run non-parallel * Add test for Whisper::Context#transcribe with parallel option * Follow signature API change of Context#transcribe * Remove useless variable assignment * Move simple usage up in README * Add need help section in README * Add document on Context#transcribe's parallel option in README * Update date * Fix signature of Context.new * Make Context#subscribe accept n_processors option * Make test follow #transcribe's change * Make RBS follow #transcribe's change * Add document for #transcribe's n_processors option * Rename test directory so that Rake tasks' default setting is used	2025-06-04 14:50:18 +09:00
Daniel Bevenius	82f461eaa4	ci : add mirror for ports.ubuntu.com (ARM packages) (#3221 ) This commit updates the build workflow to replace `ports.ubuntu.com` with `mirror.kumi.systems` in the apt sources list for ARM64 builds. The motivation for this change is intended to improve package download reliability and speed by using a more stable mirror for ARM64 packages.	2025-06-03 07:56:58 +02:00
Joas Dev	269dad68a2	bindings.java : apply whisperParams in fullTranscribeWithTime instead of ignoring them (#3201 ) This pull request fixes a bug in the fullTranscribeWithTime method, where the whisperParams argument was declared but never used. As a result, the model did not apply the configuration defined in whisperParams.	2025-06-03 06:15:21 +02:00
R0CKSTAR	121d27a495	musa: correct MUSA SDK rc4.0.1 download URL (#3217 ) * musa: correct MUSA SDK rc4.0.1 download URL Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Fix typo Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-06-03 06:02:12 +02:00
Daniel Bevenius	e05af2457b	ci : use mirrors.kernel.org for Ubuntu packages (#3220 ) This commit updates the ubuntu jobs to use mirrors sites instead of archive.ubuntu.com. The motivation of this is an attempt to make the CI build more stable and avoid errors like: https://github.com/ggml-org/whisper.cpp/actions/runs/15384056535/job/43291948394?pr=3217	2025-06-02 16:46:40 +02:00
Daniel Bevenius	b505539670	node : add language detection support (#3190 ) This commit add support for language detection in the Whisper Node.js addon example. It also updates the node addon to return an object instead of an array as the results. The motivation for this change is to enable the inclusion of the detected language in the result, in addition to the transcription segments. For example, when using the `detect_language` option, the result will now be: ```console { language: 'en' } ``` And if the `language` option is set to "auto", it will also return: ```console { language: 'en', transcription: [ [ '00:00:00.000', '00:00:07.600', ' And so my fellow Americans, ask not what your country can do for you,' ], [ '00:00:07.600', '00:00:10.600', ' ask what you can do for your country.' ] ] } ```	2025-06-02 14:58:05 +02:00
Georgi Gerganov	7fd6fa8097	talk-llama : sync llama.cpp ggml-ci	2025-06-01 15:14:44 +03:00
Georgi Gerganov	3f46282cbe	sync : ggml ggml-ci	2025-06-01 15:14:44 +03:00
Max Krasnyansky	1e16340f4b	threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling (llama/12995) * threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling We talked about adding LOW priority for GGML threads in the original threadpool PR. It might be useful for some cases to avoid contention. Latest Windows ARM64 releases started parking (offlining) the CPU cores more aggresively which results in suboptimal performance with n_threads > 4. To deal with that we now disable Power Throttling for our threads for the NORMAL and higher priorities. Co-authored-by: Diego Devesa <slarengh@gmail.com> * threading: disable SetThreadInfo() calls for older Windows versions * Update tools/llama-bench/llama-bench.cpp Co-authored-by: Diego Devesa <slarengh@gmail.com> --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-06-01 15:14:44 +03:00
Shawn yang	4a50254998	CUDA: add a prop in ggml_cuda_device_infor for distinguish iGPU or dGPU in cuda (#13856 ) (llama/13895) * 1. add "integrated" in ggml_cuda_device_info for distinguish whether it is Intergrate_gpu or discrete_gpu 2. Adjust the func:"ggml_backend_cuda_device_supports_buft" for this new feature * Update ggml/src/ggml-cuda/ggml-cuda.cu Adjusted code indentation Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Update ggml/src/ggml-cuda/ggml-cuda.cu Fixed incorrect setting of variable types Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Update ggml/src/ggml-cuda/ggml-cuda.cu Adjusted the judgment logic Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * add a host_buft assert in case of integrated_cuda_device with func:'evaluate_and_capture_cuda_graph()' * Update ggml/src/ggml-cuda/ggml-cuda.cu Add a defensive security assert Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Update ggml/src/ggml-cuda/ggml-cuda.cu Adjusted the support judgment logic. Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * revoke the suggest commit changes due to it's not applicable in jetson_device * Update ggml/src/ggml-cuda/ggml-cuda.cu Add parentheses to enforce operator precedence Co-authored-by: Diego Devesa <slarengh@gmail.com> * Update ggml/src/ggml-cuda/ggml-cuda.cu Fix ci bug: add a spaces Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: yangxiao <yang_xl@tju.edu.cn> Co-authored-by: Johannes Gäßler <johannesg@5d6.de> Co-authored-by: yangxiao <yangxl_zz@qq.com> Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-06-01 15:14:44 +03:00
Johannes Gäßler	a5aff28198	CUDA: fix typo in FlashAttention code (llama/13926)	2025-06-01 15:14:44 +03:00
Diego Devesa	6c0472ab8f	sched : avoid changing cur_copy when a graph is already allocated (llama/13922)	2025-06-01 15:14:44 +03:00
Diego Devesa	b14cee184a	cuda : prevent using split buffers with 3d/4d matrices (llama/13919)	2025-06-01 15:14:44 +03:00
Akarshan Biswas	f7f92d0aab	SYCL: Add mrope kernel (llama/13755) * SYCL: Add mrope kernel * feat: Optimize rope operations with vectorization Uses `sycl::vec` to load and store two elements at a time, significantly improving performance in `rope_norm`, `rope_neox`, and `rope_multi`. This reduces the number of memory accesses and leverages SIMD instructions for faster execution. * Use ceil_div	2025-06-01 15:14:44 +03:00
Christian Kastner	1893359cfd	cmake: Guard GGML_CPU_ALL_VARIANTS by architecture (llama/13890)	2025-06-01 15:14:44 +03:00
Yibo Cai	ea643c6ae3	arm64: optimize q4_k_q8_k kernel with i8mm (llama/13886) This PR improves q4_k_q8_k gemm kernel with arm64 i8mm instruction. Tested on neoverse-n2 with llama3 8b q4_k_m quantization model. - 34% ~ 50% S_PP uplift for all batch sizes - 12% ~ 37% S_TG uplift for batch size 4 and above Perplexity doesn't change with this PR. ``` // tested on neoverse-n2 $ llama-batched-bench \ -m Meta-Llama-3-8B-Instruct-Q4_K_M.gguf \ --no-mmap -fa \ -c 8192 -b 4096 -ub 512 -npp 128 -ntg 128 \ -npl 1,2,4,8,16,32 \ -t 64 --------------------------------------------------------------------- \| PP \| TG \| B \| S_PP t/s \| S_TG t/s \| \| \| \| \| original \| this pr \| original \| this pr \| \|-------\|--------\|------\|----------\|----------\|----------\|----------\| \| 128 \| 128 \| 1 \| 110.12 \| 147.83 \| 24.36 \| 24.28 \| \| 128 \| 128 \| 2 \| 121.16 \| 172.42 \| 46.36 \| 47.93 \| \| 128 \| 128 \| 4 \| 120.15 \| 169.75 \| 74.68 \| 84.00 \| \| 128 \| 128 \| 8 \| 130.97 \| 196.81 \| 91.04 \| 114.74 \| \| 128 \| 128 \| 16 \| 131.01 \| 196.88 \| 101.43 \| 135.79 \| \| 128 \| 128 \| 32 \| 130.85 \| 196.51 \| 106.97 \| 147.29 \| --------------------------------------------------------------------- ```	2025-06-01 15:14:44 +03:00
Christian Kastner	1d7b3c79f4	cmake: Factor out CPU architecture detection (llama/13883) * cmake: Define function for querying architecture The tests and results match exactly those of src/CMakeLists.txt * Switch arch detection over to new function	2025-06-01 15:14:44 +03:00
Vineel Abhinav	ccfaac2bb0	ggml: aarch64: Implement SVE F32 kernels for Mamba Sequential Scan Algorithm (llama/13882) * F32-Mamba-Seq_Scan-SVE * Fix formatting * ggml : missing space --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-06-01 15:14:44 +03:00
Vineel Abhinav	1230d37bca	ggml: aarch64: Implement SVE F32 kernels for vector functions (llama/13843) * F32-Mamba-SVE * F32-Mamba-SVE * Resolve test errors-1 * Resolve test errors-2 * F32-vec-SVE * F32-vec-SVE * F32-vec-SVE	2025-06-01 15:14:44 +03:00
Johannes Gäßler	9a500394ad	CUDA: fix FA tg at long context for CC >= 8.9 (llama/13852)	2025-06-01 15:14:44 +03:00
leo-pony	0035b8527c	CANN: Add SOC TYPE printing in cmake configuration (llama/13837)	2025-06-01 15:14:44 +03:00
lhez	3623186312	opencl: add new ops - `argsort`, `div`, `sub`, `addrows`, `sigmoid`, `group_norm` (llama/13787) * opencl: add `argsort` * opencl: add `div` * opencl: add `add_rows` * opencl: add `sub` * opencl: add `sigmoid`, both `f16` and `f32` * opencl: add `group_norm`	2025-06-01 15:14:44 +03:00
lhez	67beac47f3	opencl: mark `mul_mat` `f32f32` as supporting non-contiguous tensors (llama/13790)	2025-06-01 15:14:44 +03:00
Jeff Bolz	47a19bae25	vulkan: use timestamp queries for GGML_VULKAN_PERF (llama/13817) Also change it to be controlled by an env var rather than cmake flag	2025-06-01 15:14:44 +03:00
Akarshan Biswas	3d5c7ca4bc	SYCL: add gelu_erf kernel (llama/13749) * SYCL: add gelu_erf kernel * refactor code Co-authored-by: Atharva Dubey <atharva.dubey@codeplay.com> * Use scope_op_debug_print --------- Co-authored-by: Atharva Dubey <atharva.dubey@codeplay.com>	2025-06-01 15:14:44 +03:00
Xuan-Son Nguyen	4dfb2c2215	ggml : add ggml_repeat_4d (llama/13824)	2025-06-01 15:14:44 +03:00
Kai Pastor	ad433403ce	vulkan : Remove unexpected ; (ggml/1253)	2025-06-01 15:14:44 +03:00
Kai Pastor	4064dd6484	cmake : Fix broken CMake error messages (ggml/1252)	2025-06-01 15:14:44 +03:00
Radoslav Gerganov	fd75c4995b	ggml : remove ggml_graph_import and ggml_graph_export declarations (ggml/1247) The implementation is already deleted with commit 9d0762e. closes: #1235	2025-06-01 15:14:44 +03:00
KITAITI Makoto	0251445005	ruby : add Core ML support (#3214 ) * Prevent overflow * Fix memsize of Whisper::Context * Rename xxx_initialize to more Ruby-esque name: xxx_s_new * Define Whisper::Model::ZipURI * Define Whisper::Model.coreml_compiled_models * Make Options' @cmake_options Hash * Use --{enable,disable}-whisper-coreml option for -I/opt/homebrew/opt/llvm/include * Prepare Core ML model if enabled * Add test for ZipURI * Add signatures for ZipURI * Add Whisper.system_info_str * Add test for Whisper.system_info_str * Add signagure for Model.coreml_compiled_models * Add signature for Whisper.system_info_str * Add test for Core ML * Update date * Maintain .gitignore	2025-06-01 18:16:02 +09:00
Daniel Bevenius	98dfe8dc26	vad : revisit timestamp alignment/mapping (#3173 ) * vad : revisit timestamp alignment/mapping This commit improving the timestamp alignment by introducing a mapping table, adding intermediate reference points for longer segments, and binary search for lookups. The motivation for this changes is to address issues with the currently solution where zero-length segments are possible, and also to improve the precision of the VAD timestamps. Refs: https://github.com/ggml-org/whisper.cpp/issues/3162 * vad : use uint64_t for time mapping This commit changes the type of the `processed_time` and `original_time` fields in the `vad_time_mapping` struct from `double` to `uint64_t`. The motivation for this change is made to improve precision and avoid floating-point inaccuracies and also be consistent with other part of the code base that use `uint64_t` for time representation. This is a part of a refactoring where I'm also going to change the vad_segment_info struct to use `uint64_t` for the start and end times. This is the reason for the not so pleasant conversion and casts in the code at the moment. * vad : change vad_segment_info and whisper_vad_segment to use uint64_t * vad : use int64_t instead of uint64_t for timestamps To be consistent with other timestamps in the codebase. * vad : add centisecond conversion functions * vad : extract vad processing from whisper_full_with_state This commit extracts the VAD processing from the `whisper_full_with_state` function into the `whisper_full` and `whisper_full_parallel` functions. The motivation for this is that I did not take into account that when `whisper_full_parallel` is called with `n_processors > 1`, then the vad processing would not be applied correctly. Instead the VAD processing should be done prior to processing in the case of `whisper_full_parallel`. * vad : remove filtered_n_samples from whisper_vad The commit removes the parameter `filtered_n_samples` from the `whisper_vad` function signature and its usage, as it is no longer needed since filtered samples is now a vector (previously it was a float) The motivation for this is to simplify the usage of this function. vad : remove vad_mapping_table_initialized flag * vad : fix leaning (none) of pointer/references	2025-05-30 06:28:46 +02:00
KITAITI Makoto	e5e900dd00	ruby : handle build options on installation (#3206 ) * Don't pass empty string to cmake command * Refactor Dependencies * Use found cmake path for options * Maintain extsources.rb * List dependent files by directory separator agnostic way * Prepend whitespace before '=' * Handle build options on install * Remove useless test * Retrieve gem file name and version from spec file * Bump version to 1.3.3 * Update date * Add install option examples * [skip ci]Remove unused module	2025-05-30 01:32:49 +09:00

1 2 3 4 5 ...

2725 Commits