whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-06-18 15:47:08 +02:00

Author	SHA1	Message	Date
Christian Kastner	1d7b3c79f4	cmake: Factor out CPU architecture detection (llama/13883) * cmake: Define function for querying architecture The tests and results match exactly those of src/CMakeLists.txt * Switch arch detection over to new function	2025-06-01 15:14:44 +03:00
Vineel Abhinav	ccfaac2bb0	ggml: aarch64: Implement SVE F32 kernels for Mamba Sequential Scan Algorithm (llama/13882) * F32-Mamba-Seq_Scan-SVE * Fix formatting * ggml : missing space --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-06-01 15:14:44 +03:00
Vineel Abhinav	1230d37bca	ggml: aarch64: Implement SVE F32 kernels for vector functions (llama/13843) * F32-Mamba-SVE * F32-Mamba-SVE * Resolve test errors-1 * Resolve test errors-2 * F32-vec-SVE * F32-vec-SVE * F32-vec-SVE	2025-06-01 15:14:44 +03:00
Johannes Gäßler	9a500394ad	CUDA: fix FA tg at long context for CC >= 8.9 (llama/13852)	2025-06-01 15:14:44 +03:00
leo-pony	0035b8527c	CANN: Add SOC TYPE printing in cmake configuration (llama/13837)	2025-06-01 15:14:44 +03:00
lhez	3623186312	opencl: add new ops - `argsort`, `div`, `sub`, `addrows`, `sigmoid`, `group_norm` (llama/13787) * opencl: add `argsort` * opencl: add `div` * opencl: add `add_rows` * opencl: add `sub` * opencl: add `sigmoid`, both `f16` and `f32` * opencl: add `group_norm`	2025-06-01 15:14:44 +03:00
lhez	67beac47f3	opencl: mark `mul_mat` `f32f32` as supporting non-contiguous tensors (llama/13790)	2025-06-01 15:14:44 +03:00
Jeff Bolz	47a19bae25	vulkan: use timestamp queries for GGML_VULKAN_PERF (llama/13817) Also change it to be controlled by an env var rather than cmake flag	2025-06-01 15:14:44 +03:00
Akarshan Biswas	3d5c7ca4bc	SYCL: add gelu_erf kernel (llama/13749) * SYCL: add gelu_erf kernel * refactor code Co-authored-by: Atharva Dubey <atharva.dubey@codeplay.com> * Use scope_op_debug_print --------- Co-authored-by: Atharva Dubey <atharva.dubey@codeplay.com>	2025-06-01 15:14:44 +03:00
Xuan-Son Nguyen	4dfb2c2215	ggml : add ggml_repeat_4d (llama/13824)	2025-06-01 15:14:44 +03:00
Kai Pastor	ad433403ce	vulkan : Remove unexpected ; (ggml/1253)	2025-06-01 15:14:44 +03:00
Kai Pastor	4064dd6484	cmake : Fix broken CMake error messages (ggml/1252)	2025-06-01 15:14:44 +03:00
Radoslav Gerganov	fd75c4995b	ggml : remove ggml_graph_import and ggml_graph_export declarations (ggml/1247) The implementation is already deleted with commit 9d0762e. closes: #1235	2025-06-01 15:14:44 +03:00
KITAITI Makoto	0251445005	ruby : add Core ML support (#3214 ) * Prevent overflow * Fix memsize of Whisper::Context * Rename xxx_initialize to more Ruby-esque name: xxx_s_new * Define Whisper::Model::ZipURI * Define Whisper::Model.coreml_compiled_models * Make Options' @cmake_options Hash * Use --{enable,disable}-whisper-coreml option for -I/opt/homebrew/opt/llvm/include * Prepare Core ML model if enabled * Add test for ZipURI * Add signatures for ZipURI * Add Whisper.system_info_str * Add test for Whisper.system_info_str * Add signagure for Model.coreml_compiled_models * Add signature for Whisper.system_info_str * Add test for Core ML * Update date * Maintain .gitignore	2025-06-01 18:16:02 +09:00
Daniel Bevenius	98dfe8dc26	vad : revisit timestamp alignment/mapping (#3173 ) * vad : revisit timestamp alignment/mapping This commit improving the timestamp alignment by introducing a mapping table, adding intermediate reference points for longer segments, and binary search for lookups. The motivation for this changes is to address issues with the currently solution where zero-length segments are possible, and also to improve the precision of the VAD timestamps. Refs: https://github.com/ggml-org/whisper.cpp/issues/3162 * vad : use uint64_t for time mapping This commit changes the type of the `processed_time` and `original_time` fields in the `vad_time_mapping` struct from `double` to `uint64_t`. The motivation for this change is made to improve precision and avoid floating-point inaccuracies and also be consistent with other part of the code base that use `uint64_t` for time representation. This is a part of a refactoring where I'm also going to change the vad_segment_info struct to use `uint64_t` for the start and end times. This is the reason for the not so pleasant conversion and casts in the code at the moment. * vad : change vad_segment_info and whisper_vad_segment to use uint64_t * vad : use int64_t instead of uint64_t for timestamps To be consistent with other timestamps in the codebase. * vad : add centisecond conversion functions * vad : extract vad processing from whisper_full_with_state This commit extracts the VAD processing from the `whisper_full_with_state` function into the `whisper_full` and `whisper_full_parallel` functions. The motivation for this is that I did not take into account that when `whisper_full_parallel` is called with `n_processors > 1`, then the vad processing would not be applied correctly. Instead the VAD processing should be done prior to processing in the case of `whisper_full_parallel`. * vad : remove filtered_n_samples from whisper_vad The commit removes the parameter `filtered_n_samples` from the `whisper_vad` function signature and its usage, as it is no longer needed since filtered samples is now a vector (previously it was a float) The motivation for this is to simplify the usage of this function. vad : remove vad_mapping_table_initialized flag * vad : fix leaning (none) of pointer/references	2025-05-30 06:28:46 +02:00
KITAITI Makoto	e5e900dd00	ruby : handle build options on installation (#3206 ) * Don't pass empty string to cmake command * Refactor Dependencies * Use found cmake path for options * Maintain extsources.rb * List dependent files by directory separator agnostic way * Prepend whitespace before '=' * Handle build options on install * Remove useless test * Retrieve gem file name and version from spec file * Bump version to 1.3.3 * Update date * Add install option examples * [skip ci]Remove unused module	2025-05-30 01:32:49 +09:00
Daniel Tang	4d18e52f55	ggml : Fix backtrace breaking Windows build (#3203 )	2025-05-29 13:26:58 +03:00
Georgi Gerganov	ca890f566f	sync : ggml ggml-ci	2025-05-29 09:56:26 +03:00
Radoslav Gerganov	48dddbbac1	ggml : install dynamic backends (ggml/1240)	2025-05-29 09:56:26 +03:00
Daniel Tang	5ea2c37a4c	ggml : Print backtrace on uncaught C++ exceptions (ggml/1232) The goal is to have what users call "full logs" contain the backtrace. This is registered upon ggml_init. Also fixes a minor fd leak on Linux.	2025-05-29 09:56:26 +03:00
Daniel Bevenius	73a8c5fb94	whisper : remove whisper_load_backends function (#3196 ) * whisper : remove whisper_load_backends function This commit removes the `whisper_load_backends` function, which was used to load all GGML backends. The motivation for this change push the responsibility of loading backends to user applications to give them more control over which backends to load and when. See the references below for more context. Resolves: https://github.com/ggml-org/whisper.cpp/issues/3182 Refs: https://github.com/ggml-org/whisper.cpp/pull/3042#issuecomment-2801778733 Refs: https://github.com/ggml-org/whisper.cpp/pull/3042#issuecomment-2801928990 * ruby : add check for rwc is NULL This commit adds a check to ensure that the `rwc` pointer is not NULL before attempting to mark its members in the garbage collector. The motivation for this is an attempt to see if this fixed the CI build as I'm not able to reproduce the issue locally. Refs: https://github.com/ggml-org/whisper.cpp/actions/runs/15299612277/job/43036694928?pr=3196	2025-05-29 08:03:17 +02:00
KITAITI Makoto	1f5fdbecb4	ruby : add VAD support, migration to Ruby's newer API (#3197 ) * Add VAD models * Extract function to normalize model path from ruby_whisper_initialize() * Define ruby_whisper_vad_params struct * Add VAD-related features to Whisper::Params * Add tests for VAD-related features * Define Whisper::VADParams * Add Whisper::VAD::Params attributes * Add test suite for VAD::Params * Make older test to follow namespace change * Add test for transcription with VAD * Add assertion for test_vad_params * Add signatures for VAD-related methods * Define VAD::Params#== * Add test for VAD::Params#== * Fix Params#vad_params * Add test for Params#vad_params * Fix signature of Params#vad_params * Use macro to define VAD::Params params * Define VAD::Params#initialize * Add tests for VAD::Params#initialize * Add signature for VAD::Params.new * Add documentation on VAD in README * Wrap register_callbask in prepare_transcription for clear meanings * Set whisper_params.vad_params just before transcription * Don't touch NULL * Define ruby_whisper_params_type * Use TypedData_XXX for ruby_whisper_params instead of Data_XXX * Remove unused functions * Define rb_whisper_model_data_type * Use TypedData_XXX for ruby_whisper_model instead of Data_XXX * Define ruby_whisper_segment_type * Use TypedData_XXX for ruby_whisper_segment instead of Data_XXX * Define ruby_whisper_type * Use TypedData_XXX for ruby_whisper instead of Data_XXX * Qualify with const	2025-05-28 20:05:12 +09:00
Simon Booth	5720426d97	whisper : install shared libs when using GGML_BACKEND_DL (#3195 )	2025-05-28 10:15:04 +02:00
Fujimoto Seiji	b9d27b1358	tests : add a new benchmark test for long-form audio (#3185 ) * tests : add a new benchmark test for long-form audio Based on "Earnings-21" corpus by Del Rio et al. Earnings-21: A Practical Benchmark for ASR in the Wild (2021) https://arxiv.org/abs/2104.11348 This dataset contains 39 hours of long-form speech, sourced from public earning calls. Each recording contains roughly 50 minutes of English dialogues between multiple speakers (2-20 persons). This benchmark suite should allow us to evaluate the performance of whisper.cpp on long-form audio data. Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net> * tests : apply PR feedback to 'earnings21/README.md' Based on feedback from Daniel Bevenius. - Simplify how to download & prepare a Silero VAD model. - Fix typo: inferece -> inference Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net> * tests : avoid crashing on non-UTF-8 characters Based on feedback from Daniel Bevenius. Add 'errors' parameter to open() in order to avoid unhandled exception on invalid UTF-8 bytes. Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net> * tests : try to interpret the hypothesis as Windows-1252 Based on the discussion in PR#3185. Evidently Whisper.cpp can represent a quotation mark as '0x93', which implifies Windows-1252 (Microsoft's ASCII excention), and cannot be decoded by UTF-8. Add an explicit decoding loop to address the issue. Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net> --------- Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net>	2025-05-28 07:08:44 +02:00
Daniel Bevenius	0ed00d9d30	ci : update windows-blas uploads action (#3192 ) This commit modifies windows-blas which was updated previously to use the zip functionality provided by `actions/upload-artifact`. This turned out to be incorrect and I should not have done that. The reason for zipping the archives first is that otherwise the artifacts when downloaded will be unzipped and just be simple directories. In our case the release task depends on the artifacts having a .zip extension so that those archives are include in the release.	2025-05-27 18:01:31 +02:00
Georgi Gerganov	527fe6aaeb	sync : fix builds - musa, ruby	2025-05-27 18:03:00 +03:00
Georgi Gerganov	26eb48cb08	talk-llama : sync llama.cpp ggml-ci	2025-05-27 18:03:00 +03:00
Georgi Gerganov	546928c33f	sync : ggml ggml-ci	2025-05-27 18:03:00 +03:00
xctan	15ae9dc2a4	ggml : riscv: add xtheadvector support (llama/13720) * ggml : riscv: add xtheadvector support * ggml : clean up some macro usage	2025-05-27 18:03:00 +03:00
Christian Kastner	2e7a1e3e43	ggml-cpu: x86 feature detection is specific to x86 (llama/13811)	2025-05-27 18:03:00 +03:00
Diego Devesa	b75babebb2	ggml : allow CUDA graphs when using pipeline parallelism (llama/13814)	2025-05-27 18:03:00 +03:00
Georgi Gerganov	cc7a0105ef	cuda : avoid cuGetErrorString (llama/13791) ggml-ci	2025-05-27 18:03:00 +03:00
Akarshan Biswas	195fde8804	SYCL: Add non contiguous support in RMS_NORM and NORM kernels (llama/13611) * SYCL: Add non contiguous input support to norm kernel * refactor and add RMS_NORM non contiguous input support ggml-ci * restore subgroup reduction for multi-subgroup thread blocks in norm kernels * Swap grid dims of nsamples and nrows ggml-ci * Revert "Swap grid dims of nsamples and nrows" This reverts commit 43be2d657fec7f7fba54e2cd154106bc0fc45adf. * restore not required changes ggml-ci * address review comments: change it to more like SYCL * Use a common function to calculate offset * remove wrap around logic for handling broadcasts * remove static from calculate_offset fn and use ceil_div	2025-05-27 18:03:00 +03:00
Romain Biessy	25e27904ca	sycl: Add more debug prints (llama/13640)	2025-05-27 18:03:00 +03:00
Jeff Bolz	474f7be8b6	vulkan: mark IM2COL as supporting non-contig (llama/13783)	2025-05-27 18:03:00 +03:00
Bizhao Shi	e35fecc2a1	CANN: Add the basic supports of Flash Attention kernel (llama/13627) * cann: add the basic FA support * cann: update the readme * cann: update the FlashAttention with PSEShift * cann: update the input parameters in FA * cann: update the alibi with max_bias * cann: add the constrints of softcap * cann: update the docs CANN.md * cann: update the docs CANN.md * cann: fix typo of CANN.md * cann: add some comments and update the CANN.md * cann: update the CANN.md * cann: update the inner precise for fusedInferAttention * cann: update the constraints of flash_attn_ext on ggml-cann.cpp * cann: clean the whitespace * cann: clean the whitespace * cann: add a new endline	2025-05-27 18:03:00 +03:00
Akarshan Biswas	1cd7028428	SYCL: revert "sycl: simplify bin_bcast_kernel (ggml/13383)" (llama/13752) Temporarily reverted due to failing fp16 DIV operation This reverts commit 02cdd2d8b092b5a4bb18e013c6887ce49ba20ac5. ggml-ci	2025-05-27 18:03:00 +03:00
Diego Devesa	99596d6031	ggml-cpu : set openmp wait time if not set (llama/13758)	2025-05-27 18:03:00 +03:00
Xuan-Son Nguyen	2d6c6862f7	ggml : add ggml_gelu_erf() CUDA kernel (llama/13719) * ggml : add ggml_gelu_erf() CUDA kernel * missing semicolon	2025-05-27 18:03:00 +03:00
Johannes Gäßler	f1576b2659	CUDA: fix race condition in FA vector kernels (llama/13742)	2025-05-27 18:03:00 +03:00
Chenguang Li	994b4f86ab	CANN: Support MUL_MAT_ID for q8_0 and q4_0 (llama/13705) * [CANN]Support MUL_MAT_ID Q8 && Q4 Signed-off-by: noemotiovon <757486878@qq.com> * codestyle adjustment Signed-off-by: noemotiovon <757486878@qq.com> --------- Signed-off-by: noemotiovon <757486878@qq.com>	2025-05-27 18:03:00 +03:00
Xuan-Son Nguyen	3e7eaccf55	ggml : fix the order of ggml_unary_op (llama/13718)	2025-05-27 18:03:00 +03:00
Jeff Bolz	191f040414	vulkan: support CPY from any type to itself (llama/13695) Reuse the f16/f32 copy shaders, and just scale the number of elements according to the type size.	2025-05-27 18:03:00 +03:00
Jeff Bolz	2d49d4a9b5	vulkan: Disable coopmat/coopmat2/bfloat extensions if glslc doesn't support it (llama/13696)	2025-05-27 18:03:00 +03:00
Judd	000d65befb	use LOG_WARN to replace `std::cerr` (llama/13657)	2025-05-27 18:03:00 +03:00
Nicolò Scipione	f0803e6646	sycl : Remove waits from function calls (llama/13702) * removes the waits in async memcpy functions	2025-05-27 18:03:00 +03:00
Ewan Crawford	730a00be8a	SYCL: Avoid using with SYCL-Graph for unsupported nodes (llama/13587) Currently on a CUDA backend to SYCL when running `GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0` there are two operations that throw an exception from the blocking waits during queue recording. * `-o CONCAT` : Use of blocking waits on a queue that's being recorded https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/concat.cpp#L185-L187 * `-o MUL_MAT_ID`: Blocking wait on a recording queue for a copy to host memory https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/ggml-sycl.cpp#L3072-L3074 We've noticed that `ggml-cuda.cu` has the [check_node_graph_compatibility_and_refresh_copy_ops](`39e73ae0d6/ggml/src/ggml-cuda/ggml-cuda.cu (L2458-L2458)`) method for checking if a graph can be used, even if enabled. I've taken a similar approach in this PR by adding a method to `ggml-sycl.cpp` for checking if a graph can be used for the operations even if a user has asked for it to be enabled.	2025-05-27 18:03:00 +03:00
Henry Linjamäki	316600e8ee	opencl: Add support for multiple devices (llama/12622) * opencl: Add support for multiple devices ... but limited to one platform. A platform with a GPU will be preferred. Additionally: * Filter out devices that lack capabilities needed by the backend implementation (half support, OpenCL 2.0+, etc). * Make ggml_backend_opencl_reg() thread-safe. * fixup: fix an error in sync_with_other_backends ... when there is only one OpenCL device available.	2025-05-27 18:03:00 +03:00
Henry Linjamäki	42f2b3bb65	opencl: fix couple crashes (llama/12795) * opencl: fix couple crashes * fix kernel launches failed on devices which do not support non-uniform work-groups. When non-uniform work-groups are not supported, set `local_work_size` to NULL (= let driver choose the work-group sizes). This patch does not cover everything - just the cases tested by test-backend-ops. * fix sub-buffer creation failed due to `cl_buffer_region::origin` not being aligned to `CL_DEVICE_MEM_BASE_ADDR_ALIGN`. * OpenCL: query non-uniform WG sizes only on OpenCL 3.0+	2025-05-27 18:03:00 +03:00
Xuan-Son Nguyen	dd6ef64060	ggml : add ggml_gelu_erf() (llama/13667) * ggml : add ggml_gelu_na (not approximated) * fix naming order * rename na --> erf * apply review suggesions * revert naming order	2025-05-27 18:03:00 +03:00

1 2 3 4 5 ...

2691 Commits