whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-08-02 04:43:35 +02:00

Author	SHA1	Message	Date
Daniel Bevenius	98dfe8dc26	vad : revisit timestamp alignment/mapping (#3173 ) * vad : revisit timestamp alignment/mapping This commit improving the timestamp alignment by introducing a mapping table, adding intermediate reference points for longer segments, and binary search for lookups. The motivation for this changes is to address issues with the currently solution where zero-length segments are possible, and also to improve the precision of the VAD timestamps. Refs: https://github.com/ggml-org/whisper.cpp/issues/3162 * vad : use uint64_t for time mapping This commit changes the type of the `processed_time` and `original_time` fields in the `vad_time_mapping` struct from `double` to `uint64_t`. The motivation for this change is made to improve precision and avoid floating-point inaccuracies and also be consistent with other part of the code base that use `uint64_t` for time representation. This is a part of a refactoring where I'm also going to change the vad_segment_info struct to use `uint64_t` for the start and end times. This is the reason for the not so pleasant conversion and casts in the code at the moment. * vad : change vad_segment_info and whisper_vad_segment to use uint64_t * vad : use int64_t instead of uint64_t for timestamps To be consistent with other timestamps in the codebase. * vad : add centisecond conversion functions * vad : extract vad processing from whisper_full_with_state This commit extracts the VAD processing from the `whisper_full_with_state` function into the `whisper_full` and `whisper_full_parallel` functions. The motivation for this is that I did not take into account that when `whisper_full_parallel` is called with `n_processors > 1`, then the vad processing would not be applied correctly. Instead the VAD processing should be done prior to processing in the case of `whisper_full_parallel`. * vad : remove filtered_n_samples from whisper_vad The commit removes the parameter `filtered_n_samples` from the `whisper_vad` function signature and its usage, as it is no longer needed since filtered samples is now a vector (previously it was a float) The motivation for this is to simplify the usage of this function. vad : remove vad_mapping_table_initialized flag * vad : fix leaning (none) of pointer/references	2025-05-30 06:28:46 +02:00
Daniel Bevenius	73a8c5fb94	whisper : remove whisper_load_backends function (#3196 ) * whisper : remove whisper_load_backends function This commit removes the `whisper_load_backends` function, which was used to load all GGML backends. The motivation for this change push the responsibility of loading backends to user applications to give them more control over which backends to load and when. See the references below for more context. Resolves: https://github.com/ggml-org/whisper.cpp/issues/3182 Refs: https://github.com/ggml-org/whisper.cpp/pull/3042#issuecomment-2801778733 Refs: https://github.com/ggml-org/whisper.cpp/pull/3042#issuecomment-2801928990 * ruby : add check for rwc is NULL This commit adds a check to ensure that the `rwc` pointer is not NULL before attempting to mark its members in the garbage collector. The motivation for this is an attempt to see if this fixed the CI build as I'm not able to reproduce the issue locally. Refs: https://github.com/ggml-org/whisper.cpp/actions/runs/15299612277/job/43036694928?pr=3196	2025-05-29 08:03:17 +02:00
Daniel Bevenius	bd1cb0c8e3	whisper : remove redundant assignments (#3178 ) This commit removes some redundant assignments in the function `whisper_exp_compute_token_level_timestamps`. The motivations for this is that tokens[j] and token are references to the same object and this can be a little confusing when reading the code.	2025-05-21 13:23:20 +02:00
Daniel Bevenius	d1f114da61	vad : return early if no vad segments are detected (#3158 ) This commit adds a check to `whisper_full_with_state` and if no VAD segments are detected, the function will return early. The motivation for this is that if no VAD segments are detected, the function will not have any samples to process which can happen if an audio sample does not contain any speech. I did not test this previously and only discovered this when updating the stream example.	2025-05-16 08:50:53 +02:00
Daniel Bevenius	bae5d074c7	vad : store VAD context in whisper_state (#3156 ) * vad : store VAD context in whisper_state This commit stores the VAD context in the whisper_state structure, allowing for better management and reuse of the VAD context across multiple calls to the whisper_vad function. The motivation for this change is that when updating the stream example I noticed that the VAD context was being re-initialized every time the whisper_vad function was called. This involved loading the VAD model which is expensive and unnecessary if the context can be reused. Storing this in the whisper_state seems follow the pattern simliar to how whisper_coreml_context and whisper_openvion_context are stored. * vad : free vad_context in whisper_free_state	2025-05-16 07:53:26 +02:00
Georgi Gerganov	a14c89aefa	whisper : update to ggml-backend changes (#0 ) ggml-ci	2025-05-13 13:59:21 +03:00
Daniel Bevenius	e41bc5c61a	vad : add initial Voice Activity Detection (VAD) support (#3065 ) * vad : add initial Voice Activity Detection (VAD) support This commit add support for Voice Activity Detection (VAD). When enabled this feature will process the audio input and detect speech segments. This information is then used to reduce the number of samples that need to be processed by whisper_full. Resolves: https://github.com/ggml-org/whisper.cpp/issues/3003 --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-05-12 16:10:11 +02:00
Daniel Bevenius	e39ba750cd	whisper : remove dummy commit comment [no ci] (#3143 ) This commit removes a dummy comment that was add by Commit(`589b408` "ci : dummy commit to trigger CI").	2025-05-12 14:40:17 +02:00
Daniel Bevenius	09846f4e12	whisper: remove MSVC warnings pragmas (#3090 ) * ggml : remove MSVC warnings pragmas This commit removes the MSVC-specific pragmas as these are now handled in CMakeLists.txt. * whisper : remove MSVC warning pragmas This commit removes the MSVC-specific pragmas. These are now handled in the CMakeLists.txt file.	2025-05-05 13:09:35 +02:00
Daniel Bevenius	2e30e6df59	whisper : fix grammar advance stack warning (#3087 ) This commit addresses a warnings that is present for Release builds: ```console [ 30%] Building CXX object src/CMakeFiles/whisper.dir/whisper.cpp.o In file included from /usr/include/c++/13/bits/stl_tree.h:63, from /usr/include/c++/13/map:62, from /home/danbev/work/ai/whisper.cpp/src/whisper-arch.h:5, from /home/danbev/work/ai/whisper.cpp/src/whisper.cpp:2: In static member function ‘static void std::__copy_move<false, false, std::random_access_iterator_tag>::__assign_one(_Tp, _Up) [with _Tp = const whisper_grammar_element; _Up = const whisper_grammar_element const]’, inlined from ‘static _Up* std::__copy_move<_IsMove, true, std::random_access_iterator_tag>::__copy_m(_Tp, _Tp, _Up) [with _Tp = const whisper_grammar_element const; _Up = const whisper_grammar_element; bool _IsMove = false]’ at /usr/include/c++/13/bits/stl_algobase.h:440:20, inlined from ‘_OI std::__copy_move_a2(_II, _II, _OI) [with bool _IsMove = false; _II = const whisper_grammar_element const; _OI = const whisper_grammar_element]’ at /usr/include/c++/13/bits/stl_algobase.h:506:30, inlined from ‘_OI std::__copy_move_a1(_II, _II, _OI) [with bool _IsMove = false; _II = const whisper_grammar_element const; _OI = const whisper_grammar_element*]’ at /usr/include/c++/13/bits/stl_algobase.h:533:42, ... ``` This warning is caused by the fact that the `stack` vector is empty when it is passed to `new_stacks.push_back(stack);`. The suggested fix is to use `new_stacks.emplace_back();` instead of `new_stacks.push_back(stack);`.	2025-04-28 19:11:38 +02:00
Georgi Gerganov	549db9376f	whisper : reduce delta_min from 1000ms to 100ms (#3028 ) ggml-ci	2025-04-11 06:23:02 +02:00
Fujimoto Seiji	e6234cd435	whisper : fix "bench-all outputs an invalid result on larger models" (#3002 ) The benchmark script 'scripts/bench-all.sh' assumes that the 11th field of the output line is a timestamp. This assumption does not hold when the target model takes a bit longer to process. Fix this issue by introducing an explicit whitespace to the output lines of `whisper_print_timings()`. Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net>	2025-04-04 18:36:19 +03:00
Georgi Gerganov	2b6d0d2200	rename : ggerganov -> ggml-org (#3005 )	2025-04-04 16:11:52 +03:00
Daniel Bevenius	11688b262f	coreml: fix Whisper to CoreML conversion by disabling SDPA [no ci] (#2979 ) * coreml: fix Whisper to CoreML conversion by disabling SDPA This commit disables the use of PyTorch's `scaled_dot_product_attention` in the Whisper model to avoid compatibility issues during CoreML conversion. The issue occurs because coremltools requires PyTorch 2.5.0, but the Whisper implementation may expect behavior from newer PyTorch versions. By setting `MultiHeadAttention.use_sdpa = False`, we force Whisper to use its fallback manual attention implementation, which works correctly with PyTorch 2.5.0 during the tracing process. Refs: https://github.com/ggerganov/whisper.cpp/issues/2783 * coreml: fix audio shape in whisper decoder conversion This commit fixes the audio shape in the whisper decoder conversion script. The motivation for this is that the audio shape was incorrect and was causing the conversion to fail. * coreml : set -e in generate-coreml-interface.sh The commit sets the -e flag in the generate-coreml-interface.sh script to make sure the script fails if any command fails. * coreml : update generated encoder/decoder interfaces This commit updates the generated encoder/decoder interfaces for the whisper model which is the result of running the generate-coreml-interface.sh script.	2025-04-01 18:01:23 +02:00
Daniel Bevenius	f92bd59951	whisper : remove unnecessary GGML_UNUSED macro (#2960 )	2025-03-30 05:56:10 +02:00
Dan Johansson	21d890d534	whisper : add support for backends with multiple ggml_backend_buffer_type (#2863 ) * whisper : add support for ggml_backend_buffer_type Signed-off-by: Dan Johansson <dan.johansson@arm.com> * fix compile error when building on Ubuntu Signed-off-by: Dan Johansson <dan.johansson@arm.com> * remove copyright header from include file Signed-off-by: Dan Johansson <dan.johansson@arm.com> --------- Signed-off-by: Dan Johansson <dan.johansson@arm.com>	2025-03-26 16:54:02 +02:00
Daniel Bevenius	cf5ddb8c21	whisper : initialize decoder's rng with unique seed (#2932 ) This change initializes each decoder's random number generator with a unique seed. The motivation for this is that currently all decoders are initialized with the same seed value, 0. The result of this is that for the same state (logits, probs, and logprobs) they will produce the same output.	2025-03-24 09:36:07 +01:00
Daniel Bevenius	be9de81171	whisper : add check for CPU backend initialization (#2918 ) This commit adds a check for the CPU backend initialization in the whisper library. If the initialization fails, an exception is thrown. The motivation for this change is to make the library more robust and handle the case when the CPU backend initialization fails. Resolves: https://github.com/ggerganov/whisper.cpp/issues/2917	2025-03-21 09:53:26 +01:00
Daniel Bevenius	215990abde	whisper : fix compiler warnings in whisper.cpp (#2895 ) This commit fixes compiler warnings in whisper.cpp by changing the type of the loop index variable from int64_t to size_t. Currently the following warnings are generated by the compiler: ```console /whisper.cpp/src/whisper.cpp:209:27: warning: comparison of integers of different signs: 'int64_t' (aka 'long long') and 'size_t' (aka 'unsigned long') [-Wsign-compare] 209 \| for (int64_t i = 0; i < nels; ++i) { \| ~ ^ ~~~~ /whisper.cpp/src/whisper.cpp:219:27: warning: comparison of integers of different signs: 'int64_t' (aka 'long long') and 'size_t' (aka 'unsigned long') [-Wsign-compare] 219 \| for (int64_t i = 0; i < nels; ++i) { \| ~ ^ ~~~~ ```	2025-03-18 13:38:41 +01:00
Daniel Bevenius	740bf7f6a1	whisper : enable compiler warnings for src (#2891 ) * whisper : enable compiler warnings for src This commit enables compiler warnings for the src directory. Currently when the WHISPER_ALL_WARNINGS flag is set to ON is only enables warnings in ggml, by setting GGML_ALL_WARNINGS to ON. This commit adds the same compiler flags for whisper's src directory. The motivation for this is to catch potential bugs and issues early on in the development process. * squash! whisper : enable compiler warnings for src Remove GF_C_FLAGS and GF_CXX_FLAGS from add_compile_options.	2025-03-18 05:19:18 +01:00
Diego Devesa	339a1cba5d	whisper : support GGML_BACKEND_DL (#2843 ) * whisper : support GGML_BACKEND_DL * fix DTW crash * whisper.objc : fix build - add ggml-cpp.h --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-02-27 13:35:07 +01:00
Thomas Fitzsimmons	47e14c0529	whisper : restore big endian support (#2816 ) * whisper : fix BYTESWAP whitespace * whisper : make byteswap useable with C++17 * cmake : define WHISPER_BIG_ENDIAN for big-endian targets * ci : fix (again) arm64 build fails * docker : attempt fixing arm64 build on ci * qemu v7.0.0-28 [imported from https://github.com/ggml-org/llama.cpp /commit/818a340ea8be55b3706e1772527cb8738e90a8c7 (#11895)] --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-02-25 11:38:13 +02:00
Georgi Gerganov	589b40810a	ci : dummy commit to trigger CI	2025-02-03 16:32:48 +02:00
Georgi Gerganov	eb68324c86	whisper : fix gpu device selection (#2728 )	2025-01-13 13:11:37 +02:00
Sandro Hanea	2ab2eb5110	whisper : add whisper_full_get_segment_no_speech_prob_from_state (#2716 )	2025-01-09 16:21:07 +02:00
Sacha Arbonel	4183517076	server : add no-speech threshold parameter and functionality (#2654 )	2024-12-21 17:00:08 +02:00
Georgi Gerganov	f4668169a0	whisper : rename suppress_non_speech_tokens to suppress_nst (#2653 )	2024-12-21 12:54:35 +02:00
Karthick	f897eb7670	whisper : support no_speech_thold (#2625 ) * Implement no_speech_thold no_speech_thold functionality is on par with OpenAI's whisper * Addressed review comments	2024-12-17 19:15:47 +02:00
Karthick	2f2841bfce	whisper : add single-timestamp logic (#2629 ) * Fix hallucinations during silence When the predicted tokens end with a single timestamp the the entire 30 segment should be considered as done, to avoid hallucinations for the remaining part of segment. This behaviour is on par with openai's whisper. Refer to logic related to `single_timestamp_ending` in https://github.com/openai/whisper/blob/main/whisper/transcribe.py * Accept review comments related to formatting. Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-12-17 19:07:08 +02:00
Georgi Gerganov	37c88027e1	whisper : use backend registry (#0 )	2024-11-20 21:00:08 +02:00
Georgi Gerganov	7fd8d9c220	whisper : adapt to new ggml (wip)	2024-11-20 21:00:08 +02:00
Georgi Gerganov	5089ab2d6a	whisper : fix build (#0 )	2024-11-15 15:21:04 +02:00
Jhen-Jie Hong	5f8a086e22	whisper.swiftui : add model download list & bench methods (#2546 ) * swift : fix resources & exclude build * whisper : impl whisper_timings struct & api * whisper.swiftui : model list & bench methods * whisper : return ptr for whisper_get_timings * revert unnecessary change * whisper : avoid designated initializer * whisper.swiftui: code style changes * whisper.swiftui : get device name / os from UIDevice * whisper.swiftui : fix UIDevice usage * whisper.swiftui : add memcpy and ggml_mul_mat (commented)	2024-11-13 21:51:34 +02:00
thewh1teagle	5ccca19f0c	ggml : vulkan logs (#2547 )	2024-11-13 21:47:15 +02:00
Vin Misra	31aea563a8	whisper : fix extra memory usage (#2534 ) * passing samples_padded by ref to the threads. * passing samples_padded by ref to the threads. --------- Co-authored-by: Vinith Misra <physicsdemon@gmail.com>	2024-11-06 23:02:11 +02:00
Georgi Gerganov	0377596b77	whisper : backend registry init before model load	2024-11-01 10:19:05 +02:00
Georgi Gerganov	aa037a60f3	ggml : alloc ggml_contexts on the heap (#2525 ) * whisper : reduce ggml_context usage * ggml : allocate contexts on the heap (v2) * ggml : aligned malloc -> malloc	2024-10-31 22:00:09 +02:00
Georgi Gerganov	3f020fac9d	whisper : minor compile warning	2024-10-29 19:30:26 +02:00
jettoblack	1626b73b03	whisper : move new-segment callback after DTW step (#2515 )	2024-10-29 08:47:21 +02:00
Josscii	0fbaac9c89	whisper : fix index overflow in token-level timestamp logic (#2505 )	2024-10-23 15:14:03 +03:00
Rotem Dan	b6049060dd	whisper : add dtw preset for large-v3-turbo (#2481 )	2024-10-15 21:00:21 +03:00
Sandro Hanea	fdbfb460ed	whisper : add OpenVINO init with state (#2464 ) * Fixed OpenVino init on state * Removed an empty line * Fixed typo * Replaced tabs with spaces --------- Co-authored-by: Sandro Hanea <sandrohanea@users.noreply.github.com>	2024-10-08 20:08:00 +03:00
Georgi Gerganov	847f94fdeb	whisper : zero-out the KV cache upon clear (#2445 )	2024-10-05 15:23:51 +03:00
Georgi Gerganov	396089f3cf	whisper : revert mel-related changes (#0 ) too much extra logic and complexity for small benefit	2024-10-05 15:23:51 +03:00
Georgi Gerganov	941912467d	whisper : adapt to latest ggml (skip) (#0 )	2024-10-05 15:23:51 +03:00
Georgi Gerganov	f62a546e03	whisper : fix excessive memory usage (#2443 ) * whisper : fix KV cache allocation * whisper : reduce memory overhead from unused input tensors	2024-10-05 12:36:40 +03:00
Georgi Gerganov	ccc2547210	talk-llama : sync llama.cpp	2024-10-03 12:22:17 +03:00
Georgi Gerganov	fe18c29ab8	talk-llama : sync llama.cpp	2024-09-24 19:45:08 +03:00
Georgi Gerganov	34291099fb	ggml : refactoring (llama/#0) - d6a04f87 - 23e0d70b	2024-09-24 19:45:08 +03:00
Georgi Gerganov	9d754a56cf	whisper : update FA call	2024-08-28 13:22:20 +03:00

1 2

64 Commits