whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-06-23 19:21:39 +02:00

Author	SHA1	Message	Date
Fujimoto Seiji	b9d27b1358	tests : add a new benchmark test for long-form audio (#3185 ) * tests : add a new benchmark test for long-form audio Based on "Earnings-21" corpus by Del Rio et al. Earnings-21: A Practical Benchmark for ASR in the Wild (2021) https://arxiv.org/abs/2104.11348 This dataset contains 39 hours of long-form speech, sourced from public earning calls. Each recording contains roughly 50 minutes of English dialogues between multiple speakers (2-20 persons). This benchmark suite should allow us to evaluate the performance of whisper.cpp on long-form audio data. Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net> * tests : apply PR feedback to 'earnings21/README.md' Based on feedback from Daniel Bevenius. - Simplify how to download & prepare a Silero VAD model. - Fix typo: inferece -> inference Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net> * tests : avoid crashing on non-UTF-8 characters Based on feedback from Daniel Bevenius. Add 'errors' parameter to open() in order to avoid unhandled exception on invalid UTF-8 bytes. Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net> * tests : try to interpret the hypothesis as Windows-1252 Based on the discussion in PR#3185. Evidently Whisper.cpp can represent a quotation mark as '0x93', which implifies Windows-1252 (Microsoft's ASCII excention), and cannot be decoded by UTF-8. Add an explicit decoding loop to address the issue. Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net> --------- Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net>	2025-05-28 07:08:44 +02:00
Daniel Bevenius	0ed00d9d30	ci : update windows-blas uploads action (#3192 ) This commit modifies windows-blas which was updated previously to use the zip functionality provided by `actions/upload-artifact`. This turned out to be incorrect and I should not have done that. The reason for zipping the archives first is that otherwise the artifacts when downloaded will be unzipped and just be simple directories. In our case the release task depends on the artifacts having a .zip extension so that those archives are include in the release.	2025-05-27 18:01:31 +02:00
Georgi Gerganov	527fe6aaeb	sync : fix builds - musa, ruby	2025-05-27 18:03:00 +03:00
Georgi Gerganov	26eb48cb08	talk-llama : sync llama.cpp ggml-ci	2025-05-27 18:03:00 +03:00
Georgi Gerganov	546928c33f	sync : ggml ggml-ci	2025-05-27 18:03:00 +03:00
xctan	15ae9dc2a4	ggml : riscv: add xtheadvector support (llama/13720) * ggml : riscv: add xtheadvector support * ggml : clean up some macro usage	2025-05-27 18:03:00 +03:00
Christian Kastner	2e7a1e3e43	ggml-cpu: x86 feature detection is specific to x86 (llama/13811)	2025-05-27 18:03:00 +03:00
Diego Devesa	b75babebb2	ggml : allow CUDA graphs when using pipeline parallelism (llama/13814)	2025-05-27 18:03:00 +03:00
Georgi Gerganov	cc7a0105ef	cuda : avoid cuGetErrorString (llama/13791) ggml-ci	2025-05-27 18:03:00 +03:00
Akarshan Biswas	195fde8804	SYCL: Add non contiguous support in RMS_NORM and NORM kernels (llama/13611) * SYCL: Add non contiguous input support to norm kernel * refactor and add RMS_NORM non contiguous input support ggml-ci * restore subgroup reduction for multi-subgroup thread blocks in norm kernels * Swap grid dims of nsamples and nrows ggml-ci * Revert "Swap grid dims of nsamples and nrows" This reverts commit 43be2d657fec7f7fba54e2cd154106bc0fc45adf. * restore not required changes ggml-ci * address review comments: change it to more like SYCL * Use a common function to calculate offset * remove wrap around logic for handling broadcasts * remove static from calculate_offset fn and use ceil_div	2025-05-27 18:03:00 +03:00
Romain Biessy	25e27904ca	sycl: Add more debug prints (llama/13640)	2025-05-27 18:03:00 +03:00
Jeff Bolz	474f7be8b6	vulkan: mark IM2COL as supporting non-contig (llama/13783)	2025-05-27 18:03:00 +03:00
Bizhao Shi	e35fecc2a1	CANN: Add the basic supports of Flash Attention kernel (llama/13627) * cann: add the basic FA support * cann: update the readme * cann: update the FlashAttention with PSEShift * cann: update the input parameters in FA * cann: update the alibi with max_bias * cann: add the constrints of softcap * cann: update the docs CANN.md * cann: update the docs CANN.md * cann: fix typo of CANN.md * cann: add some comments and update the CANN.md * cann: update the CANN.md * cann: update the inner precise for fusedInferAttention * cann: update the constraints of flash_attn_ext on ggml-cann.cpp * cann: clean the whitespace * cann: clean the whitespace * cann: add a new endline	2025-05-27 18:03:00 +03:00
Akarshan Biswas	1cd7028428	SYCL: revert "sycl: simplify bin_bcast_kernel (ggml/13383)" (llama/13752) Temporarily reverted due to failing fp16 DIV operation This reverts commit 02cdd2d8b092b5a4bb18e013c6887ce49ba20ac5. ggml-ci	2025-05-27 18:03:00 +03:00
Diego Devesa	99596d6031	ggml-cpu : set openmp wait time if not set (llama/13758)	2025-05-27 18:03:00 +03:00
Xuan-Son Nguyen	2d6c6862f7	ggml : add ggml_gelu_erf() CUDA kernel (llama/13719) * ggml : add ggml_gelu_erf() CUDA kernel * missing semicolon	2025-05-27 18:03:00 +03:00
Johannes Gäßler	f1576b2659	CUDA: fix race condition in FA vector kernels (llama/13742)	2025-05-27 18:03:00 +03:00
Chenguang Li	994b4f86ab	CANN: Support MUL_MAT_ID for q8_0 and q4_0 (llama/13705) * [CANN]Support MUL_MAT_ID Q8 && Q4 Signed-off-by: noemotiovon <757486878@qq.com> * codestyle adjustment Signed-off-by: noemotiovon <757486878@qq.com> --------- Signed-off-by: noemotiovon <757486878@qq.com>	2025-05-27 18:03:00 +03:00
Xuan-Son Nguyen	3e7eaccf55	ggml : fix the order of ggml_unary_op (llama/13718)	2025-05-27 18:03:00 +03:00
Jeff Bolz	191f040414	vulkan: support CPY from any type to itself (llama/13695) Reuse the f16/f32 copy shaders, and just scale the number of elements according to the type size.	2025-05-27 18:03:00 +03:00
Jeff Bolz	2d49d4a9b5	vulkan: Disable coopmat/coopmat2/bfloat extensions if glslc doesn't support it (llama/13696)	2025-05-27 18:03:00 +03:00
Judd	000d65befb	use LOG_WARN to replace `std::cerr` (llama/13657)	2025-05-27 18:03:00 +03:00
Nicolò Scipione	f0803e6646	sycl : Remove waits from function calls (llama/13702) * removes the waits in async memcpy functions	2025-05-27 18:03:00 +03:00
Ewan Crawford	730a00be8a	SYCL: Avoid using with SYCL-Graph for unsupported nodes (llama/13587) Currently on a CUDA backend to SYCL when running `GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0` there are two operations that throw an exception from the blocking waits during queue recording. * `-o CONCAT` : Use of blocking waits on a queue that's being recorded https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/concat.cpp#L185-L187 * `-o MUL_MAT_ID`: Blocking wait on a recording queue for a copy to host memory https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/ggml-sycl.cpp#L3072-L3074 We've noticed that `ggml-cuda.cu` has the [check_node_graph_compatibility_and_refresh_copy_ops](`39e73ae0d6/ggml/src/ggml-cuda/ggml-cuda.cu (L2458-L2458)`) method for checking if a graph can be used, even if enabled. I've taken a similar approach in this PR by adding a method to `ggml-sycl.cpp` for checking if a graph can be used for the operations even if a user has asked for it to be enabled.	2025-05-27 18:03:00 +03:00
Henry Linjamäki	316600e8ee	opencl: Add support for multiple devices (llama/12622) * opencl: Add support for multiple devices ... but limited to one platform. A platform with a GPU will be preferred. Additionally: * Filter out devices that lack capabilities needed by the backend implementation (half support, OpenCL 2.0+, etc). * Make ggml_backend_opencl_reg() thread-safe. * fixup: fix an error in sync_with_other_backends ... when there is only one OpenCL device available.	2025-05-27 18:03:00 +03:00
Henry Linjamäki	42f2b3bb65	opencl: fix couple crashes (llama/12795) * opencl: fix couple crashes * fix kernel launches failed on devices which do not support non-uniform work-groups. When non-uniform work-groups are not supported, set `local_work_size` to NULL (= let driver choose the work-group sizes). This patch does not cover everything - just the cases tested by test-backend-ops. * fix sub-buffer creation failed due to `cl_buffer_region::origin` not being aligned to `CL_DEVICE_MEM_BASE_ADDR_ALIGN`. * OpenCL: query non-uniform WG sizes only on OpenCL 3.0+	2025-05-27 18:03:00 +03:00
Xuan-Son Nguyen	dd6ef64060	ggml : add ggml_gelu_erf() (llama/13667) * ggml : add ggml_gelu_na (not approximated) * fix naming order * rename na --> erf * apply review suggesions * revert naming order	2025-05-27 18:03:00 +03:00
R0CKSTAR	131ee546ca	musa: Upgrade MUSA SDK version to rc4.0.1 and use mudnn::Unary::IDENTITY op to accelerate D2D memory copy (llama/13647) * musa: fix build warning (unused parameter) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * musa: upgrade MUSA SDK version to rc4.0.1 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * musa: use mudnn::Unary::IDENTITY op to accelerate D2D memory copy Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Update ggml/src/ggml-cuda/cpy.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * musa: remove MUDNN_CHECK_GEN and use CUDA_CHECK_GEN instead in MUDNN_CHECK Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-05-27 18:03:00 +03:00
Eve	4712f7b663	vulkan: fix warnings (llama/13626) * small fixes * remove ifdef	2025-05-27 18:03:00 +03:00
Johannes Gäßler	926fe234e9	CUDA: skip fully masked-out KV in FA vec kernel (llama/13584) * CUDA: skip fully masked-out KV in FA vec kernel	2025-05-27 18:03:00 +03:00
Svetlozar Georgiev	f44b53480f	sycl: disable reorder for sycl mulmat (llama/13536)	2025-05-27 18:03:00 +03:00
Georgi Gerganov	e04e8f1c79	metal : fix typo in FA kernel comments (llama/13651)	2025-05-27 18:03:00 +03:00
Nicolò Scipione	ee3f177cba	sycl : Overcoming workaround for mmap() allocation on Windows (llama/13482) * Remove mmap workaround on windows After some testing I found that mmap is supported on windows and for many GPUs on Linux. Therefore I remove the workaround for windows since it is not necessary. * Update llama-bench README SYCL backend introduced a workaround that allows execution of llama-bench also without specifying `--mmp 0` flag	2025-05-27 18:03:00 +03:00
0cc4m	0b69f74e15	Vulkan: Add f32 accumulator support to quantized mul mat to fix GLM4 32B incoherence (llama/13607)	2025-05-27 18:03:00 +03:00
Georgi Gerganov	e415db0ed7	sync : ggml	2025-05-27 18:03:00 +03:00
Daniel Bevenius	2bb7694edb	docs : convert README_sycl.md to utf8 format [no ci] (#3191 ) This commit updates the README_sycl.md file to use UTF-8 encoding. The motivation for this is that while this file displays correctly in github it will fail to render with tools that expect UTF-8 encoding. For example this is the case when using `grip` to view the file locally.	2025-05-27 10:53:50 +02:00
Daniel Bevenius	450de0787e	node : enable no_prints to suppress all output (#3189 ) This commit enable the node addon to suppress all output, even the result of the transcription if the no_prints parameter is set to true. The motivation for this is that for the node addon there is a fullfilment handler/success callback to process the transcription result. And it might be useful to be able to disable the printing of the transcription result to the console, so that the user can handle the result in their own way. Refs: https://github.com/ggml-org/whisper.cpp/issues/3176	2025-05-27 05:51:47 +02:00
matteng1	ea9f206f18	talk-llama : fix for swedish umlauts + expose model inference settings in talk-llama.cpp (#3187 ) Quick fix for not removing swedish umlauts. * Update talk-llama.cpp Expose model inference settings to user instead of hard coding them. Same defaults as previous defaults. * Update examples/talk-llama/talk-llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-05-26 07:57:39 +02:00
KITAITI Makoto	13d92d08ae	docs : fix VAD section heading levels (#3186 )	2025-05-23 10:38:26 +02:00
Daniel Bevenius	aab6976465	ci : use dynamic libopenblas.dll for window-blas (#3177 ) * ci : use dynamic libopenblas.dll for window-blas This commit updates the windows-blas job to use the dynamic (can load different kernels depending of the CPU arch) libopenblas.dll instead of the "static" openblas.dll that get installed by vcpgk. The motivation for this change is that there have been reports of performance drops in later version specifically related to blas. Please see the links below for more details. Resolves: https://github.com/ggml-org/whisper.cpp/issues/3166 Refs: https://github.com/ggml-org/whisper.cpp/issues/2666#issuecomment-2885978811	2025-05-23 05:48:08 +02:00
Sacha Arbonel	78b31ca782	server : Add k6 Load Testing Script (#3175 ) * add load testing script and update README for k6 integration	2025-05-22 10:03:04 +02:00
Daniel Bevenius	cbe557f9b1	docs : add VAD model download instructions [no ci] (#3180 )	2025-05-22 07:49:29 +02:00
Alpaim	273af4aab9	docs : replace typo "]"with ")" in README (#3179 )	2025-05-22 05:49:44 +02:00
Daniel Bevenius	bd1cb0c8e3	whisper : remove redundant assignments (#3178 ) This commit removes some redundant assignments in the function `whisper_exp_compute_token_level_timestamps`. The motivations for this is that tokens[j] and token are references to the same object and this can be a little confusing when reading the code.	2025-05-21 13:23:20 +02:00
Jugal Haresh Sheth	62dc8f7d7b	whisper : update CMakeLists.txt to handle deprecated gpu Warnings (#3163 ) * Fix CMakeLists.txt to handle deprecated gpu Warnings * Conditionally apply -Wno-deprecated-gpu-targets only when GGML_CUDA is enabled * Conditionally apply -Wno-deprecated-gpu-targets only when GGML_CUDA is enabled and not MSVC --------- Co-authored-by: Jugal Sheth <jugal.sheth@marineai.co.uk>	2025-05-20 11:58:25 +02:00
Daniel Bevenius	2c4b904596	ruby : add GGML_SYCL_DNN option to ruby bindings (#3172 ) This commit adds the `GGML_SYCL_DNN` option to the Ruby bindings for the GGML library. This option as added to ggml in Commit (5e7e07758a5f3172380500e173ca71f679bbef1e "sycl: use oneDNN for matrices multiplication") The motivation for this change to enable the CI build to pass.	2025-05-19 17:59:43 +02:00
Georgi Gerganov	6b6cf19c65	talk-llama : sync llama.cpp ggml-ci	2025-05-19 14:58:39 +03:00
Georgi Gerganov	05501c218d	sync : ggml ggml-ci	2025-05-19 14:58:39 +03:00
Chenguang Li	9da3fc27be	CANN: Support MOE Model MUL_MAT_ID (llama/13042) Signed-off-by: noemotiovon <757486878@qq.com>	2025-05-19 14:58:39 +03:00
Gilad S.	2c13651e08	cmake: use the current build config for vulkan-shaders-gen (llama/13595) * fix: use the current build config for `vulkan-shaders-gen` * fix: only pass a valid build type to `--config`	2025-05-19 14:58:39 +03:00

1 2 3 4 5 ...

2668 Commits