2748 Commits

Author SHA1 Message Date
w1redch4d
2a4d6db7d9
examples : update usage/help in yt-wsp.sh (#3251)
This commit updates the usage/help message to be more readable and include the environment variables available to set options.
2025-06-16 12:21:16 +02:00
Sacha Arbonel
107c303e69
server : graceful shutdown, atomic server state, and health endpoint Improvements (#3243)
* feat(server): implement graceful shutdown and server state management

* refactor(server): use lambda capture by reference in server.cpp
2025-06-16 10:14:26 +02:00
Daniel Bevenius
705db0f728
whisper : fix VAD processing for skipped audio segments (#3230)
This commit addresses an issue with token timestamps when audio segments
are skipped, in `whisper_exp_compute_token_level_timestamps` related to
the VAD processing and the energy levels.

The motivation for this is that the token timestamps exceed the energy
array bounds due to segment timing misalignment:
```console
                  (skipped introduction)
                    ↓
Audio segment:     [2600ms → 5600ms]  (3 seconds of actual audio)
Energy array:      [0 → 480652]       (samples for 3 seconds)
Token timestamps:  [3266ms → 3408ms]  (absolute timestamps)
```
So both `s0` and `t1` get clamped to the maximum sample index (480652)
which causes the start/end timestamps to be the same for all the tokens
after a certain point.

This is addressed by using segment-relative timestamps in the
`timestamp_to_sample` and `sample_to_timestamp`.
2025-06-13 17:35:52 +02:00
Daniel Bevenius
0a4d85cf8a
server : add Voice Activity Detection (VAD) support (#3246)
* server : add Voice Activity Detection (VAD) support

This commit adds support for Voice Activity Detection (VAD) in the
server example.

The motivation for this is to enable VAD processing when using
whisper-server.

Resolves: https://github.com/ggml-org/whisper.cpp/issues/3089

* server : add VAD parameters to usage in README.md [no ci]

This commit also adds a few missing parameters.

* server : fix conflicting short options [no ci]
2025-06-13 13:24:03 +02:00
Daniel Bevenius
9df8d54bcb
cli : fix short name conflict for vad options [no ci] (#3247)
This commit fixes a short name conflict whisper-cli for
`--vad-min-speech-duration-ms` and `--vad-min-silence-duration-ms` which
currently have the same short name `-vsd`.

Refs: https://github.com/ggml-org/whisper.cpp/pull/3246#pullrequestreview-2923800114
2025-06-13 10:25:25 +02:00
Daniel Bevenius
20d203aacf
ruby : add .gitignore entries for ext directory (#3245)
This commit adds entries to `.gitignore` for directories in the
`ext` directory.

The motivation for this is that currently after building locally these
following files are reported by git as untracked:
```console
Untracked files:
  (use "git add <file>..." to include in what will be committed)
	ext/examples/
	ext/ggml/
	ext/include/
	ext/scripts/
	ext/src/
```
2025-06-13 10:04:20 +02:00
Daniel Bevenius
ebbc874e85
ci : update windows runner to windows-2022 (#3242)
* ci : update windows runner to windows-2022

This commit changes the windows-2019 runner to windows-2022.

The motiation for this is that the windows-2019 runner is scheduled for
deprection and will be removed 2025-06-30. There are currently "burnout"
periods that started 2025-06-01 and during these times jobs with
windows-2019 will fail which has happened lately on our CI.

Refs: https://github.com/actions/runner-images/issues/12045
2025-06-11 13:53:16 +02:00
Daniel Bevenius
2679bec6e0
ruby : add cleaning of library names in dependencies (#3241)
* ruby : add cleaning of library names in dependencies

This commit adds a cleaning step to the library names in the
`Dependencies` class of the Ruby bindings.

The motivation for this is that with the introduction of a library name
alias for ggml in Commit (b933d17c306e800b6d919e3ee895219c3f64d5cd
"Add in-build ggml::ggml ALIAS library (ggml/1260)) causes the Makefile
generation to break:
```console
$ sed -n '165,170p' ext/Makefile
CLEANOBJS     = $(OBJS) *.bak
TARGET_SO_DIR_TIMESTAMP = $(TIMESTAMP_DIR)/.sitearchdir.time
$(TARGET_SO): libcommon.a libwhisper.a libggml\n(ggml::ggml).a libggml-cpu.a libggml-base.a
libcommon.a libwhisper.a libggml\n(ggml::ggml).a libggml-cpu.a libggml-base.a: cmake-targets
cmake-targets:
	/usr/bin/cmake -S sources -B build -D BUILD_SHARED_LIBS=OFF -D CMAKE_ARCHIVE_OUTPUT_DIRECTORY=/home/danbev/work/ai/whisper.cpp/bindings/ruby/ext -D CMAKE_POSITION_INDEPENDENT_CODE=ON
```

* squash! ruby : add cleaning of library names in dependencies

Apply PR review feedback.
2025-06-10 15:06:40 +02:00
Georgi Gerganov
93d543905e ggml : fix weak alias win32 (#0)
ggml-ci
2025-06-10 12:40:33 +03:00
Georgi Gerganov
962361bd79 android : fix builds (#0)
ggml-ci
2025-06-10 12:40:33 +03:00
Georgi Gerganov
dbe81c1042 sync : ggml
ggml-ci
2025-06-10 12:40:33 +03:00
Georgi Gerganov
175e7e4f1a files : remove old sources (part 2) 2025-06-10 12:40:33 +03:00
Georgi Gerganov
56475d01dc sync : ggml
ggml-ci
2025-06-10 12:40:33 +03:00
Georgi Gerganov
38347a7dda files : remove old sources 2025-06-10 12:40:33 +03:00
Georgi Gerganov
db264d6220 talk-llama : sync llama.cpp
ggml-ci
2025-06-10 12:40:33 +03:00
Georgi Gerganov
96eaf46ec6 sync : ggml
ggml-ci
2025-06-10 12:40:33 +03:00
Georgi Gerganov
7a675807a2 metal : use less stack memory in FA kernel (llama/14088)
* metal : use less stack memory in FA kernel

ggml-ci

* cont : fix BF16 variant
2025-06-10 12:40:33 +03:00
xctan
8cbc889f85 ggml-cpu : split arch-specific implementations (llama/13892)
* move ggml-cpu-aarch64 to repack

* split quantize_row_q8_0/1

* split helper functions

* split ggml_vec_dot_q4_0_q8_0

* split ggml_vec_dot_q4_1_q8_1

* split ggml_vec_dot_q5_0_q8_0

* split ggml_vec_dot_q5_1_q8_1

* split ggml_vec_dot_q8_0_q8_0

* split ggml_vec_dot_tq1_0_q8_K

* split ggml_vec_dot_tq2_0_q8_K

* split ggml_vec_dot_q2_K_q8_K

* split ggml_vec_dot_q3_K_q8_K

* split ggml_vec_dot_q4_K_q8_K

* split ggml_vec_dot_q5_K_q8_K

* split ggml_vec_dot_q6_K_q8_K

* split ggml_vec_dot_iq2_xxs_q8_K

* split ggml_vec_dot_iq2_xs_q8_K

* split ggml_vec_dot_iq2_s_q8_K

* split ggml_vec_dot_iq3_xxs_q8_K

* split ggml_vec_dot_iq3_s_q8_K

* split ggml_vec_dot_iq1_s_q8_K

* split ggml_vec_dot_iq1_m_q8_K

* split ggml_vec_dot_iq4_nl_q8_0

* split ggml_vec_dot_iq4_xs_q8_K

* fix typos

* fix missing prototypes

* rename ggml-cpu-quants.c

* rename ggml-cpu-traits

* rename arm folder

* move cpu-feats-x86.cpp

* rename ggml-cpu-hbm

* update arm detection macro in quants.c

* move iq quant tables

* split ggml_quantize_mat_q8_0/K

* split ggml_gemv_*

* split ggml_gemm_*

* rename namespace aarch64 to repack

* use weak aliases to replace test macros

* rename GGML_CPU_AARCH64 to GGML_CPU_REPACK

* rename more aarch64 to repack

* clean up rebase leftover

* fix compilation errors

* remove trailing spaces

* try to fix clang compilation errors

* try to fix clang compilation errors again

* try to fix clang compilation errors, 3rd attempt

* try to fix clang compilation errors, 4th attempt

* try to fix clang compilation errors, 5th attempt

* try to fix clang compilation errors, 6th attempt

* try to fix clang compilation errors, 7th attempt

* try to fix clang compilation errors, 8th attempt

* try to fix clang compilation errors, 9th attempt

* more cleanup

* fix compilation errors

* fix apple targets

* fix a typo in arm version of ggml_vec_dot_q4_K_q8_K

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-06-10 12:40:33 +03:00
Diego Devesa
e16a84cd95 cuda : fix device sync on buffer clear (llama/14033) 2025-06-10 12:40:33 +03:00
Xinpeng Dou
26282282fa CANN: Simplify the environment variable setting(#13104)
* Simplify the environment variable setting to specify the memory pool type.

* Adjust the GGML_CANN_ASYNC_MODE setting to accept yes, enable, 1, or on (case-insensitive) as valid options.

* update

* fix CI

* update

* delete whitespace

* fix according to review

* update CANN.md

* update CANN.md
2025-06-10 12:40:33 +03:00
Nicolò Scipione
4737a8c780 sycl: Add reorder to Q6_K mmvq implementation (llama/13885)
* Add Reorder to Q6_K mmvq implementation

* Address PR comments: clean up comments

* Remove unused parameter after refactoring q4_k

* Adding inline to function and removing unnecessary reference to int

---------

Signed-off-by: nscipione <nicolo.scipione@codeplay.com>
2025-06-10 12:40:33 +03:00
Diego Devesa
8a70f4d18b cuda : fix buffer type check with integrated GPUs (llama/14069) 2025-06-10 12:40:33 +03:00
Akarshan Biswas
489dc158a6 SYCL: Implement few same quantized type copy kernels (llama/13739)
* SYCL: Implement few same quantized type copy kernels

* Use memcpy for copying contiguous tensors

ggml-ci

* feat(sycl): add contiguous tensor copy support and device checks

Adds a memcpy path for contiguous tensors of the same type to optimize data transfer. Updates device support checks to recognize contiguous tensor operations, improving compatibility and performance.

* refactor: replace specific block copy functions with template

The changes replace multiple redundant block copy functions (e.g., cpy_block_q8_0_q8_0, cpy_block_q5_0_q5_0) with a single templated function cpy_blck_q_q. This reduces code duplication by using a generic template that works for any block type, improving maintainability while preserving the same functionality. The template is instantiated with specific block types (e.g., block_q8_0) where needed.

* Exclude BF16 support for COPY tensors for now
ggml-ci

* perf: adjust SYCL copy kernel block sizes for efficiency

Use ceil_div to ensure full element coverage and update nd_range parameters to better align with SYCL block sizes, improving parallelism and device utilization in copy operations.
2025-06-10 12:40:33 +03:00
Masato Nakasaka
f0f5a9f7fb vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs (llama/14001)
* allowing B580 and U9-288V

* experimenting code to detect Xe2

* allowing coopmat only for Xe2 GPUs

* fixed comment wording

* fixed comment wording

* removed unnecessary driver check
2025-06-10 12:40:33 +03:00
Diego Devesa
13a03c5d33 llama : allow using mmap without PrefetchVirtualMemory, apply GGML_WIN_VER to llama.cpp sources (llama/14013) 2025-06-10 12:40:33 +03:00
Jeff Bolz
6dd91d4f7e vulkan: automatically deduce size of push constants (llama/13936) 2025-06-10 12:40:33 +03:00
Ervin Áron Tasnádi
5171b24f70 ggml-vulkan: adds support for op CONV_TRANSPOSE_1D (llama/13813)
* * ggml-vulkan: adds op CONV_TRANSPOSE_1D

* test-backend-ops: adds more spohisticated tests for CONV_TRANSPOSE_1D

* Missing barrier added to shader.
Number of additional tests reduced to 108.

* * Fixes typo in variable name.

* Removes extra whitespaces.

* Adds int64->int32 casts to prevent possible warnings.

* Problem size reduced in tests to pass tests with llvmpipe.

* supports_op condition moved from unintended position
2025-06-10 12:40:33 +03:00
Diego Devesa
23e2fe0682 releases : use dl backend for linux release, remove arm64 linux release (llama/13996) 2025-06-10 12:40:33 +03:00
Johannes Gäßler
7f4d110f53 CUDA: fix FTZ in FA for Gemma 3 (llama/13991) 2025-06-10 12:40:33 +03:00
Jeff Bolz
ee0ef39fee vulkan: fix warnings in perf logger querypool code (llama/13937) 2025-06-10 12:40:33 +03:00
lhez
62791ba2e6 opencl: add backend_synchronize (llama/13939)
* This is not needed by the normal use where the result is read
  using `tensor_get`, but it allows perf mode of `test-backend-ops`
  to properly measure performance.
2025-06-10 12:40:33 +03:00
rmatif
e16ef08884 OpenCL: Add concat, tsembd, upscale, tanh, pad and repeat (llama/13840)
* add concat, pad, repeat, tsembd, tanh, upscale

* small fixes
2025-06-10 12:40:33 +03:00
Georgi Gerganov
c72d3ce935 metal : use F32 accumulators in FA kernels (llama/13975)
ggml-ci
2025-06-10 12:40:33 +03:00
shalinib-ibm
126aeb4a49 cmake : Handle mixed-case 'Power' strings in POWER CPU detection (llama/13966)
Some systems report the CPU implementation as "Power11" instead of "POWER11".
The existing CMake logic uses a case-sensitive regular expression to extract
the CPU generation, which fails when the casing doesn't exactly match "POWER".

This patch provides a fix by first converting the string to uppercase before applying the regex.

Signed-off-by: root <root@rheldb2v.pperf.tadn.ibm.com>
Co-authored-by: root <root@rheldb2v.pperf.tadn.ibm.com>
2025-06-10 12:40:33 +03:00
Atharva Dubey
ef2a79d2b8 sycl: quantize and reorder the input to q8_1 when reorder is enabled (llama/13826)
* [WIP]: fuse q8 quantization and reorder

* wip2: fuse q8 quantization and reorder

* working q8 reorder commit

* restored common.hpp

* remove debug prints

* remove unnecessary headers and remove trailing whitespace

* Update ggml/src/ggml-sycl/ggml-sycl.cpp

Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@intel.com>

---------

Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@intel.com>
2025-06-10 12:40:33 +03:00
Johannes Gäßler
9589645e72 gguf: fix failure on version == 0 (llama/13956) 2025-06-10 12:40:33 +03:00
Aaron Teo
20f913d119 ggml: check if non-native endian model is being loaded (llama/13943)
* gguf: prevent non-native endian models from being loaded

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* gguf: update error message

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* gguf: make the non-native endian check more verbose

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: move ggml_assert location

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml: reword the endianness check error message

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

---------

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-06-10 12:40:33 +03:00
Kai Pastor
b933d17c30 Add in-build ggml::ggml ALIAS library (ggml/1260)
Enable uniform linking with subproject and with find_package.
2025-06-10 12:40:33 +03:00
KITAITI Makoto
fbead67549
ruby : output format (#3237)
* Fix a typo

* Don't allocate output string unless needed

* Add methods to output SRT and WebVTT

* Add tests for output methods

* Make constants for output private

* Add signatures for output methods

* Add document on output methods

* Fix method name: Segment#speaker_next_turn? -> #speacker_turn_next?

* Add Whisper::Segment#descotruct_keys

* Add test for Whisper::Context#descotruct_keys

* Add signature of Whisper::Segment#deconstruct_keys

* Use parentheses to suppress warning

* Update date
2025-06-10 06:10:17 +02:00
藍+85CD
d78f081423
ci : build and publish main-intel image (#3231) 2025-06-09 06:42:53 +02:00
藍+85CD
b175baa665
docker : add main-intel dockerfile (#3229) 2025-06-06 05:30:02 +02:00
KITAITI Makoto
799eacdde4
ruby : Add parallel transcription support (#3222)
* Fix indentation of code sample in document comment

* Make Whisper::Context#transcribe able to run non-parallel

* Add test for Whisper::Context#transcribe with parallel option

* Follow signature API change of Context#transcribe

* Remove useless variable assignment

* Move simple usage up in README

* Add need help section in README

* Add document on Context#transcribe's parallel option in README

* Update date

* Fix signature of Context.new

* Make Context#subscribe accept n_processors option

* Make test follow #transcribe's change

* Make RBS follow #transcribe's change

* Add document for #transcribe's n_processors option

* Rename test directory so that Rake tasks' default setting is used
2025-06-04 14:50:18 +09:00
Daniel Bevenius
82f461eaa4
ci : add mirror for ports.ubuntu.com (ARM packages) (#3221)
This commit updates the build workflow to replace `ports.ubuntu.com`
with `mirror.kumi.systems` in the apt sources list for ARM64 builds.

The motivation for this change is intended to improve package download
reliability and speed by using a more stable mirror for ARM64 packages.
2025-06-03 07:56:58 +02:00
Joas Dev
269dad68a2
bindings.java : apply whisperParams in fullTranscribeWithTime instead of ignoring them (#3201)
This pull request fixes a bug in the fullTranscribeWithTime method, where the whisperParams argument was declared but never used. As a result, the model did not apply the configuration defined in whisperParams.
2025-06-03 06:15:21 +02:00
R0CKSTAR
121d27a495
musa: correct MUSA SDK rc4.0.1 download URL (#3217)
* musa: correct MUSA SDK rc4.0.1 download URL

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* Fix typo

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-06-03 06:02:12 +02:00
Daniel Bevenius
e05af2457b
ci : use mirrors.kernel.org for Ubuntu packages (#3220)
This commit updates the ubuntu jobs to use mirrors sites instead of archive.ubuntu.com.

The motivation of this is an attempt to make the CI build more stable and avoid errors like:
https://github.com/ggml-org/whisper.cpp/actions/runs/15384056535/job/43291948394?pr=3217
2025-06-02 16:46:40 +02:00
Daniel Bevenius
b505539670
node : add language detection support (#3190)
This commit add support for language detection in the Whisper Node.js
addon example. It also updates the node addon to return an object
instead of an array as the results.

The motivation for this change is to enable the inclusion of the
detected language in the result, in addition to the transcription
segments.

For example, when using the `detect_language` option, the result will
now be:
```console
{ language: 'en' }
```

And if the `language` option is set to "auto", it will also return:
```console
{
  language: 'en',
  transcription: [
    [
      '00:00:00.000',
      '00:00:07.600',
      ' And so my fellow Americans, ask not what your country can do for you,'
    ],
    [
      '00:00:07.600',
      '00:00:10.600',
      ' ask what you can do for your country.'
    ]
  ]
}
```
2025-06-02 14:58:05 +02:00
Georgi Gerganov
7fd6fa8097 talk-llama : sync llama.cpp
ggml-ci
2025-06-01 15:14:44 +03:00
Georgi Gerganov
3f46282cbe sync : ggml
ggml-ci
2025-06-01 15:14:44 +03:00
Max Krasnyansky
1e16340f4b threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling (llama/12995)
* threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling

We talked about adding LOW priority for GGML threads in the original threadpool PR.
It might be useful for some cases to avoid contention.

Latest Windows ARM64 releases started parking (offlining) the CPU cores
more aggresively which results in suboptimal performance with n_threads > 4.
To deal with that we now disable Power Throttling for our threads for the NORMAL
and higher priorities.

Co-authored-by: Diego Devesa <slarengh@gmail.com>

* threading: disable SetThreadInfo() calls for older Windows versions

* Update tools/llama-bench/llama-bench.cpp

Co-authored-by: Diego Devesa <slarengh@gmail.com>

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-06-01 15:14:44 +03:00