2609 Commits

Author SHA1 Message Date
Johannes Gäßler
0dda27bc0b CUDA: fix crash on large batch size for quant. MoE (llama/13537) 2025-05-19 14:58:39 +03:00
Johannes Gäßler
ffa4720f25 CUDA: faster Deepseek FA, add Turing support (llama/13435) 2025-05-19 14:58:39 +03:00
bandoti
9b8eea28b5 cmake: simplify vulkan shader test logic (llama/13263) 2025-05-19 14:58:39 +03:00
Jeff Bolz
162bbe8220 vulkan: KHR_coopmat flash attention (llama/13506)
This shader uses coopmat1 to do the Q*K^T multiply. The P*V multiply is more
difficult for various reasons so I haven't done it. Performance for this
shader is around 2.5x better than for the scalar shader when doing prompt
processing. Some of the benefit may be from other optimizations like staging
through shared memory, or splitting by rows.
2025-05-19 14:58:39 +03:00
Jeff Bolz
a221288dc6 vulkan: workaround FA compile failures on macos (llama/13517) 2025-05-19 14:58:39 +03:00
Georgi Gerganov
08436716ae metal : use FA-vec kernel up to batch size 20 (llama/13496)
* batched-bench : fix pp batch contents

* metal : optimize multi-sequence FA vec kernel

ggml-ci

* metal : use FA-vec kernel up to batch size 20

ggml-ci
2025-05-19 14:58:39 +03:00
Georgi Gerganov
e11fc21e6c metal : optimize multi-sequence FA vec kernel (llama/13493)
* batched-bench : fix pp batch contents

* metal : optimize multi-sequence FA vec kernel

ggml-ci
2025-05-19 14:58:39 +03:00
Dan Johansson
a77a924b20 ggml-cpu: Update KleidiAI to v1.6 and fix include directives (llama/13509)
Signed-off-by: Dan Johansson <dan.johansson@arm.com>
2025-05-19 14:58:39 +03:00
Johannes Gäßler
405b9c77ad mnist: fix segmentation fault (ggml/1227) 2025-05-19 14:58:39 +03:00
Diego Devesa
9c3bfc1499 ggml : fix apple OS check in ggml_print_backtrace (ggml/1229) 2025-05-19 14:58:39 +03:00
Daniel Tang
5b7797f674 ggml : Fix missing backtrace on Linux (ggml/1228)
* Modern Linux defaults /proc/sys/kernel/yama/ptrace_scope to 1
* Fixed lldb attach
* Simplify by having the child do ggml_print_backtrace_symbols
2025-05-19 14:58:39 +03:00
Daniel Bevenius
82ad275800
examples : add vad-speech-segments to win warns [no ci] (#3170)
The commit includes the vad-speech-segments in the disable msvc warnings
"list".
2025-05-19 12:17:18 +02:00
Daniel Bevenius
d1f114da61
vad : return early if no vad segments are detected (#3158)
This commit adds a check to `whisper_full_with_state` and if no VAD
segments are detected, the function will return early.

The motivation for this is that if no VAD segments are detected, the
function will not have any samples to process which can happen if an
audio sample does not contain any speech. I did not test this previously
and only discovered this when updating the stream example.
2025-05-16 08:50:53 +02:00
Daniel Bevenius
bae5d074c7
vad : store VAD context in whisper_state (#3156)
* vad : store VAD context in whisper_state

This commit stores the VAD context in the whisper_state structure,
allowing for better management and reuse of the VAD context across
multiple calls to the whisper_vad function.

The motivation for this change is that when updating the stream example
I noticed that the VAD context was being re-initialized every time the
whisper_vad function was called. This involved loading the VAD model
which is expensive and unnecessary if the context can be reused.

Storing this in the whisper_state seems follow the pattern simliar to
how whisper_coreml_context and whisper_openvion_context are stored.

* vad : free vad_context in whisper_free_state
2025-05-16 07:53:26 +02:00
Daniel Bevenius
20a20decd9
whisper : add build_*/ to .gitignore [no ci] (#3157)
This commit add `build_*/` to `.gitignore` to ignore all build
directories that start with `build_`.

The motivation for this is that the Go bindings creates a directory
named build_go, which is not ignored by the current .gitignore. I was
not sure if changing this to build-go could effect exising users so I
opted to update .gitignore instead.
2025-05-15 14:28:10 +02:00
Daniel Bevenius
f389d7e3e5
examples : add --print-confidence option to cli (#3150)
* examples : add --print-confidence option to cli

This commit adds a new command-line option `--print-confidence` to the
whisper-cli. When enabled, this option prints the confidence level of each
token in the transcribed text using ANSI formatting codes.

The confidence levels are represented using different styles:
```console
main: confidence: highlighted (low confidence), underlined (medium), dim (high confidence)
```

Refs: https://github.com/ggml-org/whisper.cpp/issues/3135
2025-05-14 19:21:48 +02:00
Daniel Bevenius
96d791ae61
vad : add download-vad-model scripts (#3149)
* vad : add download-vad-model scripts

This commit adds a script to download VAD models.

* vad : add vad model download script for windows [no ci]

Refs: https://github.com/ggml-org/whisper.cpp/issues/3146
2025-05-14 16:47:18 +02:00
Daniel Bevenius
3882a099e1
server : add --flash-attn usage output (#3152)
This commit adds the `--flash-attn` option to the usage output of the
server example.

The motivation for this change is that while it is possible to set this
option it is not printed in the usage output.
2025-05-14 15:22:05 +02:00
Georgi Gerganov
f890560575 talk-llama : sync llama.cpp
ggml-ci
2025-05-13 13:59:21 +03:00
Georgi Gerganov
a14c89aefa whisper : update to ggml-backend changes (#0)
ggml-ci
2025-05-13 13:59:21 +03:00
Georgi Gerganov
a6a956b36d sync : ggml
ggml-ci
2025-05-13 13:59:21 +03:00
Xuan-Son Nguyen
75e9a840c5 ggml : add mrope kernel for metal (llama/13457) 2025-05-13 13:59:21 +03:00
Georgi Gerganov
41ed62bdbc metal : optimize MoE for large batches (llama/13388) 2025-05-13 13:59:21 +03:00
lhez
029c8837f8 opencl: remove unnecessary assert for add (llama/13257) 2025-05-13 13:59:21 +03:00
Johannes Gäßler
5d8b068249 llama/ggml: add LLM training support (llama/10544)
* llama/ggml: add LLM training support

more compact progress bar

llama_save_model_to_file

llama_opt_param_filter

ggml_graph_dup force_grads

refactor ggml_opt, fix test-opt

* remove logits_all

* refactor CUDA implementation for ACC

* reset graph at beginning of opt period
2025-05-13 13:59:21 +03:00
Dan Johansson
93ef22657e ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel (llama/13053)
* ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel

Signed-off-by: Dan Johansson <dan.johansson@arm.com>

* * code review fixes

Signed-off-by: Dan Johansson <dan.johansson@arm.com>

* * adds a comment that clarifies barrier usage

Signed-off-by: Dan Johansson <dan.johansson@arm.com>

---------

Signed-off-by: Dan Johansson <dan.johansson@arm.com>
Co-authored-by: Charles Xu <charles.xu@arm.com>
2025-05-13 13:59:21 +03:00
Johannes Gäßler
866f685bbc CUDA: fix misaligned synchronization in FA (llama/13469) 2025-05-13 13:59:21 +03:00
Atharva Dubey
250bcc041a enable dpcpp nightly builds with libraries (llama/13406) 2025-05-13 13:59:21 +03:00
Johannes Gäßler
90b17a99bf CUDA: fix crash with partial offloading of MoE (llama/13439) 2025-05-13 13:59:21 +03:00
David Huang
e1b2ace0f8 Add --no-op-offload to improve -ot pp perf in MoE models like llama4 400B (llama/13386) 2025-05-13 13:59:21 +03:00
Johannes Gäßler
6db0e01db6 CUDA: fix race conditions FlashAttention kernels (llama/13438) 2025-05-13 13:59:21 +03:00
Johannes Gäßler
16f3546f38 CUDA: fix FlashAttention on Turing (llama/13415) 2025-05-13 13:59:21 +03:00
Jeff Bolz
a04b329ad1 vulkan: scalar flash attention implementation (llama/13324)
* vulkan: scalar flash attention implementation

* vulkan: always use fp32 for scalar flash attention

* vulkan: use vector loads in scalar flash attention shader

* vulkan: remove PV matrix, helps with register usage

* vulkan: reduce register usage in scalar FA, but perf may be slightly worse

* vulkan: load each Q value once. optimize O reduction. more tuning

* vulkan: support q4_0/q8_0 KV in scalar FA

* CI: increase timeout to accommodate newly-supported tests

* vulkan: for scalar FA, select between 1 and 8 rows

* vulkan: avoid using Float16 capability in scalar FA
2025-05-13 13:59:21 +03:00
Alberto Cabrera Pérez
45d8b2352e sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs (llama/12858)
* sycl : Implemented reorder Q4_0 mmvq

Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>

* sycl : Fixed mmvq being called when reorder is disabled

* sycl : Improved comments in the quants header

Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>

* Use static_assert

* safe_div -> ceil_div

* Clarify qi comment

* change the reorder tensor from init to execute OP

* dbg

* Undo changes to test-backend-ops

* Refactor changes on top of q4_0 reorder fix

* Missing Reverts

* Refactored opt_for_reorder logic to simplify code path

* Explicit inlining and unroll

* Renamed mul_mat_algo enum for consistency

---------

Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>
Co-authored-by: romain.biessy <romain.biessy@codeplay.com>
2025-05-13 13:59:21 +03:00
Johannes Gäßler
2d436bfbfb CUDA: FA support for Deepseek (Ampere or newer) (llama/13306)
* CUDA: FA support for Deepseek (Ampere or newer)

* do loop unrolling via C++ template
2025-05-13 13:59:21 +03:00
Johannes Gäßler
4b7cbb62ef CUDA: fix crash on large batch size for MoE models (llama/13384) 2025-05-13 13:59:21 +03:00
Radoslav Gerganov
e27c91f6d6 rpc : add rpc_msg_set_tensor_hash_req (llama/13353)
* rpc : add rpc_msg_set_tensor_hash_req

Use a dedicated struct for the request of RPC_CMD_SET_TENSOR_HASH which
makes the code cleaner.

* fix
2025-05-13 13:59:21 +03:00
Jeff Bolz
e46df4850f vulkan: Allow up to 4096 elements for mul_mat_id row_ids (llama/13326)
This assert fired running Qwen_Qwen3-30B-A3B-Q2_K.gguf:

GGML_ASSERT(nei0 * nei1 <= 3072);

The tensor is 8 x 512. Increase this array size to accommodate.
2025-05-13 13:59:21 +03:00
Alberto Cabrera Pérez
e8a7f1b7bb sycl: addressing non-contiguous src1 mul_mats (nc and batched) (llama/13343)
* sycl: fixed non-contiguous src1 mul_mats (nc and batched)

* Fixed wrong static_cast inside kernel
2025-05-13 13:59:21 +03:00
Daniel Bevenius
fbad8058c4
examples : add VAD speech segments example (#3147)
This commit adds an example that demonstrates how to use a VAD (Voice
Activity Detection) model to segment an audio file into speech segments.

Resolves: https://github.com/ggml-org/whisper.cpp/issues/3144
2025-05-13 12:31:00 +02:00
Daniel Bevenius
b2513a6208
vad : remove shortform for --vad option in cli.cpp (#3145)
This commit removes the shortform for the --vad option in cli.cpp.

The motivation for this is that `-v` is often used for verbose or
version is many tools and this might cause confusion.

Refs: https://github.com/ggml-org/whisper.cpp/pull/3065#issuecomment-2873243334
2025-05-13 06:04:05 +02:00
Tomer Schlesinger
587ea01f55
docs : update README.md for whisper.objc app (#2569) 2025-05-13 06:03:50 +02:00
Daniel Bevenius
e41bc5c61a
vad : add initial Voice Activity Detection (VAD) support (#3065)
* vad : add initial Voice Activity Detection (VAD) support

This commit add support for Voice Activity Detection (VAD). When enabled
this feature will process the audio input and detect speech segments.
This information is then used to reduce the number of samples that need
to be processed by whisper_full.

Resolves: https://github.com/ggml-org/whisper.cpp/issues/3003

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-05-12 16:10:11 +02:00
Daniel Bevenius
e39ba750cd
whisper : remove dummy commit comment [no ci] (#3143)
This commit removes a dummy comment that was add by
Commit(589b408 "ci : dummy commit to trigger CI").
2025-05-12 14:40:17 +02:00
Daniel Bevenius
db0fc9edc6
docs : fix -owts flag typo karaoke section [no ci] (#3142) 2025-05-12 10:56:39 +02:00
Daniel Bevenius
186855e38b
cli : print color scheme info for --print-colors (#3141)
This commit adds a description of the color scheme used in the CLI
when the --print-colors option is enabled.

The motivation for this is that it is not immediately clear what the
color scheme is when using the CLI with the --print-colors option.

Example output:
```console
$ ./build/bin/whisper-cli -f samples/jfk.wav --print-colors
...

main: color scheme: red (low confidence), yellow (medium), green (high confidence)

[00:00:00.000 --> 00:00:11.000]   And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.
```
The description will not be dispayed if the `--no-prints` options is
set.

Refs: https://github.com/ggml-org/whisper.cpp/issues/3135
2025-05-12 10:43:04 +02:00
Simon Booth
a513146102
docs : update Readme to recommend same Openvino as Python tools (#3138) 2025-05-12 09:06:51 +02:00
Daniel Bevenius
4730950492
examples : update link to Paul Tol's color scheme [no ci] (#3140)
This commit updates the link to Paul Tol's color scheme in the
`examples/common.h` file. The previous link was outdated and
pointed to a non-existent page.
2025-05-12 09:02:06 +02:00
KITAITI Makoto
9dd9685c79
ruby : test extra build options only when env var specified (#3136)
* Test Ruby bindings' extra options only when commanded

* ruby : test extra build options only when env var specified

* Fix extra_options

* Update gem date
2025-05-12 06:49:46 +02:00
Daniel Bevenius
2e310b841e
ruby : omit test_build_options locally (#3132)
This commit omits the test for `test_build_options` when run locally as
it currently fails on Linux and MacOS platforms.
`
The motivation for this change is that currently when running the tests
locally on a non-macOS platform the test fails with the following error:
```console
.F
========================================================================
Failure: test_build_options(TestPackage):
  <["ACCELERATE_FRAMEWORK",
   "CMAKE_OSX_ARCHITECTURES",
   "CMAKE_OSX_SYSROOT",
   "FOUNDATION_LIBRARY",
   "METALKIT_FRAMEWORK",
   "METAL_FRAMEWORK"]> was expected to be empty.
/home/danbev/work/ai/whisper.cpp/bindings/ruby/tests/test_package.rb:43:in `test_build_options'
     40:     options = BuildOptions::Options.new
     41:     assert_empty options.missing_options
     42:     unless ENV["CI"]
  => 43:       assert_empty options.extra_options
     44:     end
     45:   end
     46: end
========================================================================
```
2025-05-10 08:18:08 +02:00