Daniel Bevenius 73a8c5fb94
whisper : remove whisper_load_backends function (#3196)
* whisper : remove whisper_load_backends function

This commit removes the `whisper_load_backends` function, which was used
to load all GGML backends.

The motivation for this change push the responsibility of loading
backends to user applications to give them more control over which
backends to load and when. See the references below for more context.

Resolves: https://github.com/ggml-org/whisper.cpp/issues/3182
Refs: https://github.com/ggml-org/whisper.cpp/pull/3042#issuecomment-2801778733
Refs: https://github.com/ggml-org/whisper.cpp/pull/3042#issuecomment-2801928990

* ruby : add check for rwc is NULL

This commit adds a check to ensure that the `rwc` pointer is not NULL
before attempting to mark its members in the garbage collector.

The motivation for this is an attempt to see if this fixed the CI build
as I'm not able to reproduce the issue locally.

Refs: https://github.com/ggml-org/whisper.cpp/actions/runs/15299612277/job/43036694928?pr=3196
2025-05-29 08:03:17 +02:00
..

whisper.cpp/examples/stream

This is a naive example of performing real-time inference on audio from your microphone. The whisper-stream tool samples the audio every half a second and runs the transcription continously. More info is available in issue #10.

./build/bin/whisper-stream -m ./models/ggml-base.en.bin -t 8 --step 500 --length 5000

https://user-images.githubusercontent.com/1991296/194935793-76afede7-cfa8-48d8-a80f-28ba83be7d09.mp4

Sliding window mode with VAD

Setting the --step argument to 0 enables the sliding window mode:

 ./build/bin/whisper-stream -m ./models/ggml-base.en.bin -t 6 --step 0 --length 30000 -vth 0.6

In this mode, the tool will transcribe only after some speech activity is detected. A very basic VAD detector is used, but in theory a more sophisticated approach can be added. The -vth argument determines the VAD threshold - higher values will make it detect silence more often. It's best to tune it to the specific use case, but a value around 0.6 should be OK in general. When silence is detected, it will transcribe the last --length milliseconds of audio and output a transcription block that is suitable for parsing.

Building

The whisper-stream tool depends on SDL2 library to capture audio from the microphone. You can build it like this:

# Install SDL2
# On Debian based linux distributions:
sudo apt-get install libsdl2-dev

# On Fedora Linux:
sudo dnf install SDL2 SDL2-devel

# Install SDL2 on Mac OS
brew install sdl2

cmake -B build -DWHISPER_SDL2=ON
cmake --build build --config Release

./build/bin/whisper-stream

Web version

This tool can also run in the browser: examples/stream.wasm