* whisper : remove whisper_load_backends function This commit removes the `whisper_load_backends` function, which was used to load all GGML backends. The motivation for this change push the responsibility of loading backends to user applications to give them more control over which backends to load and when. See the references below for more context. Resolves: https://github.com/ggml-org/whisper.cpp/issues/3182 Refs: https://github.com/ggml-org/whisper.cpp/pull/3042#issuecomment-2801778733 Refs: https://github.com/ggml-org/whisper.cpp/pull/3042#issuecomment-2801928990 * ruby : add check for rwc is NULL This commit adds a check to ensure that the `rwc` pointer is not NULL before attempting to mark its members in the garbage collector. The motivation for this is an attempt to see if this fixed the CI build as I'm not able to reproduce the issue locally. Refs: https://github.com/ggml-org/whisper.cpp/actions/runs/15299612277/job/43036694928?pr=3196
whisper.cpp/examples/stream
This is a naive example of performing real-time inference on audio from your microphone.
The whisper-stream
tool samples the audio every half a second and runs the transcription continously.
More info is available in issue #10.
./build/bin/whisper-stream -m ./models/ggml-base.en.bin -t 8 --step 500 --length 5000
https://user-images.githubusercontent.com/1991296/194935793-76afede7-cfa8-48d8-a80f-28ba83be7d09.mp4
Sliding window mode with VAD
Setting the --step
argument to 0
enables the sliding window mode:
./build/bin/whisper-stream -m ./models/ggml-base.en.bin -t 6 --step 0 --length 30000 -vth 0.6
In this mode, the tool will transcribe only after some speech activity is detected. A very
basic VAD detector is used, but in theory a more sophisticated approach can be added. The
-vth
argument determines the VAD threshold - higher values will make it detect silence more often.
It's best to tune it to the specific use case, but a value around 0.6
should be OK in general.
When silence is detected, it will transcribe the last --length
milliseconds of audio and output
a transcription block that is suitable for parsing.
Building
The whisper-stream
tool depends on SDL2 library to capture audio from the microphone. You can build it like this:
# Install SDL2
# On Debian based linux distributions:
sudo apt-get install libsdl2-dev
# On Fedora Linux:
sudo dnf install SDL2 SDL2-devel
# Install SDL2 on Mac OS
brew install sdl2
cmake -B build -DWHISPER_SDL2=ON
cmake --build build --config Release
./build/bin/whisper-stream
Web version
This tool can also run in the browser: examples/stream.wasm