mirror of
https://github.com/ggerganov/whisper.cpp.git
synced 2025-06-01 07:25:49 +02:00
* whisper : remove whisper_load_backends function This commit removes the `whisper_load_backends` function, which was used to load all GGML backends. The motivation for this change push the responsibility of loading backends to user applications to give them more control over which backends to load and when. See the references below for more context. Resolves: https://github.com/ggml-org/whisper.cpp/issues/3182 Refs: https://github.com/ggml-org/whisper.cpp/pull/3042#issuecomment-2801778733 Refs: https://github.com/ggml-org/whisper.cpp/pull/3042#issuecomment-2801928990 * ruby : add check for rwc is NULL This commit adds a check to ensure that the `rwc` pointer is not NULL before attempting to mark its members in the garbage collector. The motivation for this is an attempt to see if this fixed the CI build as I'm not able to reproduce the issue locally. Refs: https://github.com/ggml-org/whisper.cpp/actions/runs/15299612277/job/43036694928?pr=3196
whisper.cpp/examples/vad-speech-segments
This examples demonstrates how to use a VAD (Voice Activity Detection) model to segment an audio file into speech segments.
Building the example
The example can be built using the following command:
cmake -S . -B build
cmake --build build -j8 --target vad-speech-segments
Running the example
The examples can be run using the following command, which uses a model that we use internally for testing:
./build/bin/vad-speech-segments \
-vad-model models/for-tests-silero-v5.1.2-ggml.bin \
--file samples/jfk.wav \
--no-prints
Detected 5 speech segments:
Speech segment 0: start = 0.29, end = 2.21
Speech segment 1: start = 3.30, end = 3.77
Speech segment 2: start = 4.00, end = 4.35
Speech segment 3: start = 5.38, end = 7.65
Speech segment 4: start = 8.16, end = 10.59
To see more output from whisper.cpp remove the --no-prints
argument.
Command line options
./build/bin/vad-speech-segments --help
usage: ./build/bin/vad-speech-segments [options] file
supported audio formats: flac, mp3, ogg, wav
options:
-h, --help [default] show this help message and exit
-f FNAME, --file FNAME [ ] input audio file path
-t N, --threads N [4 ] number of threads to use during computation
-ug, --use-gpu [true ] use GPU
-vm FNAME, --vad-model FNAME [ ] VAD model path
-vt N, --vad-threshold N [0.50 ] VAD threshold for speech recognition
-vspd N, --vad-min-speech-duration-ms N [250 ] VAD min speech duration (0.0-1.0)
-vsd N, --vad-min-silence-duration-ms N [100 ] VAD min silence duration (to split segments)
-vmsd N, --vad-max-speech-duration-s N [FLT_MAX] VAD max speech duration (auto-split longer)
-vp N, --vad-speech-pad-ms N [30 ] VAD speech padding (extend segments)
-vo N, --vad-samples-overlap N [0.10 ] VAD samples overlap (seconds between segments)
-np, --no-prints [false ] do not print anything other than the results