# whisper.cpp/examples/vad-speech-segments

This examples demonstrates how to use a VAD (Voice Activity Detection) model to
segment an audio file into speech segments.

### Building the example
The example can be built using the following command:
```console
cmake -S . -B build
cmake --build build -j8 --target vad-speech-segments
```

### Running the example
The examples can be run using the following command, which uses a model
that we use internally for testing:
```console
./build/bin/vad-speech-segments \
    -vad-model models/for-tests-silero-v5.1.2-ggml.bin \
    --file samples/jfk.wav \
    --no-prints

Detected 5 speech segments:
Speech segment 0: start = 0.29, end = 2.21
Speech segment 1: start = 3.30, end = 3.77
Speech segment 2: start = 4.00, end = 4.35
Speech segment 3: start = 5.38, end = 7.65
Speech segment 4: start = 8.16, end = 10.59
```
To see more output from whisper.cpp remove the `--no-prints` argument.


### Command line options
```console
./build/bin/vad-speech-segments --help

usage: ./build/bin/vad-speech-segments [options] file
supported audio formats: flac, mp3, ogg, wav

options:
  -h,        --help                          [default] show this help message and exit
  -f FNAME,  --file FNAME                    [       ] input audio file path
  -t N,      --threads N                     [4      ] number of threads to use during computation
  -ug,       --use-gpu                       [true   ] use GPU
  -vm FNAME, --vad-model FNAME               [       ] VAD model path
  -vt N,     --vad-threshold N               [0.50   ] VAD threshold for speech recognition
  -vspd N,   --vad-min-speech-duration-ms  N [250    ] VAD min speech duration (0.0-1.0)
  -vsd N,    --vad-min-silence-duration-ms N [100    ] VAD min silence duration (to split segments)
  -vmsd N,   --vad-max-speech-duration-s   N [FLT_MAX] VAD max speech duration (auto-split longer)
  -vp N,     --vad-speech-pad-ms           N [30     ] VAD speech padding (extend segments)
  -vo N,     --vad-samples-overlap         N [0.10   ] VAD samples overlap (seconds between segments)
  -np,       --no-prints                     [false  ] do not print anything other than the results
```