f19e23fbd1
I disabled this because there were many complaints about slow decoding. The current implementation does not allow batching the decoders when using the "best of" or "beam size" parameters, so the decoding time is proportional to the number of decoders, which is obviously not great. However, now there are even more complaints about wrong decodings and repetition. So, making a compromise by re-enabling the fallbacks, but defaulting to just 2 "best of" / "beam size" decoders. Also, the temperature step is increased from 0.2 to 0.4 - i.e. from maximum of 5 fallbacks to maximum of 2. Also, the stream example now has fallbacks enabled by default. close #471 #477 #508 #612 #719 #731 |
||
---|---|---|
.. | ||
CMakeLists.txt | ||
README.md | ||
stream.cpp |
stream
This is a naive example of performing real-time inference on audio from your microphone.
The stream
tool samples the audio every half a second and runs the transcription continously.
More info is available in issue #10.
./stream -m ./models/ggml-base.en.bin -t 8 --step 500 --length 5000
https://user-images.githubusercontent.com/1991296/194935793-76afede7-cfa8-48d8-a80f-28ba83be7d09.mp4
Sliding window mode with VAD
Setting the --step
argument to 0
enables the sliding window mode:
./stream -m ./models/ggml-small.en.bin -t 6 --step 0 --length 30000 -vth 0.6
In this mode, the tool will transcribe only after some speech activity is detected. A very
basic VAD detector is used, but in theory a more sophisticated approach can be added. The
-vth
argument determines the VAD threshold - higher values will make it detect silence more often.
It's best to tune it to the specific use case, but a value around 0.6
should be OK in general.
When silence is detected, it will transcribe the last --length
milliseconds of audio and output
a transcription block that is suitable for parsing.
Building
The stream
tool depends on SDL2 library to capture audio from the microphone. You can build it like this:
# Install SDL2 on Linux
sudo apt-get install libsdl2-dev
# Install SDL2 on Mac OS
brew install sdl2
make stream
Web version
This tool can also run in the browser: examples/stream.wasm