forked from extern/whisper.cpp
83c742f1a7
Using a Phase Vocoder for speeding up the audio tempo by scaling down the frequencies in the frequency domain. This reduces the computation in the Encoder by a factor of 2. The transcription accuracy is degraded, but for slow to normal speech - it seems to be still very good. I think this can find application for real-time transcription - i.e. the "stream" example. |
||
---|---|---|
.. | ||
CMakeLists.txt | ||
README.md | ||
stream.cpp |
stream
This is a naive example of performing real-time inference on audio from your microphone.
The stream
tool samples the audio every half a second and runs the transcription continously.
More info is available in issue #10.
./stream -m ./models/ggml-base.en.bin -t 8 --step 500 --length 5000
https://user-images.githubusercontent.com/1991296/194935793-76afede7-cfa8-48d8-a80f-28ba83be7d09.mp4
The stream
tool depends on SDL2 library to capture audio from the microphone. You can build it like this:
# Install SDL2 on Linux
sudo apt-get install libsdl2-dev
# Install SDL2 on Mac OS
brew install sdl2
make stream