Files

Georgi Gerganov 83c742f1a7 whisper : add option to speed up the audio tempo by x2

Using a Phase Vocoder for speeding up the audio tempo by scaling down
the frequencies in the frequency domain.

This reduces the computation in the Encoder by a factor of 2.
The transcription accuracy is degraded, but for slow to normal speech -
it seems to be still very good.

I think this can find application for real-time transcription - i.e. the
"stream" example.

2022-11-13 16:25:43 +02:00

CMakeLists.txt

refactoring : move main + stream in examples + other stuff

2022-10-25 20:53:48 +03:00

README.md

Update README.md

2022-10-25 20:53:48 +03:00

stream.cpp

whisper : add option to speed up the audio tempo by x2

2022-11-13 16:25:43 +02:00

README.md

stream

This is a naive example of performing real-time inference on audio from your microphone. The stream tool samples the audio every half a second and runs the transcription continously. More info is available in issue #10.

./stream -m ./models/ggml-base.en.bin -t 8 --step 500 --length 5000

https://user-images.githubusercontent.com/1991296/194935793-76afede7-cfa8-48d8-a80f-28ba83be7d09.mp4

The stream tool depends on SDL2 library to capture audio from the microphone. You can build it like this:

# Install SDL2 on Linux
sudo apt-get install libsdl2-dev

# Install SDL2 on Mac OS
brew install sdl2

make stream