mirror of
https://github.com/ggerganov/whisper.cpp.git
synced 2024-11-07 08:34:37 +01:00
f18738f247
* talk-llama: pass file instead of arg it is too hard to quote text in a portable way * talk-llama: pass heard_ok as a file * talk-llama: let eleven-labs.py accept options Options: -v voice, -s savefile, -p (--play) * talk-llama: check installed commands in "speak" Pass "-q" to eleven-labs.py to skip checking whether elevenlabs is installed * talk-llama: pass voice_id again in order to sync talk with talk-llama * talk: sync with talk-llama Passing text_to_speak as a file is safer and more portable cf. https://stackoverflow.com/a/59036879/45375 * talk and talk-llama: get all installed voices in speak.ps1 * talk and talk-llama: get voices from api * talk and talk-llama: add more options to eleven-labs.py and remove DEFAULT_VOICE because it is deprecated (https://www.reddit.com/r/ElevenLabs/comments/1830abt/what_happened_to_bella/) ``` usage: eleven-labs.py [-q] [-l] [-h] [-n NAME | -v NUMBER] [-f KEY=VAL] [-s FILE | -p] [TEXTFILE] options: -q, --quick skip checking the required library action: TEXTFILE read the text file (default: stdin) -l, --list show the list of voices and exit -h, --help show this help and exit voice selection: -n NAME, --name NAME get a voice object by name (default: Arnold) -v NUMBER, --voice NUMBER get a voice object by number (see --list) -f KEY=VAL, --filter KEY=VAL filter voices by labels (default: "use case=narration") this option can be used multiple times filtering will be disabled if the first -f has no "=" (e.g. -f "any") output: -s FILE, --save FILE save the TTS to a file (default: audio.mp3) -p, --play play the TTS with ffplay ``` * examples: add speak_with_file() as suggested in the review * talk and talk-llama: ignore to_speak.txt |
||
---|---|---|
.. | ||
.gitignore | ||
CMakeLists.txt | ||
eleven-labs.py | ||
gpt-2.cpp | ||
gpt-2.h | ||
README.md | ||
speak | ||
speak.bat | ||
speak.ps1 | ||
talk.cpp |
talk
Talk with an Artificial Intelligence in your terminal
Web version: examples/talk.wasm
Building
The talk
tool depends on SDL2 library to capture audio from the microphone. You can build it like this:
# Install SDL2 on Linux
sudo apt-get install libsdl2-dev
# Install SDL2 on Mac OS
brew install sdl2
# Build the "talk" executable
make talk
# Run it
./talk -p Santa
GPT-2
To run this, you will need a ggml GPT-2 model: instructions
Alternatively, you can simply download the smallest ggml GPT-2 117M model (240 MB) like this:
wget --quiet --show-progress -O models/ggml-gpt-2-117M.bin https://huggingface.co/ggerganov/ggml/resolve/main/ggml-model-gpt-2-117M.bin
TTS
For best experience, this example needs a TTS tool to convert the generated text responses to voice.
You can use any TTS engine that you would like - simply edit the speak script to your needs.
By default, it is configured to use MacOS's say
or espeak
or Windows SpeechSynthesizer, but you can use whatever you wish.