1
0
mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-07-26 17:24:51 +02:00
Files
.devops
.github
bindings
cmake
coreml
examples
addon.node
bench
bench.wasm
command
CMakeLists.txt
README.md
command.cpp
commands.txt
command.wasm
lsp
main
python
quantize
server
stream
stream.wasm
sycl
talk
talk-llama
talk.wasm
wchess
whisper.android
whisper.android.java
whisper.nvim
whisper.objc
whisper.swiftui
whisper.wasm
CMakeLists.txt
common-ggml.cpp
common-ggml.h
common-sdl.cpp
common-sdl.h
common.cpp
common.h
dr_wav.h
ffmpeg-transcode.cpp
generate-karaoke.sh
grammar-parser.cpp
grammar-parser.h
helpers.js
json.hpp
livestream.sh
twitch.sh
yt-wsp.sh
ggml-cuda
grammars
models
openvino
samples
scripts
spm-headers
tests
.gitignore
.gitmodules
AUTHORS
CMakeLists.txt
LICENSE
Makefile
Package.swift
README.md
README_sycl.md
ggml-alloc.c
ggml-alloc.h
ggml-backend-impl.h
ggml-backend.c
ggml-backend.h
ggml-common.h
ggml-cuda.cu
ggml-cuda.h
ggml-impl.h
ggml-kompute.cpp
ggml-kompute.h
ggml-metal.h
ggml-metal.m
ggml-metal.metal
ggml-opencl.cpp
ggml-opencl.h
ggml-quants.c
ggml-quants.h
ggml-rpc.cpp
ggml-rpc.h
ggml-sycl.cpp
ggml-sycl.h
ggml-vulkan.cpp
ggml-vulkan.h
ggml.c
ggml.h
whisper-mel-cuda.cu
whisper-mel-cuda.hpp
whisper-mel.hpp
whisper.cpp
whisper.h
whisper.cpp/examples/command
Borislav Stanimirov af5833e298 whisper : remove speed_up and phase_vocoder* functions ()
* whisper : fix cast warning

* whisper : remove phase_vocoder functions, ref 

* whisper : remove speed_up from whisper_full_params, closes 
2024-05-31 11:37:29 +03:00
..
2022-12-16 19:38:18 +02:00

command

This is a basic Voice Assistant example that accepts voice commands from the microphone. More info is available in issue #171.

# Run with default arguments and small model
./command -m ./models/ggml-small.en.bin -t 8

# On Raspberry Pi, use tiny or base models + "-ac 768" for better performance
./command -m ./models/ggml-tiny.en.bin -ac 768 -t 3 -c 0

https://user-images.githubusercontent.com/1991296/204038393-2f846eae-c255-4099-a76d-5735c25c49da.mp4

Web version: examples/command.wasm

Guided mode

"Guided mode" allows you to specify a list of commands (i.e. strings) and the transcription will be guided to classify your command into one from the list. This can be useful in situations where a device is listening only for a small subset of commands.

Initial tests show that this approach might be extremely efficient in terms of performance, since it integrates very well with the "partial Encoder" idea from #137.

# Run in guided mode, the list of allowed commands is in commands.txt
./command -m ./models/ggml-base.en.bin -cmd ./examples/command/commands.txt

# On Raspberry Pi, in guided mode you can use "-ac 128" for extra performance
./command -m ./models/ggml-tiny.en.bin -cmd ./examples/command/commands.txt -ac 128 -t 3 -c 0

https://user-images.githubusercontent.com/1991296/207435352-8fc4ed3f-bde5-4555-9b8b-aeeb76bee969.mp4

Building

The command tool depends on SDL2 library to capture audio from the microphone. You can build it like this:

# Install SDL2
# On Debian based linux distributions:
sudo apt-get install libsdl2-dev

# On Fedora Linux:
sudo dnf install SDL2 SDL2-devel

# Install SDL2 on Mac OS
brew install sdl2

make command