mirror of
https://github.com/ggerganov/whisper.cpp.git
synced 2025-08-15 08:32:29 +02:00
.devops
.github
bindings
cmake
examples
addon.node
bench
bench.wasm
command
command.wasm
lsp
main
python
quantize
server
CMakeLists.txt
README.md
httplib.h
server.cpp
stream
stream.wasm
sycl
talk
talk-llama
talk.wasm
wchess
whisper.android
whisper.android.java
whisper.nvim
whisper.objc
whisper.swiftui
whisper.wasm
CMakeLists.txt
common-ggml.cpp
common-ggml.h
common-sdl.cpp
common-sdl.h
common.cpp
common.h
dr_wav.h
ffmpeg-transcode.cpp
generate-karaoke.sh
grammar-parser.cpp
grammar-parser.h
helpers.js
json.hpp
livestream.sh
twitch.sh
yt-wsp.sh
ggml
grammars
include
models
samples
scripts
spm-headers
src
tests
.gitignore
.gitmodules
AUTHORS
CMakeLists.txt
LICENSE
Makefile
Package.swift
README.md
README_sycl.md
whisper.cpp http server
Simple http server. WAV Files are passed to the inference model via http requests.
https://github.com/ggerganov/whisper.cpp/assets/1991296/e983ee53-8741-4eb5-9048-afe5e4594b8f
Usage
./server -h
usage: ./bin/server [options]
options:
-h, --help [default] show this help message and exit
-t N, --threads N [4 ] number of threads to use during computation
-p N, --processors N [1 ] number of processors to use during computation
-ot N, --offset-t N [0 ] time offset in milliseconds
-on N, --offset-n N [0 ] segment index offset
-d N, --duration N [0 ] duration of audio to process in milliseconds
-mc N, --max-context N [-1 ] maximum number of text context tokens to store
-ml N, --max-len N [0 ] maximum segment length in characters
-sow, --split-on-word [false ] split on word rather than on token
-bo N, --best-of N [2 ] number of best candidates to keep
-bs N, --beam-size N [-1 ] beam size for beam search
-wt N, --word-thold N [0.01 ] word timestamp probability threshold
-et N, --entropy-thold N [2.40 ] entropy threshold for decoder fail
-lpt N, --logprob-thold N [-1.00 ] log probability threshold for decoder fail
-debug, --debug-mode [false ] enable debug mode (eg. dump log_mel)
-tr, --translate [false ] translate from source language to english
-di, --diarize [false ] stereo audio diarization
-tdrz, --tinydiarize [false ] enable tinydiarize (requires a tdrz model)
-nf, --no-fallback [false ] do not use temperature fallback while decoding
-ps, --print-special [false ] print special tokens
-pc, --print-colors [false ] print colors
-pr, --print-realtime [false ] print output in realtime
-pp, --print-progress [false ] print progress
-nt, --no-timestamps [false ] do not print timestamps
-l LANG, --language LANG [en ] spoken language ('auto' for auto-detect)
-dl, --detect-language [false ] exit after automatically detecting language
--prompt PROMPT [ ] initial prompt
-m FNAME, --model FNAME [models/ggml-base.en.bin] model path
-oved D, --ov-e-device DNAME [CPU ] the OpenVINO device used for encode inference
--host HOST, [127.0.0.1] Hostname/ip-adress for the server
--port PORT, [8080 ] Port number for the server
--convert, [false ] Convert audio to WAV, requires ffmpeg on the server
Warning
Do not run the server example with administrative privileges and ensure it's operated in a sandbox environment, especially since it involves risky operations like accepting user file uploads and using ffmpeg for format conversions. Always validate and sanitize inputs to guard against potential security threats.
request examples
/inference
curl 127.0.0.1:8080/inference \
-H "Content-Type: multipart/form-data" \
-F file="@<file-path>" \
-F temperature="0.0" \
-F temperature_inc="0.2" \
-F response_format="json"
/load
curl 127.0.0.1:8080/load \
-H "Content-Type: multipart/form-data" \
-F model="<path-to-model-file>"