mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-07-09 09:07:08 +02:00

Files

Lin Xiaodong d9999d54c8 feat: support vad for addon.node (#3301 )

Co-authored-by: linxiaodong <calm.lin@wukongsch.com>

2025-07-02 13:14:29 +03:00

__test__

feat: support vad for addon.node (#3301 )

2025-07-02 13:14:29 +03:00

.gitignore

addon.node : using whisper as a Node.js addon (#443 )

2023-02-04 09:10:25 +02:00

addon.cpp

feat: support vad for addon.node (#3301 )

2025-07-02 13:14:29 +03:00

CMakeLists.txt

examples : fix node compilation (#2115 )

2024-05-02 22:52:55 +01:00

index.js

node : add language detection support (#3190 )

2025-06-02 14:58:05 +02:00

package.json

examples : fix node compilation (#2115 )

2024-05-02 22:52:55 +01:00

README.md

feat: support vad for addon.node (#3301 )

2025-07-02 13:14:29 +03:00

vad-example.js

feat: support vad for addon.node (#3301 )

2025-07-02 13:14:29 +03:00

README.md

whisper.cpp Node.js addon

This is an addon demo that can perform whisper model reasoning in node and electron environments, based on cmake-js. It can be used as a reference for using the whisper.cpp project in other node projects.

This addon now supports Voice Activity Detection (VAD) for improved transcription performance.

Install

npm install

Compile

Make sure it is in the project root directory and compiled with make-js.

npx cmake-js compile -T addon.node -B Release

For Electron addon and cmake-js options, you can see cmake-js and make very few configuration changes.

Such as appointing special cmake path:
npx cmake-js compile -c 'xxx/cmake' -T addon.node -B Release

Run

Basic Usage

cd examples/addon.node

node index.js --language='language' --model='model-path' --fname_inp='file-path'

VAD (Voice Activity Detection) Usage

Run the VAD example with performance comparison:

node vad-example.js

Voice Activity Detection (VAD) Support

VAD can significantly improve transcription performance by only processing speech segments, which is especially beneficial for audio files with long periods of silence.

VAD Model Setup

Before using VAD, download a VAD model:

# From the whisper.cpp root directory
./models/download-vad-model.sh silero-v5.1.2

VAD Parameters

All VAD parameters are optional and have sensible defaults:

vad: Enable VAD (default: false)
vad_model: Path to VAD model file (required when VAD enabled)
vad_threshold: Speech detection threshold 0.0-1.0 (default: 0.5)
vad_min_speech_duration_ms: Min speech duration in ms (default: 250)
vad_min_silence_duration_ms: Min silence duration in ms (default: 100)
vad_max_speech_duration_s: Max speech duration in seconds (default: FLT_MAX)
vad_speech_pad_ms: Speech padding in ms (default: 30)
vad_samples_overlap: Sample overlap 0.0-1.0 (default: 0.1)

JavaScript API Example

const path = require("path");
const { whisper } = require(path.join(__dirname, "../../build/Release/addon.node"));
const { promisify } = require("util");

const whisperAsync = promisify(whisper);

// With VAD enabled
const vadParams = {
  language: "en",
  model: path.join(__dirname, "../../models/ggml-base.en.bin"),
  fname_inp: path.join(__dirname, "../../samples/jfk.wav"),
  vad: true,
  vad_model: path.join(__dirname, "../../models/ggml-silero-v5.1.2.bin"),
  vad_threshold: 0.5,
  progress_callback: (progress) => console.log(`Progress: ${progress}%`)
};

whisperAsync(vadParams).then(result => console.log(result));

Supported Parameters

Both traditional whisper.cpp parameters and new VAD parameters are supported:

language: Language code (e.g., "en", "es", "fr")
model: Path to whisper model file
fname_inp: Path to input audio file
use_gpu: Enable GPU acceleration (default: true)
flash_attn: Enable flash attention (default: false)
no_prints: Disable console output (default: false)
no_timestamps: Disable timestamps (default: false)
detect_language: Auto-detect language (default: false)
audio_ctx: Audio context size (default: 0)
max_len: Maximum segment length (default: 0)
max_context: Maximum context size (default: -1)
prompt: Initial prompt for decoder
comma_in_time: Use comma in timestamps (default: true)
print_progress: Print progress info (default: false)
progress_callback: Progress callback function
VAD parameters (see above section)