* whisper : add version function
This commit adds a version function to the whisper API.
The motivation for this is that it might be convenient to have a way to
programmatically check the version.
Example usage:
```c++
printf("Using whisper version: %s\n", whisper_version());
```
Will output:
```console
Using whisper version: 1.7.6
```
* examples : add version to android example CMakeLists.txt
* stream : add nullptr check of whisper_context
This commit adds a check to ensure that the `whisper_context` is not
null after initialization.
The motivation for this is that currently, if the initialization fails,
the program continues to run leading to a segmentation fault. This sort
of check is performed by others examples like whisper-cli.
Refs: https://github.com/ggml-org/whisper.cpp/issues/3280#issuecomment-3003778035
* examples : add nullptr check for whisper_context
* android : update CMakeLists.txt to use FetchContent for ggml
This commit updates the CMakeLists.txt file for the Android Whisper
example to use FetchContent for managing the ggml library.
The motivation for this change is avoid having to make manual changes to
the CMakeLists.txt file after syncing the ggml library.
I've built and run the example locally to verify that it works as
expected.
Refs: https://github.com/ggml-org/whisper.cpp/pull/3265#issuecomment-2986715717
* android.java : update cmake to use FetchContent for ggml
This commit updates the CMake configuration for the Android Java example
to use `FetchContent` for including the `ggml` library. Do be able to
use FetchContent we also update the `compileSdkVersion` and
`targetSdkVersion` to 31, and the `buildToolsVersion` to '30.0.3'.
This also required a an update to the Gradle plugin version to 7.4.0.
The motivation for this change is avoid having to make manual changes to
the CMakeLists.txt file after syncing the ggml library.
This commit adds a conversion from stereo to mono in the
`read_audio_data` function of `common-whisper.cpp`.
The motivation for this change is prior to Commit
7d3da68f79 ("examples : use miniaudio for
direct decoding flac, mp3, ogg and wav (#2759)", there was a step that
read stereo int16 data -> pcm16 (448512 samples), and then converted to
mono (224256 samples), and then also convert to stereo in `pcmf32s.
The middle step here seems to have been missed when rewriting the code to
use Miniaudio and caused issues then transcribing stereo audio files.
For example, currently using the audio sample in the linked issue the
output is:
```console
[00:00:00.000 --> 00:00:03.000] (speaker 1) Sous-titres réalisés para la communauté d'Amara.org
```
And with the change in this commit the output is:
```
[00:00:00.000 --> 00:00:01.500] (speaker 1) *sonnerie de téléphone*
[00:00:01.500 --> 00:00:07.000] (speaker 1) Salut jeune homme !
[00:00:07.000 --> 00:00:08.500] (speaker 0) C'est vrai que je te dérange ?
[00:00:08.500 --> 00:00:10.500] (speaker 1) Ah pas du tout, pas du tout, pas du tout !
[00:00:10.500 --> 00:00:12.500] (speaker 1) J'étais en train de...
[00:00:12.500 --> 00:00:14.500] (speaker 1) de préparer un courrier
```
Resolves: https://github.com/ggml-org/whisper.cpp/issues/3092
* server : add Voice Activity Detection (VAD) support
This commit adds support for Voice Activity Detection (VAD) in the
server example.
The motivation for this is to enable VAD processing when using
whisper-server.
Resolves: https://github.com/ggml-org/whisper.cpp/issues/3089
* server : add VAD parameters to usage in README.md [no ci]
This commit also adds a few missing parameters.
* server : fix conflicting short options [no ci]
This commit add support for language detection in the Whisper Node.js
addon example. It also updates the node addon to return an object
instead of an array as the results.
The motivation for this change is to enable the inclusion of the
detected language in the result, in addition to the transcription
segments.
For example, when using the `detect_language` option, the result will
now be:
```console
{ language: 'en' }
```
And if the `language` option is set to "auto", it will also return:
```console
{
language: 'en',
transcription: [
[
'00:00:00.000',
'00:00:07.600',
' And so my fellow Americans, ask not what your country can do for you,'
],
[
'00:00:07.600',
'00:00:10.600',
' ask what you can do for your country.'
]
]
}
```
This commit enable the node addon to suppress all output, even the
result of the transcription if the no_prints parameter is set to true.
The motivation for this is that for the node addon there is a
fullfilment handler/success callback to process the transcription
result. And it might be useful to be able to disable the printing of
the transcription result to the console, so that the user can handle
the result in their own way.
Refs: https://github.com/ggml-org/whisper.cpp/issues/3176
Quick fix for not removing swedish umlauts.
* Update talk-llama.cpp
Expose model inference settings to user instead of hard coding them. Same defaults as previous defaults.
* Update examples/talk-llama/talk-llama.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* examples : add --print-confidence option to cli
This commit adds a new command-line option `--print-confidence` to the
whisper-cli. When enabled, this option prints the confidence level of each
token in the transcribed text using ANSI formatting codes.
The confidence levels are represented using different styles:
```console
main: confidence: highlighted (low confidence), underlined (medium), dim (high confidence)
```
Refs: https://github.com/ggml-org/whisper.cpp/issues/3135
This commit adds the `--flash-attn` option to the usage output of the
server example.
The motivation for this change is that while it is possible to set this
option it is not printed in the usage output.
This commit adds an example that demonstrates how to use a VAD (Voice
Activity Detection) model to segment an audio file into speech segments.
Resolves: https://github.com/ggml-org/whisper.cpp/issues/3144
* vad : add initial Voice Activity Detection (VAD) support
This commit add support for Voice Activity Detection (VAD). When enabled
this feature will process the audio input and detect speech segments.
This information is then used to reduce the number of samples that need
to be processed by whisper_full.
Resolves: https://github.com/ggml-org/whisper.cpp/issues/3003
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
This commit adds a description of the color scheme used in the CLI
when the --print-colors option is enabled.
The motivation for this is that it is not immediately clear what the
color scheme is when using the CLI with the --print-colors option.
Example output:
```console
$ ./build/bin/whisper-cli -f samples/jfk.wav --print-colors
...
main: color scheme: red (low confidence), yellow (medium), green (high confidence)
[00:00:00.000 --> 00:00:11.000] And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.
```
The description will not be dispayed if the `--no-prints` options is
set.
Refs: https://github.com/ggml-org/whisper.cpp/issues/3135
This commit updates the link to Paul Tol's color scheme in the
`examples/common.h` file. The previous link was outdated and
pointed to a non-existent page.
This commit adds HEAPU8 to the list of exported methods.
The motivation for this commit is that currently this is causing an error on Window systems where HEAPU8 in undefined, which results in the following error message in the web console:
main.js:1 Uncaught TypeError:
Cannot read properties of undefined (reading 'buffer') at __emval_get_property
(main.js:1:1363125) at 003a453a:0xc4a47 at 003a453a:0xc51cd at
Object.full_default (eval at craftInvokerFunction (main.js:1:1347011),
<anonymous>:9:10) at whisper.cpp/:647:42
danbev originally fixed this for whisper.wasm, stream.wasm, and command.stream, but the issue still exists on the other examples which I patch in this code.
Resolves: #3059
This commit updates the documentation for the WASM examples to include a
note about the generation of the `worker.js` file. As of Emscripten
3.1.58 (April 2024), separate worker.js files are no longer generated
and the worker is embedded in the main JS file.
The motivation for this change is to inform users about the new behavior
of Emscripten and why the `worker.js` file may not be present.
Refs: https://github.com/ggml-org/whisper.cpp/issues/3123
* stream.wasm : add HEAPU8 to exported runtime methods
This commit adds HEAPU8 to the list of exported methods for stream.wasm.
The motivation for this is that without it HEAPUD8 will be undefined
and when its 'buffer' attribute is accessed this will cause error as
reported in the referenced issue.
Note that to test this make sure that the web browsers caches is cleared
first.
Resolves: https://github.com/ggml-org/whisper.cpp/issues/3123
* command.wasm : add HEAPU8 to exported runtime methods
* ggml : remove MSVC warnings pragmas
This commit removes the MSVC-specific pragmas as these are now handled
in CMakeLists.txt.
* whisper : remove MSVC warning pragmas
This commit removes the MSVC-specific pragmas. These are now handled in
the CMakeLists.txt file.
This changes examples/cli/cli.cpp to be like
examples/common-whisper.cpp. "-of -" can be specified (or this can be
inferred from "-" as the input file) to output to stdout. This is useful
for piping to other applications.
Log fname_out consistently when not stdout
- Terminals have stdout=stderr, so remove the message before
successful output to ease copying
- Don't affect actual error messages
- Move opening the ofstream into the factory, fixing missing
open and/or error messages in output_score/output_wts
- Fix struct naming convention
Closes#3048
* docs : Update cli documentation
This updates the documentation of cli based on the actual output
In the longterm this should ideally be auto generated to prevent mismatch
* docs : Update cli documentation
This updates the documentation of cli based on the actual output
In the longterm this should ideally be auto generated to prevent mismatch
This commit adds the the command line option `--no-gpu` to the server
examples print usage function.
The motivation for this is that this options is available and can be set
but it is not displayed in the usage message.
Refs: https://github.com/ggml-org/whisper.cpp/issues/3095
This commit adds `HEAPU8` to the list of exported methods.
The motivation for this commit is that currently this is causing an
error on Window systems where HEAPU8 in undefined, which results in the
following error message in the web console:
```console
main.js:1 Uncaught TypeError:
Cannot read properties of undefined (reading 'buffer') at __emval_get_property
(main.js:1:1363125) at 003a453a:0xc4a47 at 003a453a:0xc51cd at
Object.full_default (eval at craftInvokerFunction (main.js:1:1347011),
<anonymous>:9:10) at whisper.cpp/:647:42
```
Resolves: https://github.com/ggml-org/whisper.cpp/issues/3059
FFmpeg introduced a new channel layout API that uses `AVChannelLayout`
interface in v6.0. It subsequently dropped the old bitmask-based API
in v7.0.
This updates decode_audio() to support the new channel layout API,
so that we can compile `whisper-cli` and `whisper-server` with FFmpeg
v7.0 or later.
Tested on on Ubuntu 24.10 with FFmpeg v7.0.2.
Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net>