mirror of
https://github.com/ggerganov/whisper.cpp.git
synced 2025-08-14 02:19:06 +02:00
Compare commits
3 Commits
gg/whisper
...
sync-ggml-
Author | SHA1 | Date | |
---|---|---|---|
e400aeb770 | |||
cb9a21b957 | |||
dacb7caed6 |
@ -1,6 +1,6 @@
|
||||
cmake_minimum_required(VERSION 3.5) # for add_link_options and implicit target directories.
|
||||
project("whisper.cpp" C CXX)
|
||||
project("whisper.cpp" VERSION 1.7.5)
|
||||
project("whisper.cpp" VERSION 1.7.4)
|
||||
include(CheckIncludeFileCXX)
|
||||
|
||||
set(SOVERSION 1)
|
||||
|
94
README.md
94
README.md
@ -2,12 +2,15 @@
|
||||
|
||||

|
||||
|
||||
[](https://github.com/ggml-org/whisper.cpp/actions)
|
||||
[](https://github.com/ggerganov/whisper.cpp/actions)
|
||||
[](https://opensource.org/licenses/MIT)
|
||||
[](https://conan.io/center/whisper-cpp)
|
||||
[](https://www.npmjs.com/package/whisper.cpp/)
|
||||
|
||||
Stable: [v1.7.5](https://github.com/ggml-org/whisper.cpp/releases/tag/v1.7.5) / [Roadmap](https://github.com/orgs/ggml-org/projects/4/)
|
||||
> [!NOTE]
|
||||
> New maintenance roadmap: https://github.com/ggerganov/whisper.cpp/discussions/2788
|
||||
|
||||
Stable: [v1.7.4](https://github.com/ggerganov/whisper.cpp/releases/tag/v1.7.4) / [Roadmap | F.A.Q.](https://github.com/ggerganov/whisper.cpp/discussions/126)
|
||||
|
||||
High-performance inference of [OpenAI's Whisper](https://github.com/openai/whisper) automatic speech recognition (ASR) model:
|
||||
|
||||
@ -23,7 +26,7 @@ High-performance inference of [OpenAI's Whisper](https://github.com/openai/whisp
|
||||
- [Efficient GPU support for NVIDIA](#nvidia-gpu-support)
|
||||
- [OpenVINO Support](#openvino-support)
|
||||
- [Ascend NPU Support](#ascend-npu-support)
|
||||
- [C-style API](https://github.com/ggml-org/whisper.cpp/blob/master/include/whisper.h)
|
||||
- [C-style API](https://github.com/ggerganov/whisper.cpp/blob/master/include/whisper.h)
|
||||
|
||||
Supported platforms:
|
||||
|
||||
@ -31,14 +34,14 @@ Supported platforms:
|
||||
- [x] [iOS](examples/whisper.objc)
|
||||
- [x] [Android](examples/whisper.android)
|
||||
- [x] [Java](bindings/java/README.md)
|
||||
- [x] Linux / [FreeBSD](https://github.com/ggml-org/whisper.cpp/issues/56#issuecomment-1350920264)
|
||||
- [x] Linux / [FreeBSD](https://github.com/ggerganov/whisper.cpp/issues/56#issuecomment-1350920264)
|
||||
- [x] [WebAssembly](examples/whisper.wasm)
|
||||
- [x] Windows ([MSVC](https://github.com/ggml-org/whisper.cpp/blob/master/.github/workflows/build.yml#L117-L144) and [MinGW](https://github.com/ggml-org/whisper.cpp/issues/168)]
|
||||
- [x] [Raspberry Pi](https://github.com/ggml-org/whisper.cpp/discussions/166)
|
||||
- [x] [Docker](https://github.com/ggml-org/whisper.cpp/pkgs/container/whisper.cpp)
|
||||
- [x] Windows ([MSVC](https://github.com/ggerganov/whisper.cpp/blob/master/.github/workflows/build.yml#L117-L144) and [MinGW](https://github.com/ggerganov/whisper.cpp/issues/168)]
|
||||
- [x] [Raspberry Pi](https://github.com/ggerganov/whisper.cpp/discussions/166)
|
||||
- [x] [Docker](https://github.com/ggerganov/whisper.cpp/pkgs/container/whisper.cpp)
|
||||
|
||||
The entire high-level implementation of the model is contained in [whisper.h](include/whisper.h) and [whisper.cpp](src/whisper.cpp).
|
||||
The rest of the code is part of the [`ggml`](https://github.com/ggml-org/ggml) machine learning library.
|
||||
The rest of the code is part of the [`ggml`](https://github.com/ggerganov/ggml) machine learning library.
|
||||
|
||||
Having such a lightweight implementation of the model allows to easily integrate it in different platforms and applications.
|
||||
As an example, here is a video of running the model on an iPhone 13 device - fully offline, on-device: [whisper.objc](examples/whisper.objc)
|
||||
@ -51,14 +54,14 @@ https://user-images.githubusercontent.com/1991296/204038393-2f846eae-c255-4099-a
|
||||
|
||||
On Apple Silicon, the inference runs fully on the GPU via Metal:
|
||||
|
||||
https://github.com/ggml-org/whisper.cpp/assets/1991296/c82e8f86-60dc-49f2-b048-d2fdbd6b5225
|
||||
https://github.com/ggerganov/whisper.cpp/assets/1991296/c82e8f86-60dc-49f2-b048-d2fdbd6b5225
|
||||
|
||||
## Quick start
|
||||
|
||||
First clone the repository:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/ggml-org/whisper.cpp.git
|
||||
git clone https://github.com/ggerganov/whisper.cpp.git
|
||||
```
|
||||
|
||||
Navigate into the directory:
|
||||
@ -149,7 +152,6 @@ standard cmake setup with:
|
||||
cmake -B build -DGGML_BLAS=1
|
||||
cmake --build build --config Release
|
||||
./build/bin/whisper-cli [ .. etc .. ]
|
||||
```
|
||||
|
||||
## Quantization
|
||||
|
||||
@ -223,7 +225,7 @@ speed-up - more than x3 faster compared with CPU-only execution. Here are the in
|
||||
The first run on a device is slow, since the ANE service compiles the Core ML model to some device-specific format.
|
||||
Next runs are faster.
|
||||
|
||||
For more information about the Core ML implementation please refer to PR [#566](https://github.com/ggml-org/whisper.cpp/pull/566).
|
||||
For more information about the Core ML implementation please refer to PR [#566](https://github.com/ggerganov/whisper.cpp/pull/566).
|
||||
|
||||
## OpenVINO support
|
||||
|
||||
@ -308,7 +310,7 @@ This can result in significant speedup in encoder performance. Here are the inst
|
||||
The first time run on an OpenVINO device is slow, since the OpenVINO framework will compile the IR (Intermediate Representation) model to a device-specific 'blob'. This device-specific blob will get
|
||||
cached for the next run.
|
||||
|
||||
For more information about the OpenVINO implementation please refer to PR [#1037](https://github.com/ggml-org/whisper.cpp/pull/1037).
|
||||
For more information about the OpenVINO implementation please refer to PR [#1037](https://github.com/ggerganov/whisper.cpp/pull/1037).
|
||||
|
||||
## NVIDIA GPU support
|
||||
|
||||
@ -386,8 +388,8 @@ Run the inference examples as usual, for example:
|
||||
|
||||
We have two Docker images available for this project:
|
||||
|
||||
1. `ghcr.io/ggml-org/whisper.cpp:main`: This image includes the main executable file as well as `curl` and `ffmpeg`. (platforms: `linux/amd64`, `linux/arm64`)
|
||||
2. `ghcr.io/ggml-org/whisper.cpp:main-cuda`: Same as `main` but compiled with CUDA support. (platforms: `linux/amd64`)
|
||||
1. `ghcr.io/ggerganov/whisper.cpp:main`: This image includes the main executable file as well as `curl` and `ffmpeg`. (platforms: `linux/amd64`, `linux/arm64`)
|
||||
2. `ghcr.io/ggerganov/whisper.cpp:main-cuda`: Same as `main` but compiled with CUDA support. (platforms: `linux/amd64`)
|
||||
|
||||
### Usage
|
||||
|
||||
@ -425,8 +427,8 @@ For detailed instructions on how to use Conan, please refer to the [Conan docume
|
||||
|
||||
This is a naive example of performing real-time inference on audio from your microphone.
|
||||
The [stream](examples/stream) tool samples the audio every half a second and runs the transcription continuously.
|
||||
More info is available in [issue #10](https://github.com/ggml-org/whisper.cpp/issues/10).
|
||||
You will need to have [sdl2](https://wiki.libsdl.org/SDL2/Installation) installed for it to work properly.
|
||||
More info is available in [issue #10](https://github.com/ggerganov/whisper.cpp/issues/10).
|
||||
You will need to have [sdl2](https://wiki.libsdl.org/SDL2/Installation) installed for it to work properly.
|
||||
|
||||
```bash
|
||||
cmake -B build -DWHISPER_SDL2=ON
|
||||
@ -514,7 +516,7 @@ main: processing './samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 pr
|
||||
|
||||
## Speaker segmentation via tinydiarize (experimental)
|
||||
|
||||
More information about this approach is available here: https://github.com/ggml-org/whisper.cpp/pull/1058
|
||||
More information about this approach is available here: https://github.com/ggerganov/whisper.cpp/pull/1058
|
||||
|
||||
Sample usage:
|
||||
|
||||
@ -578,7 +580,7 @@ https://user-images.githubusercontent.com/1991296/199337538-b7b0c7a3-2753-4a88-a
|
||||
|
||||
## Video comparison of different models
|
||||
|
||||
Use the [scripts/bench-wts.sh](https://github.com/ggml-org/whisper.cpp/blob/master/scripts/bench-wts.sh) script to generate a video in the following format:
|
||||
Use the [scripts/bench-wts.sh](https://github.com/ggerganov/whisper.cpp/blob/master/scripts/bench-wts.sh) script to generate a video in the following format:
|
||||
|
||||
```bash
|
||||
./scripts/bench-wts.sh samples/jfk.wav
|
||||
@ -595,7 +597,7 @@ In order to have an objective comparison of the performance of the inference acr
|
||||
use the [whisper-bench](examples/bench) tool. The tool simply runs the Encoder part of the model and prints how much time it
|
||||
took to execute it. The results are summarized in the following Github issue:
|
||||
|
||||
[Benchmark results](https://github.com/ggml-org/whisper.cpp/issues/89)
|
||||
[Benchmark results](https://github.com/ggerganov/whisper.cpp/issues/89)
|
||||
|
||||
Additionally a script to run whisper.cpp with different models and audio files is provided [bench.py](scripts/bench.py).
|
||||
|
||||
@ -622,24 +624,25 @@ You can download the converted models using the [models/download-ggml-model.sh](
|
||||
or manually from here:
|
||||
|
||||
- https://huggingface.co/ggerganov/whisper.cpp
|
||||
- https://ggml.ggerganov.com
|
||||
|
||||
For more details, see the conversion script [models/convert-pt-to-ggml.py](models/convert-pt-to-ggml.py) or [models/README.md](models/README.md).
|
||||
|
||||
## [Bindings](https://github.com/ggml-org/whisper.cpp/discussions/categories/bindings)
|
||||
## [Bindings](https://github.com/ggerganov/whisper.cpp/discussions/categories/bindings)
|
||||
|
||||
- [x] Rust: [tazz4843/whisper-rs](https://github.com/tazz4843/whisper-rs) | [#310](https://github.com/ggml-org/whisper.cpp/discussions/310)
|
||||
- [x] JavaScript: [bindings/javascript](bindings/javascript) | [#309](https://github.com/ggml-org/whisper.cpp/discussions/309)
|
||||
- [x] Rust: [tazz4843/whisper-rs](https://github.com/tazz4843/whisper-rs) | [#310](https://github.com/ggerganov/whisper.cpp/discussions/310)
|
||||
- [x] JavaScript: [bindings/javascript](bindings/javascript) | [#309](https://github.com/ggerganov/whisper.cpp/discussions/309)
|
||||
- React Native (iOS / Android): [whisper.rn](https://github.com/mybigday/whisper.rn)
|
||||
- [x] Go: [bindings/go](bindings/go) | [#312](https://github.com/ggml-org/whisper.cpp/discussions/312)
|
||||
- [x] Go: [bindings/go](bindings/go) | [#312](https://github.com/ggerganov/whisper.cpp/discussions/312)
|
||||
- [x] Java:
|
||||
- [GiviMAD/whisper-jni](https://github.com/GiviMAD/whisper-jni)
|
||||
- [x] Ruby: [bindings/ruby](bindings/ruby) | [#507](https://github.com/ggml-org/whisper.cpp/discussions/507)
|
||||
- [x] Objective-C / Swift: [ggml-org/whisper.spm](https://github.com/ggml-org/whisper.spm) | [#313](https://github.com/ggml-org/whisper.cpp/discussions/313)
|
||||
- [x] Ruby: [bindings/ruby](bindings/ruby) | [#507](https://github.com/ggerganov/whisper.cpp/discussions/507)
|
||||
- [x] Objective-C / Swift: [ggerganov/whisper.spm](https://github.com/ggerganov/whisper.spm) | [#313](https://github.com/ggerganov/whisper.cpp/discussions/313)
|
||||
- [exPHAT/SwiftWhisper](https://github.com/exPHAT/SwiftWhisper)
|
||||
- [x] .NET: | [#422](https://github.com/ggml-org/whisper.cpp/discussions/422)
|
||||
- [x] .NET: | [#422](https://github.com/ggerganov/whisper.cpp/discussions/422)
|
||||
- [sandrohanea/whisper.net](https://github.com/sandrohanea/whisper.net)
|
||||
- [NickDarvey/whisper](https://github.com/NickDarvey/whisper)
|
||||
- [x] Python: | [#9](https://github.com/ggml-org/whisper.cpp/issues/9)
|
||||
- [x] Python: | [#9](https://github.com/ggerganov/whisper.cpp/issues/9)
|
||||
- [stlukey/whispercpp.py](https://github.com/stlukey/whispercpp.py) (Cython)
|
||||
- [AIWintermuteAI/whispercpp](https://github.com/AIWintermuteAI/whispercpp) (Updated fork of aarnphm/whispercpp)
|
||||
- [aarnphm/whispercpp](https://github.com/aarnphm/whispercpp) (Pybind11)
|
||||
@ -647,33 +650,6 @@ For more details, see the conversion script [models/convert-pt-to-ggml.py](model
|
||||
- [x] R: [bnosac/audio.whisper](https://github.com/bnosac/audio.whisper)
|
||||
- [x] Unity: [macoron/whisper.unity](https://github.com/Macoron/whisper.unity)
|
||||
|
||||
## XCFramework
|
||||
The XCFramework is a precompiled version of the library for iOS, visionOS, tvOS,
|
||||
and macOS. It can be used in Swift projects without the need to compile the
|
||||
library from source. For examples:
|
||||
```swift
|
||||
// swift-tools-version: 5.10
|
||||
// The swift-tools-version declares the minimum version of Swift required to build this package.
|
||||
|
||||
import PackageDescription
|
||||
|
||||
let package = Package(
|
||||
name: "Whisper",
|
||||
targets: [
|
||||
.executableTarget(
|
||||
name: "Whisper",
|
||||
dependencies: [
|
||||
"WhisperFramework"
|
||||
]),
|
||||
.binaryTarget(
|
||||
name: "WhisperFramework",
|
||||
url: "https://github.com/ggml-org/whisper.cpp/releases/download/v1.7.5/whisper-v1.7.5-xcframework.zip",
|
||||
checksum: "c7faeb328620d6012e130f3d705c51a6ea6c995605f2df50f6e1ad68c59c6c4a"
|
||||
)
|
||||
]
|
||||
)
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
There are various examples of using the library for different projects in the [examples](examples) folder.
|
||||
@ -692,13 +668,13 @@ Some of the examples are even ported to run in the browser using WebAssembly. Ch
|
||||
| [whisper.android](examples/whisper.android) | | Android mobile application using whisper.cpp |
|
||||
| [whisper.nvim](examples/whisper.nvim) | | Speech-to-text plugin for Neovim |
|
||||
| [generate-karaoke.sh](examples/generate-karaoke.sh) | | Helper script to easily [generate a karaoke video](https://youtu.be/uj7hVta4blM) of raw audio capture |
|
||||
| [livestream.sh](examples/livestream.sh) | | [Livestream audio transcription](https://github.com/ggml-org/whisper.cpp/issues/185) |
|
||||
| [livestream.sh](examples/livestream.sh) | | [Livestream audio transcription](https://github.com/ggerganov/whisper.cpp/issues/185) |
|
||||
| [yt-wsp.sh](examples/yt-wsp.sh) | | Download + transcribe and/or translate any VOD [(original)](https://gist.github.com/DaniruKun/96f763ec1a037cc92fe1a059b643b818) |
|
||||
| [wchess](examples/wchess) | [wchess.wasm](examples/wchess) | Voice-controlled chess |
|
||||
|
||||
## [Discussions](https://github.com/ggml-org/whisper.cpp/discussions)
|
||||
## [Discussions](https://github.com/ggerganov/whisper.cpp/discussions)
|
||||
|
||||
If you have any kind of feedback about this project feel free to use the Discussions section and open a new topic.
|
||||
You can use the [Show and tell](https://github.com/ggml-org/whisper.cpp/discussions/categories/show-and-tell) category
|
||||
You can use the [Show and tell](https://github.com/ggerganov/whisper.cpp/discussions/categories/show-and-tell) category
|
||||
to share your own projects that use `whisper.cpp`. If you have a question, make sure to check the
|
||||
[Frequently asked questions (#126)](https://github.com/ggml-org/whisper.cpp/discussions/126) discussion.
|
||||
[Frequently asked questions (#126)](https://github.com/ggerganov/whisper.cpp/discussions/126) discussion.
|
||||
|
@ -51,7 +51,7 @@ func main() {
|
||||
In order to build, you need to have the Go compiler installed. You can get it from [here](https://golang.org/dl/). Run the tests with:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/ggml-org/whisper.cpp.git
|
||||
git clone https://github.com/ggerganov/whisper.cpp.git
|
||||
cd whisper.cpp/bindings/go
|
||||
make test
|
||||
```
|
||||
@ -98,7 +98,7 @@ The API Documentation:
|
||||
|
||||
Getting help:
|
||||
|
||||
* Follow the discussion for the go bindings [here](https://github.com/ggml-org/whisper.cpp/discussions/312)
|
||||
* Follow the discussion for the go bindings [here](https://github.com/ggerganov/whisper.cpp/discussions/312)
|
||||
|
||||
## License
|
||||
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
github.com/ggml-org/whisper.cpp/bindings/go
|
||||
github.com/ggerganov/whisper.cpp/bindings/go
|
||||
provides a speech-to-text service bindings for the Go programming language.
|
||||
*/
|
||||
package whisper
|
||||
|
@ -31,10 +31,10 @@ public class Example {
|
||||
var whisperParams = whisper.getFullDefaultParams(WhisperSamplingStrategy.WHISPER_SAMPLING_GREEDY);
|
||||
// custom configuration if required
|
||||
whisperParams.temperature_inc = 0f;
|
||||
|
||||
|
||||
var samples = readAudio(); // divide each value by 32767.0f
|
||||
whisper.fullTranscribe(whisperParams, samples);
|
||||
|
||||
|
||||
int segmentCount = whisper.getTextSegmentCount(context);
|
||||
for (int i = 0; i < segmentCount; i++) {
|
||||
String text = whisper.getTextSegment(context, i);
|
||||
@ -52,7 +52,7 @@ public class Example {
|
||||
In order to build, you need to have the JDK 8 or higher installed. Run the tests with:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/ggml-org/whisper.cpp.git
|
||||
git clone https://github.com/ggerganov/whisper.cpp.git
|
||||
cd whisper.cpp/bindings/java
|
||||
|
||||
./gradlew build
|
||||
|
@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "whisper.cpp",
|
||||
"version": "1.7.5",
|
||||
"version": "1.7.4",
|
||||
"description": "Whisper speech recognition",
|
||||
"main": "whisper.js",
|
||||
"scripts": {
|
||||
|
@ -228,7 +228,7 @@ The second argument `samples` may be an array, an object with `length` and `each
|
||||
Development
|
||||
-----------
|
||||
|
||||
% git clone https://github.com/ggml-org/whisper.cpp.git
|
||||
% git clone https://github.com/ggerganov/whisper.cpp.git
|
||||
% cd whisper.cpp/bindings/ruby
|
||||
% rake test
|
||||
|
||||
@ -241,5 +241,5 @@ License
|
||||
|
||||
The same to [whisper.cpp][].
|
||||
|
||||
[whisper.cpp]: https://github.com/ggml-org/whisper.cpp
|
||||
[models]: https://github.com/ggml-org/whisper.cpp/tree/master/models
|
||||
[whisper.cpp]: https://github.com/ggerganov/whisper.cpp
|
||||
[models]: https://github.com/ggerganov/whisper.cpp/tree/master/models
|
||||
|
@ -918,7 +918,7 @@ ruby_whisper_params_initialize(int argc, VALUE *argv, VALUE self)
|
||||
return self;
|
||||
}
|
||||
|
||||
rb_get_kwargs(kw_hash, param_names, 0, RUBY_WHISPER_PARAMS_PARAM_NAMES_COUNT, values);
|
||||
rb_get_kwargs(kw_hash, ¶m_names, 0, RUBY_WHISPER_PARAMS_PARAM_NAMES_COUNT, &values);
|
||||
Data_Get_Struct(self, ruby_whisper_params, rwp);
|
||||
|
||||
for (i = 0; i < RUBY_WHISPER_PARAMS_PARAM_NAMES_COUNT; i++) {
|
||||
|
@ -34,7 +34,7 @@ module Whisper
|
||||
when /darwin/
|
||||
Pathname(Dir.home)/"Library/Caches"
|
||||
else
|
||||
ENV.key?("XDG_CACHE_HOME") ? Pathname(ENV["XDG_CACHE_HOME"]) : Pathname(Dir.home)/".cache"
|
||||
ENV.key?("XDG_CACHE_HOME") ? ENV["XDG_CACHE_HOME"] : Pathname(Dir.home)/".cache"
|
||||
end
|
||||
base/"whisper.cpp"
|
||||
end
|
||||
|
@ -26,7 +26,7 @@ Gem::Specification.new do |s|
|
||||
s.required_ruby_version = '>= 3.1.0'
|
||||
|
||||
#### Documentation and testing.
|
||||
s.homepage = 'https://github.com/ggml-org/whisper.cpp'
|
||||
s.homepage = 'https://github.com/ggerganov/whisper.cpp'
|
||||
s.rdoc_options = ['--main', 'README.md']
|
||||
|
||||
|
||||
|
@ -41,11 +41,6 @@ COMMON_CMAKE_ARGS=(
|
||||
-DGGML_OPENMP=${GGML_OPENMP}
|
||||
)
|
||||
|
||||
XCODE_VERSION=$(xcodebuild -version 2>/dev/null | head -n1 | awk '{ print $2 }')
|
||||
MAJOR_VERSION=$(echo $XCODE_VERSION | cut -d. -f1)
|
||||
MINOR_VERSION=$(echo $XCODE_VERSION | cut -d. -f2)
|
||||
echo "Detected Xcode version: $XCODE_VERSION"
|
||||
|
||||
check_required_tool() {
|
||||
local tool=$1
|
||||
local install_message=$2
|
||||
@ -340,28 +335,21 @@ combine_static_libraries() {
|
||||
|
||||
# Platform-specific post-processing for device builds
|
||||
if [[ "$is_simulator" == "false" ]]; then
|
||||
if command -v xcrun vtool &>/dev/null; then
|
||||
if command -v vtool &>/dev/null; then
|
||||
case "$platform" in
|
||||
"ios")
|
||||
echo "Marking binary as a framework binary for iOS..."
|
||||
xcrun vtool -set-build-version ios ${IOS_MIN_OS_VERSION} ${IOS_MIN_OS_VERSION} -replace \
|
||||
vtool -set-build-version ios ${IOS_MIN_OS_VERSION} ${IOS_MIN_OS_VERSION} -replace \
|
||||
-output "${base_dir}/${output_lib}" "${base_dir}/${output_lib}"
|
||||
;;
|
||||
"visionos")
|
||||
echo "Marking binary as a framework binary for visionOS..."
|
||||
if [[ "$MAJOR_VERSION" -gt 16 ]] || [[ "$MAJOR_VERSION" -eq 16 && "$MINOR_VERSION" -gt 2 ]]; then
|
||||
echo "Xcode version greater than 16.2, using visionOS."
|
||||
VISION_OS_BUILD_VERSION="visionos"
|
||||
else
|
||||
echo "Xcode version less than or equal to 16.2, using xros."
|
||||
VISION_OS_BUILD_VERSION="xros"
|
||||
fi
|
||||
xcrun vtool -set-build-version ${VISION_OS_BUILD_VERSION} ${VISIONOS_MIN_OS_VERSION} ${VISIONOS_MIN_OS_VERSION} -replace \
|
||||
vtool -set-build-version xros ${VISIONOS_MIN_OS_VERSION} ${VISIONOS_MIN_OS_VERSION} -replace \
|
||||
-output "${base_dir}/${output_lib}" "${base_dir}/${output_lib}"
|
||||
;;
|
||||
"tvos")
|
||||
echo "Marking binary as a framework binary for tvOS..."
|
||||
xcrun vtool -set-build-version tvos ${TVOS_MIN_OS_VERSION} ${TVOS_MIN_OS_VERSION} -replace \
|
||||
vtool -set-build-version tvos ${TVOS_MIN_OS_VERSION} ${TVOS_MIN_OS_VERSION} -replace \
|
||||
-output "${base_dir}/${output_lib}" "${base_dir}/${output_lib}"
|
||||
;;
|
||||
esac
|
||||
|
@ -4,7 +4,7 @@ A very basic tool for benchmarking the inference performance on your device. The
|
||||
the transformer on some random audio data and records the execution time. This way we can have an objective comparison
|
||||
of the performance of the model for various setups.
|
||||
|
||||
Benchmark results are tracked in the following Github issue: https://github.com/ggml-org/whisper.cpp/issues/89
|
||||
Benchmark results are tracked in the following Github issue: https://github.com/ggerganov/whisper.cpp/issues/89
|
||||
|
||||
```bash
|
||||
# run the bench too on the small.en model using 4 threads
|
||||
@ -40,7 +40,7 @@ system_info: n_threads = 4 | AVX2 = 0 | AVX512 = 0 | NEON = 1 | FP16_VA = 1 | WA
|
||||
|
||||
If you wish, you can submit these results here:
|
||||
|
||||
https://github.com/ggml-org/whisper.cpp/issues/89
|
||||
https://github.com/ggerganov/whisper.cpp/issues/89
|
||||
|
||||
Please include the following information:
|
||||
|
||||
|
@ -3,7 +3,7 @@
|
||||
// Speak short text commands to the microphone.
|
||||
// This program will detect your voice command and convert them to text.
|
||||
//
|
||||
// ref: https://github.com/ggml-org/whisper.cpp/issues/171
|
||||
// ref: https://github.com/ggerganov/whisper.cpp/issues/171
|
||||
//
|
||||
|
||||
#include "common-sdl.h"
|
||||
|
@ -2,7 +2,7 @@
|
||||
#
|
||||
# Transcribe audio livestream by feeding ffmpeg output to whisper.cpp at regular intervals
|
||||
# Idea by @semiformal-net
|
||||
# ref: https://github.com/ggml-org/whisper.cpp/issues/185
|
||||
# ref: https://github.com/ggerganov/whisper.cpp/issues/185
|
||||
#
|
||||
|
||||
set -eo pipefail
|
||||
|
@ -1,115 +1,39 @@
|
||||
import http.server
|
||||
import socketserver
|
||||
import os
|
||||
import sys
|
||||
from pathlib import Path
|
||||
import urllib.parse
|
||||
|
||||
SCRIPT_DIR = Path(__file__).parent.absolute()
|
||||
DIRECTORY = os.path.join(SCRIPT_DIR, "../build-em/bin")
|
||||
DIRECTORY = os.path.abspath(DIRECTORY)
|
||||
|
||||
# The context root we want for all applications
|
||||
CONTEXT_ROOT = "/whisper.cpp"
|
||||
|
||||
class CustomHTTPRequestHandler(http.server.SimpleHTTPRequestHandler):
|
||||
def __init__(self, *args, **kwargs):
|
||||
super().__init__(*args, directory=DIRECTORY, **kwargs)
|
||||
|
||||
def do_GET(self):
|
||||
# Redirect root to the context root
|
||||
if self.path == '/':
|
||||
self.send_response(302)
|
||||
self.send_header('Location', CONTEXT_ROOT + '/')
|
||||
self.end_headers()
|
||||
return
|
||||
|
||||
# Handle requests under the context root
|
||||
if self.path.startswith(CONTEXT_ROOT):
|
||||
# Remove the context root prefix to get the actual path
|
||||
actual_path = self.path[len(CONTEXT_ROOT):]
|
||||
|
||||
if not actual_path:
|
||||
self.send_response(302)
|
||||
self.send_header('Location', CONTEXT_ROOT + '/')
|
||||
self.end_headers()
|
||||
return
|
||||
|
||||
if '.worker.js' in actual_path:
|
||||
worker_file = os.path.basename(actual_path)
|
||||
worker_path = os.path.join(DIRECTORY, worker_file)
|
||||
|
||||
if os.path.exists(worker_path):
|
||||
print(f"Found worker file: {worker_path}")
|
||||
self.path = '/' + worker_file
|
||||
else:
|
||||
print(f"Worker file not found: {worker_path}")
|
||||
|
||||
elif actual_path == '/':
|
||||
self.path = '/whisper.wasm/index.html'
|
||||
elif actual_path.startswith('/bench.wasm/') or actual_path.startswith('/command.wasm/') or actual_path.startswith('/stream.wasm/'):
|
||||
# Keep the path as is, just remove the context root
|
||||
self.path = actual_path
|
||||
# For all other paths under the context root
|
||||
else:
|
||||
# Check if this is a request to a file in whisper.wasm
|
||||
potential_file = os.path.join(DIRECTORY, 'whisper.wasm', actual_path.lstrip('/'))
|
||||
if os.path.exists(potential_file) and not os.path.isdir(potential_file):
|
||||
self.path = '/whisper.wasm' + actual_path
|
||||
else:
|
||||
# Try to resolve the file from the base directory
|
||||
potential_file = os.path.join(DIRECTORY, actual_path.lstrip('/'))
|
||||
if os.path.exists(potential_file):
|
||||
self.path = actual_path
|
||||
|
||||
# For direct requests to worker files (without context root as these
|
||||
# are in the build-em/bin directory
|
||||
elif '.worker.js' in self.path:
|
||||
# If requesting a worker file from any subdirectory
|
||||
if '.worker.js' in self.path:
|
||||
worker_file = os.path.basename(self.path)
|
||||
worker_path = os.path.join(DIRECTORY, worker_file)
|
||||
|
||||
if os.path.exists(worker_path):
|
||||
self.path = '/' + worker_file
|
||||
|
||||
# Handle coi-serviceworker.js separately
|
||||
if 'coi-serviceworker.js' in self.path:
|
||||
worker_file = "coi-serviceworker.js"
|
||||
worker_path = os.path.join(SCRIPT_DIR, worker_file)
|
||||
if os.path.exists(worker_path):
|
||||
self.send_response(200)
|
||||
self.send_header('Content-type', 'application/javascript')
|
||||
self.end_headers()
|
||||
with open(worker_path, 'rb') as file:
|
||||
self.wfile.write(file.read())
|
||||
return
|
||||
else:
|
||||
print(f"Warning: Could not find {worker_path}")
|
||||
|
||||
return super().do_GET()
|
||||
|
||||
def end_headers(self):
|
||||
# Add required headers for SharedArrayBuffer
|
||||
self.send_header("Cross-Origin-Opener-Policy", "same-origin")
|
||||
self.send_header("Cross-Origin-Embedder-Policy", "require-corp")
|
||||
self.send_header("Access-Control-Allow-Origin", "*")
|
||||
self.send_header("Access-Control-Allow-Origin", "*");
|
||||
super().end_headers()
|
||||
|
||||
PORT = 8000
|
||||
|
||||
# Enable address reuse
|
||||
class CustomServer(socketserver.TCPServer):
|
||||
allow_reuse_address = True
|
||||
|
||||
try:
|
||||
with CustomServer(("", PORT), CustomHTTPRequestHandler) as httpd:
|
||||
print(f"Serving directory '{DIRECTORY}' at http://localhost:{PORT}")
|
||||
print(f"Application context root: http://localhost:{PORT}{CONTEXT_ROOT}/")
|
||||
try:
|
||||
httpd.serve_forever()
|
||||
except KeyboardInterrupt:
|
||||
print("\nServer stopped.")
|
||||
# Force complete exit
|
||||
sys.exit(0)
|
||||
except OSError as e:
|
||||
print(f"Error: {e}")
|
||||
sys.exit(1)
|
||||
with socketserver.TCPServer(("", PORT), CustomHTTPRequestHandler) as httpd:
|
||||
print(f"Serving directory '{DIRECTORY}' at http://localhost:{PORT}")
|
||||
try:
|
||||
httpd.serve_forever()
|
||||
except KeyboardInterrupt:
|
||||
print("\nServer stopped.")
|
||||
|
@ -2,7 +2,7 @@
|
||||
#
|
||||
# Transcribe twitch.tv livestream by feeding audio input to whisper.cpp at regular intervals
|
||||
# Thanks to @keyehzy
|
||||
# ref: https://github.com/ggml-org/whisper.cpp/issues/209
|
||||
# ref: https://github.com/ggerganov/whisper.cpp/issues/209
|
||||
#
|
||||
# The script currently depends on the third-party tool "streamlink"
|
||||
# On Mac OS, you can install it via "brew install streamlink"
|
||||
|
@ -5,7 +5,7 @@
|
||||
# This simple script is called by Neovim to capture audio from the microphone and transcribe it with Whisper.
|
||||
# In order for this to work, you need to clone the whisper.cpp repo and build the 'stream' tool
|
||||
#
|
||||
# git clone https://github.com/ggml-org/whisper.cpp
|
||||
# git clone https://github.com/ggerganov/whisper.cpp
|
||||
# cd whisper.cpp
|
||||
# make stream
|
||||
#
|
||||
@ -31,7 +31,7 @@
|
||||
model="base.en"
|
||||
|
||||
# export the path to the whisper.cpp repo in the WHISPER_CPP_HOME env variable
|
||||
# https://github.com/ggml-org/whisper.cpp
|
||||
# https://github.com/ggerganov/whisper.cpp
|
||||
cd "${WHISPER_CPP_HOME}"
|
||||
|
||||
if [ ! -f ./stream ] ; then
|
||||
|
@ -30,7 +30,7 @@ Link: https://ggerganov.github.io/whisper.cpp/
|
||||
|
||||
```bash (v3.1.2)
|
||||
# build using Emscripten
|
||||
git clone https://github.com/ggml-org/whisper.cpp
|
||||
git clone https://github.com/ggerganov/whisper.cpp
|
||||
cd whisper.cpp
|
||||
mkdir build-em && cd build-em
|
||||
emcmake cmake ..
|
||||
|
@ -65,14 +65,13 @@ EMSCRIPTEN_BINDINGS(whisper) {
|
||||
}
|
||||
|
||||
struct whisper_full_params params = whisper_full_default_params(whisper_sampling_strategy::WHISPER_SAMPLING_GREEDY);
|
||||
bool is_multilingual = whisper_is_multilingual(g_contexts[index]);
|
||||
|
||||
params.print_realtime = true;
|
||||
params.print_progress = false;
|
||||
params.print_timestamps = true;
|
||||
params.print_special = false;
|
||||
params.translate = translate;
|
||||
params.language = is_multilingual ? strdup(lang.c_str()) : "en";
|
||||
params.language = whisper_is_multilingual(g_contexts[index]) ? lang.c_str() : "en";
|
||||
params.n_threads = std::min(nthreads, std::min(16, mpow2(std::thread::hardware_concurrency())));
|
||||
params.offset_ms = 0;
|
||||
|
||||
@ -103,13 +102,10 @@ EMSCRIPTEN_BINDINGS(whisper) {
|
||||
|
||||
// run the worker
|
||||
{
|
||||
g_worker = std::thread([index, params, pcmf32 = std::move(pcmf32), is_multilingual]() {
|
||||
g_worker = std::thread([index, params, pcmf32 = std::move(pcmf32)]() {
|
||||
whisper_reset_timings(g_contexts[index]);
|
||||
whisper_full(g_contexts[index], params, pcmf32.data(), pcmf32.size());
|
||||
whisper_print_timings(g_contexts[index]);
|
||||
if (is_multilingual) {
|
||||
free((void*)params.language);
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
|
@ -25,12 +25,12 @@
|
||||
# SOFTWARE.
|
||||
|
||||
# Small shell script to more easily automatically download and transcribe live stream VODs.
|
||||
# This uses YT-DLP, ffmpeg and the CPP version of Whisper: https://github.com/ggml-org/whisper.cpp
|
||||
# This uses YT-DLP, ffmpeg and the CPP version of Whisper: https://github.com/ggerganov/whisper.cpp
|
||||
# Use `./examples/yt-wsp.sh help` to print help info.
|
||||
#
|
||||
# Sample usage:
|
||||
#
|
||||
# git clone https://github.com/ggml-org/whisper.cpp
|
||||
# git clone https://github.com/ggerganov/whisper.cpp
|
||||
# cd whisper.cpp
|
||||
# make
|
||||
# ./examples/yt-wsp.sh https://www.youtube.com/watch?v=1234567890
|
||||
@ -44,7 +44,7 @@ SCRIPT_DIR="${SCRIPT_PATH%/*}"
|
||||
|
||||
################################################################################
|
||||
# Documentation on downloading models can be found in the whisper.cpp repo:
|
||||
# https://github.com/ggml-org/whisper.cpp/#usage
|
||||
# https://github.com/ggerganov/whisper.cpp/#usage
|
||||
#
|
||||
# note: unless a multilingual model is specified, WHISPER_LANG will be ignored
|
||||
# and the video will be transcribed as if the audio were in the English language
|
||||
@ -103,10 +103,10 @@ check_requirements() {
|
||||
fi;
|
||||
|
||||
if ! command -v "${WHISPER_EXECUTABLE}" &>/dev/null; then
|
||||
echo "The C++ implementation of Whisper is required: https://github.com/ggml-org/whisper.cpp"
|
||||
echo "The C++ implementation of Whisper is required: https://github.com/ggerganov/whisper.cpp"
|
||||
echo "Sample usage:";
|
||||
echo "";
|
||||
echo " git clone https://github.com/ggml-org/whisper.cpp";
|
||||
echo " git clone https://github.com/ggerganov/whisper.cpp";
|
||||
echo " cd whisper.cpp";
|
||||
echo " make";
|
||||
echo " ./examples/yt-wsp.sh https://www.youtube.com/watch?v=1234567890";
|
||||
|
@ -25,6 +25,7 @@ You can now use it like this:
|
||||
`ggml` models are available from the following locations:
|
||||
|
||||
- https://huggingface.co/ggerganov/whisper.cpp/tree/main
|
||||
- https://ggml.ggerganov.com
|
||||
|
||||
### 3. Convert with [convert-pt-to-ggml.py](convert-pt-to-ggml.py)
|
||||
|
||||
@ -77,7 +78,7 @@ OpenAI format. To read the HF models you can use the [convert-h5-to-ggml.py](con
|
||||
|
||||
```bash
|
||||
git clone https://github.com/openai/whisper
|
||||
git clone https://github.com/ggml-org/whisper.cpp
|
||||
git clone https://github.com/ggerganov/whisper.cpp
|
||||
|
||||
# clone HF fine-tuned model (this is just an example)
|
||||
git clone https://huggingface.co/openai/whisper-medium
|
||||
@ -95,7 +96,7 @@ Currently, the chunk-based transcription strategy is not implemented, so there c
|
||||
```bash
|
||||
# clone OpenAI whisper and whisper.cpp
|
||||
git clone https://github.com/openai/whisper
|
||||
git clone https://github.com/ggml-org/whisper.cpp
|
||||
git clone https://github.com/ggerganov/whisper.cpp
|
||||
|
||||
# get the models
|
||||
cd whisper.cpp/models
|
||||
|
@ -3,7 +3,7 @@
|
||||
# Usage:
|
||||
#
|
||||
# git clone https://github.com/openai/whisper
|
||||
# git clone https://github.com/ggml-org/whisper.cpp
|
||||
# git clone https://github.com/ggerganov/whisper.cpp
|
||||
# git clone https://huggingface.co/openai/whisper-medium
|
||||
#
|
||||
# python3 ./whisper.cpp/models/convert-h5-to-ggml.py ./whisper-medium/ ./whisper .
|
||||
@ -12,7 +12,7 @@
|
||||
#
|
||||
# For more info:
|
||||
#
|
||||
# https://github.com/ggml-org/whisper.cpp/issues/157
|
||||
# https://github.com/ggerganov/whisper.cpp/issues/157
|
||||
#
|
||||
|
||||
import io
|
||||
|
@ -1,4 +1,4 @@
|
||||
## M1 Pro (old 22c96b4)
|
||||
## M1 Pro
|
||||
|
||||
make -j && ./scripts/bench-all.sh 8
|
||||
|
||||
@ -67,184 +67,202 @@ make -j && ./scripts/bench-all.sh 8
|
||||
|
||||
Running memcpy benchmark
|
||||
|
||||
memcpy: 48.01 GB/s (heat-up)
|
||||
memcpy: 56.00 GB/s ( 1 thread)
|
||||
memcpy: 56.20 GB/s ( 1 thread)
|
||||
memcpy: 102.69 GB/s ( 2 thread)
|
||||
memcpy: 140.32 GB/s ( 3 thread)
|
||||
memcpy: 179.04 GB/s ( 4 thread)
|
||||
memcpy: 159.61 GB/s ( 5 thread)
|
||||
memcpy: 159.02 GB/s ( 6 thread)
|
||||
memcpy: 180.29 GB/s ( 7 thread)
|
||||
memcpy: 198.10 GB/s ( 8 thread)
|
||||
sum: -5119999345.000000
|
||||
memcpy: 46.58 GB/s (heat-up)
|
||||
memcpy: 54.16 GB/s ( 1 thread)
|
||||
memcpy: 54.23 GB/s ( 1 thread)
|
||||
memcpy: 99.63 GB/s ( 2 thread)
|
||||
memcpy: 140.59 GB/s ( 3 thread)
|
||||
memcpy: 176.52 GB/s ( 4 thread)
|
||||
memcpy: 158.90 GB/s ( 5 thread)
|
||||
memcpy: 163.00 GB/s ( 6 thread)
|
||||
memcpy: 189.69 GB/s ( 7 thread)
|
||||
memcpy: 197.15 GB/s ( 8 thread)
|
||||
sum: -5120002007.000000
|
||||
|
||||
|
||||
make -j && ./scripts/bench-all.sh 1
|
||||
|
||||
Running ggml_mul_mat benchmark with 1 threads
|
||||
|
||||
64 x 64: Q4_0 37.7 GFLOPS (128 runs) | Q4_1 36.0 GFLOPS (128 runs)
|
||||
64 x 64: Q5_0 20.1 GFLOPS (128 runs) | Q5_1 19.8 GFLOPS (128 runs) | Q8_0 39.5 GFLOPS (128 runs)
|
||||
64 x 64: F16 29.9 GFLOPS (128 runs) | F32 22.6 GFLOPS (128 runs)
|
||||
128 x 128: Q4_0 71.0 GFLOPS (128 runs) | Q4_1 62.2 GFLOPS (128 runs)
|
||||
128 x 128: Q5_0 33.4 GFLOPS (128 runs) | Q5_1 31.6 GFLOPS (128 runs) | Q8_0 79.8 GFLOPS (128 runs)
|
||||
128 x 128: F16 52.4 GFLOPS (128 runs) | F32 32.7 GFLOPS (128 runs)
|
||||
256 x 256: Q4_0 88.6 GFLOPS (128 runs) | Q4_1 77.2 GFLOPS (128 runs)
|
||||
256 x 256: Q5_0 40.3 GFLOPS (128 runs) | Q5_1 36.8 GFLOPS (128 runs) | Q8_0 102.5 GFLOPS (128 runs)
|
||||
256 x 256: F16 64.6 GFLOPS (128 runs) | F32 36.4 GFLOPS (128 runs)
|
||||
512 x 512: Q4_0 94.7 GFLOPS (128 runs) | Q4_1 83.6 GFLOPS (128 runs)
|
||||
512 x 512: Q5_0 45.9 GFLOPS (128 runs) | Q5_1 41.3 GFLOPS (128 runs) | Q8_0 112.8 GFLOPS (128 runs)
|
||||
512 x 512: F16 72.3 GFLOPS (128 runs) | F32 37.7 GFLOPS (128 runs)
|
||||
1024 x 1024: Q4_0 98.9 GFLOPS ( 47 runs) | Q4_1 88.2 GFLOPS ( 42 runs)
|
||||
1024 x 1024: Q5_0 49.0 GFLOPS ( 23 runs) | Q5_1 43.9 GFLOPS ( 21 runs) | Q8_0 121.0 GFLOPS ( 57 runs)
|
||||
1024 x 1024: F16 72.6 GFLOPS ( 34 runs) | F32 36.0 GFLOPS ( 17 runs)
|
||||
2048 x 2048: Q4_0 101.3 GFLOPS ( 6 runs) | Q4_1 90.0 GFLOPS ( 6 runs)
|
||||
2048 x 2048: Q5_0 50.8 GFLOPS ( 3 runs) | Q5_1 45.3 GFLOPS ( 3 runs) | Q8_0 124.1 GFLOPS ( 8 runs)
|
||||
2048 x 2048: F16 70.7 GFLOPS ( 5 runs) | F32 30.4 GFLOPS ( 3 runs)
|
||||
4096 x 4096: Q4_0 101.7 GFLOPS ( 3 runs) | Q4_1 90.3 GFLOPS ( 3 runs)
|
||||
4096 x 4096: Q5_0 52.2 GFLOPS ( 3 runs) | Q5_1 45.7 GFLOPS ( 3 runs) | Q8_0 123.0 GFLOPS ( 3 runs)
|
||||
4096 x 4096: F16 60.3 GFLOPS ( 3 runs) | F32 29.8 GFLOPS ( 3 runs)
|
||||
64 x 64: Q4_0 245.8 GFLOPS (128 runs) | Q4_1 168.6 GFLOPS (128 runs)
|
||||
64 x 64: Q5_0 115.7 GFLOPS (128 runs) | Q5_1 125.9 GFLOPS (128 runs) | Q8_0 215.8 GFLOPS (128 runs)
|
||||
64 x 64: F16 139.5 GFLOPS (128 runs) | F32 337.2 GFLOPS (128 runs)
|
||||
128 x 128: Q4_0 494.8 GFLOPS (128 runs) | Q4_1 350.4 GFLOPS (128 runs)
|
||||
128 x 128: Q5_0 257.1 GFLOPS (128 runs) | Q5_1 261.4 GFLOPS (128 runs) | Q8_0 509.4 GFLOPS (128 runs)
|
||||
128 x 128: F16 302.3 GFLOPS (128 runs) | F32 672.8 GFLOPS (128 runs)
|
||||
256 x 256: Q4_0 795.7 GFLOPS (128 runs) | Q4_1 663.7 GFLOPS (128 runs)
|
||||
256 x 256: Q5_0 737.8 GFLOPS (128 runs) | Q5_1 757.6 GFLOPS (128 runs) | Q8_0 827.7 GFLOPS (128 runs)
|
||||
256 x 256: F16 872.6 GFLOPS (128 runs) | F32 956.3 GFLOPS (128 runs)
|
||||
512 x 512: Q4_0 1188.0 GFLOPS (128 runs) | Q4_1 1085.0 GFLOPS (128 runs)
|
||||
512 x 512: Q5_0 1421.1 GFLOPS (128 runs) | Q5_1 1454.9 GFLOPS (128 runs) | Q8_0 1191.4 GFLOPS (128 runs)
|
||||
512 x 512: F16 1577.4 GFLOPS (128 runs) | F32 1982.0 GFLOPS (128 runs)
|
||||
1024 x 1024: Q4_0 2342.6 GFLOPS (128 runs) | Q4_1 1955.8 GFLOPS (128 runs)
|
||||
1024 x 1024: Q5_0 2306.7 GFLOPS (128 runs) | Q5_1 2217.0 GFLOPS (128 runs) | Q8_0 2230.7 GFLOPS (128 runs)
|
||||
1024 x 1024: F16 2593.8 GFLOPS (128 runs) | F32 3269.0 GFLOPS (128 runs)
|
||||
2048 x 2048: Q4_0 3735.7 GFLOPS (128 runs) | Q4_1 3205.3 GFLOPS (128 runs)
|
||||
2048 x 2048: Q5_0 3584.5 GFLOPS (128 runs) | Q5_1 3621.7 GFLOPS (128 runs) | Q8_0 3622.3 GFLOPS (128 runs)
|
||||
2048 x 2048: F16 3763.6 GFLOPS (128 runs) | F32 4153.3 GFLOPS (128 runs)
|
||||
4096 x 4096: Q4_0 3891.1 GFLOPS ( 29 runs) | Q4_1 3554.0 GFLOPS ( 26 runs)
|
||||
4096 x 4096: Q5_0 3753.1 GFLOPS ( 28 runs) | Q5_1 3750.1 GFLOPS ( 28 runs) | Q8_0 3768.5 GFLOPS ( 28 runs)
|
||||
4096 x 4096: F16 3864.2 GFLOPS ( 29 runs) | F32 3970.5 GFLOPS ( 29 runs)
|
||||
|
||||
|
||||
make -j && ./scripts/bench-all.sh 1 1 0
|
||||
|
||||
| CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
|
||||
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
||||
| M2 ULTRA | METAL | tiny | 1 | 0 | 8.74 | 1.20 | 0.36 | 0.01 | ad4e350 |
|
||||
| M2 ULTRA | METAL | tiny-q5_0 | 1 | 0 | 10.30 | 1.15 | 0.38 | 0.01 | ad4e350 |
|
||||
| M2 ULTRA | METAL | tiny-q5_1 | 1 | 0 | 10.71 | 1.13 | 0.38 | 0.01 | ad4e350 |
|
||||
| M2 ULTRA | METAL | tiny-q8_0 | 1 | 0 | 9.97 | 1.12 | 0.37 | 0.01 | ad4e350 |
|
||||
| M2 ULTRA | METAL | base | 1 | 0 | 16.77 | 1.71 | 0.44 | 0.02 | ad4e350 |
|
||||
| M2 ULTRA | METAL | base-q5_0 | 1 | 0 | 16.92 | 1.63 | 0.44 | 0.02 | ad4e350 |
|
||||
| M2 ULTRA | METAL | base-q5_1 | 1 | 0 | 16.84 | 1.63 | 0.44 | 0.02 | ad4e350 |
|
||||
| M2 ULTRA | METAL | base-q8_0 | 1 | 0 | 16.12 | 1.63 | 0.44 | 0.02 | ad4e350 |
|
||||
| M2 ULTRA | METAL | small | 1 | 0 | 45.29 | 3.44 | 0.92 | 0.05 | ad4e350 |
|
||||
| M2 ULTRA | METAL | small-q5_0 | 1 | 0 | 50.43 | 3.34 | 0.94 | 0.06 | ad4e350 |
|
||||
| M2 ULTRA | METAL | small-q5_1 | 1 | 0 | 50.49 | 3.35 | 0.93 | 0.06 | ad4e350 |
|
||||
| M2 ULTRA | METAL | small-q8_0 | 1 | 0 | 47.37 | 3.20 | 0.91 | 0.05 | ad4e350 |
|
||||
| M2 ULTRA | METAL | medium | 1 | 0 | 122.81 | 7.39 | 1.99 | 0.12 | ad4e350 |
|
||||
| M2 ULTRA | METAL | medium-q5_0 | 1 | 0 | 140.62 | 6.73 | 2.03 | 0.14 | ad4e350 |
|
||||
| M2 ULTRA | METAL | medium-q5_1 | 1 | 0 | 140.44 | 6.74 | 2.04 | 0.14 | ad4e350 |
|
||||
| M2 ULTRA | METAL | medium-q8_0 | 1 | 0 | 131.05 | 6.54 | 1.95 | 0.13 | ad4e350 |
|
||||
| M2 ULTRA | METAL | medium-dis | 1 | 0 | 110.95 | 0.99 | 0.24 | 0.02 | ad4e350 |
|
||||
| M2 ULTRA | METAL | large-v2 | 1 | 0 | 222.19 | 10.93 | 3.01 | 0.21 | ad4e350 |
|
||||
| M2 ULTRA | METAL | large-v2-q5_0 | 1 | 0 | 258.47 | 9.75 | 3.01 | 0.25 | ad4e350 |
|
||||
| M2 ULTRA | METAL | large-v2-q5_1 | 1 | 0 | 258.40 | 9.85 | 3.01 | 0.24 | ad4e350 |
|
||||
| M2 ULTRA | METAL | large-v2-q8_0 | 1 | 0 | 236.68 | 9.61 | 2.85 | 0.23 | ad4e350 |
|
||||
| M2 ULTRA | METAL | large-v2-dis | 1 | 0 | 199.28 | 1.12 | 0.27 | 0.02 | ad4e350 |
|
||||
| M2 ULTRA | METAL | large-v3-turbo | 1 | 0 | 201.49 | 1.76 | 0.45 | 0.03 | ad4e350 |
|
||||
| M2 ULTRA | METAL | large-v3-turbo-q5_0 | 1 | 0 | 233.70 | 1.55 | 0.46 | 0.04 | ad4e350 |
|
||||
| M2 ULTRA | METAL | large-v3-turbo-q8_0 | 1 | 0 | 214.20 | 1.51 | 0.44 | 0.04 | ad4e350 |
|
||||
| M2 ULTRA | METAL | tiny | 1 | 0 | 12.32 | 1.35 | 0.49 | 0.01 | 22c96b4 |
|
||||
| M2 ULTRA | METAL | tiny-q5_0 | 1 | 0 | 11.65 | 1.30 | 0.51 | 0.01 | 22c96b4 |
|
||||
| M2 ULTRA | METAL | tiny-q5_1 | 1 | 0 | 12.08 | 1.30 | 0.51 | 0.01 | 22c96b4 |
|
||||
| M2 ULTRA | METAL | base | 1 | 0 | 17.58 | 1.90 | 0.76 | 0.02 | 22c96b4 |
|
||||
| M2 ULTRA | METAL | base-q5_0 | 1 | 0 | 18.89 | 1.86 | 0.79 | 0.02 | 22c96b4 |
|
||||
| M2 ULTRA | METAL | base-q5_1 | 1 | 0 | 20.69 | 1.88 | 0.79 | 0.02 | 22c96b4 |
|
||||
| M2 ULTRA | METAL | small | 1 | 0 | 49.32 | 3.85 | 1.71 | 0.05 | 22c96b4 |
|
||||
| M2 ULTRA | METAL | small-q5_0 | 1 | 0 | 54.91 | 3.81 | 1.82 | 0.06 | 22c96b4 |
|
||||
| M2 ULTRA | METAL | small-q5_1 | 1 | 0 | 54.92 | 3.81 | 1.79 | 0.06 | 22c96b4 |
|
||||
| M2 ULTRA | METAL | medium | 1 | 0 | 134.34 | 8.04 | 3.82 | 0.13 | 22c96b4 |
|
||||
| M2 ULTRA | METAL | medium-q5_0 | 1 | 0 | 151.68 | 7.59 | 4.07 | 0.14 | 22c96b4 |
|
||||
| M2 ULTRA | METAL | medium-q5_1 | 1 | 0 | 151.58 | 7.67 | 4.07 | 0.14 | 22c96b4 |
|
||||
| M2 ULTRA | METAL | medium-dis | 1 | 0 | 120.82 | 1.07 | 0.41 | 0.02 | 22c96b4 |
|
||||
| M2 ULTRA | METAL | large-v2 | 1 | 0 | 235.63 | 12.27 | 5.85 | 0.22 | 22c96b4 |
|
||||
| M2 ULTRA | METAL | large-v2-q5_0 | 1 | 0 | 273.38 | 11.17 | 6.40 | 0.26 | 22c96b4 |
|
||||
| M2 ULTRA | METAL | large-v2-q5_1 | 1 | 0 | 272.44 | 11.32 | 6.29 | 0.26 | 22c96b4 |
|
||||
| M2 ULTRA | METAL | large-v2-dis | 1 | 0 | 212.51 | 1.20 | 0.47 | 0.02 | 22c96b4 |
|
||||
|
||||
|
||||
make -j && ./scripts/bench-all.sh 1 1 1
|
||||
|
||||
| CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
|
||||
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
||||
| M2 ULTRA | METAL | tiny | 1 | 1 | 7.82 | 1.31 | 0.35 | 0.01 | ad4e350 |
|
||||
| M2 ULTRA | METAL | tiny-q5_0 | 1 | 1 | 8.32 | 1.28 | 0.37 | 0.01 | ad4e350 |
|
||||
| M2 ULTRA | METAL | tiny-q5_1 | 1 | 1 | 8.21 | 1.28 | 0.37 | 0.01 | ad4e350 |
|
||||
| M2 ULTRA | METAL | tiny-q8_0 | 1 | 1 | 7.97 | 1.23 | 0.36 | 0.01 | ad4e350 |
|
||||
| M2 ULTRA | METAL | base | 1 | 1 | 13.96 | 1.80 | 0.42 | 0.02 | ad4e350 |
|
||||
| M2 ULTRA | METAL | base-q5_0 | 1 | 1 | 15.19 | 1.75 | 0.42 | 0.02 | ad4e350 |
|
||||
| M2 ULTRA | METAL | base-q5_1 | 1 | 1 | 15.09 | 1.75 | 0.42 | 0.02 | ad4e350 |
|
||||
| M2 ULTRA | METAL | base-q8_0 | 1 | 1 | 14.45 | 1.70 | 0.41 | 0.02 | ad4e350 |
|
||||
| M2 ULTRA | METAL | small | 1 | 1 | 40.08 | 3.54 | 0.86 | 0.05 | ad4e350 |
|
||||
| M2 ULTRA | METAL | small-q5_0 | 1 | 1 | 45.07 | 3.51 | 0.88 | 0.05 | ad4e350 |
|
||||
| M2 ULTRA | METAL | small-q5_1 | 1 | 1 | 45.05 | 3.52 | 0.88 | 0.05 | ad4e350 |
|
||||
| M2 ULTRA | METAL | small-q8_0 | 1 | 1 | 42.04 | 3.34 | 0.85 | 0.05 | ad4e350 |
|
||||
| M2 ULTRA | METAL | medium | 1 | 1 | 107.20 | 7.28 | 1.79 | 0.11 | ad4e350 |
|
||||
| M2 ULTRA | METAL | medium-q5_0 | 1 | 1 | 125.02 | 6.67 | 1.83 | 0.12 | ad4e350 |
|
||||
| M2 ULTRA | METAL | medium-q5_1 | 1 | 1 | 124.83 | 6.70 | 1.84 | 0.12 | ad4e350 |
|
||||
| M2 ULTRA | METAL | medium-q8_0 | 1 | 1 | 114.56 | 6.53 | 1.79 | 0.11 | ad4e350 |
|
||||
| M2 ULTRA | METAL | medium-dis | 1 | 1 | 95.96 | 1.01 | 0.23 | 0.01 | ad4e350 |
|
||||
| M2 ULTRA | METAL | large-v2 | 1 | 1 | 194.29 | 10.57 | 2.67 | 0.20 | ad4e350 |
|
||||
| M2 ULTRA | METAL | large-v2-q5_0 | 1 | 1 | 230.74 | 9.57 | 2.73 | 0.23 | ad4e350 |
|
||||
| M2 ULTRA | METAL | large-v2-q5_1 | 1 | 1 | 229.97 | 9.69 | 2.74 | 0.23 | ad4e350 |
|
||||
| M2 ULTRA | METAL | large-v2-q8_0 | 1 | 1 | 208.11 | 9.37 | 2.60 | 0.21 | ad4e350 |
|
||||
| M2 ULTRA | METAL | large-v2-dis | 1 | 1 | 172.72 | 1.12 | 0.26 | 0.02 | ad4e350 |
|
||||
| M2 ULTRA | METAL | large-v3-turbo | 1 | 1 | 174.46 | 1.74 | 0.42 | 0.03 | ad4e350 |
|
||||
| M2 ULTRA | METAL | large-v3-turbo-q5_0 | 1 | 1 | 205.78 | 1.54 | 0.42 | 0.04 | ad4e350 |
|
||||
| M2 ULTRA | METAL | large-v3-turbo-q8_0 | 1 | 1 | 186.33 | 1.50 | 0.40 | 0.03 | ad4e350 |
|
||||
| M2 ULTRA | METAL | tiny | 1 | 1 | 9.07 | 1.33 | 0.45 | 0.01 | 22c96b4 |
|
||||
| M2 ULTRA | METAL | tiny-q5_0 | 1 | 1 | 9.74 | 1.33 | 0.47 | 0.01 | 22c96b4 |
|
||||
| M2 ULTRA | METAL | tiny-q5_1 | 1 | 1 | 8.93 | 1.31 | 0.46 | 0.01 | 22c96b4 |
|
||||
| M2 ULTRA | METAL | base | 1 | 1 | 15.75 | 1.87 | 0.71 | 0.02 | 22c96b4 |
|
||||
| M2 ULTRA | METAL | base-q5_0 | 1 | 1 | 17.04 | 1.83 | 0.74 | 0.02 | 22c96b4 |
|
||||
| M2 ULTRA | METAL | base-q5_1 | 1 | 1 | 17.17 | 1.83 | 0.74 | 0.02 | 22c96b4 |
|
||||
| M2 ULTRA | METAL | small | 1 | 1 | 42.33 | 3.64 | 1.60 | 0.05 | 22c96b4 |
|
||||
| M2 ULTRA | METAL | small-q5_0 | 1 | 1 | 47.61 | 3.63 | 1.70 | 0.05 | 22c96b4 |
|
||||
| M2 ULTRA | METAL | small-q5_1 | 1 | 1 | 47.70 | 3.66 | 1.68 | 0.05 | 22c96b4 |
|
||||
| M2 ULTRA | METAL | medium | 1 | 1 | 114.42 | 7.53 | 3.55 | 0.11 | 22c96b4 |
|
||||
| M2 ULTRA | METAL | medium-q5_0 | 1 | 1 | 132.63 | 7.02 | 3.77 | 0.13 | 22c96b4 |
|
||||
| M2 ULTRA | METAL | medium-q5_1 | 1 | 1 | 132.28 | 7.10 | 3.76 | 0.13 | 22c96b4 |
|
||||
| M2 ULTRA | METAL | medium-dis | 1 | 1 | 102.34 | 1.01 | 0.42 | 0.01 | 22c96b4 |
|
||||
| M2 ULTRA | METAL | large-v2 | 1 | 1 | 203.01 | 11.03 | 5.45 | 0.20 | 22c96b4 |
|
||||
| M2 ULTRA | METAL | large-v2-q5_0 | 1 | 1 | 240.05 | 10.18 | 5.98 | 0.23 | 22c96b4 |
|
||||
| M2 ULTRA | METAL | large-v2-q5_1 | 1 | 1 | 239.22 | 10.23 | 5.87 | 0.23 | 22c96b4 |
|
||||
| M2 ULTRA | METAL | large-v2-dis | 1 | 1 | 181.14 | 1.14 | 0.48 | 0.02 | 22c96b4 |
|
||||
|
||||
|
||||
## M4 Max
|
||||
|
||||
make -j && ./scripts/bench-all.sh 8
|
||||
## Ryzen 9 5950X + RTX 2060
|
||||
|
||||
make -j && ./scripts/bench-all.sh 8 0 0
|
||||
|
||||
Running memcpy benchmark
|
||||
|
||||
memcpy: 57.23 GB/s (heat-up)
|
||||
memcpy: 68.85 GB/s ( 1 thread)
|
||||
memcpy: 70.00 GB/s ( 1 thread)
|
||||
memcpy: 104.83 GB/s ( 2 thread)
|
||||
memcpy: 124.54 GB/s ( 3 thread)
|
||||
memcpy: 144.30 GB/s ( 4 thread)
|
||||
memcpy: 141.24 GB/s ( 5 thread)
|
||||
memcpy: 147.03 GB/s ( 6 thread)
|
||||
memcpy: 147.18 GB/s ( 7 thread)
|
||||
memcpy: 149.83 GB/s ( 8 thread)
|
||||
sum: -5120001475.000000
|
||||
memcpy: 12.36 GB/s (heat-up)
|
||||
memcpy: 12.33 GB/s ( 1 thread)
|
||||
memcpy: 12.38 GB/s ( 1 thread)
|
||||
memcpy: 14.48 GB/s ( 2 thread)
|
||||
memcpy: 15.00 GB/s ( 3 thread)
|
||||
memcpy: 14.77 GB/s ( 4 thread)
|
||||
memcpy: 14.60 GB/s ( 5 thread)
|
||||
memcpy: 14.57 GB/s ( 6 thread)
|
||||
memcpy: 14.34 GB/s ( 7 thread)
|
||||
memcpy: 14.40 GB/s ( 8 thread)
|
||||
sum: -5119998076.000000
|
||||
|
||||
Running ggml_mul_mat benchmark with 8 threads
|
||||
|
||||
64 x 64: Q4_0 3.1 GFLOPS (128 runs) | Q4_1 3.1 GFLOPS (128 runs)
|
||||
64 x 64: Q5_0 3.0 GFLOPS (128 runs) | Q5_1 2.9 GFLOPS (128 runs) | Q8_0 3.1 GFLOPS (128 runs)
|
||||
64 x 64: F16 3.0 GFLOPS (128 runs) | F32 3.0 GFLOPS (128 runs)
|
||||
128 x 128: Q4_0 21.1 GFLOPS (128 runs) | Q4_1 20.3 GFLOPS (128 runs)
|
||||
128 x 128: Q5_0 20.6 GFLOPS (128 runs) | Q5_1 20.4 GFLOPS (128 runs) | Q8_0 22.1 GFLOPS (128 runs)
|
||||
128 x 128: F16 21.7 GFLOPS (128 runs) | F32 21.7 GFLOPS (128 runs)
|
||||
256 x 256: Q4_0 105.7 GFLOPS (128 runs) | Q4_1 94.4 GFLOPS (128 runs)
|
||||
256 x 256: Q5_0 94.8 GFLOPS (128 runs) | Q5_1 87.5 GFLOPS (128 runs) | Q8_0 107.2 GFLOPS (128 runs)
|
||||
256 x 256: F16 95.1 GFLOPS (128 runs) | F32 94.3 GFLOPS (128 runs)
|
||||
512 x 512: Q4_0 214.7 GFLOPS (128 runs) | Q4_1 189.8 GFLOPS (128 runs)
|
||||
512 x 512: Q5_0 187.7 GFLOPS (128 runs) | Q5_1 176.2 GFLOPS (128 runs) | Q8_0 252.2 GFLOPS (128 runs)
|
||||
512 x 512: F16 220.8 GFLOPS (128 runs) | F32 218.3 GFLOPS (128 runs)
|
||||
1024 x 1024: Q4_0 333.7 GFLOPS (128 runs) | Q4_1 305.8 GFLOPS (128 runs)
|
||||
1024 x 1024: Q5_0 283.2 GFLOPS (128 runs) | Q5_1 268.2 GFLOPS (125 runs) | Q8_0 394.1 GFLOPS (128 runs)
|
||||
1024 x 1024: F16 355.0 GFLOPS (128 runs) | F32 313.0 GFLOPS (128 runs)
|
||||
2048 x 2048: Q4_0 395.0 GFLOPS ( 23 runs) | Q4_1 380.6 GFLOPS ( 23 runs)
|
||||
2048 x 2048: Q5_0 336.6 GFLOPS ( 20 runs) | Q5_1 318.4 GFLOPS ( 19 runs) | Q8_0 482.6 GFLOPS ( 29 runs)
|
||||
2048 x 2048: F16 424.5 GFLOPS ( 25 runs) | F32 337.7 GFLOPS ( 20 runs)
|
||||
4096 x 4096: Q4_0 412.8 GFLOPS ( 4 runs) | Q4_1 405.1 GFLOPS ( 3 runs)
|
||||
4096 x 4096: Q5_0 346.0 GFLOPS ( 3 runs) | Q5_1 334.6 GFLOPS ( 3 runs) | Q8_0 502.6 GFLOPS ( 4 runs)
|
||||
4096 x 4096: F16 412.5 GFLOPS ( 4 runs) | F32 274.0 GFLOPS ( 3 runs)
|
||||
|
||||
| CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
|
||||
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
||||
| Ryzen 9 5950X | AVX2 | tiny | 8 | 0 | 195.29 | 1.57 | 0.51 | 0.26 | 22c96b4 |
|
||||
| Ryzen 9 5950X | AVX2 | tiny-q5_0 | 8 | 0 | 213.33 | 1.10 | 0.50 | 0.30 | 22c96b4 |
|
||||
| Ryzen 9 5950X | AVX2 | tiny-q5_1 | 8 | 0 | 219.38 | 1.18 | 0.53 | 0.32 | 22c96b4 |
|
||||
| Ryzen 9 5950X | AVX2 | base | 8 | 0 | 424.85 | 3.71 | 1.03 | 0.46 | 22c96b4 |
|
||||
| Ryzen 9 5950X | AVX2 | base-q5_0 | 8 | 0 | 473.61 | 1.81 | 0.82 | 0.52 | 22c96b4 |
|
||||
| Ryzen 9 5950X | AVX2 | base-q5_1 | 8 | 0 | 484.14 | 1.92 | 0.85 | 0.56 | 22c96b4 |
|
||||
| Ryzen 9 5950X | AVX2 | small | 8 | 0 | 1458.32 | 12.66 | 3.09 | 1.26 | 22c96b4 |
|
||||
| Ryzen 9 5950X | AVX2 | small-q5_0 | 8 | 0 | 1673.22 | 6.42 | 2.18 | 1.45 | 22c96b4 |
|
||||
| Ryzen 9 5950X | AVX2 | small-q5_1 | 8 | 0 | 1724.78 | 6.72 | 2.32 | 1.52 | 22c96b4 |
|
||||
| Ryzen 9 5950X | AVX2 | medium | 8 | 0 | 4333.87 | 36.80 | 8.56 | 3.37 | 22c96b4 |
|
||||
| Ryzen 9 5950X | AVX2 | medium-q5_0 | 8 | 0 | 5194.09 | 19.21 | 5.71 | 3.97 | 22c96b4 |
|
||||
| Ryzen 9 5950X | AVX2 | medium-q5_1 | 8 | 0 | 5450.39 | 20.01 | 5.99 | 4.17 | 22c96b4 |
|
||||
| Ryzen 9 5950X | AVX2 | medium-dis | 8 | 0 | 3995.19 | 5.08 | 1.21 | 0.55 | 22c96b4 |
|
||||
| Ryzen 9 5950X | AVX2 | large-v2 | 8 | 0 | 8056.16 | 69.74 | 16.11 | 6.13 | 22c96b4 |
|
||||
| Ryzen 9 5950X | AVX2 | large-v2-q5_0 | 8 | 0 | 9799.58 | 35.16 | 10.49 | 7.28 | 22c96b4 |
|
||||
| Ryzen 9 5950X | AVX2 | large-v2-q5_1 | 8 | 0 | ms | 36.74 | 11.02 | 7.65 | 22c96b4 |
|
||||
| Ryzen 9 5950X | AVX2 | large-v2-dis | 8 | 0 | 7490.03 | 7.40 | 1.70 | 0.72 | 22c96b4 |
|
||||
|
||||
|
||||
make -j && ./scripts/bench-all.sh 1
|
||||
WHISPER_CUDA=1 make -j && ./scripts/bench-all.sh 8 1 0
|
||||
|
||||
Running ggml_mul_mat benchmark with 1 threads
|
||||
|
||||
64 x 64: Q4_0 49.6 GFLOPS (128 runs) | Q4_1 46.8 GFLOPS (128 runs)
|
||||
64 x 64: Q5_0 28.1 GFLOPS (128 runs) | Q5_1 26.8 GFLOPS (128 runs) | Q8_0 52.3 GFLOPS (128 runs)
|
||||
64 x 64: F16 38.1 GFLOPS (128 runs) | F32 26.0 GFLOPS (128 runs)
|
||||
128 x 128: Q4_0 87.6 GFLOPS (128 runs) | Q4_1 79.9 GFLOPS (128 runs)
|
||||
128 x 128: Q5_0 44.7 GFLOPS (128 runs) | Q5_1 41.6 GFLOPS (128 runs) | Q8_0 98.9 GFLOPS (128 runs)
|
||||
128 x 128: F16 64.1 GFLOPS (128 runs) | F32 35.4 GFLOPS (128 runs)
|
||||
256 x 256: Q4_0 104.2 GFLOPS (128 runs) | Q4_1 92.3 GFLOPS (128 runs)
|
||||
256 x 256: Q5_0 57.3 GFLOPS (128 runs) | Q5_1 51.5 GFLOPS (128 runs) | Q8_0 127.7 GFLOPS (128 runs)
|
||||
256 x 256: F16 71.4 GFLOPS (128 runs) | F32 40.6 GFLOPS (128 runs)
|
||||
512 x 512: Q4_0 109.5 GFLOPS (128 runs) | Q4_1 98.0 GFLOPS (128 runs)
|
||||
512 x 512: Q5_0 62.4 GFLOPS (128 runs) | Q5_1 54.6 GFLOPS (128 runs) | Q8_0 135.0 GFLOPS (128 runs)
|
||||
512 x 512: F16 82.6 GFLOPS (128 runs) | F32 44.6 GFLOPS (128 runs)
|
||||
1024 x 1024: Q4_0 112.1 GFLOPS ( 53 runs) | Q4_1 100.9 GFLOPS ( 47 runs)
|
||||
1024 x 1024: Q5_0 65.4 GFLOPS ( 31 runs) | Q5_1 56.7 GFLOPS ( 27 runs) | Q8_0 140.9 GFLOPS ( 66 runs)
|
||||
1024 x 1024: F16 88.0 GFLOPS ( 41 runs) | F32 43.4 GFLOPS ( 21 runs)
|
||||
2048 x 2048: Q4_0 113.4 GFLOPS ( 7 runs) | Q4_1 102.0 GFLOPS ( 6 runs)
|
||||
2048 x 2048: Q5_0 67.1 GFLOPS ( 4 runs) | Q5_1 57.7 GFLOPS ( 4 runs) | Q8_0 142.7 GFLOPS ( 9 runs)
|
||||
2048 x 2048: F16 84.6 GFLOPS ( 5 runs) | F32 37.5 GFLOPS ( 3 runs)
|
||||
4096 x 4096: Q4_0 113.8 GFLOPS ( 3 runs) | Q4_1 102.0 GFLOPS ( 3 runs)
|
||||
4096 x 4096: Q5_0 67.7 GFLOPS ( 3 runs) | Q5_1 58.0 GFLOPS ( 3 runs) | Q8_0 142.9 GFLOPS ( 3 runs)
|
||||
4096 x 4096: F16 73.7 GFLOPS ( 3 runs) | F32 36.1 GFLOPS ( 3 runs)
|
||||
| GPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
|
||||
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
||||
| RTX 2060 | AVX2 CUDA | tiny | 8 | 0 | 12.54 | 0.93 | 0.29 | 0.02 | 22c96b4 |
|
||||
| RTX 2060 | AVX2 CUDA | tiny-q5_0 | 8 | 0 | 12.73 | 0.98 | 0.24 | 0.02 | 22c96b4 |
|
||||
| RTX 2060 | AVX2 CUDA | tiny-q5_1 | 8 | 0 | 12.72 | 0.99 | 0.24 | 0.02 | 22c96b4 |
|
||||
| RTX 2060 | AVX2 CUDA | base | 8 | 0 | 24.14 | 1.28 | 0.41 | 0.03 | 22c96b4 |
|
||||
| RTX 2060 | AVX2 CUDA | base-q5_0 | 8 | 0 | 24.58 | 1.38 | 0.35 | 0.03 | 22c96b4 |
|
||||
| RTX 2060 | AVX2 CUDA | base-q5_1 | 8 | 0 | 24.58 | 1.37 | 0.35 | 0.03 | 22c96b4 |
|
||||
| RTX 2060 | AVX2 CUDA | small | 8 | 0 | 74.70 | 2.91 | 0.84 | 0.07 | 22c96b4 |
|
||||
| RTX 2060 | AVX2 CUDA | small-q5_0 | 8 | 0 | 76.12 | 2.84 | 0.77 | 0.08 | 22c96b4 |
|
||||
| RTX 2060 | AVX2 CUDA | small-q5_1 | 8 | 0 | 76.14 | 2.84 | 0.76 | 0.08 | 22c96b4 |
|
||||
| RTX 2060 | AVX2 CUDA | medium | 8 | 0 | 200.69 | 6.46 | 1.83 | 0.17 | 22c96b4 |
|
||||
| RTX 2060 | AVX2 CUDA | medium-q5_0 | 8 | 0 | 204.80 | 5.90 | 1.65 | 0.19 | 22c96b4 |
|
||||
| RTX 2060 | AVX2 CUDA | medium-q5_1 | 8 | 0 | 205.61 | 5.85 | 1.61 | 0.19 | 22c96b4 |
|
||||
| RTX 2060 | AVX2 CUDA | medium-dis | 8 | 0 | 186.17 | 0.86 | 0.24 | 0.02 | 22c96b4 |
|
||||
| RTX 2060 | AVX2 CUDA | large-v2 | 8 | 0 | 347.22 | 10.36 | 2.82 | 0.29 | 22c96b4 |
|
||||
| RTX 2060 | AVX2 CUDA | large-v2-q5_0 | 8 | 0 | 357.06 | 8.81 | 2.58 | 0.34 | 22c96b4 |
|
||||
| RTX 2060 | AVX2 CUDA | large-v2-q5_1 | 8 | 0 | 356.97 | 8.62 | 2.49 | 0.33 | 22c96b4 |
|
||||
| RTX 2060 | AVX2 CUDA | large-v2-dis | 8 | 0 | 318.05 | 1.03 | 0.34 | 0.04 | 22c96b4 |
|
||||
|
||||
|
||||
make -j && ./scripts/bench-all.sh 1 1 0
|
||||
WHISPER_CUDA=1 make -j && ./scripts/bench-all.sh 8 1 1
|
||||
|
||||
| CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
|
||||
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
||||
| M4 Max | METAL | tiny | 1 | 0 | 13.12 | 0.87 | 0.29 | 0.01 | ad4e3509 |
|
||||
| M4 Max | METAL | tiny-q8_0 | 1 | 0 | 15.90 | 0.88 | 0.31 | 0.01 | ad4e3509 |
|
||||
| M4 Max | METAL | base | 1 | 0 | 23.10 | 1.42 | 0.34 | 0.02 | ad4e3509 |
|
||||
| M4 Max | METAL | base-q8_0 | 1 | 0 | 27.25 | 1.31 | 0.34 | 0.02 | ad4e3509 |
|
||||
| M4 Max | METAL | small | 1 | 0 | 71.76 | 3.02 | 0.70 | 0.06 | ad4e3509 |
|
||||
| M4 Max | METAL | small-q8_0 | 1 | 0 | 73.88 | 2.60 | 0.71 | 0.06 | ad4e3509 |
|
||||
| M4 Max | METAL | medium | 1 | 0 | 208.22 | 6.94 | 1.55 | 0.16 | ad4e3509 |
|
||||
| M4 Max | METAL | medium-q8_0 | 1 | 0 | 214.65 | 5.90 | 1.57 | 0.17 | ad4e3509 |
|
||||
| M4 Max | METAL | large-v2 | 1 | 0 | 381.72 | 11.28 | 2.51 | 0.29 | ad4e3509 |
|
||||
| M4 Max | METAL | large-v2-q8_0 | 1 | 0 | 394.97 | 8.90 | 2.45 | 0.30 | ad4e3509 |
|
||||
| GPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
|
||||
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
||||
| RTX 2060 | AVX2 CUDA | tiny | 8 | 1 | 7.21 | 0.76 | 0.29 | 0.02 | 22c96b4 |
|
||||
| RTX 2060 | AVX2 CUDA | tiny-q5_0 | 8 | 1 | 7.42 | 0.82 | 0.18 | 0.02 | 22c96b4 |
|
||||
| RTX 2060 | AVX2 CUDA | tiny-q5_1 | 8 | 1 | 7.38 | 0.82 | 0.18 | 0.02 | 22c96b4 |
|
||||
| RTX 2060 | AVX2 CUDA | base | 8 | 1 | 13.49 | 1.04 | 0.36 | 0.02 | 22c96b4 |
|
||||
| RTX 2060 | AVX2 CUDA | base-q5_0 | 8 | 1 | 13.94 | 1.13 | 0.26 | 0.03 | 22c96b4 |
|
||||
| RTX 2060 | AVX2 CUDA | base-q5_1 | 8 | 1 | 13.94 | 1.14 | 0.26 | 0.03 | 22c96b4 |
|
||||
| RTX 2060 | AVX2 CUDA | small | 8 | 1 | 42.81 | 2.33 | 0.69 | 0.05 | 22c96b4 |
|
||||
| RTX 2060 | AVX2 CUDA | small-q5_0 | 8 | 1 | 44.43 | 2.25 | 0.59 | 0.06 | 22c96b4 |
|
||||
| RTX 2060 | AVX2 CUDA | small-q5_1 | 8 | 1 | 44.11 | 2.24 | 0.58 | 0.06 | 22c96b4 |
|
||||
| RTX 2060 | AVX2 CUDA | medium | 8 | 1 | 115.47 | 5.17 | 1.45 | 0.11 | 22c96b4 |
|
||||
| RTX 2060 | AVX2 CUDA | medium-q5_0 | 8 | 1 | 120.37 | 4.63 | 1.25 | 0.13 | 22c96b4 |
|
||||
| RTX 2060 | AVX2 CUDA | medium-q5_1 | 8 | 1 | 120.28 | 4.55 | 1.21 | 0.13 | 22c96b4 |
|
||||
| RTX 2060 | AVX2 CUDA | medium-dis | 8 | 1 | 101.69 | 0.75 | 0.20 | 0.02 | 22c96b4 |
|
||||
| RTX 2060 | AVX2 CUDA | large-v2 | 8 | 1 | 205.67 | 8.49 | 2.19 | 0.18 | 22c96b4 |
|
||||
| RTX 2060 | AVX2 CUDA | large-v2-q5_0 | 8 | 1 | 214.07 | 6.88 | 1.94 | 0.22 | 22c96b4 |
|
||||
| RTX 2060 | AVX2 CUDA | large-v2-q5_1 | 8 | 1 | 213.98 | 6.70 | 1.86 | 0.22 | 22c96b4 |
|
||||
| RTX 2060 | AVX2 CUDA | large-v2-dis | 8 | 1 | 176.71 | 0.91 | 0.31 | 0.03 | 22c96b4 |
|
||||
|
||||
|
||||
make -j && ./scripts/bench-all.sh 1 1 1
|
||||
|
||||
| CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
|
||||
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
||||
| M4 Max | METAL | tiny | 1 | 1 | 15.22 | 0.89 | 0.26 | 0.01 | ad4e3509 |
|
||||
| M4 Max | METAL | tiny-q8_0 | 1 | 1 | 14.70 | 0.86 | 0.26 | 0.01 | ad4e3509 |
|
||||
| M4 Max | METAL | base | 1 | 1 | 25.33 | 1.36 | 0.30 | 0.02 | ad4e3509 |
|
||||
| M4 Max | METAL | base-q8_0 | 1 | 1 | 21.27 | 1.31 | 0.30 | 0.02 | ad4e3509 |
|
||||
| M4 Max | METAL | small | 1 | 1 | 58.43 | 2.78 | 0.60 | 0.05 | ad4e3509 |
|
||||
| M4 Max | METAL | small-q8_0 | 1 | 1 | 60.26 | 2.39 | 0.60 | 0.05 | ad4e3509 |
|
||||
| M4 Max | METAL | medium | 1 | 1 | 169.73 | 6.03 | 1.31 | 0.14 | ad4e3509 |
|
||||
| M4 Max | METAL | medium-q8_0 | 1 | 1 | 176.61 | 4.99 | 1.31 | 0.14 | ad4e3509 |
|
||||
| M4 Max | METAL | large-v2 | 1 | 1 | 316.18 | 9.60 | 2.08 | 0.24 | ad4e3509 |
|
||||
| M4 Max | METAL | large-v2-q8_0 | 1 | 1 | 329.59 | 7.55 | 2.08 | 0.25 | ad4e3509 |
|
||||
|
||||
|
||||
# V100
|
||||
@ -253,33 +271,28 @@ WHISPER_CUDA=1 make -j && ./scripts/bench-all.sh 8 1 0
|
||||
|
||||
| GPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
|
||||
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
||||
| V100 | AVX2 CUDA | tiny | 8 | 0 | 6.15 | 1.02 | 0.30 | 0.01 | ad4e3509 |
|
||||
| V100 | AVX2 CUDA | tiny-q5_1 | 8 | 0 | 5.92 | 0.96 | 0.25 | 0.01 | ad4e3509 |
|
||||
| V100 | AVX2 CUDA | base | 8 | 0 | 10.60 | 1.43 | 0.43 | 0.02 | ad4e3509 |
|
||||
| V100 | AVX2 CUDA | base-q5_1 | 8 | 0 | 10.80 | 1.37 | 0.36 | 0.02 | ad4e3509 |
|
||||
| V100 | AVX2 CUDA | small | 8 | 0 | 31.83 | 2.82 | 0.87 | 0.04 | ad4e3509 |
|
||||
| V100 | AVX2 CUDA | small-q5_1 | 8 | 0 | 31.88 | 2.68 | 0.72 | 0.04 | ad4e3509 |
|
||||
| V100 | AVX2 CUDA | medium | 8 | 0 | 81.30 | 6.02 | 1.81 | 0.09 | ad4e3509 |
|
||||
| V100 | AVX2 CUDA | medium-q5_0 | 8 | 0 | 83.21 | 5.44 | 1.41 | 0.10 | ad4e3509 |
|
||||
| V100 | AVX2 CUDA | large-v2 | 8 | 0 | 134.81 | 8.64 | 2.69 | 0.14 | ad4e3509 |
|
||||
| V100 | AVX2 CUDA | large-v2-q5_0 | 8 | 0 | 138.95 | 7.57 | 2.04 | 0.15 | ad4e3509 |
|
||||
| V100 | AVX2 CUDA | large-v3-turbo | 8 | 0 | 124.42 | 1.37 | 0.43 | 0.02 | ad4e3509 |
|
||||
| V100 | AVX2 CUDA | large-v3-turbo-q5_0 | 8 | 0 | 127.81 | 1.13 | 0.32 | 0.03 | ad4e3509 |
|
||||
| V100 | AVX2 CUDA | tiny | 1 | 0 | 6.21 | 1.11 | 0.30 | 0.02 | 22c96b4 |
|
||||
| V100 | AVX2 CUDA | tiny-q5_1 | 1 | 0 | 5.97 | 1.10 | 0.26 | 0.02 | 22c96b4 |
|
||||
| V100 | AVX2 CUDA | base | 1 | 0 | 10.95 | 1.47 | 0.42 | 0.03 | 22c96b4 |
|
||||
| V100 | AVX2 CUDA | base-q5_1 | 1 | 0 | 11.13 | 1.53 | 0.36 | 0.03 | 22c96b4 |
|
||||
| V100 | AVX2 CUDA | small | 1 | 0 | 31.57 | 2.96 | 0.84 | 0.05 | 22c96b4 |
|
||||
| V100 | AVX2 CUDA | small-q5_1 | 1 | 0 | 32.19 | 3.14 | 0.75 | 0.05 | 22c96b4 |
|
||||
| V100 | AVX2 CUDA | medium | 1 | 0 | 85.88 | 6.49 | 1.80 | 0.10 | 22c96b4 |
|
||||
| V100 | AVX2 CUDA | medium-q5_0 | 1 | 0 | 87.53 | 5.82 | 1.37 | 0.10 | 22c96b4 |
|
||||
| V100 | AVX2 CUDA | large-v2 | 1 | 0 | 142.23 | 8.92 | 2.62 | 0.15 | 22c96b4 |
|
||||
|
||||
|
||||
WHISPER_CUDA=1 make -j && ./scripts/bench-all.sh 8 1 1
|
||||
|
||||
| GPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
|
||||
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
||||
| V100 | AVX2 CUDA | tiny | 8 | 1 | 4.01 | 0.90 | 0.25 | 0.01 | ad4e3509 |
|
||||
| V100 | AVX2 CUDA | tiny-q5_1 | 8 | 1 | 4.12 | 0.88 | 0.18 | 0.01 | ad4e3509 |
|
||||
| V100 | AVX2 CUDA | base | 8 | 1 | 7.00 | 1.30 | 0.35 | 0.01 | ad4e3509 |
|
||||
| V100 | AVX2 CUDA | base-q5_1 | 8 | 1 | 7.22 | 1.21 | 0.26 | 0.02 | ad4e3509 |
|
||||
| V100 | AVX2 CUDA | small | 8 | 1 | 18.68 | 2.39 | 0.69 | 0.03 | ad4e3509 |
|
||||
| V100 | AVX2 CUDA | small-q5_1 | 8 | 1 | 19.38 | 2.32 | 0.51 | 0.03 | ad4e3509 |
|
||||
| V100 | AVX2 CUDA | medium | 8 | 1 | 53.17 | 5.15 | 1.45 | 0.06 | ad4e3509 |
|
||||
| V100 | AVX2 CUDA | medium-q5_0 | 8 | 1 | 55.09 | 4.64 | 1.05 | 0.07 | ad4e3509 |
|
||||
| V100 | AVX2 CUDA | large-v2 | 8 | 1 | 85.77 | 7.57 | 2.19 | 0.10 | ad4e3509 |
|
||||
| V100 | AVX2 CUDA | large-v2-q5_0 | 8 | 1 | 89.24 | 6.48 | 1.48 | 0.11 | ad4e3509 |
|
||||
| V100 | AVX2 CUDA | large-v3-turbo | 8 | 1 | 75.56 | 1.25 | 0.37 | 0.02 | ad4e3509 |
|
||||
| V100 | AVX2 CUDA | large-v3-turbo-q5_0 | 8 | 1 | 78.48 | 1.01 | 0.24 | 0.02 | ad4e3509 |
|
||||
| V100 | AVX2 CUDA | tiny | 1 | 1 | 3.96 | 0.82 | 0.24 | 0.02 | 22c96b4 |
|
||||
| V100 | AVX2 CUDA | tiny-q5_1 | 1 | 1 | 4.05 | 0.85 | 0.18 | 0.02 | 22c96b4 |
|
||||
| V100 | AVX2 CUDA | base | 1 | 1 | 7.21 | 1.16 | 0.36 | 0.02 | 22c96b4 |
|
||||
| V100 | AVX2 CUDA | base-q5_1 | 1 | 1 | 7.39 | 1.21 | 0.26 | 0.02 | 22c96b4 |
|
||||
| V100 | AVX2 CUDA | small | 1 | 1 | 19.81 | 2.41 | 0.71 | 0.04 | 22c96b4 |
|
||||
| V100 | AVX2 CUDA | small-q5_1 | 1 | 1 | 20.50 | 2.31 | 0.51 | 0.04 | 22c96b4 |
|
||||
| V100 | AVX2 CUDA | medium | 1 | 1 | 56.02 | 4.89 | 1.44 | 0.07 | 22c96b4 |
|
||||
| V100 | AVX2 CUDA | medium-q5_0 | 1 | 1 | 57.85 | 4.73 | 1.09 | 0.08 | 22c96b4 |
|
||||
| V100 | AVX2 CUDA | large-v2 | 1 | 1 | 92.73 | 7.18 | 2.14 | 0.10 | 22c96b4 |
|
||||
|
||||
|
@ -4276,11 +4276,11 @@ void whisper_print_timings(struct whisper_context * ctx) {
|
||||
|
||||
WHISPER_LOG_INFO("%s: fallbacks = %3d p / %3d h\n", __func__, ctx->state->n_fail_p, ctx->state->n_fail_h);
|
||||
WHISPER_LOG_INFO("%s: mel time = %8.2f ms\n", __func__, ctx->state->t_mel_us / 1000.0f);
|
||||
WHISPER_LOG_INFO("%s: sample time = %8.2f ms / %5d runs ( %8.2f ms per run)\n", __func__, 1e-3f * ctx->state->t_sample_us, n_sample, 1e-3f * ctx->state->t_sample_us / n_sample);
|
||||
WHISPER_LOG_INFO("%s: encode time = %8.2f ms / %5d runs ( %8.2f ms per run)\n", __func__, 1e-3f * ctx->state->t_encode_us, n_encode, 1e-3f * ctx->state->t_encode_us / n_encode);
|
||||
WHISPER_LOG_INFO("%s: decode time = %8.2f ms / %5d runs ( %8.2f ms per run)\n", __func__, 1e-3f * ctx->state->t_decode_us, n_decode, 1e-3f * ctx->state->t_decode_us / n_decode);
|
||||
WHISPER_LOG_INFO("%s: batchd time = %8.2f ms / %5d runs ( %8.2f ms per run)\n", __func__, 1e-3f * ctx->state->t_batchd_us, n_batchd, 1e-3f * ctx->state->t_batchd_us / n_batchd);
|
||||
WHISPER_LOG_INFO("%s: prompt time = %8.2f ms / %5d runs ( %8.2f ms per run)\n", __func__, 1e-3f * ctx->state->t_prompt_us, n_prompt, 1e-3f * ctx->state->t_prompt_us / n_prompt);
|
||||
WHISPER_LOG_INFO("%s: sample time = %8.2f ms / %5d runs (%8.2f ms per run)\n", __func__, 1e-3f * ctx->state->t_sample_us, n_sample, 1e-3f * ctx->state->t_sample_us / n_sample);
|
||||
WHISPER_LOG_INFO("%s: encode time = %8.2f ms / %5d runs (%8.2f ms per run)\n", __func__, 1e-3f * ctx->state->t_encode_us, n_encode, 1e-3f * ctx->state->t_encode_us / n_encode);
|
||||
WHISPER_LOG_INFO("%s: decode time = %8.2f ms / %5d runs (%8.2f ms per run)\n", __func__, 1e-3f * ctx->state->t_decode_us, n_decode, 1e-3f * ctx->state->t_decode_us / n_decode);
|
||||
WHISPER_LOG_INFO("%s: batchd time = %8.2f ms / %5d runs (%8.2f ms per run)\n", __func__, 1e-3f * ctx->state->t_batchd_us, n_batchd, 1e-3f * ctx->state->t_batchd_us / n_batchd);
|
||||
WHISPER_LOG_INFO("%s: prompt time = %8.2f ms / %5d runs (%8.2f ms per run)\n", __func__, 1e-3f * ctx->state->t_prompt_us, n_prompt, 1e-3f * ctx->state->t_prompt_us / n_prompt);
|
||||
}
|
||||
WHISPER_LOG_INFO("%s: total time = %8.2f ms\n", __func__, (t_end_us - ctx->t_start_us)/1000.0f);
|
||||
}
|
||||
@ -5527,13 +5527,11 @@ int whisper_full_with_state(
|
||||
const int seek_start = params.offset_ms/10;
|
||||
const int seek_end = params.duration_ms == 0 ? whisper_n_len_from_state(state) : seek_start + params.duration_ms/10;
|
||||
|
||||
// if length of spectrogram is less than 100ms (10 frames), then return
|
||||
// basically don't process anything that is less than 100ms
|
||||
// ref: https://github.com/ggml-org/whisper.cpp/issues/2065
|
||||
const int delta_min = 10;
|
||||
|
||||
if (seek_end < seek_start + delta_min) {
|
||||
WHISPER_LOG_WARN("%s: input is too short - %d ms < 100 ms. consider padding the input audio with silence\n", __func__, (seek_end - seek_start)*10);
|
||||
// if length of spectrogram is less than 1.0s (100 frames), then return
|
||||
// basically don't process anything that is less than 1.0s
|
||||
// see issue #39: https://github.com/ggerganov/whisper.cpp/issues/39
|
||||
if (seek_end < seek_start + 100) {
|
||||
WHISPER_LOG_WARN("%s: input is too short - %d ms < 1000 ms. consider padding the input audio with silence\n", __func__, (seek_end - seek_start)*10);
|
||||
return 0;
|
||||
}
|
||||
|
||||
@ -5677,8 +5675,8 @@ int whisper_full_with_state(
|
||||
ctx, state, progress_cur, params.progress_callback_user_data);
|
||||
}
|
||||
|
||||
// if only 100ms left, then stop
|
||||
if (seek + delta_min >= seek_end) {
|
||||
// if only 1 second left, then stop
|
||||
if (seek + 100 >= seek_end) {
|
||||
break;
|
||||
}
|
||||
|
||||
@ -6025,10 +6023,10 @@ int whisper_full_with_state(
|
||||
// end of segment
|
||||
if (token.id == whisper_token_eot(ctx) || // end of text token
|
||||
(params.max_tokens > 0 && i >= params.max_tokens) || // max tokens per segment reached
|
||||
(has_ts && seek + seek_delta + delta_min >= seek_end) // end of audio reached (100ms)
|
||||
(has_ts && seek + seek_delta + 100 >= seek_end) // end of audio reached
|
||||
) {
|
||||
if (result_len == 0 && !params.no_timestamps) {
|
||||
if (seek + seek_delta + delta_min >= seek_end) {
|
||||
if (seek + seek_delta + 100 >= seek_end) {
|
||||
result_len = i + 1;
|
||||
} else {
|
||||
WHISPER_LOG_DEBUG("%s: decoder %d failed (result_len = 0)\n", __func__, j);
|
||||
@ -6377,7 +6375,7 @@ int whisper_full_with_state(
|
||||
}
|
||||
}
|
||||
|
||||
// ref: https://github.com/ggml-org/whisper.cpp/pull/2629
|
||||
// ref: https://github.com/ggerganov/whisper.cpp/pull/2629
|
||||
const bool single_timestamp_ending = tokens_cur.size() > 1 &&
|
||||
tokens_cur[tokens_cur.size() - 2].id < whisper_token_beg(ctx) &&
|
||||
tokens_cur[tokens_cur.size() - 1].id > whisper_token_beg(ctx);
|
||||
|
6
tests/librispeech/.gitignore
vendored
6
tests/librispeech/.gitignore
vendored
@ -1,6 +0,0 @@
|
||||
__pycache__
|
||||
*.tar.gz
|
||||
*.txt
|
||||
eval.conf
|
||||
venv
|
||||
LibriSpeech
|
@ -1,15 +0,0 @@
|
||||
TAR_URL = https://www.openslr.org/resources/12/test-clean.tar.gz
|
||||
|
||||
all: eval
|
||||
|
||||
eval:
|
||||
$(MAKE) -f eval.mk
|
||||
|
||||
clean:
|
||||
$(MAKE) -f eval.mk clean
|
||||
|
||||
get-audio:
|
||||
wget -c $(TAR_URL)
|
||||
tar -xf test-clean.tar.gz
|
||||
|
||||
.PHONY: all eval clean setup-venv clean-venv get-audio
|
@ -1,60 +0,0 @@
|
||||
# whisper.cpp/tests/librispeech
|
||||
|
||||
[LibriSpeech](https://www.openslr.org/12) is a standard dataset for
|
||||
training and evaluating automatic speech recognition systems.
|
||||
|
||||
This directory contains a set of tools to evaluate the recognition
|
||||
performance of whisper.cpp on LibriSpeech corpus.
|
||||
|
||||
## Quick Start
|
||||
|
||||
1. (Pre-requirement) Compile `whisper-cli` and prepare the Whisper
|
||||
model in `ggml` format.
|
||||
|
||||
```
|
||||
$ # Execute the commands below in the project root dir.
|
||||
$ cmake -B build
|
||||
$ cmake --build build --config Release
|
||||
$ ./models/download-ggml-model.sh tiny
|
||||
```
|
||||
|
||||
Consult [whisper.cpp/README.md](../../README.md) for more details.
|
||||
|
||||
2. Download the audio files from LibriSpeech project.
|
||||
|
||||
```
|
||||
$ make get-audio
|
||||
```
|
||||
|
||||
3. Set up the environment to compute WER score.
|
||||
|
||||
```
|
||||
$ pip install -r requirements.txt
|
||||
```
|
||||
|
||||
For example, if you use `virtualenv`, you can set up it as follows:
|
||||
|
||||
```
|
||||
$ python3 -m venv venv
|
||||
$ . venv/bin/activate
|
||||
$ pip install -r requirements.txt
|
||||
```
|
||||
|
||||
4. Run the benchmark test.
|
||||
|
||||
```
|
||||
$ make
|
||||
```
|
||||
|
||||
## How-to guides
|
||||
|
||||
### How to change the inferece parameters
|
||||
|
||||
Create `eval.conf` and override variables.
|
||||
|
||||
```
|
||||
WHISPER_MODEL = large-v3-turbo
|
||||
WHISPER_FLAGS = --no-prints --threads 8 --language en --output-txt
|
||||
```
|
||||
|
||||
Check out `eval.mk` for more details.
|
@ -1,39 +0,0 @@
|
||||
PYTHON = python
|
||||
|
||||
WHISPER_PREFIX = ../../
|
||||
WHISPER_MODEL = tiny
|
||||
|
||||
WHISPER_CLI = $(WHISPER_PREFIX)build/bin/whisper-cli
|
||||
WHISPER_FLAGS = --no-prints --language en --output-txt
|
||||
|
||||
# You can create eval.conf to override the WHISPER_* variables
|
||||
# defined above.
|
||||
-include eval.conf
|
||||
|
||||
# This follows the file structure of the LibriSpeech project.
|
||||
AUDIO_SRCS = $(sort $(wildcard LibriSpeech/*/*/*/*.flac))
|
||||
TRANS_TXTS = $(addsuffix .txt, $(AUDIO_SRCS))
|
||||
|
||||
# We output the evaluation result to this file.
|
||||
DONE = $(WHISPER_MODEL).txt
|
||||
|
||||
all: $(DONE)
|
||||
|
||||
$(DONE): $(TRANS_TXTS)
|
||||
$(PYTHON) eval.py > $@.tmp
|
||||
mv $@.tmp $@
|
||||
|
||||
# Note: This task writes to a temporary file first to
|
||||
# create the target file atomically.
|
||||
%.flac.txt: %.flac
|
||||
$(WHISPER_CLI) $(WHISPER_FLAGS) --model $(WHISPER_PREFIX)models/ggml-$(WHISPER_MODEL).bin --file $^ --output-file $^.tmp
|
||||
mv $^.tmp.txt $^.txt
|
||||
|
||||
archive:
|
||||
tar -czf $(WHISPER_MODEL).tar.gz --exclude="*.flac" LibriSpeech $(DONE)
|
||||
|
||||
clean:
|
||||
@rm -f $(TRANS_TXTS)
|
||||
@rm -f $(DONE)
|
||||
|
||||
.PHONY: all clean
|
@ -1,47 +0,0 @@
|
||||
import os
|
||||
import glob
|
||||
import jiwer
|
||||
from normalizers import EnglishTextNormalizer
|
||||
|
||||
def get_reference():
|
||||
ref = {}
|
||||
for path in glob.glob('LibriSpeech/*/*/*/*.trans.txt'):
|
||||
with open(path) as fp:
|
||||
for line in fp:
|
||||
code, text = line.strip().split(" ", maxsplit=1)
|
||||
ref [code] = text
|
||||
return ref
|
||||
|
||||
def get_hypothesis():
|
||||
hyp = {}
|
||||
for path in glob.glob('LibriSpeech/*/*/*/*.flac.txt'):
|
||||
with open(path) as fp:
|
||||
text = fp.read().strip()
|
||||
code = os.path.basename(path).replace('.flac.txt', '')
|
||||
hyp[code] = text
|
||||
return hyp
|
||||
|
||||
def get_codes():
|
||||
codes = []
|
||||
for path in glob.glob('LibriSpeech/*/*/*/*.flac'):
|
||||
codes.append(os.path.basename(path).replace('.flac', ''))
|
||||
return sorted(codes)
|
||||
|
||||
def main():
|
||||
normalizer = EnglishTextNormalizer()
|
||||
|
||||
ref_orig = get_reference()
|
||||
hyp_orig = get_hypothesis()
|
||||
|
||||
ref_clean = []
|
||||
hyp_clean = []
|
||||
|
||||
for code in get_codes():
|
||||
ref_clean.append(normalizer(ref_orig[code]))
|
||||
hyp_clean.append(normalizer(hyp_orig[code]))
|
||||
|
||||
wer = jiwer.wer(ref_clean, hyp_clean)
|
||||
print(f"WER: {wer * 100:.2f}%")
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
@ -1,25 +0,0 @@
|
||||
Code in this directory is adapted from OpenAI Whisper project
|
||||
(https://github.com/openai/whisper) and carries the following
|
||||
copyright and license.
|
||||
|
||||
MIT License
|
||||
|
||||
Copyright (c) 2022 OpenAI
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
@ -1,2 +0,0 @@
|
||||
from .basic import BasicTextNormalizer as BasicTextNormalizer
|
||||
from .english import EnglishTextNormalizer as EnglishTextNormalizer
|
@ -1,80 +0,0 @@
|
||||
import re
|
||||
import unicodedata
|
||||
|
||||
import regex
|
||||
|
||||
# non-ASCII letters that are not separated by "NFKD" normalization
|
||||
ADDITIONAL_DIACRITICS = {
|
||||
"œ": "oe",
|
||||
"Œ": "OE",
|
||||
"ø": "o",
|
||||
"Ø": "O",
|
||||
"æ": "ae",
|
||||
"Æ": "AE",
|
||||
"ß": "ss",
|
||||
"ẞ": "SS",
|
||||
"đ": "d",
|
||||
"Đ": "D",
|
||||
"ð": "d",
|
||||
"Ð": "D",
|
||||
"þ": "th",
|
||||
"Þ": "th",
|
||||
"ł": "l",
|
||||
"Ł": "L",
|
||||
}
|
||||
|
||||
|
||||
def remove_symbols_and_diacritics(s: str, keep=""):
|
||||
"""
|
||||
Replace any other markers, symbols, and punctuations with a space,
|
||||
and drop any diacritics (category 'Mn' and some manual mappings)
|
||||
"""
|
||||
return "".join(
|
||||
(
|
||||
c
|
||||
if c in keep
|
||||
else (
|
||||
ADDITIONAL_DIACRITICS[c]
|
||||
if c in ADDITIONAL_DIACRITICS
|
||||
else (
|
||||
""
|
||||
if unicodedata.category(c) == "Mn"
|
||||
else " " if unicodedata.category(c)[0] in "MSP" else c
|
||||
)
|
||||
)
|
||||
)
|
||||
for c in unicodedata.normalize("NFKD", s)
|
||||
)
|
||||
|
||||
|
||||
def remove_symbols(s: str):
|
||||
"""
|
||||
Replace any other markers, symbols, punctuations with a space, keeping diacritics
|
||||
"""
|
||||
return "".join(
|
||||
" " if unicodedata.category(c)[0] in "MSP" else c
|
||||
for c in unicodedata.normalize("NFKC", s)
|
||||
)
|
||||
|
||||
|
||||
class BasicTextNormalizer:
|
||||
def __init__(self, remove_diacritics: bool = False, split_letters: bool = False):
|
||||
self.clean = (
|
||||
remove_symbols_and_diacritics if remove_diacritics else remove_symbols
|
||||
)
|
||||
self.split_letters = split_letters
|
||||
|
||||
def __call__(self, s: str):
|
||||
s = s.lower()
|
||||
s = re.sub(r"[<\[][^>\]]*[>\]]", "", s) # remove words between brackets
|
||||
s = re.sub(r"\(([^)]+?)\)", "", s) # remove words between parenthesis
|
||||
s = self.clean(s).lower()
|
||||
|
||||
if self.split_letters:
|
||||
s = " ".join(regex.findall(r"\X", s, regex.U))
|
||||
|
||||
s = re.sub(
|
||||
r"\s+", " ", s
|
||||
) # replace any successive whitespace characters with a space
|
||||
|
||||
return s
|
File diff suppressed because it is too large
Load Diff
@ -1,550 +0,0 @@
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
from fractions import Fraction
|
||||
from typing import Iterator, List, Match, Optional, Union
|
||||
|
||||
from more_itertools import windowed
|
||||
|
||||
from .basic import remove_symbols_and_diacritics
|
||||
|
||||
|
||||
class EnglishNumberNormalizer:
|
||||
"""
|
||||
Convert any spelled-out numbers into arabic numbers, while handling:
|
||||
|
||||
- remove any commas
|
||||
- keep the suffixes such as: `1960s`, `274th`, `32nd`, etc.
|
||||
- spell out currency symbols after the number. e.g. `$20 million` -> `20000000 dollars`
|
||||
- spell out `one` and `ones`
|
||||
- interpret successive single-digit numbers as nominal: `one oh one` -> `101`
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
|
||||
self.zeros = {"o", "oh", "zero"}
|
||||
self.ones = {
|
||||
name: i
|
||||
for i, name in enumerate(
|
||||
[
|
||||
"one",
|
||||
"two",
|
||||
"three",
|
||||
"four",
|
||||
"five",
|
||||
"six",
|
||||
"seven",
|
||||
"eight",
|
||||
"nine",
|
||||
"ten",
|
||||
"eleven",
|
||||
"twelve",
|
||||
"thirteen",
|
||||
"fourteen",
|
||||
"fifteen",
|
||||
"sixteen",
|
||||
"seventeen",
|
||||
"eighteen",
|
||||
"nineteen",
|
||||
],
|
||||
start=1,
|
||||
)
|
||||
}
|
||||
self.ones_plural = {
|
||||
"sixes" if name == "six" else name + "s": (value, "s")
|
||||
for name, value in self.ones.items()
|
||||
}
|
||||
self.ones_ordinal = {
|
||||
"zeroth": (0, "th"),
|
||||
"first": (1, "st"),
|
||||
"second": (2, "nd"),
|
||||
"third": (3, "rd"),
|
||||
"fifth": (5, "th"),
|
||||
"twelfth": (12, "th"),
|
||||
**{
|
||||
name + ("h" if name.endswith("t") else "th"): (value, "th")
|
||||
for name, value in self.ones.items()
|
||||
if value > 3 and value != 5 and value != 12
|
||||
},
|
||||
}
|
||||
self.ones_suffixed = {**self.ones_plural, **self.ones_ordinal}
|
||||
|
||||
self.tens = {
|
||||
"twenty": 20,
|
||||
"thirty": 30,
|
||||
"forty": 40,
|
||||
"fifty": 50,
|
||||
"sixty": 60,
|
||||
"seventy": 70,
|
||||
"eighty": 80,
|
||||
"ninety": 90,
|
||||
}
|
||||
self.tens_plural = {
|
||||
name.replace("y", "ies"): (value, "s") for name, value in self.tens.items()
|
||||
}
|
||||
self.tens_ordinal = {
|
||||
name.replace("y", "ieth"): (value, "th")
|
||||
for name, value in self.tens.items()
|
||||
}
|
||||
self.tens_suffixed = {**self.tens_plural, **self.tens_ordinal}
|
||||
|
||||
self.multipliers = {
|
||||
"hundred": 100,
|
||||
"thousand": 1_000,
|
||||
"million": 1_000_000,
|
||||
"billion": 1_000_000_000,
|
||||
"trillion": 1_000_000_000_000,
|
||||
"quadrillion": 1_000_000_000_000_000,
|
||||
"quintillion": 1_000_000_000_000_000_000,
|
||||
"sextillion": 1_000_000_000_000_000_000_000,
|
||||
"septillion": 1_000_000_000_000_000_000_000_000,
|
||||
"octillion": 1_000_000_000_000_000_000_000_000_000,
|
||||
"nonillion": 1_000_000_000_000_000_000_000_000_000_000,
|
||||
"decillion": 1_000_000_000_000_000_000_000_000_000_000_000,
|
||||
}
|
||||
self.multipliers_plural = {
|
||||
name + "s": (value, "s") for name, value in self.multipliers.items()
|
||||
}
|
||||
self.multipliers_ordinal = {
|
||||
name + "th": (value, "th") for name, value in self.multipliers.items()
|
||||
}
|
||||
self.multipliers_suffixed = {
|
||||
**self.multipliers_plural,
|
||||
**self.multipliers_ordinal,
|
||||
}
|
||||
self.decimals = {*self.ones, *self.tens, *self.zeros}
|
||||
|
||||
self.preceding_prefixers = {
|
||||
"minus": "-",
|
||||
"negative": "-",
|
||||
"plus": "+",
|
||||
"positive": "+",
|
||||
}
|
||||
self.following_prefixers = {
|
||||
"pound": "£",
|
||||
"pounds": "£",
|
||||
"euro": "€",
|
||||
"euros": "€",
|
||||
"dollar": "$",
|
||||
"dollars": "$",
|
||||
"cent": "¢",
|
||||
"cents": "¢",
|
||||
}
|
||||
self.prefixes = set(
|
||||
list(self.preceding_prefixers.values())
|
||||
+ list(self.following_prefixers.values())
|
||||
)
|
||||
self.suffixers = {
|
||||
"per": {"cent": "%"},
|
||||
"percent": "%",
|
||||
}
|
||||
self.specials = {"and", "double", "triple", "point"}
|
||||
|
||||
self.words = set(
|
||||
[
|
||||
key
|
||||
for mapping in [
|
||||
self.zeros,
|
||||
self.ones,
|
||||
self.ones_suffixed,
|
||||
self.tens,
|
||||
self.tens_suffixed,
|
||||
self.multipliers,
|
||||
self.multipliers_suffixed,
|
||||
self.preceding_prefixers,
|
||||
self.following_prefixers,
|
||||
self.suffixers,
|
||||
self.specials,
|
||||
]
|
||||
for key in mapping
|
||||
]
|
||||
)
|
||||
self.literal_words = {"one", "ones"}
|
||||
|
||||
def process_words(self, words: List[str]) -> Iterator[str]:
|
||||
prefix: Optional[str] = None
|
||||
value: Optional[Union[str, int]] = None
|
||||
skip = False
|
||||
|
||||
def to_fraction(s: str):
|
||||
try:
|
||||
return Fraction(s)
|
||||
except ValueError:
|
||||
return None
|
||||
|
||||
def output(result: Union[str, int]):
|
||||
nonlocal prefix, value
|
||||
result = str(result)
|
||||
if prefix is not None:
|
||||
result = prefix + result
|
||||
value = None
|
||||
prefix = None
|
||||
return result
|
||||
|
||||
if len(words) == 0:
|
||||
return
|
||||
|
||||
for prev, current, next in windowed([None] + words + [None], 3):
|
||||
if skip:
|
||||
skip = False
|
||||
continue
|
||||
|
||||
next_is_numeric = next is not None and re.match(r"^\d+(\.\d+)?$", next)
|
||||
has_prefix = current[0] in self.prefixes
|
||||
current_without_prefix = current[1:] if has_prefix else current
|
||||
if re.match(r"^\d+(\.\d+)?$", current_without_prefix):
|
||||
# arabic numbers (potentially with signs and fractions)
|
||||
f = to_fraction(current_without_prefix)
|
||||
assert f is not None
|
||||
if value is not None:
|
||||
if isinstance(value, str) and value.endswith("."):
|
||||
# concatenate decimals / ip address components
|
||||
value = str(value) + str(current)
|
||||
continue
|
||||
else:
|
||||
yield output(value)
|
||||
|
||||
prefix = current[0] if has_prefix else prefix
|
||||
if f.denominator == 1:
|
||||
value = f.numerator # store integers as int
|
||||
else:
|
||||
value = current_without_prefix
|
||||
elif current not in self.words:
|
||||
# non-numeric words
|
||||
if value is not None:
|
||||
yield output(value)
|
||||
yield output(current)
|
||||
elif current in self.zeros:
|
||||
value = str(value or "") + "0"
|
||||
elif current in self.ones:
|
||||
ones = self.ones[current]
|
||||
|
||||
if value is None:
|
||||
value = ones
|
||||
elif isinstance(value, str) or prev in self.ones:
|
||||
if (
|
||||
prev in self.tens and ones < 10
|
||||
): # replace the last zero with the digit
|
||||
assert value[-1] == "0"
|
||||
value = value[:-1] + str(ones)
|
||||
else:
|
||||
value = str(value) + str(ones)
|
||||
elif ones < 10:
|
||||
if value % 10 == 0:
|
||||
value += ones
|
||||
else:
|
||||
value = str(value) + str(ones)
|
||||
else: # eleven to nineteen
|
||||
if value % 100 == 0:
|
||||
value += ones
|
||||
else:
|
||||
value = str(value) + str(ones)
|
||||
elif current in self.ones_suffixed:
|
||||
# ordinal or cardinal; yield the number right away
|
||||
ones, suffix = self.ones_suffixed[current]
|
||||
if value is None:
|
||||
yield output(str(ones) + suffix)
|
||||
elif isinstance(value, str) or prev in self.ones:
|
||||
if prev in self.tens and ones < 10:
|
||||
assert value[-1] == "0"
|
||||
yield output(value[:-1] + str(ones) + suffix)
|
||||
else:
|
||||
yield output(str(value) + str(ones) + suffix)
|
||||
elif ones < 10:
|
||||
if value % 10 == 0:
|
||||
yield output(str(value + ones) + suffix)
|
||||
else:
|
||||
yield output(str(value) + str(ones) + suffix)
|
||||
else: # eleven to nineteen
|
||||
if value % 100 == 0:
|
||||
yield output(str(value + ones) + suffix)
|
||||
else:
|
||||
yield output(str(value) + str(ones) + suffix)
|
||||
value = None
|
||||
elif current in self.tens:
|
||||
tens = self.tens[current]
|
||||
if value is None:
|
||||
value = tens
|
||||
elif isinstance(value, str):
|
||||
value = str(value) + str(tens)
|
||||
else:
|
||||
if value % 100 == 0:
|
||||
value += tens
|
||||
else:
|
||||
value = str(value) + str(tens)
|
||||
elif current in self.tens_suffixed:
|
||||
# ordinal or cardinal; yield the number right away
|
||||
tens, suffix = self.tens_suffixed[current]
|
||||
if value is None:
|
||||
yield output(str(tens) + suffix)
|
||||
elif isinstance(value, str):
|
||||
yield output(str(value) + str(tens) + suffix)
|
||||
else:
|
||||
if value % 100 == 0:
|
||||
yield output(str(value + tens) + suffix)
|
||||
else:
|
||||
yield output(str(value) + str(tens) + suffix)
|
||||
elif current in self.multipliers:
|
||||
multiplier = self.multipliers[current]
|
||||
if value is None:
|
||||
value = multiplier
|
||||
elif isinstance(value, str) or value == 0:
|
||||
f = to_fraction(value)
|
||||
p = f * multiplier if f is not None else None
|
||||
if f is not None and p.denominator == 1:
|
||||
value = p.numerator
|
||||
else:
|
||||
yield output(value)
|
||||
value = multiplier
|
||||
else:
|
||||
before = value // 1000 * 1000
|
||||
residual = value % 1000
|
||||
value = before + residual * multiplier
|
||||
elif current in self.multipliers_suffixed:
|
||||
multiplier, suffix = self.multipliers_suffixed[current]
|
||||
if value is None:
|
||||
yield output(str(multiplier) + suffix)
|
||||
elif isinstance(value, str):
|
||||
f = to_fraction(value)
|
||||
p = f * multiplier if f is not None else None
|
||||
if f is not None and p.denominator == 1:
|
||||
yield output(str(p.numerator) + suffix)
|
||||
else:
|
||||
yield output(value)
|
||||
yield output(str(multiplier) + suffix)
|
||||
else: # int
|
||||
before = value // 1000 * 1000
|
||||
residual = value % 1000
|
||||
value = before + residual * multiplier
|
||||
yield output(str(value) + suffix)
|
||||
value = None
|
||||
elif current in self.preceding_prefixers:
|
||||
# apply prefix (positive, minus, etc.) if it precedes a number
|
||||
if value is not None:
|
||||
yield output(value)
|
||||
|
||||
if next in self.words or next_is_numeric:
|
||||
prefix = self.preceding_prefixers[current]
|
||||
else:
|
||||
yield output(current)
|
||||
elif current in self.following_prefixers:
|
||||
# apply prefix (dollars, cents, etc.) only after a number
|
||||
if value is not None:
|
||||
prefix = self.following_prefixers[current]
|
||||
yield output(value)
|
||||
else:
|
||||
yield output(current)
|
||||
elif current in self.suffixers:
|
||||
# apply suffix symbols (percent -> '%')
|
||||
if value is not None:
|
||||
suffix = self.suffixers[current]
|
||||
if isinstance(suffix, dict):
|
||||
if next in suffix:
|
||||
yield output(str(value) + suffix[next])
|
||||
skip = True
|
||||
else:
|
||||
yield output(value)
|
||||
yield output(current)
|
||||
else:
|
||||
yield output(str(value) + suffix)
|
||||
else:
|
||||
yield output(current)
|
||||
elif current in self.specials:
|
||||
if next not in self.words and not next_is_numeric:
|
||||
# apply special handling only if the next word can be numeric
|
||||
if value is not None:
|
||||
yield output(value)
|
||||
yield output(current)
|
||||
elif current == "and":
|
||||
# ignore "and" after hundreds, thousands, etc.
|
||||
if prev not in self.multipliers:
|
||||
if value is not None:
|
||||
yield output(value)
|
||||
yield output(current)
|
||||
elif current == "double" or current == "triple":
|
||||
if next in self.ones or next in self.zeros:
|
||||
repeats = 2 if current == "double" else 3
|
||||
ones = self.ones.get(next, 0)
|
||||
value = str(value or "") + str(ones) * repeats
|
||||
skip = True
|
||||
else:
|
||||
if value is not None:
|
||||
yield output(value)
|
||||
yield output(current)
|
||||
elif current == "point":
|
||||
if next in self.decimals or next_is_numeric:
|
||||
value = str(value or "") + "."
|
||||
else:
|
||||
# should all have been covered at this point
|
||||
raise ValueError(f"Unexpected token: {current}")
|
||||
else:
|
||||
# all should have been covered at this point
|
||||
raise ValueError(f"Unexpected token: {current}")
|
||||
|
||||
if value is not None:
|
||||
yield output(value)
|
||||
|
||||
def preprocess(self, s: str):
|
||||
# replace "<number> and a half" with "<number> point five"
|
||||
results = []
|
||||
|
||||
segments = re.split(r"\band\s+a\s+half\b", s)
|
||||
for i, segment in enumerate(segments):
|
||||
if len(segment.strip()) == 0:
|
||||
continue
|
||||
if i == len(segments) - 1:
|
||||
results.append(segment)
|
||||
else:
|
||||
results.append(segment)
|
||||
last_word = segment.rsplit(maxsplit=2)[-1]
|
||||
if last_word in self.decimals or last_word in self.multipliers:
|
||||
results.append("point five")
|
||||
else:
|
||||
results.append("and a half")
|
||||
|
||||
s = " ".join(results)
|
||||
|
||||
# put a space at number/letter boundary
|
||||
s = re.sub(r"([a-z])([0-9])", r"\1 \2", s)
|
||||
s = re.sub(r"([0-9])([a-z])", r"\1 \2", s)
|
||||
|
||||
# but remove spaces which could be a suffix
|
||||
s = re.sub(r"([0-9])\s+(st|nd|rd|th|s)\b", r"\1\2", s)
|
||||
|
||||
return s
|
||||
|
||||
def postprocess(self, s: str):
|
||||
def combine_cents(m: Match):
|
||||
try:
|
||||
currency = m.group(1)
|
||||
integer = m.group(2)
|
||||
cents = int(m.group(3))
|
||||
return f"{currency}{integer}.{cents:02d}"
|
||||
except ValueError:
|
||||
return m.string
|
||||
|
||||
def extract_cents(m: Match):
|
||||
try:
|
||||
return f"¢{int(m.group(1))}"
|
||||
except ValueError:
|
||||
return m.string
|
||||
|
||||
# apply currency postprocessing; "$2 and ¢7" -> "$2.07"
|
||||
s = re.sub(r"([€£$])([0-9]+) (?:and )?¢([0-9]{1,2})\b", combine_cents, s)
|
||||
s = re.sub(r"[€£$]0.([0-9]{1,2})\b", extract_cents, s)
|
||||
|
||||
# write "one(s)" instead of "1(s)", just for the readability
|
||||
s = re.sub(r"\b1(s?)\b", r"one\1", s)
|
||||
|
||||
return s
|
||||
|
||||
def __call__(self, s: str):
|
||||
s = self.preprocess(s)
|
||||
s = " ".join(word for word in self.process_words(s.split()) if word is not None)
|
||||
s = self.postprocess(s)
|
||||
|
||||
return s
|
||||
|
||||
|
||||
class EnglishSpellingNormalizer:
|
||||
"""
|
||||
Applies British-American spelling mappings as listed in [1].
|
||||
|
||||
[1] https://www.tysto.com/uk-us-spelling-list.html
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
mapping_path = os.path.join(os.path.dirname(__file__), "english.json")
|
||||
self.mapping = json.load(open(mapping_path))
|
||||
|
||||
def __call__(self, s: str):
|
||||
return " ".join(self.mapping.get(word, word) for word in s.split())
|
||||
|
||||
|
||||
class EnglishTextNormalizer:
|
||||
def __init__(self):
|
||||
self.ignore_patterns = r"\b(hmm|mm|mhm|mmm|uh|um)\b"
|
||||
self.replacers = {
|
||||
# common contractions
|
||||
r"\bwon't\b": "will not",
|
||||
r"\bcan't\b": "can not",
|
||||
r"\blet's\b": "let us",
|
||||
r"\bain't\b": "aint",
|
||||
r"\by'all\b": "you all",
|
||||
r"\bwanna\b": "want to",
|
||||
r"\bgotta\b": "got to",
|
||||
r"\bgonna\b": "going to",
|
||||
r"\bi'ma\b": "i am going to",
|
||||
r"\bimma\b": "i am going to",
|
||||
r"\bwoulda\b": "would have",
|
||||
r"\bcoulda\b": "could have",
|
||||
r"\bshoulda\b": "should have",
|
||||
r"\bma'am\b": "madam",
|
||||
# contractions in titles/prefixes
|
||||
r"\bmr\b": "mister ",
|
||||
r"\bmrs\b": "missus ",
|
||||
r"\bst\b": "saint ",
|
||||
r"\bdr\b": "doctor ",
|
||||
r"\bprof\b": "professor ",
|
||||
r"\bcapt\b": "captain ",
|
||||
r"\bgov\b": "governor ",
|
||||
r"\bald\b": "alderman ",
|
||||
r"\bgen\b": "general ",
|
||||
r"\bsen\b": "senator ",
|
||||
r"\brep\b": "representative ",
|
||||
r"\bpres\b": "president ",
|
||||
r"\brev\b": "reverend ",
|
||||
r"\bhon\b": "honorable ",
|
||||
r"\basst\b": "assistant ",
|
||||
r"\bassoc\b": "associate ",
|
||||
r"\blt\b": "lieutenant ",
|
||||
r"\bcol\b": "colonel ",
|
||||
r"\bjr\b": "junior ",
|
||||
r"\bsr\b": "senior ",
|
||||
r"\besq\b": "esquire ",
|
||||
# prefect tenses, ideally it should be any past participles, but it's harder..
|
||||
r"'d been\b": " had been",
|
||||
r"'s been\b": " has been",
|
||||
r"'d gone\b": " had gone",
|
||||
r"'s gone\b": " has gone",
|
||||
r"'d done\b": " had done", # "'s done" is ambiguous
|
||||
r"'s got\b": " has got",
|
||||
# general contractions
|
||||
r"n't\b": " not",
|
||||
r"'re\b": " are",
|
||||
r"'s\b": " is",
|
||||
r"'d\b": " would",
|
||||
r"'ll\b": " will",
|
||||
r"'t\b": " not",
|
||||
r"'ve\b": " have",
|
||||
r"'m\b": " am",
|
||||
}
|
||||
self.standardize_numbers = EnglishNumberNormalizer()
|
||||
self.standardize_spellings = EnglishSpellingNormalizer()
|
||||
|
||||
def __call__(self, s: str):
|
||||
s = s.lower()
|
||||
|
||||
s = re.sub(r"[<\[][^>\]]*[>\]]", "", s) # remove words between brackets
|
||||
s = re.sub(r"\(([^)]+?)\)", "", s) # remove words between parenthesis
|
||||
s = re.sub(self.ignore_patterns, "", s)
|
||||
s = re.sub(r"\s+'", "'", s) # when there's a space before an apostrophe
|
||||
|
||||
for pattern, replacement in self.replacers.items():
|
||||
s = re.sub(pattern, replacement, s)
|
||||
|
||||
s = re.sub(r"(\d),(\d)", r"\1\2", s) # remove commas between digits
|
||||
s = re.sub(r"\.([^0-9]|$)", r" \1", s) # remove periods not followed by numbers
|
||||
s = remove_symbols_and_diacritics(s, keep=".%$¢€£") # keep numeric symbols
|
||||
|
||||
s = self.standardize_numbers(s)
|
||||
s = self.standardize_spellings(s)
|
||||
|
||||
# now remove prefix/suffix symbols that are not preceded/followed by numbers
|
||||
s = re.sub(r"[.$¢€£]([^0-9])", r" \1", s)
|
||||
s = re.sub(r"([^0-9])%", r"\1 ", s)
|
||||
|
||||
s = re.sub(r"\s+", " ", s) # replace any successive whitespaces with a space
|
||||
|
||||
return s
|
@ -1,6 +0,0 @@
|
||||
# This is the minimal set of dependencies we need to compute
|
||||
# WER score. Read Section 3.2. of the original paper
|
||||
# (https://arxiv.org/abs/2212.04356) for more contexts.
|
||||
jiwer
|
||||
regex
|
||||
more-itertools
|
Reference in New Issue
Block a user