This commit add support for language detection in the Whisper Node.js
addon example. It also updates the node addon to return an object
instead of an array as the results.
The motivation for this change is to enable the inclusion of the
detected language in the result, in addition to the transcription
segments.
For example, when using the `detect_language` option, the result will
now be:
```console
{ language: 'en' }
```
And if the `language` option is set to "auto", it will also return:
```console
{
language: 'en',
transcription: [
[
'00:00:00.000',
'00:00:07.600',
' And so my fellow Americans, ask not what your country can do for you,'
],
[
'00:00:07.600',
'00:00:10.600',
' ask what you can do for your country.'
]
]
}
```
* threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling
We talked about adding LOW priority for GGML threads in the original threadpool PR.
It might be useful for some cases to avoid contention.
Latest Windows ARM64 releases started parking (offlining) the CPU cores
more aggresively which results in suboptimal performance with n_threads > 4.
To deal with that we now disable Power Throttling for our threads for the NORMAL
and higher priorities.
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* threading: disable SetThreadInfo() calls for older Windows versions
* Update tools/llama-bench/llama-bench.cpp
Co-authored-by: Diego Devesa <slarengh@gmail.com>
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* 1. add "integrated" in ggml_cuda_device_info for distinguish whether it is Intergrate_gpu or discrete_gpu
2. Adjust the func:"ggml_backend_cuda_device_supports_buft" for this new feature
* Update ggml/src/ggml-cuda/ggml-cuda.cu
Adjusted code indentation
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* Update ggml/src/ggml-cuda/ggml-cuda.cu
Fixed incorrect setting of variable types
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* Update ggml/src/ggml-cuda/ggml-cuda.cu
Adjusted the judgment logic
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* add a host_buft assert in case of integrated_cuda_device with func:'evaluate_and_capture_cuda_graph()'
* Update ggml/src/ggml-cuda/ggml-cuda.cu
Add a defensive security assert
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* Update ggml/src/ggml-cuda/ggml-cuda.cu
Adjusted the support judgment logic.
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* revoke the suggest commit changes due to it's not applicable in jetson_device
* Update ggml/src/ggml-cuda/ggml-cuda.cu
Add parentheses to enforce operator precedence
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* Update ggml/src/ggml-cuda/ggml-cuda.cu
Fix ci bug: add a spaces
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
---------
Co-authored-by: yangxiao <yang_xl@tju.edu.cn>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Co-authored-by: yangxiao <yangxl_zz@qq.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* SYCL: Add mrope kernel
* feat: Optimize rope operations with vectorization
Uses `sycl::vec` to load and store two elements at a time,
significantly improving performance in `rope_norm`,
`rope_neox`, and `rope_multi`. This reduces the number of memory
accesses and leverages SIMD instructions for faster execution.
* Use ceil_div
* cmake: Define function for querying architecture
The tests and results match exactly those of src/CMakeLists.txt
* Switch arch detection over to new function
* Prevent overflow
* Fix memsize of Whisper::Context
* Rename xxx_initialize to more Ruby-esque name: xxx_s_new
* Define Whisper::Model::ZipURI
* Define Whisper::Model.coreml_compiled_models
* Make Options' @cmake_options Hash
* Use --{enable,disable}-whisper-coreml option for -I/opt/homebrew/opt/llvm/include
* Prepare Core ML model if enabled
* Add test for ZipURI
* Add signatures for ZipURI
* Add Whisper.system_info_str
* Add test for Whisper.system_info_str
* Add signagure for Model.coreml_compiled_models
* Add signature for Whisper.system_info_str
* Add test for Core ML
* Update date
* Maintain .gitignore
* vad : revisit timestamp alignment/mapping
This commit improving the timestamp alignment by introducing a mapping
table, adding intermediate reference points for longer segments, and
binary search for lookups.
The motivation for this changes is to address issues with the currently
solution where zero-length segments are possible, and also to improve
the precision of the VAD timestamps.
Refs: https://github.com/ggml-org/whisper.cpp/issues/3162
* vad : use uint64_t for time mapping
This commit changes the type of the `processed_time` and `original_time`
fields in the `vad_time_mapping` struct from `double` to `uint64_t`.
The motivation for this change is made to improve precision and avoid
floating-point inaccuracies and also be consistent with other part of
the code base that use `uint64_t` for time representation.
This is a part of a refactoring where I'm also going to change the
vad_segment_info struct to use `uint64_t` for the start and end times.
This is the reason for the not so pleasant conversion and casts in the
code at the moment.
* vad : change vad_segment_info and whisper_vad_segment to use uint64_t
* vad : use int64_t instead of uint64_t for timestamps
To be consistent with other timestamps in the codebase.
* vad : add centisecond conversion functions
* vad : extract vad processing from whisper_full_with_state
This commit extracts the VAD processing from the
`whisper_full_with_state` function into the `whisper_full` and
`whisper_full_parallel` functions.
The motivation for this is that I did not take into account that when
`whisper_full_parallel` is called with `n_processors > 1`, then the
vad processing would not be applied correctly. Instead the VAD
processing should be done prior to processing in the case of
`whisper_full_parallel`.
* vad : remove filtered_n_samples from whisper_vad
The commit removes the parameter `filtered_n_samples` from the
`whisper_vad` function signature and its usage, as it is no longer
needed since filtered samples is now a vector (previously it was a
float*)
The motivation for this is to simplify the usage of this function.
* vad : remove vad_mapping_table_initialized flag
* vad : fix leaning (none) of pointer/references
* Don't pass empty string to cmake command
* Refactor Dependencies
* Use found cmake path for options
* Maintain extsources.rb
* List dependent files by directory separator agnostic way
* Prepend whitespace before '='
* Handle build options on install
* Remove useless test
* Retrieve gem file name and version from spec file
* Bump version to 1.3.3
* Update date
* Add install option examples
* [skip ci]Remove unused module
* Add VAD models
* Extract function to normalize model path from ruby_whisper_initialize()
* Define ruby_whisper_vad_params struct
* Add VAD-related features to Whisper::Params
* Add tests for VAD-related features
* Define Whisper::VADParams
* Add Whisper::VAD::Params attributes
* Add test suite for VAD::Params
* Make older test to follow namespace change
* Add test for transcription with VAD
* Add assertion for test_vad_params
* Add signatures for VAD-related methods
* Define VAD::Params#==
* Add test for VAD::Params#==
* Fix Params#vad_params
* Add test for Params#vad_params
* Fix signature of Params#vad_params
* Use macro to define VAD::Params params
* Define VAD::Params#initialize
* Add tests for VAD::Params#initialize
* Add signature for VAD::Params.new
* Add documentation on VAD in README
* Wrap register_callbask in prepare_transcription for clear meanings
* Set whisper_params.vad_params just before transcription
* Don't touch NULL
* Define ruby_whisper_params_type
* Use TypedData_XXX for ruby_whisper_params instead of Data_XXX
* Remove unused functions
* Define rb_whisper_model_data_type
* Use TypedData_XXX for ruby_whisper_model instead of Data_XXX
* Define ruby_whisper_segment_type
* Use TypedData_XXX for ruby_whisper_segment instead of Data_XXX
* Define ruby_whisper_type
* Use TypedData_XXX for ruby_whisper instead of Data_XXX
* Qualify with const
* tests : add a new benchmark test for long-form audio
Based on "Earnings-21" corpus by Del Rio et al.
Earnings-21: A Practical Benchmark for ASR in the Wild (2021)
https://arxiv.org/abs/2104.11348
This dataset contains 39 hours of long-form speech, sourced from public
earning calls. Each recording contains roughly 50 minutes of English
dialogues between multiple speakers (2-20 persons).
This benchmark suite should allow us to evaluate the performance of
whisper.cpp on long-form audio data.
Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net>
* tests : apply PR feedback to 'earnings21/README.md'
Based on feedback from Daniel Bevenius.
- Simplify how to download & prepare a Silero VAD model.
- Fix typo: inferece -> inference
Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net>
* tests : avoid crashing on non-UTF-8 characters
Based on feedback from Daniel Bevenius.
Add 'errors' parameter to open() in order to avoid unhandled
exception on invalid UTF-8 bytes.
Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net>
* tests : try to interpret the hypothesis as Windows-1252
Based on the discussion in PR#3185.
Evidently Whisper.cpp can represent a quotation mark as '0x93', which
implifies Windows-1252 (Microsoft's ASCII excention), and cannot be
decoded by UTF-8.
Add an explicit decoding loop to address the issue.
Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net>
---------
Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net>
This commit modifies windows-blas which was updated previously to use
the zip functionality provided by `actions/upload-artifact`. This turned
out to be incorrect and I should not have done that. The reason for
zipping the archives first is that otherwise the artifacts when
downloaded will be unzipped and just be simple directories. In our case
the release task depends on the artifacts having a .zip extension so
that those archives are include in the release.
* SYCL: Add non contiguous input support to norm kernel
* refactor and add RMS_NORM non contiguous input support
ggml-ci
* restore subgroup reduction for multi-subgroup thread blocks in norm kernels
* Swap grid dims of nsamples and nrows
ggml-ci
* Revert "Swap grid dims of nsamples and nrows"
This reverts commit 43be2d657fec7f7fba54e2cd154106bc0fc45adf.
* restore not required changes
ggml-ci
* address review comments: change it to more like SYCL
* Use a common function to calculate offset
* remove wrap around logic for handling broadcasts
* remove static from calculate_offset fn and use ceil_div
* cann: add the basic FA support
* cann: update the readme
* cann: update the FlashAttention with PSEShift
* cann: update the input parameters in FA
* cann: update the alibi with max_bias
* cann: add the constrints of softcap
* cann: update the docs CANN.md
* cann: update the docs CANN.md
* cann: fix typo of CANN.md
* cann: add some comments and update the CANN.md
* cann: update the CANN.md
* cann: update the inner precise for fusedInferAttention
* cann: update the constraints of flash_attn_ext on ggml-cann.cpp
* cann: clean the whitespace
* cann: clean the whitespace
* cann: add a new endline