* * ggml-vulkan: adds op CONV_TRANSPOSE_1D
* test-backend-ops: adds more spohisticated tests for CONV_TRANSPOSE_1D
* Missing barrier added to shader.
Number of additional tests reduced to 108.
* * Fixes typo in variable name.
* Removes extra whitespaces.
* Adds int64->int32 casts to prevent possible warnings.
* Problem size reduced in tests to pass tests with llvmpipe.
* supports_op condition moved from unintended position
* This is not needed by the normal use where the result is read
using `tensor_get`, but it allows perf mode of `test-backend-ops`
to properly measure performance.
Some systems report the CPU implementation as "Power11" instead of "POWER11".
The existing CMake logic uses a case-sensitive regular expression to extract
the CPU generation, which fails when the casing doesn't exactly match "POWER".
This patch provides a fix by first converting the string to uppercase before applying the regex.
Signed-off-by: root <root@rheldb2v.pperf.tadn.ibm.com>
Co-authored-by: root <root@rheldb2v.pperf.tadn.ibm.com>
* Fix indentation of code sample in document comment
* Make Whisper::Context#transcribe able to run non-parallel
* Add test for Whisper::Context#transcribe with parallel option
* Follow signature API change of Context#transcribe
* Remove useless variable assignment
* Move simple usage up in README
* Add need help section in README
* Add document on Context#transcribe's parallel option in README
* Update date
* Fix signature of Context.new
* Make Context#subscribe accept n_processors option
* Make test follow #transcribe's change
* Make RBS follow #transcribe's change
* Add document for #transcribe's n_processors option
* Rename test directory so that Rake tasks' default setting is used
This commit updates the build workflow to replace `ports.ubuntu.com`
with `mirror.kumi.systems` in the apt sources list for ARM64 builds.
The motivation for this change is intended to improve package download
reliability and speed by using a more stable mirror for ARM64 packages.
This pull request fixes a bug in the fullTranscribeWithTime method, where the whisperParams argument was declared but never used. As a result, the model did not apply the configuration defined in whisperParams.
This commit add support for language detection in the Whisper Node.js
addon example. It also updates the node addon to return an object
instead of an array as the results.
The motivation for this change is to enable the inclusion of the
detected language in the result, in addition to the transcription
segments.
For example, when using the `detect_language` option, the result will
now be:
```console
{ language: 'en' }
```
And if the `language` option is set to "auto", it will also return:
```console
{
language: 'en',
transcription: [
[
'00:00:00.000',
'00:00:07.600',
' And so my fellow Americans, ask not what your country can do for you,'
],
[
'00:00:07.600',
'00:00:10.600',
' ask what you can do for your country.'
]
]
}
```
* threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling
We talked about adding LOW priority for GGML threads in the original threadpool PR.
It might be useful for some cases to avoid contention.
Latest Windows ARM64 releases started parking (offlining) the CPU cores
more aggresively which results in suboptimal performance with n_threads > 4.
To deal with that we now disable Power Throttling for our threads for the NORMAL
and higher priorities.
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* threading: disable SetThreadInfo() calls for older Windows versions
* Update tools/llama-bench/llama-bench.cpp
Co-authored-by: Diego Devesa <slarengh@gmail.com>
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* 1. add "integrated" in ggml_cuda_device_info for distinguish whether it is Intergrate_gpu or discrete_gpu
2. Adjust the func:"ggml_backend_cuda_device_supports_buft" for this new feature
* Update ggml/src/ggml-cuda/ggml-cuda.cu
Adjusted code indentation
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* Update ggml/src/ggml-cuda/ggml-cuda.cu
Fixed incorrect setting of variable types
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* Update ggml/src/ggml-cuda/ggml-cuda.cu
Adjusted the judgment logic
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* add a host_buft assert in case of integrated_cuda_device with func:'evaluate_and_capture_cuda_graph()'
* Update ggml/src/ggml-cuda/ggml-cuda.cu
Add a defensive security assert
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* Update ggml/src/ggml-cuda/ggml-cuda.cu
Adjusted the support judgment logic.
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* revoke the suggest commit changes due to it's not applicable in jetson_device
* Update ggml/src/ggml-cuda/ggml-cuda.cu
Add parentheses to enforce operator precedence
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* Update ggml/src/ggml-cuda/ggml-cuda.cu
Fix ci bug: add a spaces
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
---------
Co-authored-by: yangxiao <yang_xl@tju.edu.cn>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Co-authored-by: yangxiao <yangxl_zz@qq.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* SYCL: Add mrope kernel
* feat: Optimize rope operations with vectorization
Uses `sycl::vec` to load and store two elements at a time,
significantly improving performance in `rope_norm`,
`rope_neox`, and `rope_multi`. This reduces the number of memory
accesses and leverages SIMD instructions for faster execution.
* Use ceil_div
* cmake: Define function for querying architecture
The tests and results match exactly those of src/CMakeLists.txt
* Switch arch detection over to new function
* Prevent overflow
* Fix memsize of Whisper::Context
* Rename xxx_initialize to more Ruby-esque name: xxx_s_new
* Define Whisper::Model::ZipURI
* Define Whisper::Model.coreml_compiled_models
* Make Options' @cmake_options Hash
* Use --{enable,disable}-whisper-coreml option for -I/opt/homebrew/opt/llvm/include
* Prepare Core ML model if enabled
* Add test for ZipURI
* Add signatures for ZipURI
* Add Whisper.system_info_str
* Add test for Whisper.system_info_str
* Add signagure for Model.coreml_compiled_models
* Add signature for Whisper.system_info_str
* Add test for Core ML
* Update date
* Maintain .gitignore
* vad : revisit timestamp alignment/mapping
This commit improving the timestamp alignment by introducing a mapping
table, adding intermediate reference points for longer segments, and
binary search for lookups.
The motivation for this changes is to address issues with the currently
solution where zero-length segments are possible, and also to improve
the precision of the VAD timestamps.
Refs: https://github.com/ggml-org/whisper.cpp/issues/3162
* vad : use uint64_t for time mapping
This commit changes the type of the `processed_time` and `original_time`
fields in the `vad_time_mapping` struct from `double` to `uint64_t`.
The motivation for this change is made to improve precision and avoid
floating-point inaccuracies and also be consistent with other part of
the code base that use `uint64_t` for time representation.
This is a part of a refactoring where I'm also going to change the
vad_segment_info struct to use `uint64_t` for the start and end times.
This is the reason for the not so pleasant conversion and casts in the
code at the moment.
* vad : change vad_segment_info and whisper_vad_segment to use uint64_t
* vad : use int64_t instead of uint64_t for timestamps
To be consistent with other timestamps in the codebase.
* vad : add centisecond conversion functions
* vad : extract vad processing from whisper_full_with_state
This commit extracts the VAD processing from the
`whisper_full_with_state` function into the `whisper_full` and
`whisper_full_parallel` functions.
The motivation for this is that I did not take into account that when
`whisper_full_parallel` is called with `n_processors > 1`, then the
vad processing would not be applied correctly. Instead the VAD
processing should be done prior to processing in the case of
`whisper_full_parallel`.
* vad : remove filtered_n_samples from whisper_vad
The commit removes the parameter `filtered_n_samples` from the
`whisper_vad` function signature and its usage, as it is no longer
needed since filtered samples is now a vector (previously it was a
float*)
The motivation for this is to simplify the usage of this function.
* vad : remove vad_mapping_table_initialized flag
* vad : fix leaning (none) of pointer/references
* Don't pass empty string to cmake command
* Refactor Dependencies
* Use found cmake path for options
* Maintain extsources.rb
* List dependent files by directory separator agnostic way
* Prepend whitespace before '='
* Handle build options on install
* Remove useless test
* Retrieve gem file name and version from spec file
* Bump version to 1.3.3
* Update date
* Add install option examples
* [skip ci]Remove unused module