* vad : add initial Voice Activity Detection (VAD) support
This commit add support for Voice Activity Detection (VAD). When enabled
this feature will process the audio input and detect speech segments.
This information is then used to reduce the number of samples that need
to be processed by whisper_full.
Resolves: https://github.com/ggml-org/whisper.cpp/issues/3003
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update PATH for main/main-cuda container
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Add Dockerfile for musa, .dockerignore and update CI
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Add Moore Threads GPU Support in README.md and replace ./main with whisper-cli
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Forward GGML_CUDA/GGML_MUSA to cmake in Makefile
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Minor updates for PATH ENV in Dockerfiles
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Address comments
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
FFmpeg integration was introduced in 1b51fdf by William Tambellini,
but not mentioned in the main documentation.
Add a short guide on how to enable the feature. Confirmed to work
on both Ubuntu 24.04 and Fedora 39.
Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net>
This adds a section to the README.md file that describes how to use the
XCFramework.
The modification for this is that is not obvious how to use the
XCFramework and and example will help.
One thing to note is that the example is using the latest release
including the checksum. We are thinking about how we might automate
this in the future but for now this is a good start.
This commit updates the recommended version of Python to 3.11 for Core
ML conversion support. It also adds the `-e` flag to the
`generate-coreml-model.sh` script to ensure that the script exits on the
first error.
The motivation for this that when following the installation instructions
using Python 3.10 I get the following error:
```console
(venv) $ ./models/generate-coreml-model.sh base.en
A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.1.3 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.
If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.
Traceback (most recent call last): File "/whisper-work/models/convert-whisper-to-coreml.py", line 2, in <module>
import torch
File "/whisper-work/venv/lib/python3.10/site-packages/torch/__init__.py", line 870, in <module>
from . import _masked
File "/whisper-work/venv/lib/python3.10/site-packages/torch/_masked/__init__.py", line 420, in <module>
def sum(input: Tensor,
File "/whisper-work/venv/lib/python3.10/site-packages/torch/_masked/__init__.py", line 223, in _apply_docstring_templates
example_input = torch.tensor([[-3, -2, -1], [0, 1, 2]])
/whisper-work/venv/lib/python3.10/site-packages/torch/_masked/__init__.py:223: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at /Users/distiller/project/pytorch/torch/csrc/utils/tensor_numpy.cpp:68.)
example_input = torch.tensor([[-3, -2, -1], [0, 1, 2]])
Minimum required torch version for importing coremltools.optimize.torch is 2.1.0. Got torch version 1.11.0.
Traceback (most recent call last):
File "/whisper-work/models/convert-whisper-to-coreml.py", line 4, in <module>
import coremltools as ct
File "/whisper-work/venv/lib/python3.10/site-packages/coremltools/__init__.py", line 120, in <module>
from . import converters, models, optimize, proto
File "/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/__init__.py", line 7, in <module>
from . import libsvm, sklearn, xgboost
File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/xgboost/__init__.py", line 6, in <module>
from ._tree import convert
File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/xgboost/_tree.py", line 9, in <module>
from ._tree_ensemble import convert_tree_ensemble as _convert_tree_ensemble
File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/xgboost/_tree_ensemble.py", line 11, in <module>
from ...models.tree_ensemble import TreeEnsembleClassifier
File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/models/__init__.py", line 6, in <module>
from . import (
File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/models/ml_program/__init__.py", line 6, in <module>
from . import compression_utils
File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/models/ml_program/compression_utils.py", line 8, in <module>
from coremltools.converters.mil.mil import Operation as _Operation
File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/mil/__init__.py", line 7, in <module>
from .frontend.tensorflow.tf_op_registry import register_tf_op
File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/__init__.py", line 6, in <module>
from . import tensorflow, tensorflow2, torch
File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/__init__.py", line 11, in <module>
from . import ops, quantization_ops
File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 36, in <module>
from .internal_graph import InternalTorchIRGraph, InternalTorchIRNode
File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/internal_graph.py", line 15, in <module>
from .exir_utils import extract_io_from_exir_program
File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/exir_utils.py", line 99, in <module>
) -> Dict[str, torch.fx.Node]:
AttributeError: module 'torch' has no attribute 'fx'
```
Using Python3.11 the conversion script runs without any errors.
I found the docker instructions to be useful in the README.md and the differences in docker variants such as ffmpeg and cuda support. However, this section was removed in v1.7.4 and I would vote to bring it back.
This is a pull request to add that section back.
The script itself has a hashbang indicating that it is a shell script,
but the README indicates that it must be executed with `bash`.
I checked the script itself, and it seems to be valid POSIX shell. I can
confirm that it works with busybox sh.
Clarify the reference on the README, so it is clear that bash is not
actually a dependency for this script.
* Update README.md
Fix broken C-style API link
* Update whisper_processor.py
Update examples/python/whisper_processor.py to remove nonexistent flag "-np" from subprocess.Popen call.
* Add pywhispercpp to the Pybind11 Python wrapper list
abdeladim-s/pywhispercpp wasn't added to the list / was removed at some point (?)
It was referenced in issue #9, so I feel like it's worthy of being added as it's the first if not one of the first Python wrappers for whisper.cpp