whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-08-19 03:36:49 +02:00

Author	SHA1	Message	Date
Daniel Bevenius	96d791ae61	vad : add download-vad-model scripts (#3149 ) * vad : add download-vad-model scripts This commit adds a script to download VAD models. * vad : add vad model download script for windows [no ci] Refs: https://github.com/ggml-org/whisper.cpp/issues/3146	2025-05-14 16:47:18 +02:00
Daniel Bevenius	e41bc5c61a	vad : add initial Voice Activity Detection (VAD) support (#3065 ) * vad : add initial Voice Activity Detection (VAD) support This commit add support for Voice Activity Detection (VAD). When enabled this feature will process the audio input and detect speech segments. This information is then used to reduce the number of samples that need to be processed by whisper_full. Resolves: https://github.com/ggml-org/whisper.cpp/issues/3003 --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-05-12 16:10:11 +02:00
Daniel Bevenius	8b92060a10	coreml : set convert_to="mlprogram" in convert * coreml : skip model load in convert-whisper-to-coreml.py This commit updates the conversion process for Whisper models to use the "mlprogram" format instead of "neuralnetwork". The motivation for this change is that when using the "neuralnetwork" format the underlying model produced is based on protobuf and my understanding is that there are limitations to this format, such as sizes of strings and the complexity of the model. Currently when trying to convert larger models such as large-v3 the conversion fails but succeeds for smaller models. The "mlprogram" format is a more recent addition to CoreML and is designed to be more flexible and powerful, allowing for more complex models and larger data types. This seems to work for larger and smaller models alike and unless I'm there are considerations that I'm not aware of I think this is what we should be using moving forward. The error that is generated for large models is the following: ```console Running MIL backend_neuralnetwork pipeline: 100%\|█████████\| 9/9 [00:00<00:00, 35.44 passes/s] Translating MIL ==> NeuralNetwork Ops: 100%\|███████████\| 5641/5641 [03:31<00:00, 26.65 ops/s] Traceback (most recent call last): File "/Users/danbev/work/ai/whisper-work/models/convert-whisper-to-coreml.py", line 322, in <module> encoder = convert_encoder(hparams, encoder, quantize=args.quantize) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/danbev/work/ai/whisper-work/models/convert-whisper-to-coreml.py", line 255, in convert_encoder model = ct.convert( ^^^^^^^^^^^ File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.11/site-packages/coremltools/converters/_converters_entry.py", line 635, in convert mlmodel = mil_convert( ^^^^^^^^^^^^ File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.11/site-packages/coremltools/converters/mil/converter.py", line 186, in mil_convert return _mil_convert( ^^^^^^^^^^^^^ File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.11/site-packages/coremltools/converters/mil/converter.py", line 245, in _mil_convert return modelClass( ^^^^^^^^^^^ File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.11/site-packages/coremltools/models/model.py", line 489, in __init__ self.__proxy__, self._spec, self._framework_error = self._get_proxy_and_spec( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.11/site-packages/coremltools/models/model.py", line 550, in _get_proxy_and_spec _MLModelProxy( ValueError: basic_string ``` Refs: https://github.com/ggml-org/whisper.cpp/issues/3012	2025-04-23 08:24:38 +02:00
Greg Sadetsky	ada745f4a5	models : fix dead link to models in readme (#3006 )	2025-04-06 08:29:41 +03:00
Georgi Gerganov	2b6d0d2200	rename : ggerganov -> ggml-org (#3005 )	2025-04-04 16:11:52 +03:00
Daniel Bevenius	11688b262f	coreml: fix Whisper to CoreML conversion by disabling SDPA [no ci] (#2979 ) * coreml: fix Whisper to CoreML conversion by disabling SDPA This commit disables the use of PyTorch's `scaled_dot_product_attention` in the Whisper model to avoid compatibility issues during CoreML conversion. The issue occurs because coremltools requires PyTorch 2.5.0, but the Whisper implementation may expect behavior from newer PyTorch versions. By setting `MultiHeadAttention.use_sdpa = False`, we force Whisper to use its fallback manual attention implementation, which works correctly with PyTorch 2.5.0 during the tracing process. Refs: https://github.com/ggerganov/whisper.cpp/issues/2783 * coreml: fix audio shape in whisper decoder conversion This commit fixes the audio shape in the whisper decoder conversion script. The motivation for this is that the audio shape was incorrect and was causing the conversion to fail. * coreml : set -e in generate-coreml-interface.sh The commit sets the -e flag in the generate-coreml-interface.sh script to make sure the script fails if any command fails. * coreml : update generated encoder/decoder interfaces This commit updates the generated encoder/decoder interfaces for the whisper model which is the result of running the generate-coreml-interface.sh script.	2025-04-01 18:01:23 +02:00
Peter	edf1ee1ef8	whisper : enhance model download scripts functionality and resolve compiler warning (#2925 ) * whisper : improve whisper-cli executable path detection in model download shell scripts If whisper-cli is found on the path, do not suggest invoking from build directory. This improves flexibility and usability for distribution and packaging scenarios. * whisper : enhance Windows model download batch script to have comparable functionality and behaviour as shell scripts * Download models to the current directory if the script is executed from the \bin\ directory (for future distribution scenarios where the script is in the \bin\ subdirectory of a Windows build) * Add model_path command line argument * If whisper-cli is found on the path, do not suggest invoking from build directory * whisper : resolve compiler warning by removing duplicate definition of NOMINMAX in whisper-cli code	2025-03-24 10:39:50 +02:00
Peter	9bc0dc7235	whisper : update default model download directory behavior to use current working directory when script is in /bin/ directory (#2924 ) This change ensures that when the script is packaged and distributed, models are downloaded to the current directory instead of the script's location, preventing conflicts with system directories. This improves flexibility and usability for distribution and packaging scenarios.	2025-03-22 16:27:57 +02:00
Daniel Bevenius	663cafc1e8	readme : update Python version to 3.11 for Core ML support [no -ci] (#2919 ) This commit updates the recommended version of Python to 3.11 for Core ML conversion support. It also adds the `-e` flag to the `generate-coreml-model.sh` script to ensure that the script exits on the first error. The motivation for this that when following the installation instructions using Python 3.10 I get the following error: ```console (venv) $ ./models/generate-coreml-model.sh base.en A module that was compiled using NumPy 1.x cannot be run in NumPy 2.1.3 as it may crash. To support both 1.x and 2.x versions of NumPy, modules must be compiled with NumPy 2.0. Some module may need to rebuild instead e.g. with 'pybind11>=2.12'. If you are a user of the module, the easiest solution will be to downgrade to 'numpy<2' or try to upgrade the affected module. We expect that some modules will need time to support NumPy 2. Traceback (most recent call last): File "/whisper-work/models/convert-whisper-to-coreml.py", line 2, in <module> import torch File "/whisper-work/venv/lib/python3.10/site-packages/torch/__init__.py", line 870, in <module> from . import _masked File "/whisper-work/venv/lib/python3.10/site-packages/torch/_masked/__init__.py", line 420, in <module> def sum(input: Tensor, File "/whisper-work/venv/lib/python3.10/site-packages/torch/_masked/__init__.py", line 223, in _apply_docstring_templates example_input = torch.tensor([[-3, -2, -1], [0, 1, 2]]) /whisper-work/venv/lib/python3.10/site-packages/torch/_masked/__init__.py:223: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at /Users/distiller/project/pytorch/torch/csrc/utils/tensor_numpy.cpp:68.) example_input = torch.tensor([[-3, -2, -1], [0, 1, 2]]) Minimum required torch version for importing coremltools.optimize.torch is 2.1.0. Got torch version 1.11.0. Traceback (most recent call last): File "/whisper-work/models/convert-whisper-to-coreml.py", line 4, in <module> import coremltools as ct File "/whisper-work/venv/lib/python3.10/site-packages/coremltools/__init__.py", line 120, in <module> from . import converters, models, optimize, proto File "/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/__init__.py", line 7, in <module> from . import libsvm, sklearn, xgboost File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/xgboost/__init__.py", line 6, in <module> from ._tree import convert File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/xgboost/_tree.py", line 9, in <module> from ._tree_ensemble import convert_tree_ensemble as _convert_tree_ensemble File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/xgboost/_tree_ensemble.py", line 11, in <module> from ...models.tree_ensemble import TreeEnsembleClassifier File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/models/__init__.py", line 6, in <module> from . import ( File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/models/ml_program/__init__.py", line 6, in <module> from . import compression_utils File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/models/ml_program/compression_utils.py", line 8, in <module> from coremltools.converters.mil.mil import Operation as _Operation File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/mil/__init__.py", line 7, in <module> from .frontend.tensorflow.tf_op_registry import register_tf_op File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/__init__.py", line 6, in <module> from . import tensorflow, tensorflow2, torch File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/__init__.py", line 11, in <module> from . import ops, quantization_ops File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 36, in <module> from .internal_graph import InternalTorchIRGraph, InternalTorchIRNode File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/internal_graph.py", line 15, in <module> from .exir_utils import extract_io_from_exir_program File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/exir_utils.py", line 99, in <module> ) -> Dict[str, torch.fx.Node]: AttributeError: module 'torch' has no attribute 'fx' ``` Using Python3.11 the conversion script runs without any errors.	2025-03-21 10:31:55 +01:00
Anders Bjarby	4854789751	convert : update convert-h5-to-ggml.py (#2840 ) improved handling of missing max_length	2025-03-17 09:41:05 +02:00
Ryan Johnson	c774eec709	go : improve model download (#2756 ) * Updated models download URL * Updated list of models available All of the high efficiency quantized models are rejected when trying to download. They exist on the server. Let's allow them. * added path prefix for whisper-cli in message to user. The message is misleading if this script is called from another script in a different folder. So the message has to be fixed. * undid download URL change I made earlier. Fixed filepath.Join(urlPath, model) bug. * Undid download URL change I made earlier. Seems that the old URL works but only when provided a model to download. Still doesn't explain why there's a different download URL that also works. Please elucidate in docs. * Fixed URLForModel Function's bug filepath.Join is designed for filesystem paths, and it uses backslashes (\) on Windows. URLs, however, require forward slashes (/), so the use of filepath.Join is inappropriate for constructing URLs. The fmt.Sprintf function ensures that forward slashes are used. * Fixed URL trailing / double slash bug Ensure no double slash by trimming trailing '/' from srcUrl if present * Fixed bad download URL, missing ggml prefix Not sure if that was a bug I introduced but it was trying to download without the prefix. * Added question before downloading all models. Added download size estimate HEAD Requests: Efficiently fetches file sizes without downloading the content. Interactive Workflow: Allows the user to make informed decisions about downloading all models. Safe Defaults: Aborts if the user does not explicitly confirm. * Fixed Unbuffered channel warning. warning in context.go : misuse of unbuffered os.Signal channel as argument to signal. The warning indicates that the unbuffered channel used in signal.Notify in context.go may be misused. In Go, unbuffered channels can cause potential deadlocks if signals are sent faster than they are received. * Fixed download size calculation, download URL prefix bug, added link to models URL for user. The URL formatter was prepending the model name to the formatted model name in the URL * Added logs and exes to gitignore * Delete bindings/go/examples/go-model-download/go-model-download.exe * Delete whisper_build.log	2025-03-07 10:03:51 +02:00
mgrachten	cff8868b5f	coreml : always convert to "neuralnetwork" (#2770 )	2025-02-03 22:36:32 +02:00
Adam Jones	885e31368d	docs: Fix main -> whisper-cli in download scripts (#2707 )	2025-01-06 15:17:57 +02:00
Michael Rienstra	6aa1d7b892	models : fix typo in download-ggml-model.sh (#2623 ) Introduced in #2589	2024-12-12 18:02:00 +02:00
Michael Rienstra	a9d06ce151	models : add `q8_0` models to `download-ggml-model.sh` (#2589 )	2024-11-28 10:31:54 +02:00
CrispStrobe	06a1da9daf	convert : handle max_target_positions (#2477 ) as needed eg for https://huggingface.co/primeline/whisper-large-v3-turbo-german/blob/main/config.json	2024-10-14 10:46:33 +03:00
Georgi Gerganov	2ef717b293	whisper : add large-v3-turbo (#2440 )	2024-10-01 15:57:06 +03:00
Brad Murray	d2986f8b07	models : add support for wget2 for fedora (#2387 )	2024-08-28 11:46:01 +03:00
Georgi Gerganov	e30c679928	whisper : reorganize source code + improve CMake (#2256 ) * scripts : update sync [no ci] * files : reorganize [no ci] * sync : llama.cpp * cmake : link math library * cmake : build normal ggml library * files : move headers to include * objc : fix path to ggml-metal.h * ci : fix WHISPER_CUDA -> GGML_CUDA * scripts : sync LICENSE [no ci]	2024-06-26 19:34:09 +03:00
Georgi Gerganov	858452d58d	models : disable old script (#2079 )	2024-04-24 14:56:30 +03:00
st-gr	eb23f4ef16	openvino : fix convert-whisper-to-openvino.py (#1890 ) Fix issue: Conversion from Whisper to OpenVino failed #1870 convert-whisper-to-openvino.py stopped working with OpenVINO version 2023.0.0-10926-b4452d56304-releases/2023/0 . Error was: TypeError: load(): incompatible function arguments. The following argument types are supported: 1. (self: openvino._pyopenvino.FrontEnd, path: object) -> ov::frontend::InputModel Tested successfully with a large-v3 conversion. Co-authored-by: Stefan Grundmann <grundmanns@sandiego.gov>	2024-02-22 15:11:35 +02:00
Georgi Gerganov	3d42463845	models : add update py requirements	2024-02-13 11:51:32 +02:00
Michael Rienstra	4bbb60efce	docs : make model options / model install methods clearer (#1806 ) * Make models more "discoverable" * Clean up code block language identifiers * make 3 options clearer * undo Prettier formatter change * docs: `$` shell prompt, consistently * docs: minor changes	2024-01-26 17:39:54 +02:00
Sơn Phan Trung	d05b7ee90e	models : make all scripts to be POSIX Compliant (#1725 ) * download-coreml-model: make it POSIX-compliant * download-ggml-model: posix compliant (2nd) * minor edit * forgot to add newline * generate-coreml-interface: far more straightforward * generate-coreml-model: done with the posix thingy * typo * Update download-ggml-model.sh * fix * fix typo * another fix * Update download-coreml-model.sh * Update download-ggml-model.sh * Update download-coreml-model.sh	2024-01-12 14:11:04 +02:00
Yajing Tang	ba5bcde874	coreml : fix ANE optimized encoder (#1716 )	2024-01-04 16:28:30 +02:00
Dimo	a5cc3dc8a2	download : fix large q5 model name (#1695 ) fixed typo in large-v3-q5-0 model name to match HF link	2023-12-29 11:14:32 +02:00
Chaoqun	d2ee117a0a	docker : Dockerize whisper.cpp (#1674 ) * build: add dockerfile for ci * ci: add action to build/push docker image * fix: lowercase repository to fix ci * ci: update cuBLAS flag * build: install curl and ffmped in image * docs: add docker section * fix: improve args check when download model	2023-12-22 11:16:02 +00:00
Georgi Gerganov	c7606b47df	models : add info about distilled models	2023-11-15 21:10:13 +02:00
Georgi Gerganov	bfbaa4dce5	whisper : make large version explicit + fix data size units (#1493 )	2023-11-15 19:42:25 +02:00
bobqianic	953419c69a	openvino : update convert-whisper-to-openvino.py to support v3 (#1459 )	2023-11-09 12:42:39 +02:00
Xiao-Yong Jin	0de8582f65	coreml : use the correct `n_mel` value (#1458 )	2023-11-08 20:01:41 +00:00
Georgi Gerganov	2cdfc4e025	whisper : add support for large v3 (#1444 ) * whisper : add support for large v3 * bench : fix build + fix go bindings * bench : fix n_mels * models : update readme	2023-11-07 15:30:18 +02:00
bobqianic	8a2bee6717	models : use absolute paths for the converted model (#1356 )	2023-11-03 10:44:27 +02:00
WhiteOlivierus	45c87b5481	models : Faster download for models on windows using BitTransfer (#1404 )	2023-10-30 19:18:12 +00:00
Xiang (Kevin) Li	91c0b23384	models : add conversion scripts from HuggingFace models to CoreML (#1304 )	2023-10-04 12:00:25 +03:00
Neil Chudleigh	aed5d40607	models : add quantum models to download-ggml-model.sh (#1235 ) * Add quantized models to download-ggml-model.sh * Update names in download-ggml-model script to normalized	2023-09-07 12:16:58 +03:00
Ryan Metcalfe	62b81276e0	whisper : add OpenVINO support (#1037 ) * openvino: use OpenVINO encoder inference * openvino: add python script for OpenVINO model generation * whisper: Fix 'unused' warnings when OpenVINO isn't enabled in build * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * whisper: Fix compilation error * whisper: revert whisper_get_openvino_path_encoder & whisper_get_openvino_path_cache to non-const func signatures * cmake: Add openvino-encoder as separate object target * whisper : minor style fixes * minor : indentation fixes --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-07-04 15:56:11 +03:00
Akash Mahajan	c8d0f5fe98	whisper : support speaker segmentation (local diarization) of mono audio via tinydiarize (#1058 ) * add HuggingFace mirror to download ggml model * support tdrz via simple hack overriding solm tokens * fix incorrect translate/transcribe token_ids that are not static const * add apollo 13 sample for tdrz demo * render [SPEAKER TURN] consistently in all terminal output using vocab.id_to_token * extend whisper_segment with speaker_turn_next field and save in json output * fix failing go build * slipped in some python syntax whoops * whisper : finalize tinydiarize support (add flag + fixes) * whisper : tdrz support for word-level timestamps (respect max_len) * java : try to fix tests after adding tdrz_enable flag * main : remove TODO leftover * java : fix params order list after adding "tdrz_enable" * whisper : fix solm and add nosp token * main : print tinydiarize help --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-07-04 09:45:00 +03:00
Simon Moisselin	6c68218e3c	models : add ggml_to_pt script (#1042 ) * adding ggml_to_pt * typo sys too many args * fixing swap errors dimensions --------- Co-authored-by: simonMoisselin <simon.moisselin@gmail.com>	2023-06-25 15:29:54 +03:00
Roddur Dasgupta	f11f33f1c0	models : cd statements are quoted to allow spaces in path (#1041 )	2023-06-25 15:27:28 +03:00
Georgi Gerganov	8ac23c9f77	models : handle paths with spaces in download script (close #1038 )	2023-06-25 15:23:23 +03:00
Akash Mahajan	3ec7bfffe0	py : make convert-pt-to-ggml.py backwards compatible with older vocab.json tokenizer files (#1001 ) * patch checkpoint convert script to keep compatibility with older hf_transformers whisper tokenizer * typo fix	2023-06-25 13:50:14 +03:00
genevera (she/her)	9b926844e3	models : fix README.md (#964 ) Fixes typo on line 76 of models/README.md	2023-05-27 10:40:28 +03:00
Ahmad Bilal	95b02d76b0	coreml : add support of large-v1 model (#926 )	2023-05-15 18:36:06 +03:00
Clifford Heath	9931d66400	readme : add instructions on converting to GGML + "--no-config" to wget (#874 )	2023-05-08 20:58:36 +03:00
AsukaMinato	94aa56f19e	minor : improve C++ and Python style (#768 ) * use some STL functions * use self.field than setattr, use pathlib.Path * recover some format * const some iter * Keep the original * 2 space	2023-04-29 10:06:25 +03:00
Georgi Gerganov	5e47e223bd	whisper : add Core ML support (#566 ) * coreml : use Core ML encoder inference * coreml : simlpify whisper_encode + log messages * whisper : resolve rebase conflicts * coreml : add scripts for CoreML model generation * bench-all : recognize COREML flag	2023-04-15 13:21:27 +03:00
Ivan Gorin	62b51c3070	models : change convert-pt-to-ggml to use .tiktoken tokenizer files (#725 )	2023-04-14 19:50:39 +03:00
be-next	18e6fb0287	models : handle spaces and special characters in shell script paths (#677 ) This commit modifies the `get_script_path` function to correctly handle spaces and special characters in directory paths. The fix involves adding double quotes around variables and commands where needed to ensure proper parsing of paths with spaces and special characters.	2023-03-29 23:38:33 +03:00
Kamilake	992aa2cd1b	models : change default encoding to utf8 (#605 )	2023-03-22 21:17:24 +02:00

1 2

78 Commits