whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-08-02 14:43:06 +02:00

Author	SHA1	Message	Date
Georgi Gerganov	1e9c2f87f1	ggml : simplify Arm fp16 CPU logic (ggml/1177) * ggml : simlpify Arm fp16 CPU logic ggml-ci * cont : bring back CUDA/MUSA checks ggml-ci	2025-04-24 20:39:16 +03:00
Sigbjørn Skjæret	06ce8f83e6	CUDA: don't convert BF16 weights to FP32 (ggml/1174) * add bf16 support * use convert_from_bf16_cuda instead of convert_unary_cuda for f32 * revert 7ec5085 * move functionality into convert_unary with constexpr	2025-04-24 20:39:16 +03:00
Daniel Bevenius	8b92060a10	coreml : set convert_to="mlprogram" in convert * coreml : skip model load in convert-whisper-to-coreml.py This commit updates the conversion process for Whisper models to use the "mlprogram" format instead of "neuralnetwork". The motivation for this change is that when using the "neuralnetwork" format the underlying model produced is based on protobuf and my understanding is that there are limitations to this format, such as sizes of strings and the complexity of the model. Currently when trying to convert larger models such as large-v3 the conversion fails but succeeds for smaller models. The "mlprogram" format is a more recent addition to CoreML and is designed to be more flexible and powerful, allowing for more complex models and larger data types. This seems to work for larger and smaller models alike and unless I'm there are considerations that I'm not aware of I think this is what we should be using moving forward. The error that is generated for large models is the following: ```console Running MIL backend_neuralnetwork pipeline: 100%\|█████████\| 9/9 [00:00<00:00, 35.44 passes/s] Translating MIL ==> NeuralNetwork Ops: 100%\|███████████\| 5641/5641 [03:31<00:00, 26.65 ops/s] Traceback (most recent call last): File "/Users/danbev/work/ai/whisper-work/models/convert-whisper-to-coreml.py", line 322, in <module> encoder = convert_encoder(hparams, encoder, quantize=args.quantize) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/danbev/work/ai/whisper-work/models/convert-whisper-to-coreml.py", line 255, in convert_encoder model = ct.convert( ^^^^^^^^^^^ File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.11/site-packages/coremltools/converters/_converters_entry.py", line 635, in convert mlmodel = mil_convert( ^^^^^^^^^^^^ File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.11/site-packages/coremltools/converters/mil/converter.py", line 186, in mil_convert return _mil_convert( ^^^^^^^^^^^^^ File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.11/site-packages/coremltools/converters/mil/converter.py", line 245, in _mil_convert return modelClass( ^^^^^^^^^^^ File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.11/site-packages/coremltools/models/model.py", line 489, in __init__ self.__proxy__, self._spec, self._framework_error = self._get_proxy_and_spec( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.11/site-packages/coremltools/models/model.py", line 550, in _get_proxy_and_spec _MLModelProxy( ValueError: basic_string ``` Refs: https://github.com/ggml-org/whisper.cpp/issues/3012	2025-04-23 08:24:38 +02:00
Daniel Bevenius	7858eddd10	ci : disable freeBSD job in build.yml (#3064 ) This commit disables the FreeBSD job in build.yml of the GitHub Actions workflow. The motivation for this is that this job seems to stall and timeout from time to time, taking up to 6 hours to complete/cancel.	2025-04-22 11:07:54 +02:00
Daniel Bevenius	3a88f1e504	examples : add HEAPU8 to exported runtime methods (#3062 ) This commit adds `HEAPU8` to the list of exported methods. The motivation for this commit is that currently this is causing an error on Window systems where HEAPU8 in undefined, which results in the following error message in the web console: ```console main.js:1 Uncaught TypeError: Cannot read properties of undefined (reading 'buffer') at __emval_get_property (main.js:1:1363125) at 003a453a:0xc4a47 at 003a453a:0xc51cd at Object.full_default (eval at craftInvokerFunction (main.js:1:1347011), <anonymous>:9:10) at whisper.cpp/:647:42 ``` Resolves: https://github.com/ggml-org/whisper.cpp/issues/3059	2025-04-20 19:40:25 +02:00
KITAITI Makoto	f0d2bfbfb7	ruby : make Ruby bindings installed with build options (#3056 ) * Fix signature of URI.new7s return value * Use path instead of string \| _ToPath * Add document comment to RBS * Remove unnecessary build flags * Remove unnecessary line * Remove files have become unnecessary * Make gem install accept build options for whisper.cpp * Add instraction for build options in README * Add methods for check to Options * Test build options * Rename: configs -> options * Add assert_installed assertion * Use assert_installed * Remove unused attribute * Extract dependency check logic as Dependencies class * Update README * Add WHISPER_FFMPEG option * Test extra build options only on local test * Bump version to 1.3.2 [skip ci]	2025-04-17 18:49:58 +09:00
Sacha Arbonel	170b2faf75	whisper : add no_context parameter to whisper_params (#3045 )	2025-04-16 06:24:38 +02:00
Fujimoto Seiji	f8a3509b6d	examples : add FFmpeg v7.0 support to ffmpeg-transcode.cpp (#3038 ) FFmpeg introduced a new channel layout API that uses `AVChannelLayout` interface in v6.0. It subsequently dropped the old bitmask-based API in v7.0. This updates decode_audio() to support the new channel layout API, so that we can compile `whisper-cli` and `whisper-server` with FFmpeg v7.0 or later. Tested on on Ubuntu 24.10 with FFmpeg v7.0.2. Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net>	2025-04-15 06:09:00 +02:00
KITAITI Makoto	2a2d21c75d	ruby: use CMake in build process (#3043 ) * Use CMake to build shared object * Make Rakefile follow change of build process * Add test for packaging * Run CI for Ruby bindings almost always because each CMakeLists.txt might affect Ruby bindings * Enable PIC * Bump Ruby version to 3.2 on CI * Check libgomp * Check dependency of whisper.cpp accurately	2025-04-14 18:18:27 +09:00
Jeff Klassen	9cfcd6cc45	docs : update README.md to note newer nvidia gpus (#3031 ) Resolves: https://github.com/ggml-org/whisper.cpp/issues/3030	2025-04-11 08:54:51 +02:00
Lin Xiaodong	e853620270	addon.node : support max_context api for addon.node (#3025 ) * feat: support max content * feat: show api in test file --------- Co-authored-by: linxiaodong <calm.lin@wukongsch.com>	2025-04-11 06:36:38 +02:00
Georgi Gerganov	549db9376f	whisper : reduce delta_min from 1000ms to 100ms (#3028 ) ggml-ci	2025-04-11 06:23:02 +02:00
Fujimoto Seiji	33a25e4dda	docs : document how to use 'WHISPER_FFMPEG' build option (#3029 ) FFmpeg integration was introduced in `1b51fdf` by William Tambellini, but not mentioned in the main documentation. Add a short guide on how to enable the feature. Confirmed to work on both Ubuntu 24.04 and Fedora 39. Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net>	2025-04-10 18:21:38 +02:00
Ekaitz Zárraga	43f5030aeb	docs : fix README.md (#3024 )	2025-04-09 19:49:37 +02:00
Daniel Bevenius	cf794133de	xcf : use check for visionos build version (#3021 ) This commit adds a check for the visionos build version used with vtool in build-xcframework.sh. The script now checks the Xcode version and determines whether to use "xros" or "visionos" for the build version. This commit also uses xcrun for the vtool so that the version of vtool in xcode command line tools is used instead of the one in the system path. Refs: https://github.com/ggml-org/whisper.cpp/pull/2994#issuecomment-2773292223	2025-04-09 16:34:58 +02:00
Olli	ef6cf357e7	ruby : fix types of arguments for rb_get_kwargs in ruby_whisper_params.c (#3022 ) Change param_names and values not to be references for rb_get_kwargs - so it can be compiled on ruby 3.3.6 and 3.4.1	2025-04-09 20:49:25 +09:00
Olli	b1f5c11b32	ruby : Update uri.rb (#3016 ) Bugfix ... without this Pathname the "/" operator wouldn't work and will throw an error	2025-04-08 22:27:40 +09:00
Greg Sadetsky	ada745f4a5	models : fix dead link to models in readme (#3006 )	2025-04-06 08:29:41 +03:00
KITAITI Makoto	01985c22c0	ruby : change homepage URI in Ruby gemspec (#3007 )	2025-04-05 07:55:09 +03:00
Fujimoto Seiji	448f3d3b93	tests : add script to benchmark whisper.cpp on LibriSpeech corpus (#2999 ) * tests : add script to benchmark whisper.cpp on LibriSpeech corpus LibriSpeech is a widely-used benchmark dataset for training and testing speech recognition models. This adds a set of scripts to measure the recognition accuracy of whisper.cpp models, following the common benchmark standards. Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net> * Document how to prepare `whisper-cli` and model files Feedback from Daniel Bevenius. This adds a short code example how to prepare the `whisper-cli` command, to make the initial setup step a little bit clearer. Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net> * tests : Simplify how to set up Python environment Based on a feedback from Georgi Gerganov. Instead of setting up a virtual environment in Makefile, let users set up the Python environment. This is better since users may have their own preferred workflow/toolkit. Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net> --------- Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net>	2025-04-04 19:51:26 +03:00
Fujimoto Seiji	e6234cd435	whisper : fix "bench-all outputs an invalid result on larger models" (#3002 ) The benchmark script 'scripts/bench-all.sh' assumes that the 11th field of the output line is a timestamp. This assumption does not hold when the target model takes a bit longer to process. Fix this issue by introducing an explicit whitespace to the output lines of `whisper_print_timings()`. Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net>	2025-04-04 18:36:19 +03:00
Georgi Gerganov	2b6d0d2200	rename : ggerganov -> ggml-org (#3005 )	2025-04-04 16:11:52 +03:00
Daniel Bevenius	0b17d4507e	examples : update server.py to match github pages app [no ci] (#3004 ) This commit updates examples/server.py which is used to serve the wasm examples locally. The changes include: - Added a redirect from the root URL to /whisper.cpp. So now accessing http://localhost:8000/ will redirect to http://localhost:8000/whisper.cpp/ which matches the url for the app deployed to github pages. - Custom handling for coi-serviceworker.js to serve it to avoid and error in the console. This file is not strictly necessary for the local server to work as the headers are provided already but it is nice to not have an error in the console. - Fixed the shutdown of the server to ensure it exits cleanly on Ctrl+C. Previously it would continue to hang onto the port even after the processed had exited.	2025-04-04 10:23:53 +02:00
Daniel Bevenius	77e0c86ab6	whisper.wasm : fix unknown language issue (#3000 ) * whisper.wasm : fix unknown language issue This commit addresses an issue with whisper.wasm where the following error was being displayed when running the application in github pages: ``` whisper_lang_id: unknown language 'д=␙c' ``` This turned out to be a memory corruption issue and further details can be found in the reference issue below. Refs: https://github.com/ggerganov/whisper.cpp/issues/2998	2025-04-03 19:50:47 +02:00
Georgi Gerganov	eac1bc9c47	examples : add new sources ggml-ci	2025-04-03 10:30:16 +03:00
Georgi Gerganov	cbde66d913	sync : ggml	2025-04-03 10:30:16 +03:00
cmdr2	513ecf8dc0	cpu: move all the operators into a separate c++ file (except mul_mat) (ggml/1167) * cpu: refactor SIMD mappings and vectorized op functions into separate files * Fix warning for ggml_float to float * Fix warnings * cpu: move all the operations (except mul_mat) to a separate c++ file * fix whitespace * Update ggml/src/ggml-cpu/vec.h Co-authored-by: Diego Devesa <slarengh@gmail.com> * Fix PR comments - use GGML_UNUSED, use cassert in ops.cpp * Reverse the order of import for ops.h and vec.h, to match what was present in ggml-cpu.c previously --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-04-03 10:30:16 +03:00
Daniel Bevenius	cce5daf17b	docs : add xcframework section to README.md [no ci] (#2997 ) This adds a section to the README.md file that describes how to use the XCFramework. The modification for this is that is not obvious how to use the XCFramework and and example will help. One thing to note is that the example is using the latest release including the checksum. We are thinking about how we might automate this in the future but for now this is a good start.	2025-04-03 09:06:53 +02:00
Georgi Gerganov	2c502b3c00	readme : update roadmap link	2025-04-02 17:38:35 +03:00
Georgi Gerganov	51c6961c7b	release : v1.7.5 v1.7.5	2025-04-02 16:39:48 +03:00
Georgi Gerganov	503a786c9a	bench : update numbers [no ci] (#2993 )	2025-04-02 16:27:36 +03:00
Georgi Gerganov	ad4e350933	sync : ggml ggml-ci	2025-04-02 15:51:57 +03:00
Chenguang Li	d7a9346ab1	get_rows and dup optimization (llama/12671) * [CANN]get_rows and dup optimization. Co-authored-by: hipudding <huafengchun@gmail.com> Signed-off-by: noemotiovon <noemotiovon@gmail.com> * [CANN]GET_ROWS and CPY/DUP optimization Co-authored-by: hipudding <huafengchun@gmail.com> Signed-off-by: noemotiovon <noemotiovon@gmail.com> * [CANN]code style adjustment Signed-off-by: noemotiovon <noemotiovon@gmail.com> * [CANN]code style adjustment Signed-off-by: noemotiovon <noemotiovon@gmail.com> * [CANN]code style adjustment Signed-off-by: noemotiovon <noemotiovon@gmail.com> * [CANN]code style adjustment Signed-off-by: noemotiovon <noemotiovon@gmail.com> --------- Signed-off-by: noemotiovon <noemotiovon@gmail.com> Co-authored-by: noemotiovon <noemotiovon@gmail.com> Co-authored-by: hipudding <huafengchun@gmail.com>	2025-04-02 15:51:57 +03:00
Junil Kim	b63d23f728	opencl : fix memory allocation size (llama/12649) issue: https://github.com/CodeLinaro/llama.cpp/pull/17#issuecomment-2760611283 This patch fixes the memory allocation size not exceeding the maximum size of the OpenCL device.	2025-04-02 15:51:57 +03:00
Georgi Gerganov	f6ce10e4a1	metal : use F32 prec in FA kernels (llama/12688) * metal : use F32 prec in FA kernels ggml-ci * cont : fix FA vec kernel ggml-ci	2025-04-02 15:51:57 +03:00
R0CKSTAR	6cb2b86581	Fix clang warning in gguf_check_reserved_keys (llama/12686) * Fix clang warning in gguf_check_reserved_keys Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Fix typo Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-04-02 15:51:57 +03:00
Wagner Bruna	801d6bd809	vulkan: fix build when glslc doesn't support coopmat (llama/12683)	2025-04-02 15:51:57 +03:00
Romain Biessy	ddf7e6a15d	SYCL: Rename oneMKL to oneMath (llama/12192) * Rename oneMKL Interface to oneMath * Use oneMath for Intel vendor * Rename occurences to mkl * clang-format * Silence verbose warnings * Set oneMath HIP_TARGETS * Fix silence warnings * Remove step to build oneMath from build instructions * Use fixed oneMath version * Remove INTEL_CPU * Fold CMake oneDNN conditions * Use Intel oneMKL for Intel devices * Improve CMake message * Link against MKL::MKL_SYCL::BLAS only * Move oneMath documentation to Nvidia and AMD sections	2025-04-02 15:51:57 +03:00
Akarshan Biswas	0d42097fd3	SYCL: switch to SYCL namespace (llama/12674)	2025-04-02 15:51:57 +03:00
a3sh	842b9c984c	ggml : faster ssm scan (llama/10558) * faster ssm_scan * delete unused commnet * clang format * add space * modify unnecessary calculations * faster ssm conv implementatioin * modify file name with dash	2025-04-02 15:51:57 +03:00
0cc4m	0810f02547	Vulkan: Add DP4A MMQ and Q8_1 quantization shader (llama/12135) * Vulkan: Add DP4A MMQ and Q8_1 quantization shader * Add q4_0 x q8_1 matrix matrix multiplication support * Vulkan: Add int8 coopmat MMQ support * Vulkan: Add q4_1, q5_0 and q5_1 quants, improve integer dot code * Add GL_EXT_integer_dot_product check * Remove ggml changes, fix mmq pipeline picker * Remove ggml changes, restore Intel coopmat behaviour * Fix glsl compile attempt when integer vec dot is not supported * Remove redundant code, use non-saturating integer dot, enable all matmul sizes for mmq * Remove redundant comment * Fix integer dot check * Fix compile issue with unsupported int dot glslc * Update Windows build Vulkan SDK version	2025-04-02 15:51:57 +03:00
Georgi Gerganov	8c13c78f9d	cmake : fix whitespace (llama/0)	2025-04-02 15:51:57 +03:00
Daniel Bevenius	f31b404fcb	tests : remove gh label test-whisper-cli-tiny-en (#2988 ) This commit removes test-whisper-cli-tiny-en from the gh label. The motivation for this change is that until recently the tests were disabled. But now that they are enabled some of the tests, specifically the ci jobs that use sanatizers (e.g. thread-sanitizer) take a long time to run as they are instrumented. Some of these jobs also have matricies which means that there are multiple jobs are created that all run these tests. The suggestion here is to limit the number of tests that are run in the ci jobs so cut down the CI build time.	2025-04-02 10:50:31 +02:00
Daniel Bevenius	854c0518bc	examples : clarify Core ML encoder model usage [no ci] (#2987 ) This commit clarifies the usage of the Core ML encoder model in the whisper.obj and whisper.swiftui examples. Refs: https://github.com/ggerganov/whisper.cpp/issues/2783	2025-04-02 08:32:14 +02:00
Daniel Bevenius	c8e3968edd	ci : remove intermediate build on push to master (#2986 ) This commit removes the builds that happen on each push to master. Refs: https://github.com/ggerganov/whisper.cpp/discussions/2983#discussioncomment-12691424	2025-04-02 08:29:28 +02:00
Daniel Bevenius	b358de2458	whisper.objc : fix typo in README.md [no ci] (#2985 ) This commit fixes a typo in the README.md file of the whisper.objc example. Resolves: https://github.com/ggerganov/whisper.cpp/issues/2984	2025-04-02 08:26:57 +02:00
Daniel Bevenius	11688b262f	coreml: fix Whisper to CoreML conversion by disabling SDPA [no ci] (#2979 ) * coreml: fix Whisper to CoreML conversion by disabling SDPA This commit disables the use of PyTorch's `scaled_dot_product_attention` in the Whisper model to avoid compatibility issues during CoreML conversion. The issue occurs because coremltools requires PyTorch 2.5.0, but the Whisper implementation may expect behavior from newer PyTorch versions. By setting `MultiHeadAttention.use_sdpa = False`, we force Whisper to use its fallback manual attention implementation, which works correctly with PyTorch 2.5.0 during the tracing process. Refs: https://github.com/ggerganov/whisper.cpp/issues/2783 * coreml: fix audio shape in whisper decoder conversion This commit fixes the audio shape in the whisper decoder conversion script. The motivation for this is that the audio shape was incorrect and was causing the conversion to fail. * coreml : set -e in generate-coreml-interface.sh The commit sets the -e flag in the generate-coreml-interface.sh script to make sure the script fails if any command fails. * coreml : update generated encoder/decoder interfaces This commit updates the generated encoder/decoder interfaces for the whisper model which is the result of running the generate-coreml-interface.sh script.	2025-04-01 18:01:23 +02:00
Daniel Bevenius	04b9508fb3	ci : add coreml job that converts base.en to coreml [no ci] (#2981 ) * ci : add coreml job that converts base.en to coreml [no ci] This commit adds a new job to the CI pipeline that downloads the base.en model and converts it to CoreML format. The CoreML model is then packed into a zip file and uploaded as an artifact. This will only be done for pushes to master, releases, or pre-releases. Refs: https://github.com/ggerganov/whisper.cpp/issues/2783 * coreml : remove publishing of coreml model * ci : add GGML_OPENMP=OFF to ubuntu-22-gcc-sanitized	2025-04-01 17:04:32 +02:00
Daniel Bevenius	4200430e75	tests : re-enable tests [no ci] (#2977 ) This commit re-enables the tests in the build process which are currently commented out. It is possible to build the tests using `-DWHISPER_BUILD_TESTS=ON` and then run a single test using: ```console $ ctest -R test-whisper-cli-tiny.en --test-dir build Internal ctest changing into directory: /home/danbev/work/ai/whisper-work/build Test project /home/danbev/work/ai/whisper-work/build Start 2: test-whisper-cli-tiny.en 1/1 Test #2: test-whisper-cli-tiny.en ......... Passed 4.44 sec 100% tests passed, 0 tests failed out of 1 Label Time Summary: en = 4.44 secproc (1 test) gh = 4.44 secproc (1 test) tiny = 4.44 sec*proc (1 test) Total Test time (real) = 4.44 sec ``` Some of the tests take a long time to run so it might not be a good idea to enable them in CI, or perhaps we could only run a subset of the tests in CI.	2025-03-31 17:04:37 +02:00
Daniel Bevenius	e153b8eaa2	android.java : re-add ggml source updates (#2975 ) This commit updates the ggml source to include the new unary and binary operations. I merged https://github.com/ggerganov/whisper.cpp/pull/2958 which seems to have overwritten the changes to the ggml source which were added in https://github.com/ggerganov/whisper.cpp/pull/2972. Sorry about this. b2365	2025-03-31 16:14:33 +02:00

1 2 3 4 5 ...

2414 Commits