whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-08-15 13:52:33 +02:00

Author	SHA1	Message	Date
Jeff Bolz	3c26dd3353	vulkan: fix NaN issue in flash attention shader (llama/12776) Use -FLT_MAX/2 rather than -inf as the initial value for computing the maximum.	2025-04-24 20:39:16 +03:00
Jeff Bolz	d792d2a2dc	vulkan: Use unclamped loads for flash attention mask (llama/12720) nem1 must be a multiple of GGML_KQ_MASK_PAD, and GGML_KQ_MASK_PAD is a multiple of the number of rows in the matrix. The KV dim is a multiple of the number of columns for the aligned shader.	2025-04-24 20:39:16 +03:00
0cc4m	8add58aa5e	Vulkan: Tune Vulkan mmq int dot shader for performance (llama/12767)	2025-04-24 20:39:16 +03:00
Nicolò Scipione	8f8ede1b12	sycl: allow ggml-sycl configuration and compilation using Visual Studio project/solution (llama/12625)	2025-04-24 20:39:16 +03:00
Ronny Brendel	3a6fe8d767	cmake: fix ggml-shaders-gen compiler paths containing spaces (llama/12747) fixes error for compiler paths with spaces	2025-04-24 20:39:16 +03:00
Jeff Bolz	76231bda56	vulkan: Hybrid waitForFences/getFenceStatus to reduce fence latency (llama/12630) There seems to be a bubble waking up from waitForFences, which costs a few percent performance and also increased variance in performance. This change inserts an "almost_ready" fence when the graph is about 80% complete and we waitForFences for the almost_ready fence and then spin (with _mm_pauses) waiting for the final fence to be signaled.	2025-04-24 20:39:16 +03:00
Jeff Bolz	785437c253	vulkan: set cmake minimum and project name in vulkan-shaders (llama/12744)	2025-04-24 20:39:16 +03:00
Gaurav Garg	2f0612cb1c	CUDA: Prefer vector flash decoding kernel for Gemma models (llama/12738) * Prefer vector flash decoding kernel for Gemma models Vector flash decoding kernel was not being picked for models with head dimension 256. Gemma models are in this category. Removing this limit improves e2e performance by upto 12% in gen phase throughput for Gemm models. * Update ggml/src/ggml-cuda/fattn.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-04-24 20:39:16 +03:00
Jeff Bolz	e944065d5b	vulkan: Fix missing cmake logic for dot product extension (llama/12721)	2025-04-24 20:39:16 +03:00
a3sh	ccc7b5df0b	fix MUSA compiler warning (llama/12704) * fix MUSA compiler warning * replace (void) with GGML_UNUSED	2025-04-24 20:39:16 +03:00
Chenguang Li	fbed36851e	CANN: Support operator SIN COS ARGMAX (llama/12709) * [CANN]support sin cos argmax Signed-off-by: noemotiovon <noemotiovon@gmail.com> * [CANN]codestyle adjustment Signed-off-by: noemotiovon <noemotiovon@gmail.com> * [CANN]Remove redundant code Signed-off-by: noemotiovon <noemotiovon@gmail.com> --------- Signed-off-by: noemotiovon <noemotiovon@gmail.com> Co-authored-by: noemotiovon <noemotiovon@gmail.com>	2025-04-24 20:39:16 +03:00
Alan Gray	d1d847f184	Simplify and improve CUDA graphs through use of indirect copy pointers (llama/9017) * CUDA: Simplify and improve CUDA graphs through use of indirect copy pointers Previously there was complexity in the CUDA graphs implementation due frequently changing parameters to copy kernels associated with K and V cache pointers. This patch simplifies by using indirection to avoid such parameters frequently changing, avoiding the need for frequent graph updates. Fixes #12152 * Addressed comments * fix HIP builds * properly sync to stream * removed ggml_cuda_cpy_fn_ptrs * move stream sync before free * guard to only use indirection with graphs * style fixes * check for errors --------- Co-authored-by: slaren <slarengh@gmail.com>	2025-04-24 20:39:16 +03:00
hipudding	337f91d4a6	CANN: Fix failed test cases (llama/12708) * CANN: Fix memory waste in aclnn_tensor * CANN: fix backend ops fail * CANN: fix acl_tensor memory alloc. * CANN: format * CANN: remove trailing whitespace	2025-04-24 20:39:16 +03:00
lhez	317a0031f9	opencl: use `max_alloc_size` in backend ctx instead of querying again (llama/12705)	2025-04-24 20:39:16 +03:00
Jeff Bolz	b243416918	vulkan: Implement split_k for coopmat2 flash attention. (llama/12627) When using group query attention, we have one workgroup per KV batch and this can be very few workgroups (e.g. just 8 in some models). Enable split_k to spread the work across SMs. This helps a lot when the KV cache is large.	2025-04-24 20:39:16 +03:00
bandoti	6e532c7187	cmake: remove caching from vulkan coopmat checks (llama/12719)	2025-04-24 20:39:16 +03:00
Jeff Bolz	2105b110d3	vulkan: Implement grouped query attention in the coopmat2 FA shader (llama/12559) When adjacent batches of Q share the same batches of K/V, batch them into the same workgroup. For example, when: dst(128,32,1,1) = FA(q(128,1,32,1), k(128,16640,8,1), v(128,16640,8,1)) previously we would run 32 workgroups computing 1 result each, now we will run 8 workgroups computing 4 results each. This doesn't directly translate to better performance (at least when you have >=32 SMs), but in a subsequent change I'll enable split_k which will scale much better with 4x fewer workgroups.	2025-04-24 20:39:16 +03:00
0cc4m	f82622180f	Vulkan: Fix mmq int dot float cache size (llama/12722)	2025-04-24 20:39:16 +03:00
Diego Devesa	a71c64512a	llama : add option to override model tensor buffers (llama/11397) * llama : add option to override tensor buffers * ggml : fix possible underflow in ggml_nbytes	2025-04-24 20:39:16 +03:00
Georgi Gerganov	1e9c2f87f1	ggml : simplify Arm fp16 CPU logic (ggml/1177) * ggml : simlpify Arm fp16 CPU logic ggml-ci * cont : bring back CUDA/MUSA checks ggml-ci	2025-04-24 20:39:16 +03:00
Sigbjørn Skjæret	06ce8f83e6	CUDA: don't convert BF16 weights to FP32 (ggml/1174) * add bf16 support * use convert_from_bf16_cuda instead of convert_unary_cuda for f32 * revert 7ec5085 * move functionality into convert_unary with constexpr	2025-04-24 20:39:16 +03:00
Daniel Bevenius	8b92060a10	coreml : set convert_to="mlprogram" in convert * coreml : skip model load in convert-whisper-to-coreml.py This commit updates the conversion process for Whisper models to use the "mlprogram" format instead of "neuralnetwork". The motivation for this change is that when using the "neuralnetwork" format the underlying model produced is based on protobuf and my understanding is that there are limitations to this format, such as sizes of strings and the complexity of the model. Currently when trying to convert larger models such as large-v3 the conversion fails but succeeds for smaller models. The "mlprogram" format is a more recent addition to CoreML and is designed to be more flexible and powerful, allowing for more complex models and larger data types. This seems to work for larger and smaller models alike and unless I'm there are considerations that I'm not aware of I think this is what we should be using moving forward. The error that is generated for large models is the following: ```console Running MIL backend_neuralnetwork pipeline: 100%\|█████████\| 9/9 [00:00<00:00, 35.44 passes/s] Translating MIL ==> NeuralNetwork Ops: 100%\|███████████\| 5641/5641 [03:31<00:00, 26.65 ops/s] Traceback (most recent call last): File "/Users/danbev/work/ai/whisper-work/models/convert-whisper-to-coreml.py", line 322, in <module> encoder = convert_encoder(hparams, encoder, quantize=args.quantize) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/danbev/work/ai/whisper-work/models/convert-whisper-to-coreml.py", line 255, in convert_encoder model = ct.convert( ^^^^^^^^^^^ File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.11/site-packages/coremltools/converters/_converters_entry.py", line 635, in convert mlmodel = mil_convert( ^^^^^^^^^^^^ File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.11/site-packages/coremltools/converters/mil/converter.py", line 186, in mil_convert return _mil_convert( ^^^^^^^^^^^^^ File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.11/site-packages/coremltools/converters/mil/converter.py", line 245, in _mil_convert return modelClass( ^^^^^^^^^^^ File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.11/site-packages/coremltools/models/model.py", line 489, in __init__ self.__proxy__, self._spec, self._framework_error = self._get_proxy_and_spec( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.11/site-packages/coremltools/models/model.py", line 550, in _get_proxy_and_spec _MLModelProxy( ValueError: basic_string ``` Refs: https://github.com/ggml-org/whisper.cpp/issues/3012	2025-04-23 08:24:38 +02:00
Daniel Bevenius	7858eddd10	ci : disable freeBSD job in build.yml (#3064 ) This commit disables the FreeBSD job in build.yml of the GitHub Actions workflow. The motivation for this is that this job seems to stall and timeout from time to time, taking up to 6 hours to complete/cancel.	2025-04-22 11:07:54 +02:00
Daniel Bevenius	3a88f1e504	examples : add HEAPU8 to exported runtime methods (#3062 ) This commit adds `HEAPU8` to the list of exported methods. The motivation for this commit is that currently this is causing an error on Window systems where HEAPU8 in undefined, which results in the following error message in the web console: ```console main.js:1 Uncaught TypeError: Cannot read properties of undefined (reading 'buffer') at __emval_get_property (main.js:1:1363125) at 003a453a:0xc4a47 at 003a453a:0xc51cd at Object.full_default (eval at craftInvokerFunction (main.js:1:1347011), <anonymous>:9:10) at whisper.cpp/:647:42 ``` Resolves: https://github.com/ggml-org/whisper.cpp/issues/3059	2025-04-20 19:40:25 +02:00
KITAITI Makoto	f0d2bfbfb7	ruby : make Ruby bindings installed with build options (#3056 ) * Fix signature of URI.new7s return value * Use path instead of string \| _ToPath * Add document comment to RBS * Remove unnecessary build flags * Remove unnecessary line * Remove files have become unnecessary * Make gem install accept build options for whisper.cpp * Add instraction for build options in README * Add methods for check to Options * Test build options * Rename: configs -> options * Add assert_installed assertion * Use assert_installed * Remove unused attribute * Extract dependency check logic as Dependencies class * Update README * Add WHISPER_FFMPEG option * Test extra build options only on local test * Bump version to 1.3.2 [skip ci]	2025-04-17 18:49:58 +09:00
Sacha Arbonel	170b2faf75	whisper : add no_context parameter to whisper_params (#3045 )	2025-04-16 06:24:38 +02:00
Fujimoto Seiji	f8a3509b6d	examples : add FFmpeg v7.0 support to ffmpeg-transcode.cpp (#3038 ) FFmpeg introduced a new channel layout API that uses `AVChannelLayout` interface in v6.0. It subsequently dropped the old bitmask-based API in v7.0. This updates decode_audio() to support the new channel layout API, so that we can compile `whisper-cli` and `whisper-server` with FFmpeg v7.0 or later. Tested on on Ubuntu 24.10 with FFmpeg v7.0.2. Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net>	2025-04-15 06:09:00 +02:00
KITAITI Makoto	2a2d21c75d	ruby: use CMake in build process (#3043 ) * Use CMake to build shared object * Make Rakefile follow change of build process * Add test for packaging * Run CI for Ruby bindings almost always because each CMakeLists.txt might affect Ruby bindings * Enable PIC * Bump Ruby version to 3.2 on CI * Check libgomp * Check dependency of whisper.cpp accurately	2025-04-14 18:18:27 +09:00
Jeff Klassen	9cfcd6cc45	docs : update README.md to note newer nvidia gpus (#3031 ) Resolves: https://github.com/ggml-org/whisper.cpp/issues/3030	2025-04-11 08:54:51 +02:00
Lin Xiaodong	e853620270	addon.node : support max_context api for addon.node (#3025 ) * feat: support max content * feat: show api in test file --------- Co-authored-by: linxiaodong <calm.lin@wukongsch.com>	2025-04-11 06:36:38 +02:00
Georgi Gerganov	549db9376f	whisper : reduce delta_min from 1000ms to 100ms (#3028 ) ggml-ci	2025-04-11 06:23:02 +02:00
Fujimoto Seiji	33a25e4dda	docs : document how to use 'WHISPER_FFMPEG' build option (#3029 ) FFmpeg integration was introduced in `1b51fdf` by William Tambellini, but not mentioned in the main documentation. Add a short guide on how to enable the feature. Confirmed to work on both Ubuntu 24.04 and Fedora 39. Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net>	2025-04-10 18:21:38 +02:00
Ekaitz Zárraga	43f5030aeb	docs : fix README.md (#3024 )	2025-04-09 19:49:37 +02:00
Daniel Bevenius	cf794133de	xcf : use check for visionos build version (#3021 ) This commit adds a check for the visionos build version used with vtool in build-xcframework.sh. The script now checks the Xcode version and determines whether to use "xros" or "visionos" for the build version. This commit also uses xcrun for the vtool so that the version of vtool in xcode command line tools is used instead of the one in the system path. Refs: https://github.com/ggml-org/whisper.cpp/pull/2994#issuecomment-2773292223	2025-04-09 16:34:58 +02:00
Olli	ef6cf357e7	ruby : fix types of arguments for rb_get_kwargs in ruby_whisper_params.c (#3022 ) Change param_names and values not to be references for rb_get_kwargs - so it can be compiled on ruby 3.3.6 and 3.4.1	2025-04-09 20:49:25 +09:00
Olli	b1f5c11b32	ruby : Update uri.rb (#3016 ) Bugfix ... without this Pathname the "/" operator wouldn't work and will throw an error	2025-04-08 22:27:40 +09:00
Greg Sadetsky	ada745f4a5	models : fix dead link to models in readme (#3006 )	2025-04-06 08:29:41 +03:00
KITAITI Makoto	01985c22c0	ruby : change homepage URI in Ruby gemspec (#3007 )	2025-04-05 07:55:09 +03:00
Fujimoto Seiji	448f3d3b93	tests : add script to benchmark whisper.cpp on LibriSpeech corpus (#2999 ) * tests : add script to benchmark whisper.cpp on LibriSpeech corpus LibriSpeech is a widely-used benchmark dataset for training and testing speech recognition models. This adds a set of scripts to measure the recognition accuracy of whisper.cpp models, following the common benchmark standards. Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net> * Document how to prepare `whisper-cli` and model files Feedback from Daniel Bevenius. This adds a short code example how to prepare the `whisper-cli` command, to make the initial setup step a little bit clearer. Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net> * tests : Simplify how to set up Python environment Based on a feedback from Georgi Gerganov. Instead of setting up a virtual environment in Makefile, let users set up the Python environment. This is better since users may have their own preferred workflow/toolkit. Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net> --------- Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net>	2025-04-04 19:51:26 +03:00
Fujimoto Seiji	e6234cd435	whisper : fix "bench-all outputs an invalid result on larger models" (#3002 ) The benchmark script 'scripts/bench-all.sh' assumes that the 11th field of the output line is a timestamp. This assumption does not hold when the target model takes a bit longer to process. Fix this issue by introducing an explicit whitespace to the output lines of `whisper_print_timings()`. Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net>	2025-04-04 18:36:19 +03:00
Georgi Gerganov	2b6d0d2200	rename : ggerganov -> ggml-org (#3005 )	2025-04-04 16:11:52 +03:00
Daniel Bevenius	0b17d4507e	examples : update server.py to match github pages app [no ci] (#3004 ) This commit updates examples/server.py which is used to serve the wasm examples locally. The changes include: - Added a redirect from the root URL to /whisper.cpp. So now accessing http://localhost:8000/ will redirect to http://localhost:8000/whisper.cpp/ which matches the url for the app deployed to github pages. - Custom handling for coi-serviceworker.js to serve it to avoid and error in the console. This file is not strictly necessary for the local server to work as the headers are provided already but it is nice to not have an error in the console. - Fixed the shutdown of the server to ensure it exits cleanly on Ctrl+C. Previously it would continue to hang onto the port even after the processed had exited.	2025-04-04 10:23:53 +02:00
Daniel Bevenius	77e0c86ab6	whisper.wasm : fix unknown language issue (#3000 ) * whisper.wasm : fix unknown language issue This commit addresses an issue with whisper.wasm where the following error was being displayed when running the application in github pages: ``` whisper_lang_id: unknown language 'д=␙c' ``` This turned out to be a memory corruption issue and further details can be found in the reference issue below. Refs: https://github.com/ggerganov/whisper.cpp/issues/2998	2025-04-03 19:50:47 +02:00
Georgi Gerganov	eac1bc9c47	examples : add new sources ggml-ci	2025-04-03 10:30:16 +03:00
Georgi Gerganov	cbde66d913	sync : ggml	2025-04-03 10:30:16 +03:00
cmdr2	513ecf8dc0	cpu: move all the operators into a separate c++ file (except mul_mat) (ggml/1167) * cpu: refactor SIMD mappings and vectorized op functions into separate files * Fix warning for ggml_float to float * Fix warnings * cpu: move all the operations (except mul_mat) to a separate c++ file * fix whitespace * Update ggml/src/ggml-cpu/vec.h Co-authored-by: Diego Devesa <slarengh@gmail.com> * Fix PR comments - use GGML_UNUSED, use cassert in ops.cpp * Reverse the order of import for ops.h and vec.h, to match what was present in ggml-cpu.c previously --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-04-03 10:30:16 +03:00
Daniel Bevenius	cce5daf17b	docs : add xcframework section to README.md [no ci] (#2997 ) This adds a section to the README.md file that describes how to use the XCFramework. The modification for this is that is not obvious how to use the XCFramework and and example will help. One thing to note is that the example is using the latest release including the checksum. We are thinking about how we might automate this in the future but for now this is a good start.	2025-04-03 09:06:53 +02:00
Georgi Gerganov	2c502b3c00	readme : update roadmap link	2025-04-02 17:38:35 +03:00
Georgi Gerganov	51c6961c7b	release : v1.7.5 v1.7.5	2025-04-02 16:39:48 +03:00
Georgi Gerganov	503a786c9a	bench : update numbers [no ci] (#2993 )	2025-04-02 16:27:36 +03:00

1 2 3 4 5 ...

2433 Commits