whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-07-31 07:42:40 +02:00

Author	SHA1	Message	Date
zhouwg	2e04b81f3e	opencl : fix possible buffer overflow in dump_tensor (llama/14490)	2025-07-12 19:23:56 +03:00
Eric Zhang	cd87a2f7e0	opencl : skip empty nodes on cgraph compute (llama/14491)	2025-07-12 19:23:56 +03:00
lhez	e43c38f9f1	opencl : update upscale to support align corners (llama/14488)	2025-07-12 19:23:56 +03:00
Björn Ganster	ab850d4680	ggml : Callback before abort (llama/14481) * Add a callback that will be called just before abort. This allows apps without a console to display a message to the user and save data if needed. * Return previous callback to allow callback chaining * style fixes --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-07-12 19:23:56 +03:00
Georgi Gerganov	cdf5e72163	ci : disable fast-math for Metal GHA CI (llama/14478) * ci : disable fast-math for Metal GHA CI ggml-ci * cont : remove -g flag ggml-ci	2025-07-12 19:23:56 +03:00
Chenguang Li	32d7c10766	CANN: update aclnnGroupedMatmulV2 to aclnnGroupedMatmulV3 (llama/14411) * [CANN]update to aclnnGroupedMatmulV2 Signed-off-by: noemotiovon <757486878@qq.com> * Support MUL_MAT_ID on 310p Signed-off-by: noemotiovon <757486878@qq.com> * fix editorconfig Signed-off-by: noemotiovon <757486878@qq.com> --------- Signed-off-by: noemotiovon <757486878@qq.com>	2025-07-12 19:23:56 +03:00
Jeff Bolz	3c7939cfe5	vulkan: Split large mul_mat_id to fit in shared memory (llama/14451)	2025-07-12 19:23:56 +03:00
Sigbjørn Skjæret	6fc80e8456	add GELU_ERF (llama/14455)	2025-07-12 19:23:56 +03:00
Acly	19b9aaf044	vulkan : implement bilinear interpolation for ggml_upscale/ggml_interpolate (ggml/1291) * supports GGML_SCALE_MODE_BILINEAR and GGML_SCALE_FLAG_ALIGN_CORNERS	2025-07-12 19:23:56 +03:00
Acly	f98cb6607b	vulkan : implement ggml_roll (ggml/1290) * vulkan : implement ggml_roll * vulkan : refactor vk_op_unary_push_constants initialization	2025-07-12 19:23:56 +03:00
Daniel Bevenius	5ea5c58768	ggml : add version function to get lib version (ggml/1286) * ggml : add version function to get lib version This commit adds a function `ggml_version()` to the ggml library that returns the version of the library as a string. The motivation for this is that it can be useful to be able to programmatically check the version of the ggml library being used. Usage: ```c printf("GGML version: %s\n", ggml_version()); ``` Output: ```console GGML version: 0.0.2219 ``` * ggml : add ggml_commit() --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-07-12 19:23:56 +03:00
accessiblepixel	869335f2d5	server : add dtw.params for v3-large-turbo (#3307 ) * Add DTW model large-v3-turbo parameters to server.cpp example DTW support is available in whispercpp and the large-v3-turbo model has already been added to the sources, but the large-v3-turbo model hasn't been added to the server.cpp file to make use of it. This commit hopefully corrects that issue. * match original linebreak of original server.cpp file after adding large.v3.turbo dtw	2025-07-07 12:51:15 +03:00
Lin Xiaodong	d9999d54c8	feat: support vad for addon.node (#3301 ) Co-authored-by: linxiaodong <calm.lin@wukongsch.com>	2025-07-02 13:14:29 +03:00
Georgi Gerganov	bca021c974	sync : ggml ggml-ci	2025-07-01 17:54:53 +03:00
Georgi Gerganov	1f816de7da	talk-llama : sync llama.cpp	2025-07-01 17:54:53 +03:00
Georgi Gerganov	c4ea72be9a	ggml : remove trailing whitespace (llama/0)	2025-07-01 17:54:53 +03:00
lhez	1e930ab1b8	opencl : add GEGLU, REGLU, SWIGLU (llama/14456)	2025-07-01 17:54:53 +03:00
Aman Gupta	b5b237d49a	Add Conv2d for CPU (llama/14388) * Conv2D: Add CPU version * Half decent * Tiled approach for F32 * remove file * Fix tests * Support F16 operations * add assert about size * Review: further formatting fixes, add assert and use CPU version of fp32->fp16	2025-07-01 17:54:53 +03:00
Georgi Gerganov	679f31a9d1	metal : disable fast-math for some cpy kernels (llama/14460) * metal : disable fast-math for some cpy kernels ggml-ci * cont : disable for q4_1 ggml-ci * cont : disable for iq4_nl ggml-ci	2025-07-01 17:54:53 +03:00
Romain Biessy	e29e36aee7	ggml-cpu: sycl: Re-enable exp f16 (llama/14462)	2025-07-01 17:54:53 +03:00
xiaobing318	6bb1234a56	cmake : Remove redundant include path in CMakeLists.txt (llama/14452) * Update docker.yml 修改docker.yml文件中的内容使其停止周期性的运行该workflow，如果想要运行该workflow可以手动启动 * Remove redundant include path in CMakeLists.txt The parent directory '..' was removed from the include directories for the ggml-cpu-feats target, to avoid unnecessary include paths. * Enable scheduled Docker image builds Uncomments the workflow schedule to trigger daily Docker image rebuilds at 04:12 UTC, improving automation and keeping images up to date.	2025-07-01 17:54:53 +03:00
Vedran Miletić	3239359bd1	scripts : make the shell scripts cross-platform (llama/14341)	2025-07-01 17:54:53 +03:00
Akarshan Biswas	e81be92931	SYCL: disable faulty fp16 exp kernel (llama/14395) * SYCL: disable faulty fp16 CPU exponent for now * Revert "SYCL: disable faulty fp16 CPU exponent for now" This reverts commit ed0aab1ec31b4eb4b0f275dd7acd41d96a375202. * SYCL: disable faulty fp16 CPU exponent for now * Fix logic of disabling exponent kernel	2025-07-01 17:54:53 +03:00
Sigbjørn Skjæret	130044f228	ggml : fix unmerged GGML_FPxx_TO_FPxx refactoring (llama/14443)	2025-07-01 17:54:53 +03:00
Sigbjørn Skjæret	8bc638ee56	ggml : implement REGLU/GEGLU/SWIGLU ops (llama/14158) * implement unary REGLU/GEGLU/SWIGLU cpu ops * relax constraints * duplicate shape of source * fix ggml_vec_geglu_f16 * special case gated ops * implement unary REGLU/GEGLU/SWIGLU cuda ops * tighten constraints again * refactor into GGML_GLU_OP * metal : add glu kernels ggml-ci * add CUDA_GLU_BLOCK_SIZE [no ci] * more constraints and use 64bit ints ggml-ci * 64bit multiplication [no ci] * implement swapped variants (cpu/cuda) * update comment [no ci] ggml-ci * Vulkan: Add GLU ops and shaders * SYCL: Implement fused kernel GEGLU, SWIGLU and REGLU for single up+gate * ggml : implement GLU for split up/gate (llama/14181) * implement GLU for split up/gate * add tests for ggml_glu_split * Vulkan: Implement glu_split logic and shader support * add split to logging [no ci] * SYCL: refactor element_size ops and add split up and gate support to gated kernels * SYCL: switch GEGLU to use tanh approximation --------- Co-authored-by: 0cc4m <picard12@live.de> Co-authored-by: Akarshan <akarshan@menlo.ai> * GGML: increase OP count in assertion * Refactor: Optimize SYCL element-wise operations with unary function inlining This commit refactors the SYCL element-wise operations to improve performance by: - Inlining unary operations (sgn, abs, elu, gelu, silu, etc.) to reduce kernel launch overhead. - Introducing helper functions `op_xxx` for each unary operation to encapsulate the logic. - Replacing direct kernel calls with calls to these inlined functions. - Using `__dpct_inline__` to encourage compiler inlining. - Minor code cleanup and consistency improvements. The changes aim to reduce kernel launch overhead and improve the overall efficiency of element-wise operations on SYCL devices. * vulkan: Increase workgroup size for GLU, for performance (llama/14345) * vulkan: Increase workgroup size for GLU, for performance * vulkan: change GLU shaders to do one element per invocation rather than one row per workgroup * merge fix * metal : add support for split and swap ggml-ci --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: 0cc4m <picard12@live.de> Co-authored-by: Akarshan <akarshan@menlo.ai> Co-authored-by: Jeff Bolz <jbolz@nvidia.com>	2025-07-01 17:54:53 +03:00
Jeff Bolz	00b36237ba	vulkan: Add fusion support for RMS_NORM+MUL (llama/14366) * vulkan: Add fusion support for RMS_NORM+MUL - Add a use_count to ggml_tensor, so we can detect if an output is used more than once. - Change the ggml-vulkan rms_norm shader to optionally multiply by another tensor. - Add detection logic and basic fusion logic in ggml-vulkan. - Add some testing support for fusion. Rather than computing one node at a time, allow for computing the whole graph and just testing one node's results. Add rms_norm_mul tests and enable a llama test. * extract some common fusion logic * fix -Winconsistent-missing-override * move ggml_can_fuse to a common function * build fix * C and C++ versions of can_fuse * move use count to the graph to avoid data races and double increments when used in multiple threads * use hash table lookup to find node index * change use_counts to be indexed by hash table slot * minimize hash lookups style fixes * last node doesn't need single use. fix type. handle mul operands being swapped. * remove redundant parameter --------- Co-authored-by: slaren <slarengh@gmail.com>	2025-07-01 17:54:53 +03:00
Aman Gupta	b900ee424c	CUDA: add bf16 and f32 support to cublas_mul_mat_batched (llama/14361) * CUDA: add bf16 and f32 support to cublas_mul_mat_batched * Review: add type traits and make function more generic * Review: make check more explicit, add back comments, and fix formatting * Review: fix formatting, remove useless type conversion, fix naming for bools	2025-07-01 17:54:53 +03:00
Jeff Bolz	f641a4c410	vulkan: handle noncontig in the final case of ggml_vk_get_cpy_pipeline (llama/14378)	2025-07-01 17:54:53 +03:00
Jeff Bolz	9e48afba2f	vulkan: lock accesses of pinned_memory vector (llama/14333)	2025-07-01 17:54:53 +03:00
Xinpeng Dou	f31ed384f4	fix async_mode bug (llama/14432)	2025-07-01 17:54:53 +03:00
Jeff Bolz	0b09f5bbad	vulkan: Fix GGML_VULKAN_SHADER_DEBUG_INFO (llama/14427) This setting needs to be passed through to vulkan-shaders-gen	2025-07-01 17:54:53 +03:00
Radoslav Gerganov	48fb51f314	ggml : add ggml_set_rows (llama/14274) * ggml : add ggml_set_rows Add ggml_set_rows(a, b, c) which copies rows from 'b' into 'a' using indices from 'c'. ref: #8366 * use I64 for indices * ggml : add repeat impl for i64 * ggml : add ggml_is_contiguous_rows * ggml : ggml_set_rows support broadcast * ggml : ggml_set_rows support quantized dst ggml-ci * ggml : support GGML_TYPE_F32 ".from_float" trait * ggml : ggml_set_rows update comment + better index name * tests : add ggml_set_rows * metal : add ggml_set_rows implementation ggml-ci * ggml : simplify forward_dup_f32 * ggml : fix supports_op * tests : add comment to set_rows * ggml : leave the repeat_i64 for a separate PR ggml-ci * ggml : set_rows use std::min instead of MIN * ggml : better error message for set_rows unsupported type * metal : perform op->type check only once * tests : more consistent implementation + more tests ggml-ci --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-07-01 17:54:53 +03:00
bandoti	566462a5c0	cmake: regen vulkan shaders when shaders-gen sources change (llama/14398) * Add shaders-gen sources as target deps	2025-07-01 17:54:53 +03:00
Georgi Gerganov	c300f1e32d	metal : add special-case mat-vec mul for ne00 == 4 (llama/14385) ggml-ci	2025-07-01 17:54:53 +03:00
Georgi Gerganov	c848b9fbef	metal : batch rows copy in a single threadgroup (llama/14384) * metal : batch rows copy in a single threadgroup ggml-ci * metal : handle some edge cases when threadgroup size is not a power of 2 ggml-ci	2025-07-01 17:54:53 +03:00
R0CKSTAR	a5e6a3c953	musa: enable fp16 mma (all) and cublas on qy2 (llama/13842) * musa: enable fp16 mma (all) and cublas on qy2 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Update ggml/src/ggml-cuda/ggml-cuda.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Address review comments Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Address review comments Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * musa: disable MUL_MAT_ID (q2_k × f32) due to precision issues Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-07-01 17:54:53 +03:00
Aaron Teo	16aa7d151d	ggml-cpu: enable IBM NNPA Vector Intrinsics (llama/14317) * ggml-cpu: add nnpa compile flag Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit 4a9f60c201573128f73a65999b3e5cc497fae5c1) * ggml-cpu: add fp16->fp32 nnpa first Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit 8d4a7987f9c1887f716be96250f2caeee0253929) * ggml-cpu: add fp32->fp16 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit 0ff0d6516247a41d2ade42b42cf0d676a4dd1627) * ggml-cpu: better variable names Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit 2f58bbcbb89c183340e252362b2a40651f573f1f) * docs: update s390x docs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit 01b929491b50071a5d0572235dcf5a449da70aa7) * ggml-cpu: add debugging prints to see if dlf16 is correct Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix print vs printf Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix float placeholder Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: ensure fp16 and fp32 load and stores are called Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fp16 load ensured to hit Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: remove sigint from fp16 store for some reason, the function is not getting a hit when debugged with gdb. we will need to investigate further Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: activate nnpa for ggml_cpu_fp16_to_fp32 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: nnpa activate ggml_cpu_fp16_to_fp32 for 8 elements Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: nnpa switch to vec_xst test Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: switch to vec_xst for 4 element loops also Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: rework noop Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: remove noop, general code cleanup Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: clarify variable naming Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: activate nnpa for ggml_cpu_fp32_to_fp16 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: add breakpoint for debugging Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: test fix for conversion failure Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: disable fp32->fp16 nnpa conversions for now there are some conversion failures in nnpa that requires the eyes of an ibm stsm. will create a separate pr to introduce the fp32->fp16 change. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: switch to elif macro Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: reattempt fp32->fp16 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix typo Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: reattempt fp32->fp16 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix compiler types Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: change to typedef vector types Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: add 4 element loops for fp32->fp16 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: clarified vector naming Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: bring back fp32->fp16 store nnpa Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: activate nnpa fp32->fp16 or fp16->fp32 compute Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: add nnpa macro check in ggml-impl Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: add missing __func__ Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: diagnose why __NNPA__ macro is not being defined Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: import vecintrin.h to fix compiler errors Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: update macro tests Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: move s390x typedef to own header file Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Revert "ggml-cpu: move s390x typedef to own header file" This reverts commit 157f856c34589566151630e294563a420702db39. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: switch to importing ggml-cpu-impl instead Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix macro declaration Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: test more macros Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: add debug prints Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: bruteforce macro definitions Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: move macro definitions Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: add ggml-impl.h to cmakelists Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: switch to private macros Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: move s390x typedef to own header file Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit 157f856c34589566151630e294563a420702db39) * ggml-cpu: move things around Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: bring back compile macros Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: switch to quotes for import Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: add compiler error macro Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: add s390x detection in ggml-src Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: bring back compile definitions Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: undo cmakelists work Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Revert "ggml-cpu: move s390x typedef to own header file" This reverts commit 18d79e1a30b39d9aaa0bd58400c5cf2c32135c9a. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: remove typedefs.h Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: remove typedef from cmakelists Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: add ggml-impl.h future notes Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: add todo comment for future reference Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: clarify naming of dlf16 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: remove unnecessary target compile definitions Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: move nnpa fp16->fp32 and fp32->fp16 to simd-mappings Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: refactor fp32->fp16 and fp16->fp32 simd to ggml-cpu Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: update broken huggingface link for s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix duplicate func names during compile Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Revert "ggml-cpu: fix duplicate func names during compile" This reverts commit fbb733451f27677063b914d4f6c9a9841d45b38d. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Revert "ggml: refactor fp32->fp16 and fp16->fp32 simd to ggml-cpu" This reverts commit bd288e8fa52b5244f65cee21cb61062f1a9e0ca5. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: refactor fp16<->fp32 simd to ggml-cpu Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix missing simd-mappings.h import in quants.c Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix missing simd-mappings.h within repack Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix amx mmq missing simd-mappings.h Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: attempt at fixing loongarch failing build Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: move nnpa together with other fp16<->fp32 simd Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix wrong refactor of ggml-base ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164176555 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: remove dependency on ggml-cpu from ggml-base Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: rename all fp16<->fp32 macros to prefix with ggml_cpu ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164449406 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: remove mistaken fallback macro fallback logic was already implemented but i was too sleepy to realise Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: move ggml_table_f32_f16 to ggml-cpu ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164775006 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: move ggml_table_f32_f16 back to ggml-base due to ci failures Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Revert "ggml-cpu: move ggml_table_f32_f16 back to ggml-base due to ci failures" This reverts commit 32a3533564bdb7902cefb9c89b1c9e956a81ce29. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Revert "ggml: move ggml_table_f32_f16 to ggml-cpu" This reverts commit 9e40d984ad27d7b60392fb2b7548885201864fe4. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: move ggml_table_f32_f16 to ggml-cpu ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164775006 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit 9e40d984ad27d7b60392fb2b7548885201864fe4) * ggml: move ggml_table_f32_f16 to ggml-cpu.c Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: extern c ggml_table_f32_f16 + chore docs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: dedup ggml_table_f32_f16 from simd-mappings.h we rely on the variable declaration in ggml-cpu.c instead Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Revert "ggml-cpu: dedup ggml_table_f32_f16 from simd-mappings.h" This reverts commit f71b21d2f74f5e03ec0c2b4fefd3cbf395aecf16. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: bring back ggml_table_f32_f16 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Revert "ggml-cpu: bring back ggml_table_f32_f16" This reverts commit 2dce119178bed5ef5c8398c4230ddd14fef80e49. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * fix ggml time initialization * fix f32_f16 table init * remove extra line --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> Co-authored-by: slaren <slarengh@gmail.com>	2025-07-01 17:54:53 +03:00
Sigbjørn Skjæret	99764f5767	ggml : do not output unprintable characters on GGUF load failure (llama/14381)	2025-07-01 17:54:53 +03:00
Anton Mitkov	fc28594112	sycl: GGML_SYCL_DISABLE_OPT on by default for all Intel Devices (llama/13973)	2025-07-01 17:54:53 +03:00
lhez	acfbf2921b	opencl: ref count `ggml_backend_opencl_context` and refactor profiling (llama/14254) * Move profiling info into `ggml_backend_opencl_context` * Add `enqueue_ndrange_kernel` to launch kernel	2025-07-01 17:54:53 +03:00
uvos	6a1d12a8ea	CUDA/HIP: optimize mmv paths taken for HIP devices (llama/14324) Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-07-01 17:54:53 +03:00
Johannes Gäßler	06b01ba87b	CUDA: mul_mat_v support for batch sizes > 1 (llama/14262) * CUDA: mul_mat_v support for batch sizes > 1 * use 64 bit math for initial offset calculation	2025-07-01 17:54:53 +03:00
uvos	791201a974	HIP: enable vec fattn on RDNA4 (llama/14323)	2025-07-01 17:54:53 +03:00
Aman Gupta	abb650c0ec	CUDA: add mean operation (llama/14313) * CUDA: add mean operation * add back sum_rows_f32_cuda * Review: early exit if col!=0	2025-07-01 17:54:53 +03:00
Markus Tavenrath	e036676795	Add support for VK_EXT_debug_utils to add labels to Vulkan objects. (llama/13792) * Add support for VK_EXT_debug_utils to add labels to Vulkan objects. In step 1 compute pipelines are getting labeled. * remove #ifdef for debug utils and add queue marker.	2025-07-01 17:54:53 +03:00
Georgi Gerganov	c1418b9906	metal : fix thread-safety (llama/14300) ggml-ci	2025-07-01 17:54:53 +03:00
Acly	9d7cb80f04	ggml-cpu : "align corners" for bilinear upscale/downscale (ggml/1285) * add "align corners" mode for bilinear upscale, and allow downscaling * add ggml_interpolate, deprecate ggml_upscale_ext, pass in align-corners as bit-flag * test-backend-ops: replace ggml_upscale_ext with ggml_interpolate, add test cases for downscale and align-corners	2025-07-01 17:54:53 +03:00
Daniel Bevenius	515df20351	ggml-quants : rename best_mad to best_error (ggml/1283) This commit renames the variable `best_mad` to `best_error` in the `make_qkx2_quants` function. The motivation for this is that the name `best_mad` can be somewhat confusing if mean absolute deviation (MAD) is not in use.	2025-07-01 17:54:53 +03:00
Daniel Bevenius	c88ffbf9ba	ci : use selective copy for musa image (#3296 ) This commit modified the musa docker file to selectively copy directories needed for the container image. This commit also added a step to the docker workflow to free up disk space in attempt to make enough room for the large musa build containers. The motivation for this change is to reduce the size of the container image and try to avoid disk usage issues in CI.	2025-06-27 15:43:56 +02:00
Daniel Bevenius	7069394447	ci: set fail-fast to false in docker.yml (#3294 ) * ci: set fail-fast to false in docker.yml This commit modifies the GitHub Actions workflow for Docker builds to disable the fail-fast behavior. The motivation for this is that currently if one of the strategy jobs fails any other job that is in progress will be cancelled. There is no need for this as the jobs are independent. * ci : update docker.yml to use a single build This commit updates the docker job to only build the image once instead of twice (only happens when pushing to the master branch). Instead this will tag the image with the commit SHA when pushing to master. The motivation for this change is to reduce the time it takes to run this job and also it might help with the disk space issues we are experiencing for this job when it runs on pushes to master.	2025-06-27 09:55:56 +02:00

1 2 3 4 5 ...

2865 Commits