whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-08-19 06:06:44 +02:00

Author	SHA1	Message	Date
Akarshan Biswas	fd369871f7	SYCL: remove XMX info from print devices (llama/11712)	2025-02-27 08:55:36 +02:00
Jinyang He	bbd8364f5e	ggml : optimize and build warning fix for LoongArch (llama/11709) * ggml : optimize convert f32<->f16 for loongarch_asx * ggml : optimize loongarch_asx extend i16,i8,u8 to i32,i16 * ggml : Fix warnings when run cpu CI locally on LoongArch	2025-02-27 08:55:36 +02:00
Akarshan Biswas	e4102440ef	SYCL: Adjust support condition for norm operators (llama/11674) SYCL does not support non contiguous tensors for norm operations	2025-02-27 08:55:36 +02:00
junchao-zhao	f8242ec483	ggml : fix LoongArch compile error with 128-bit SIMD (llama/11701)	2025-02-27 08:55:36 +02:00
Jeff Bolz	ef51b4cba4	vulkan: optimize coopmat2 iq2/iq3 callbacks (llama/11521) * vulkan: optimize coopmat2 iq2/iq3 callbacks * build: trigger CI on GLSL compute shader changes	2025-02-27 08:55:36 +02:00
Rémy O	6f08b24146	vulkan: initial support for IQ4_XS quantization (llama/11501)	2025-02-27 08:55:36 +02:00
Jeff Bolz	7c165d7fa8	vulkan: use smaller combined allocations to avoid fragmentation (llama/11551)	2025-02-27 08:55:36 +02:00
Charles Duffy	2f0cf44915	metal : avoid breaking build when metal API predates TARGET_OS_VISION (llama/11690) Avoids breakage in nix flake build introduced by b0569130c5e9c671152c913d82803b7c2f014ff9	2025-02-27 08:55:36 +02:00
Georgi Gerganov	b9c972fd0d	metal : adjust support conditions for norm operators (llama/11671) cont #11659 ggml-ci	2025-02-27 08:55:36 +02:00
Johannes Gäßler	01c9aafbfd	CUDA: support for mat. mul. with ne03 != ne13 (llama/11656)	2025-02-27 08:55:36 +02:00
Johannes Gäßler	bae6bbf487	CUDA: non-contiguous (RMS) norm support (llama/11659) * CUDA: non-contiguous (RMS) norm support --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-02-27 08:55:36 +02:00
fxzjshm	c310272fa0	HIP: force max threads per block to be 1024 (llama/11621) Some old/vendor forked version of llvm still use 256. Explicitly set it to 1024 to align with upstream llvm. Signed-off-by: fxzjshm <fxzjshm@163.com>	2025-02-27 08:55:36 +02:00
Jhen-Jie Hong	bd0b55dbe0	metal : use residency set for other platforms (llama/11648)	2025-02-27 08:55:36 +02:00
Patrick Peng	ba4645db2c	rpc: fix known RCE in rpc-server (ggml/1103) Add bounds checking in `rpc_server::copy_tensor` to prevent out-of-bounds writes + Check if `(uint8_t *)dst->data + ggml_nbytes(src)` remains within the destination buffer’s allocated region.	2025-02-27 08:55:36 +02:00
masahji	dfc6ca62f3	stream : add beam size parameter(#2836 ) * feat: Add beam size parameter to stream.cpp for beam search configuration * feat: Add beam size parameter to whisper full params in stream example * fix: Remove duplicate beam search size assignment in server.cpp	2025-02-25 11:39:33 +02:00
Thomas Fitzsimmons	47e14c0529	whisper : restore big endian support (#2816 ) * whisper : fix BYTESWAP whitespace * whisper : make byteswap useable with C++17 * cmake : define WHISPER_BIG_ENDIAN for big-endian targets * ci : fix (again) arm64 build fails * docker : attempt fixing arm64 build on ci * qemu v7.0.0-28 [imported from https://github.com/ggml-org/llama.cpp /commit/818a340ea8be55b3706e1772527cb8738e90a8c7 (#11895)] --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-02-25 11:38:13 +02:00
Judd	d682e15090	Fixes for Windows (#2790 ) Fixes for Windows: * MSVC default to utf-8 without BOM. * Console output code page changed to utf-8. --------- Co-authored-by: Judd <foldl@boxvest.com>	2025-02-06 15:37:21 +08:00
midnight	46d07b9c85	cmake : fix compile assumptions for power9/etc (#2777 ) * Add small comment re: VSX to readme Co-authored-by: midnight <midnight@example.com>	2025-02-05 14:41:10 +02:00
Georgi Gerganov	33ea03f131	authors : update	2025-02-04 13:03:40 +02:00
Georgi Gerganov	dbcc669e1a	sync : ggml	2025-02-04 13:03:09 +02:00
Christian Kastner	16245b35e4	cmake: Add ability to pass in GGML_BUILD_NUMBER (ggml/1096) This makes git as a dependency optional, and is useful in the case where ggml is built not from git, but from a tarball, or a distribution source package. This conditional also affects GGML_BUILD_COMMIT. Nothing seems to be using it, though, so there doesn't seem much value factor it out, or even require it.	2025-02-04 13:03:03 +02:00
Georgi Gerganov	898c0cb9d1	readme : add maintenance roadmap	2025-02-04 10:50:10 +02:00
Georgi Gerganov	eb9e5032c4	ci : add stalebot	2025-02-04 09:30:20 +02:00
billyct	cadfc50eab	node : add max_len params in node addon (#2760 )	2025-02-03 22:49:06 +02:00
Georgi Gerganov	3f91832352	talk-llama : sync llama.cpp	2025-02-03 22:42:26 +02:00
mgrachten	cff8868b5f	coreml : always convert to "neuralnetwork" (#2770 )	2025-02-03 22:36:32 +02:00
Georgi Gerganov	90e3c5fc40	ci : more git	2025-02-03 22:00:57 +02:00
Georgi Gerganov	e0f4cef867	ci : install git	2025-02-03 22:00:57 +02:00
Georgi Gerganov	234460987e	ci : use ubuntu-22.04 instead of ubuntu-latest	2025-02-03 22:00:57 +02:00
Georgi Gerganov	b8ab126343	cmake : sync cmake scripts	2025-02-03 22:00:57 +02:00
Georgi Gerganov	edc5d9267c	sync : ggml	2025-02-03 22:00:57 +02:00
Georgi Gerganov	344b98a44f	scripts : fix sync paths	2025-02-03 22:00:57 +02:00
Johannes Gäßler	dbeb7916b8	CUDA: fix Volta FlashAttention logic (llama/11615)	2025-02-03 22:00:57 +02:00
Johannes Gäßler	fad2806352	HIP: fix flash_attn_stream_k_fixup warning (llama/11604)	2025-02-03 22:00:57 +02:00
uvos	9906792ec3	CUDA/HIP: add support for selectable warp size to mmv (llama/11519) CUDA/HIP: add support for selectable warp size to mmv	2025-02-03 22:00:57 +02:00
uvos	c49ee07ff4	HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (llama/11601) This fixes a bug where RDNA1 gpus other than gfx1010 where not handled correctly	2025-02-03 22:00:57 +02:00
Johannes Gäßler	f8a831779e	CUDA: use mma PTX instructions for FlashAttention (llama/11583) * CUDA: use mma PTX instructions for FlashAttention * __shfl_sync workaround for movmatrix * add __shfl_sync to HIP Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-02-03 22:00:57 +02:00
Olivier Chafik	85451e3612	`ci`: use sccache on windows instead of ccache (llama/11545) * Use sccache on ci for windows * Detect sccache in cmake	2025-02-03 22:00:57 +02:00
uvos	43c744ce8b	HIP: require at least HIP 5.5	2025-02-03 22:00:57 +02:00
uvos	fc2e44490d	HIP: Prepare reduction operators for wave 64	2025-02-03 22:00:57 +02:00
uvos	f41fdad200	CUDA/HIP: add warp_size to cuda_device_info	2025-02-03 22:00:57 +02:00
Rémy Oudompheng	80fa576254	vulkan: implement initial support for IQ2 and IQ3 quantizations (llama/11360) * vulkan: initial support for IQ3_S * vulkan: initial support for IQ3_XXS * vulkan: initial support for IQ2_XXS * vulkan: initial support for IQ2_XS * vulkan: optimize Q3_K by removing branches * vulkan: implement dequantize variants for coopmat2 * vulkan: initial support for IQ2_S * vulkan: vertically realign code * port failing dequant callbacks from mul_mm * Fix array length mismatches * vulkan: avoid using workgroup size before it is referenced * tests: increase timeout for Vulkan llvmpipe backend --------- Co-authored-by: Jeff Bolz <jbolz@nvidia.com>	2025-02-03 22:00:57 +02:00
Jeff Bolz	75e7d0585e	vulkan: Catch pipeline creation failure and print an error message (llama/11436) * vulkan: Catch pipeline creation failure and print an error message Also, fix some warnings from my on-demand compile change. * vulkan: fix pipeline creation logging	2025-02-03 22:00:57 +02:00
uvos	682a6f5f87	HIP: Supress transformation warning in softmax.cu loops with bounds not known at compile time can not be unrolled. when ncols_template == 0, the bounds of the loop are not constexpr, thus llvm cant unroll the loops here.	2025-02-03 22:00:57 +02:00
Nikita Sarychev	115716d109	HIP: Only call rocblas_initialize on rocblas versions with the multiple instantation bug (llama/11080) This disables the workaround on rocblas fixed versions (>=4.0.0) to eliminate the runtime cost and unnecessary VRAM allocation of loading all tensile objects.	2025-02-03 22:00:57 +02:00
someone13574	b2cfef655b	cmake : don't fail on `GGML_CPU=OFF` (llama/11457)	2025-02-03 22:00:57 +02:00
Akarshan Biswas	22e3df0afa	SYCL : SOFTMAX F16 mask support and other fixes (llama/11261) Implemented ggml_sycl_op_soft_max() F16 src1(mask) support for which a pragma deprecation warning was added during #5021. To do this, had to decouple it from ggml_sycl_op_flatten which always considered src1 to be of fp32 type(many OP functions are dependent on it). * SYCL: SOFTMAX F16 mask support and other fixes * test-backend-ops: Add F16 mask test cases	2025-02-03 22:00:57 +02:00
Haus1	028511d349	AMD: parse the architecture as supplied by gcnArchName (llama/11244) The value provided by minor doesn't include stepping for AMD, parse the value returned by gcnArchName instead to retrieve an accurate ID.	2025-02-03 22:00:57 +02:00
Ihar Hrachyshka	70c4038842	metal: Handle null returned from MTLCreateSystemDefaultDevice() (llama/11441) This fixes segmentation fault error when running tests when no metal devices are available (for example, when not linked with Core Graphics framework or otherwise).	2025-02-03 22:00:57 +02:00
Georgi Gerganov	8639c003a9	metal : use residency sets (llama/11427) * metal : use residency sets ggml-ci * metal : restore commandBufferWithUnretainedReferences calls [no ci] * metal : release descriptors ggml-ci * metal : check env GGML_METAL_NO_RESIDENCY ggml-ci * metal : fix build + clean-up ggml-ci	2025-02-03 22:00:57 +02:00

... 4 5 6 7 8 ...

2397 Commits