whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-08-15 19:48:02 +02:00

Author	SHA1	Message	Date
hydai	262d0abc87	fix: add missing msg in static_assert (llama/11143) Signed-off-by: hydai <z54981220@gmail.com>	2025-01-14 10:38:01 +02:00
amritahs-ibm	124eec1664	llamafile : ppc64le MMA INT8 implementation (llama/10912) This change upstreams llamafile's cpu matrix multiplication kernels for ppc64le using MMA builtins for quantised int8 datatype. This change results in 10% - 70% improvement in total speed(ie all tokens/total time), across various batch sizes. The patch is tested with Meta-Lllama-3-8B, Mistral-7B, Llama-2-7B-chat-hf models on a IBM POWER10 machine. Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>	2025-01-14 10:38:01 +02:00
Mathieu Baudier	b08c3a88c8	Disable GL_KHR_cooperative_matrix Vulkan extension if not available. (llama/11117) * Disable GL_KHR_cooperative_matrix Vulkan extension if not available. * Perform Vulkan extensions checks in a more sensible order * Remove unnecessary #ifdef directive	2025-01-14 10:38:01 +02:00
ag2s20150909	0afce25a69	fix: Vulkan shader gen binary path when Cross-compiling (llama/11096) * fix: Vulkan shader gen binary path when cross compiling	2025-01-14 10:38:01 +02:00
Johannes Gäßler	acdbe58631	GGUF: C++ refactor, backend support, misc fixes (llama/11030) * GGUF: C++ refactor, backend support, misc fixes remove ggml_tensor.backend update CODEOWNERS [no ci] remove gguf_get_data from API revise GGUF API data types	2025-01-14 10:38:01 +02:00
Diego Devesa	09fabffdf5	ggml-backend : only offload from host buffers (fix) (llama/11124)	2025-01-14 10:38:01 +02:00
Diego Devesa	3988d6396b	ggml-backend : only offload from host buffers (llama/11120)	2025-01-14 10:38:01 +02:00
Radoslav Gerganov	c8c63eeec0	rpc : code cleanup (llama/11107) Remove duplicated macros, use GGML_LOG_ERROR for errors	2025-01-14 10:38:01 +02:00
Akarshan Biswas	abf7f24410	SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6 (llama/11087) * SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6 * Revert "SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6" This reverts commit f62dc45f318e48d375e7734b34cbddee81deed52. * Reland: Use get_multi_ptr instead of deprecated get_pointer in wkv6	2025-01-14 10:38:01 +02:00
Johannes Gäßler	341f5c28e6	CUDA: add BF16 support (llama/11093) * CUDA: add BF16 support	2025-01-14 10:38:01 +02:00
0cc4m	5377099524	Vulkan: Add device-specific blacklist for coopmat for the AMD proprietary driver (llama/11074) * Vulkan: Add device-specific blacklist for coopmat for the AMD proprietary driver * Add (TM) to AMD name check	2025-01-14 10:38:01 +02:00
matt23654	dcbb375779	Support for models with non-512-aligned tensors over RPC. (llama/11047) * Added init tensor calling code * Added get_alloc_size forwarding * Cleaned up and improved type/error handling. * fix: remove trailing whitespaces. * Cleanup and use GGML error logging functions. * Handle potentially dangerous edge cases. * Apply suggestions from code review Co-authored-by: Diego Devesa <slarengh@gmail.com> --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-01-14 10:38:01 +02:00
Gilad S.	4334c71aed	fix: Vulkan shader gen binary path (llama/11037)	2025-01-14 10:38:01 +02:00
Radoslav Gerganov	e875a82473	ggml : allow loading backend with env variable (ggml/1059) ref: #1058	2025-01-14 10:38:01 +02:00
Georgi Gerganov	507e230f1e	scripts : sync opencl, gguf	2025-01-14 09:42:16 +02:00
Georgi Gerganov	eb68324c86	whisper : fix gpu device selection (#2728 )	2025-01-13 13:11:37 +02:00
Georgi Gerganov	e940fbf283	server : fix build (#2718 )	2025-01-13 08:57:33 +02:00
Georgi Gerganov	35d0e02c72	talk-llama : sync llama.cpp (#2709 )	2025-01-13 08:55:48 +02:00
NETZkultur GmbH	45d3faf961	server : generate unique tmp filenames (#2718 ) #Summary This Merge Request adds a mechanism to generate unique filenames for FFmpeg conversions in whisper_server.cpp. Previously, a single fixed filename was used (e.g., whisper-server-tmp.wav), which could result in unexpected file overwrites under certain circumstances. By generating a unique filename per request, any risk of overwriting temporary files is eliminated. #Background / Motivation • Problem: Relying on a static filename for temporary audio files may lead to overwrites if multiple operations occur simultaneously or if the same file name is reused. • Goal: Dynamically generate unique filenames, ensuring each request or operation uses an isolated temporary file.	2025-01-13 08:55:21 +02:00
Sandro Hanea	2ab2eb5110	whisper : add whisper_full_get_segment_no_speech_prob_from_state (#2716 )	2025-01-09 16:21:07 +02:00
Jayant	b82d305282	readme : add docker instructions (#2711 ) I found the docker instructions to be useful in the README.md and the differences in docker variants such as ffmpeg and cuda support. However, this section was removed in v1.7.4 and I would vote to bring it back. This is a pull request to add that section back.	2025-01-07 13:20:51 +02:00
Adam Jones	885e31368d	docs: Fix main -> whisper-cli in download scripts (#2707 )	2025-01-06 15:17:57 +02:00
Georgi Gerganov	8a9ad7844d	release : v1.7.4 v1.7.4	2025-01-06 15:13:48 +02:00
Georgi Gerganov	eb874b3a3c	ci : cont	2025-01-06 10:46:10 +02:00
Georgi Gerganov	eb78e3a3f1	ci : fix ubuntu runner names	2025-01-06 09:29:10 +02:00
Yusuf Redžić	ece3ff88f6	cli : fix segfault on missing argument (#2700 )	2025-01-04 10:47:41 +02:00
Georgi Gerganov	9366544991	ci : fix arm builds	2025-01-04 10:45:01 +02:00
Georgi Gerganov	95583942ed	sync : ggml ggml-ci	2025-01-04 10:45:01 +02:00
Georgi Gerganov	2e93cb6a2f	ggml : do not install metal source when embed library (ggml/1054)	2025-01-04 10:45:01 +02:00
Georgi Gerganov	de5cd60d1c	metal : avoid uint (llama/11019)	2025-01-04 10:45:01 +02:00
Srihari-mcw	3fcba3e58b	ggml : fixes for AVXVNNI instruction set with MSVC and Clang (llama/11027) * Fixes for clang AVX VNNI * enable AVX VNNI and alder lake build for MSVC * Apply suggestions from code review --------- Co-authored-by: slaren <slarengh@gmail.com>	2025-01-04 10:45:01 +02:00
Jeff Bolz	cea5f1c52f	vulkan: optimize mul_mat for small values of N (llama/10991) Make the mul_mat_vec shaders support N>1 (as a spec constant, NUM_COLS) where the batch_strides are overloaded to hold the row strides. Put the loads from the B matrix in the innermost loop because it should cache better. Share some code for reducing the result values to memory in mul_mat_vec_base.	2025-01-04 10:45:01 +02:00
Jeff Bolz	2112462db4	vulkan: im2col and matmul optimizations for stable diffusion (llama/10942) * tests: Add im2col perf tests * vulkan: optimize im2col, more elements per thread * vulkan: increase small tile size for NV_coopmat2 * vulkan: change im2col to 512 elements per workgroup	2025-01-04 10:45:01 +02:00
Jeff Bolz	fc84ecd445	vulkan: Use push constant offset to handle misaligned descriptors (llama/10987)	2025-01-04 10:45:01 +02:00
Eve	8de1e99907	vulkan: multi-row k quants (llama/10846) * multi row k quant shaders! * better row selection * more row choices * readjust row selection * rm_kq=2 by default	2025-01-04 10:45:01 +02:00
Peter	499af9294a	examples, ggml : fix GCC compiler warnings (llama/10983) Warning types fixed (observed under MSYS2 GCC 14.2.0): * format '%ld' expects argument of type 'long int', but argument has type 'size_t' * llama.cpp/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp:81:46: warning: missing initializer for member '_STARTUPINFOA::lpDesktop' [-Wmissing-field-initializers] (emitted for all struct field except first)	2025-01-04 10:45:01 +02:00
Djip007	bcf937c216	ggml : more perfo with llamafile tinyblas on x86_64 (llama/10714) * more perfo with llamafile tinyblas on x86_64. - add bf16 suport - change dispache strategie (thanks: https://github.com/ikawrakow/ik_llama.cpp/pull/71 ) - reduce memory bandwidth simple tinyblas dispache and more cache freindly * tinyblas dynamic dispaching * sgemm: add M blocs. * - git 2.47 use short id of len 9. - show-progress is not part of GNU Wget2 * remove not stable test	2025-01-04 10:45:01 +02:00
Diego Devesa	b8d90953d7	ggml : use wstring for backend search paths (llama/10960) ggml-ci	2025-01-04 10:45:01 +02:00
Diego Devesa	60a422147b	ggml : fix arm enabled features check (llama/10961)	2025-01-04 10:45:01 +02:00
Diego Devesa	3387415bad	ggml : fix const usage in SSE path (llama/10962)	2025-01-04 10:45:01 +02:00
yuri@FreeBSD	536ca3ec89	ggml : fix run-time on FreeBSD in get_executable_path() (llama/10948)	2025-01-04 10:45:01 +02:00
Jeff Bolz	a4bb983190	vulkan: build fixes for 32b (llama/10927) * vulkan: build fixes for 32b Should fix #10923 * vulkan: initialize some buffer/offset variables	2025-01-04 10:45:01 +02:00
Jeff Bolz	39c205f555	vulkan: optimize coopmat2 dequant functions (llama/10855) Change the code to do 16b loads when possible and extract the appropriate component late, so the code is effectively decoding a pair of elements and then selecting one. This can allow more commoning to happen in the compiler when neighboring elements are loaded.	2025-01-04 10:45:01 +02:00
Adrien Gallouët	6d502f33dc	ggml-cpu: replace NEON asm with intrinsics in ggml_gemv_q4_0_4x8_q8_0() (llama/10874) * ggml-cpu: replace NEON asm with intrinsics in ggml_gemv_q4_0_4x8_q8_0() Signed-off-by: Adrien Gallouët <angt@huggingface.co> * ggml-cpu: format code Signed-off-by: Adrien Gallouët <angt@huggingface.co> --------- Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-01-04 10:45:01 +02:00
Akarshan Biswas	5ea27d089d	SYCL: Migrate away from deprecated ggml_tensor->backend (llama/10840) * Migrate to tensor->buffer for checking backend buffer type: 1 * SYCL: common.cpp try to migrate away from tensor->backend * SYCL: fix assertions and add proper comments * SYCL: remove extra space * SYCL: Add back static to ggml_backend_buffer_is_sycl_split function * SYCL: Add pragma directive to suppress warning spam * SYCL: Integrate debug logs with GGML_LOG and other fixes * Revert "SYCL: Integrate debug logs with GGML_LOG and other fixes" This reverts commit 2607b7de0f0d2f4f1f690226f86fa861aa39cb97. Let's keep the current SYCL specific logging mechanism for now * SYCL: Use GGML_SYCL_DEBUG after reverting * SYCL: reg_get_proc_address func, update to the current func signature * SYCL: Refactor SYCL buffer checks in ggml_sycl_cpy_tensor_2d	2025-01-04 10:45:01 +02:00
Diego Devesa	1462d92588	ggml : add test for SVE and disable when it fails (llama/10906)	2025-01-04 10:45:01 +02:00
Adrien Gallouët	7ba1a41f47	ggml: fix arm build with gcc (llama/10895) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-01-04 10:45:01 +02:00
Diego Devesa	5ea088636f	ggml : fix arm build (llama/10890) * ggml: GGML_NATIVE uses -mcpu=native on ARM Signed-off-by: Adrien Gallouët <angt@huggingface.co> * ggml: Show detected features with GGML_NATIVE Signed-off-by: Adrien Gallouët <angt@huggingface.co> * remove msvc support, add GGML_CPU_ARM_ARCH option * disable llamafile in android example * march -> mcpu, skip adding feature macros ggml-ci --------- Signed-off-by: Adrien Gallouët <angt@huggingface.co> Co-authored-by: Adrien Gallouët <angt@huggingface.co>	2025-01-04 10:45:01 +02:00
Georgi Gerganov	f32ddb3b1c	tts : add OuteTTS support (llama/10784) * server : add "tokens" output ggml-ci * server : output embeddings for all tokens when pooling = none ggml-ci * server : be explicit about the pooling type in the tests ggml-ci * server : do not normalize embeddings when there is no pooling ggml-ci * llama : add OuteTTS support (wip) * wip * extract features * first conv * group norm * resnet conv * resnet * attn * pos net * layer norm * convnext * head * hann window * fix n_embd + remove llama.cpp hacks * compute hann window * fft * spectrum processing * clean-up * tts : receive input text and generate codes * clip : fix new conv name * tts : minor fix * tts : add header + minor fixes ggml-ci * tts : add matchematical constant ggml-ci * tts : fix sampling + cut initial noise * tts : fixes * tts : update default samplers ggml-ci * tts : text pre-processing * tts : outetts-voc -> wavtokenizer-dec * tts : remove hardcoded constants ggml-ci * tts : fix tensor shapes * llama : refactor wavtokenizer tensors ggml-ci * cont ggml-ci * cont [no ci] * llama : update WavTokenizer to non-causal attn * llama : handle no-vocab detokenization * tts : add Python example for OuteTTS (wip) * tts : extend python example to generate spectrogram ggml-ci * server : fix rebase artifacts * tts : enable "return_tokens" in Python example ggml-ci * tts : minor fixes * common : support HF download for vocoder	2025-01-04 10:45:01 +02:00
Johannes Gäßler	79b75ece03	tests: add tests for GGUF (llama/10830)	2025-01-04 10:45:01 +02:00

1 2 3 4 5 ...

2056 Commits