whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-08-04 17:49:49 +02:00

Author	SHA1	Message	Date
Daniele	73703a144f	CUDA: revert part of the RDNA1 optimizations (llama/8309) The change on the launch_bounds was causing a small performance drop in perplexity of 25 t/s	2024-07-08 14:53:55 +03:00
Johannes Gäßler	e89fdceec2	CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 (llama/8311)	2024-07-08 14:53:55 +03:00
luoyu-intel	29a2739d27	Fix WARP_SIZE=16 bug of Intel GPU (llama/8266) * fix group_norm ut * split softmax * fix softmax * add concat support condition * revert debug code * move QK_WARP_SIZE to presets.hpp	2024-07-08 14:53:55 +03:00
Neo Zhang Jianyu	ee6d17f6b4	rm get_work_group_size() by local cache for performance (llama/8286) Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>	2024-07-08 14:53:55 +03:00
Daniele	95e90823d9	Define and optimize RDNA1 (llama/8085)	2024-07-08 14:53:55 +03:00
Judd	005cc45df3	fix typo (llama/8267) Co-authored-by: Judd <foldl@boxvest.com>	2024-07-08 14:53:55 +03:00
Clint Herron	c2c60dc9ba	Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (llama/8258)	2024-07-08 14:53:55 +03:00
slaren	4af3194b7c	cuda : update supports_op for matrix multiplication (llama/8245)	2024-07-08 14:53:55 +03:00
luoyu-intel	4a2ba1a065	Fix win build conflict of math library (llama/8230) * fix win build conflict of math library * fix the condition: !(win32 & SYCL) * revert warp_size=16	2024-07-08 14:53:55 +03:00
luoyu-intel	f096cc6807	Fix the sub group size of Intel (llama/8106) * use warp_size macro for all sycl kernels * fix mask of permute_sub_group_by_xor * fix rms_norm with correct warp number * fix rms_norm_f32/group_norm_f32 * move norm to norm.cpp file * fix quantize bug * fix mmvq's batch size	2024-07-08 14:53:55 +03:00
Johannes Gäßler	e4bc83ab47	CUDA: refactor and optimize IQ MMVQ (llama/8215) * CUDA: refactor and optimize IQ MMVQ * uint -> uint32_t * __dp4a -> ggml_cuda_dp4a * remove MIN_CC_DP4A checks * change default * try CI fix	2024-07-08 14:53:55 +03:00
zhentaoyu	db7e0dbe6e	Update SYCL-Rope op and Refactor (llama/8157) * align with rope.cu and move sycl-op to a single file	2024-07-08 14:53:55 +03:00
Johannes Gäßler	bf88c94da9	CUDA: fix MMQ stream-k for --split-mode row (llama/8167)	2024-07-08 14:53:55 +03:00
John Balis	3eea171cab	feat: cuda implementation for `ggml_conv_transpose_1d` (ggml/854) * conv transpose 1d passing test for 1d input and kernel * working for different input and output channel counts, added test for variable stride * initial draft appears to work with stride other than 1 * working with all old and new conv1d tests * added a test for large tensors * removed use cuda hardcoding * restored test-conv-transpose.c * removed unused arugments, and fixed bug where test failure would cause subsequent tests to fail * fixed accumulator bug * added test to test-backend-ops * fixed mistake * addressed review * fixed includes * removed blank lines * style and warning fixes * return failure when test fails * fix supports_op --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-07-08 14:53:55 +03:00
Georgi Gerganov	64a56ebf13	ci : disable java build	2024-07-08 14:26:59 +03:00
Emmanuel Schmidbauer	bec9836849	server : add inference path to make OAI API compatible (#2270 )	2024-07-08 14:24:58 +03:00
Georgi Gerganov	c118733a29	sync : ggml + fix sync script	2024-06-26 23:20:19 +03:00
Georgi Gerganov	bb3dd45524	make : disable CUDA graphs	2024-06-26 23:20:13 +03:00
slaren	04e7fa6f4f	ggml : add GGML_CUDA_USE_GRAPHS option, restore GGML_CUDA_FORCE_CUBLAS (cmake) (llama/8140)	2024-06-26 23:18:11 +03:00
Georgi Gerganov	9f7f36d4c9	make : disable CUDA mel build	2024-06-26 22:25:25 +03:00
Georgi Gerganov	4a62efbb95	cmake : minor fixes	2024-06-26 21:42:39 +03:00
Georgi Gerganov	0a55a70b9b	make : fix missing -O3 same as https://github.com/ggerganov/llama.cpp/pull/8143	2024-06-26 21:21:12 +03:00
Georgi Gerganov	dc8cc2dd6f	whisper : disable CUDA mel + fix FFMPEG	2024-06-26 20:11:38 +03:00
Georgi Gerganov	3efedb9511	sync : ggml	2024-06-26 19:40:23 +03:00
Georgi Gerganov	e30c679928	whisper : reorganize source code + improve CMake (#2256 ) * scripts : update sync [no ci] * files : reorganize [no ci] * sync : llama.cpp * cmake : link math library * cmake : build normal ggml library * files : move headers to include * objc : fix path to ggml-metal.h * ci : fix WHISPER_CUDA -> GGML_CUDA * scripts : sync LICENSE [no ci]	2024-06-26 19:34:09 +03:00
mky_coder	bf4cb4abad	whisper : optimize fft() function (#2242 ) Co-authored-by: Mike Fan <60965742+mike-fzy@users.noreply.github.com>	2024-06-18 18:10:33 +03:00
Georgi Gerganov	e293f17d34	talk-llama : sync llama.cpp	2024-06-18 09:45:37 +03:00
Georgi Gerganov	5d950c4b8d	whisper : use ggml_backend_sched (#2239 ) * whisper : use ggml_backend_sched (wip) * use sched in whisper_allocr * whisper : single backend in whisper_context * whisper : remove whisper_state->backends_used * whisper : remove whisper_context->backend * whisper : reset scheduler after init * whisper : fix external encoder (e.g. CoreML) * whisper : cleanup * whisper : handle null GPU buffer types + fix sycl --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-06-18 09:39:40 +03:00
Georgi Gerganov	820446e230	fix : remove extra files	2024-06-18 09:39:40 +03:00
Georgi Gerganov	54d5823ebe	scripts : sync ggml-blas	2024-06-18 09:39:40 +03:00
Georgi Gerganov	5181494e9f	build : update make / cmake	2024-06-18 09:39:40 +03:00
Georgi Gerganov	4a6e6e8b30	sync : ggml	2024-06-18 09:39:40 +03:00
slaren	de29b193f6	move BLAS to a separate backend (cont) (llama/6210) ggml-ci	2024-06-18 09:39:40 +03:00
0cc4m	922971041b	Vulkan Shader Refactor, Memory Debugging Option (llama/7947) * Refactor shaders, extract GLSL code from ggml_vk_generate_shaders.py into vulkan-shaders directory * Improve debug log code * Add memory debug output option * Fix flake8 * Fix unnecessary high llama-3 VRAM use	2024-06-18 09:39:40 +03:00
Georgi Gerganov	63a767a134	scripts : stop sync whisper example from ggml	2024-06-18 09:39:40 +03:00
Georgi Gerganov	30841fa786	cmake : fix sycl build (#0 )	2024-06-16 18:19:48 +03:00
Georgi Gerganov	3b1ac03828	ggml : remove OpenCL (#0 )	2024-06-16 18:19:48 +03:00
Georgi Gerganov	990de617b5	sycl : sync (#0 )	2024-06-16 18:19:48 +03:00
Georgi Gerganov	6975600b4b	cuda : enable CUDA graphs (#0 )	2024-06-16 18:19:48 +03:00
Georgi Gerganov	061eeb9f61	talk-llama : sync llama.cpp	2024-06-16 18:19:48 +03:00
Georgi Gerganov	4942b1b428	cmake : fix CUDA build (#0 )	2024-06-16 18:19:48 +03:00
Georgi Gerganov	3c7cc5c437	sync : ggml ggml-ci	2024-06-16 18:19:48 +03:00
Hong Bo PENG	5cd42ee2cc	ggml : fix and optimize ppc64le (ggml/849) * fix compile issues introduced by loongarch_asx * restore quant changes to merge * fix compile issues introduced by loongarch_asx * further optimize by using vec_msum & vec_sum4s on ppc64le	2024-06-16 18:19:48 +03:00
Daniel Bevenius	ee718f3da6	ggml : remove duplicate include of ggml-common.h (ggml/853) Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-06-16 18:19:48 +03:00
Meng, Hengyu	63eac1f608	remove global variables (llama/7710) * separate DPCT helpers outside * replace global variables with context * remove useless extra * update mul_mat condition * remove duplicate buft initialization * remove duplicate extra and global work group size * remove useless backend check * remove duplicated extras * use macro for group_size and remove cuda-related	2024-06-16 18:19:48 +03:00
Johannes Gäßler	b17ba2815b	CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (llama/7921) * CUDA: faster q2_K, q3_K MMQ + int8 tensor cores * try CI fix * try CI fix * try CI fix * fix data race * rever q2_K precision related changes	2024-06-16 18:19:48 +03:00
Georgi Gerganov	7a489af2f3	metal : utilize max shared memory for mul_mat_id (llama/7935)	2024-06-16 18:19:48 +03:00
Radoslav Gerganov	4a4ea13d6d	rpc : fix ggml_backend_rpc_supports_buft() (llama/7918)	2024-06-16 18:19:48 +03:00
slaren	174a461fc6	move BLAS to a separate backend (llama/6210) * move BLAS to a separate backend * rename GGML_USE_OPENBLAS to GGML_USE_BLAS * alloc : reuse same buffer when the same buffer type if used multiple times * set number of threads automatically for openblas and blis * sched : print assignments when GGML_SCHED_DEBUG env variable is set * sched : allow ops with weights on an incompatible buffer type This will cause the weight to be copied to a backend that supports the op, which is very costly. The weight should have been stored in a buffer of a backend that can run the op, but llama.cpp cannot do this automatically at the moment. --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-06-16 18:19:48 +03:00
Johannes Gäßler	d8b7a24bc9	CUDA: fix broken oob check for FA vec f32 kernel (llama/7904)	2024-06-16 18:19:48 +03:00

1 2 3 4 5 ...

1445 Commits