whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-06-24 19:51:26 +02:00

Author	SHA1	Message	Date
slaren	58a36d2e3b	metal : gemma2 flash attention support (llama/9159)	2024-08-28 13:22:20 +03:00
Johannes Gäßler	24d8534bd8	CPU/CUDA: Gemma 2 FlashAttention support (llama/8542) * CPU/CUDA: Gemma 2 FlashAttention support * apply logit_softcap to scale in kernel * disable logit softcapping tests on Metal * remove metal check	2024-08-28 13:22:20 +03:00
Akarshan Biswas	9b16ddd3a5	Add a space to supress a cmake warning (llama/9133)	2024-08-28 13:22:20 +03:00
luoyu-intel	32f88af17b	Add oneDNN primitive support (llama/9091) * add onednn * add sycl_f16 * add dnnl stream * add engine map * use dnnl for intel only * use fp16fp16fp16 * update doc	2024-08-28 13:22:20 +03:00
compilade	9bf7250bf9	llama : simplify Mamba with advanced batch splits (llama/8526) * llama : advanced batch splits This includes equal-sequence-length batch splits which are useful to simplify recurrent model operators. * llama : always make recurrent state slots contiguous * ggml : simplify mamba operators * llama : fix integer signedness mixing * llama : logits_all has priority over batch->logits Otherwise, the server embeddings tests failed. This was likely an existing problem but was only detected here because of an additional assertion. * llama : apply suggestions Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * llama : fix t5 segfault * llama : fix Mamba session save and restore * llama : minor cosmetic changes * llama : rename llama_reorder_outputs to llama_output_reorder Also move it closer to llama_output_reserve. * llama : fix pooled embeddings when using batches with equal_seqs * minor : add struct members for clarity ggml-ci * llama : fix T5 segfault again * llama : fix Mamba pooled embeddings with multiple sequences Until the pooled embeddings are refactored to allow splitting across ubatches for causal embeddings, recurrent models can only process a single sequence per ubatch when calculating pooled embeddings. * llama : add llama_model_is_recurrent to simplify figuring that out This will make it easier to more cleanly support RWKV-v6 and Mamba-2. * llama : fix simple splits when the batch contains embeddings --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-08-28 13:22:20 +03:00
Meng, Hengyu	17e49d3ab2	fallback mmvq (llama/9088) * fallback mmvq to mul_mat * mmvq in cuda path * Update ggml/src/ggml-sycl.cpp Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@codeplay.com> --------- Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@codeplay.com>	2024-08-28 13:22:20 +03:00
zhentaoyu	58b725282a	Fix SYCL `im2col` and `convert` Overflow with Large Dims (llama/9052) * sycl: fix im2col overflow and sync with cuda Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * sycl: fix convert overflow Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * sycl: fix convert and dequantize Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * sycl: fix ib in dmmv Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * sycl:refine convert Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * sycl: move downsample global_range into common Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * test: add im2col and convert test cases Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * test: make new cases only in sycl Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * test: comment new test_cases for only local testing Signed-off-by: zhentaoyu <zhentao.yu@intel.com> --------- Signed-off-by: zhentaoyu <zhentao.yu@intel.com>	2024-08-28 13:22:20 +03:00
Radoslav Gerganov	7e59afa1e0	rpc : print error message when failed to connect endpoint (llama/9042)	2024-08-28 13:22:20 +03:00
Radoslav Gerganov	5ac022140e	rpc : prevent crashes on invalid input (llama/9040) Add more checks which prevent RPC server from crashing if invalid input is received from client	2024-08-28 13:22:20 +03:00
Nico Bosshard	0eaa67280c	ggml : dynamic ggml_sched_max_splits based on graph_size (llama/9047) * ggml : Dynamic ggml_sched_max_splits based on graph_size * Fixed and readded debug code for causes	2024-08-28 13:22:20 +03:00
Georgi Gerganov	5a62fdb735	cmake : remove unused option GGML_CURL (llama/9011)	2024-08-28 13:22:20 +03:00
Daniel Bevenius	60098d6204	ggml : move rope type enum to ggml.h (llama/8949) * ggml : move rope type enum to ggml.h This commit moves the `llama_rope_type` enum from `llama.h` to `ggml.h` and changes its name to `ggml_rope_type`. The motivation for this change is to address the TODO in `llama.h` and use the enum in ggml. Note: This commit does not change the `mode` parameter to be of type `enum ggml_rope_type`. The name `mode` and its usage suggest that it might be more generic and possibly used as a bit field for multiple flags. Further investigation/discussion may be needed to determine if `mode` should be restricted to RoPE types. * squash! ggml : move rope type enum to ggml.h This commit removes GGML_ROPE_TYPE_NONE and GGML_ROPE_TYPE_GLM from ggml.h, and back the llama_rope_type enum. I've kept the assert for GGML_ROPE_TYPE_GLM as I'm not sure if it is safe to remove it yet. * squash! ggml : move rope type enum to ggml.h This commit removes the enum ggml_rope_type from ggml.h and replaces it with a define (GGML_ROPE_TYPE_NEOX). This define is used in the code to check if the mode is set to GPT-NeoX. Also the enum llama_rope_type has been updated to reflect this change. * squash! ggml : move rope type enum to ggml.h This commit contains a suggestion enable the GGML_ROPE_TYPE_NEOX macro/define to be passed to the shader compiler. * squash! ggml : move rope type enum to ggml.h This commit fixes the editorconfig-checker warnings. * squash! ggml : move rope type enum to ggml.h Update comment for ggml_rope function. * Revert "squash! ggml : move rope type enum to ggml.h" This reverts commit 6261222bd0dc0efd51f0fb0435ad3f16a5b52fd6. * squash! ggml : move rope type enum to ggml.h Add GGML_ROPE_TYPE_NEOX to rope_common.comp. * remove extra line --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-08-28 13:22:20 +03:00
DavidKorczynski	317293e6a7	ggml: fix div-by-zero (llama/9003) Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=70724 In order to access the above bug you need to login using one of the emails in https://github.com/google/oss-fuzz/blob/master/projects/llamacpp/project.yaml#L3-L5 Signed-off-by: David Korczynski <david@adalogics.com>	2024-08-28 13:22:20 +03:00
Markus Tavenrath	488a966c07	Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead. (llama/8943) * Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead. - Allocation overhead for the temporary std::vectors was easily detectable with a sampling profiler and simple to remove. - ggml_vk_sync_buffer introduce a full pipeline sync which has a significant cost on the GPU side, sometimes larger than the actual kernel execution. Adding only barriers for shader read/writes and transfers seems to be sufficient looking at the code which either launches compute kernels or copies tensors. * Fix small typo --------- Co-authored-by: 0cc4m <picard12@live.de>	2024-08-28 13:22:20 +03:00
Johannes Gäßler	8954769aa2	feat: ref. cross entropy, add CUDA, fix grad test (ggml/929)	2024-08-28 13:22:20 +03:00
Johannes Gäßler	df06468d9e	ggml: remove bad assert (ggml/928)	2024-08-28 13:22:20 +03:00
Johannes Gäßler	1fbd828a5d	examples: add MNIST training + missing ops	2024-08-28 13:22:20 +03:00
Brad Murray	d2986f8b07	models : add support for wget2 for fedora (#2387 )	2024-08-28 11:46:01 +03:00
Peng	8bfa8574e2	readme : update the path to bench.py (#2386 )	2024-08-28 11:45:05 +03:00
Ivo von Putzer Reibegg	376567bf4f	readme : fix typo (#2383 )	2024-08-28 11:42:18 +03:00
stormofice	c0fd64a9c0	readme : fix broken links in implementation details section (#2382 )	2024-08-28 11:41:51 +03:00
Georgi Gerganov	6e9596f6de	whisper : fix compile warning for unused params	2024-08-28 11:40:11 +03:00
Georgi Gerganov	9e3c5345cd	sync : ggml vulkan (ggml/0) ggml-ci	2024-08-21 11:07:13 +03:00
Radoslav Gerganov	b6c05ce82f	yolo : add backend support (ggml/924) * yolo : add backend support * metal : add sub and sqrt kernels --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-08-21 11:07:13 +03:00
Daniel Bevenius	52c80cac00	ggml : fix typo in ggml-quants.c comment (ggml/922)	2024-08-21 11:07:13 +03:00
Ronsor	3643120690	feat: add new `sin` and `cos` operators (ggml/919) * ggml : add sin/cos operators * ggml-cuda : add sin/cos operators * ggml : add corresponding tests for sin/cos * ggml : add backward computation for sin/cos operators * ggml-vulkan : add sin/cos operators * ggml-vulkan : add sin/cos shader source * metal : add sin, cos --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-08-21 11:07:13 +03:00
Eric Curtin	d65786ea54	readme : fix broken links (#2358 ) For whisper.cpp and whisper.h files	2024-08-20 10:57:45 +03:00
Justine Tunney	7f78675008	examples : use colorblind friendly TTY color scheme (#2360 ) This change updates the -pc flag, so that a new xterm256 color scheme is used. This color scheme is believed to be better for three reasons: 1. It should be friendlier to the colorblind. The scheme was designed by Paul Tol (see: https://personal.sron.nl/~pault/). TensorBoard uses it since 2017, so it's already popular in the machine learning community 2. It should appear to be the same colors as before to people who aren't i.e. it's still a red-green spectrum like before but lightly modified 3. It is readable in both white and black background terminals. The neon colors before were probably a bit too intense for white backgrounds.	2024-08-20 10:49:10 +03:00
Georgi Gerganov	22fcd5fd11	sync : ggml	2024-08-12 11:59:15 +03:00
Salvatore Mesoraca	993f0df419	ggml : support forward pass broadcasting in ggml_sub (ggml/914) * ggml: support forward pass broadcasting in ggml_sub Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> * Use assert instead of GGML_ASSERT in ggml_compute_forward_sub_f32 The check is already performed in ggml_sub_impl Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> --------- Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>	2024-08-12 11:58:49 +03:00
slaren	9b1788483c	metal : fix uninitialized abort_callback (llama/8968)	2024-08-12 11:58:49 +03:00
Georgi Gerganov	ad37d26983	rpc : sanitize tensor data + warnings (llama/0) Co-authored-by: slaren <slarengh@gmail.com>	2024-08-12 11:58:46 +03:00
Mengqing Cao	81c999fe0a	cann : add Ascend NPU support (#2336 ) * enable Ascend NPU in src/whisper.cpp * sync test-backend-ops with llama.cpp	2024-08-09 15:21:56 +03:00
Georgi Gerganov	4b7de08bfd	whisper : fix compile warning (#0 )	2024-08-09 09:58:16 +03:00
Georgi Gerganov	4b9c4de1ad	sync : ggml	2024-08-09 09:58:16 +03:00
hipudding	be88ee1d75	ggml : add CANN backend (llama/0) ggml-ci	2024-08-09 09:58:16 +03:00
Georgi Gerganov	3ab19c744e	scripts : sync cann	2024-08-09 09:58:16 +03:00
Georgi Gerganov	6eac06759b	ci : disable ruby workflow (#0 )	2024-08-08 22:48:46 +03:00
Georgi Gerganov	2e9a5bd2c4	ci : try to fix FreeBSD (#0 )	2024-08-08 22:48:46 +03:00
Georgi Gerganov	58323bf8ed	build : fix aarch64 (#0 )	2024-08-08 22:48:46 +03:00
Georgi Gerganov	22058f2dbc	talk-llama : sync llama.cpp	2024-08-08 22:48:46 +03:00
Georgi Gerganov	5b7979a1e6	sync : ggml	2024-08-08 22:48:46 +03:00
slaren	ee14c02365	ggml-backend : fix async copy from CPU (llama/8897) * ggml-backend : fix async copy from CPU * cuda : more reliable async copy, fix stream used when the devices are the same	2024-08-08 22:48:46 +03:00
Ouadie EL FAROUKI	ab39dd34e1	Updated SYCL device filtering (llama/8901) * Updated device filter to depend on default_selector (fixes non-intel device issues) * Small related update to example/sycl Readme	2024-08-08 22:48:46 +03:00
Johannes Gäßler	b1348d3530	CUDA/HIP: fix tests/test-backend-ops (llama/8896)	2024-08-08 22:48:46 +03:00
Johannes Gäßler	90641b5cf4	CUDA: fix padding logic for FP16/FP32 (llama/8884)	2024-08-08 22:48:46 +03:00
Molly Sophia	4160b930f1	ggml : add epsilon as a parameter for group_norm (llama/8818) Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2024-08-08 22:48:46 +03:00
Justine Tunney	7a96e661e4	ggml : fix overflows in elu function (llama/8866) It's helpful to use expm1f(x), because expf(x)-1 will result in overflow for 25% of single-precision floating point numbers.	2024-08-08 22:48:46 +03:00
jdomke	a902fb4ab2	ggml : reading the runtime sve config of the cpu (llama/8709) * ggml : reading the runtime sve config of the cpu * change to one time init to prevent performance drop * prefix variable to avoid possible conflicts * revert xxhash fix and add brackets --------- Co-authored-by: domke <673751-domke@users.noreply.gitlab.com>	2024-08-08 22:48:46 +03:00
Sigbjørn Skjæret	6cb38c3673	Fix conversion of unnormalized BF16->BF16 weights (llama/7843) * add truncate_bf16 * truncate intermediate fp32 if converting bf16 to bf16 * fix masking in __compute_fp32_to_bf16 * np.int16 no longer used * missing cast and additional numpy 2.x fix * ggml-impl : do not flush bf16 subnormals to zero * ggml : add reference fp32 to bf16 conversion The fast version is no longer equivalent for all platforms because of the handling of subnormal values. * gguf-py : remove flush to zero for bf16 subnormals * gguf-py : remove float32 truncation to bf16 Rounding achieves the same thing in the cases where this was used. * missed prototype update in merge * merge cleanup --------- Co-authored-by: Francis Couture-Harpin <git@compilade.net>	2024-08-08 22:48:46 +03:00

1 2 3 4 5 ...

1570 Commits