Commit Graph

  • 1f5fdbecb4
    ruby : add VAD support, migration to Ruby's newer API (#3197) master KITAITI Makoto 2025-05-28 20:05:12 +09:00
  • 5720426d97
    whisper : install shared libs when using GGML_BACKEND_DL (#3195) Simon Booth 2025-05-28 09:15:04 +01:00
  • b9d27b1358
    tests : add a new benchmark test for long-form audio (#3185) Fujimoto Seiji 2025-05-28 14:08:44 +09:00
  • 0ed00d9d30
    ci : update windows-blas uploads action (#3192) Daniel Bevenius 2025-05-27 18:01:31 +02:00
  • 527fe6aaeb sync : fix builds - musa, ruby Georgi Gerganov 2025-05-27 18:02:37 +03:00
  • 26eb48cb08 talk-llama : sync llama.cpp Georgi Gerganov 2025-05-27 17:08:24 +03:00
  • 546928c33f sync : ggml Georgi Gerganov 2025-05-27 17:07:06 +03:00
  • 15ae9dc2a4 ggml : riscv: add xtheadvector support (llama/13720) xctan 2025-05-27 21:21:36 +08:00
  • 2e7a1e3e43 ggml-cpu: x86 feature detection is specific to x86 (llama/13811) Christian Kastner 2025-05-27 13:18:39 +02:00
  • b75babebb2 ggml : allow CUDA graphs when using pipeline parallelism (llama/13814) Diego Devesa 2025-05-27 04:05:18 -07:00
  • cc7a0105ef cuda : avoid cuGetErrorString (llama/13791) Georgi Gerganov 2025-05-26 22:14:52 +03:00
  • 195fde8804 SYCL: Add non contiguous support in RMS_NORM and NORM kernels (llama/13611) Akarshan Biswas 2025-05-26 21:10:36 +05:30
  • 25e27904ca sycl: Add more debug prints (llama/13640) Romain Biessy 2025-05-26 10:28:53 +02:00
  • 474f7be8b6 vulkan: mark IM2COL as supporting non-contig (llama/13783) Jeff Bolz 2025-05-25 23:02:07 -05:00
  • e35fecc2a1 CANN: Add the basic supports of Flash Attention kernel (llama/13627) Bizhao Shi 2025-05-26 10:20:18 +08:00
  • 1cd7028428 SYCL: revert "sycl: simplify bin_bcast_kernel (ggml/13383)" (llama/13752) Akarshan Biswas 2025-05-25 12:38:37 +05:30
  • 99596d6031 ggml-cpu : set openmp wait time if not set (llama/13758) Diego Devesa 2025-05-24 13:26:47 -07:00
  • 2d6c6862f7 ggml : add ggml_gelu_erf() CUDA kernel (llama/13719) Xuan-Son Nguyen 2025-05-24 13:06:47 +02:00
  • f1576b2659 CUDA: fix race condition in FA vector kernels (llama/13742) Johannes Gäßler 2025-05-24 11:46:19 +02:00
  • 994b4f86ab CANN: Support MUL_MAT_ID for q8_0 and q4_0 (llama/13705) Chenguang Li 2025-05-23 16:47:53 +08:00
  • 3e7eaccf55 ggml : fix the order of ggml_unary_op (llama/13718) Xuan-Son Nguyen 2025-05-23 08:12:48 +02:00
  • 191f040414 vulkan: support CPY from any type to itself (llama/13695) Jeff Bolz 2025-05-23 00:45:02 -04:00
  • 2d49d4a9b5 vulkan: Disable coopmat/coopmat2/bfloat extensions if glslc doesn't support it (llama/13696) Jeff Bolz 2025-05-23 00:33:45 -04:00
  • 000d65befb use LOG_WARN to replace std::cerr (llama/13657) Judd 2025-05-23 12:33:08 +08:00
  • f0803e6646 sycl : Remove waits from function calls (llama/13702) Nicolò Scipione 2025-05-22 13:54:43 +02:00
  • 730a00be8a SYCL: Avoid using with SYCL-Graph for unsupported nodes (llama/13587) Ewan Crawford 2025-05-22 09:24:09 +01:00
  • 316600e8ee opencl: Add support for multiple devices (llama/12622) Henry Linjamäki 2025-05-22 02:21:45 +03:00
  • 42f2b3bb65 opencl: fix couple crashes (llama/12795) Henry Linjamäki 2025-05-21 23:21:17 +03:00
  • dd6ef64060 ggml : add ggml_gelu_erf() (llama/13667) Xuan-Son Nguyen 2025-05-21 16:26:33 +02:00
  • 131ee546ca musa: Upgrade MUSA SDK version to rc4.0.1 and use mudnn::Unary::IDENTITY op to accelerate D2D memory copy (llama/13647) R0CKSTAR 2025-05-21 09:58:49 +08:00
  • 4712f7b663 vulkan: fix warnings (llama/13626) Eve 2025-05-20 21:35:16 +00:00
  • 926fe234e9 CUDA: skip fully masked-out KV in FA vec kernel (llama/13584) Johannes Gäßler 2025-05-20 14:45:07 +02:00
  • f44b53480f sycl: disable reorder for sycl mulmat (llama/13536) Svetlozar Georgiev 2025-05-20 10:34:15 +01:00
  • e04e8f1c79 metal : fix typo in FA kernel comments (llama/13651) Georgi Gerganov 2025-05-20 10:41:40 +03:00
  • ee3f177cba sycl : Overcoming workaround for mmap() allocation on Windows (llama/13482) Nicolò Scipione 2025-05-20 02:54:43 +02:00
  • 0b69f74e15 Vulkan: Add f32 accumulator support to quantized mul mat to fix GLM4 32B incoherence (llama/13607) 0cc4m 2025-05-19 17:54:08 +02:00
  • e415db0ed7 sync : ggml Georgi Gerganov 2025-05-27 17:06:49 +03:00
  • 2bb7694edb
    docs : convert README_sycl.md to utf8 format [no ci] (#3191) Daniel Bevenius 2025-05-27 10:53:50 +02:00
  • 450de0787e
    node : enable no_prints to suppress all output (#3189) Daniel Bevenius 2025-05-27 05:51:47 +02:00
  • ea9f206f18
    talk-llama : fix for swedish umlauts + expose model inference settings in talk-llama.cpp (#3187) matteng1 2025-05-26 07:57:39 +02:00
  • 13d92d08ae
    docs : fix VAD section heading levels (#3186) KITAITI Makoto 2025-05-23 17:38:26 +09:00
  • aab6976465
    ci : use dynamic libopenblas.dll for window-blas (#3177) Daniel Bevenius 2025-05-23 05:48:08 +02:00
  • 78b31ca782
    server : Add k6 Load Testing Script (#3175) Sacha Arbonel 2025-05-22 10:03:04 +02:00
  • cbe557f9b1
    docs : add VAD model download instructions [no ci] (#3180) Daniel Bevenius 2025-05-22 07:49:29 +02:00
  • 273af4aab9
    docs : replace typo "]"with ")" in README (#3179) Alpaim 2025-05-22 06:49:44 +03:00
  • bd1cb0c8e3
    whisper : remove redundant assignments (#3178) Daniel Bevenius 2025-05-21 13:23:20 +02:00
  • 62dc8f7d7b
    whisper : update CMakeLists.txt to handle deprecated gpu Warnings (#3163) Jugal Haresh Sheth 2025-05-20 10:58:25 +01:00
  • 2c4b904596
    ruby : add GGML_SYCL_DNN option to ruby bindings (#3172) Daniel Bevenius 2025-05-19 17:59:43 +02:00
  • 6b6cf19c65 talk-llama : sync llama.cpp Georgi Gerganov 2025-05-19 13:39:12 +03:00
  • 05501c218d sync : ggml Georgi Gerganov 2025-05-19 13:38:44 +03:00
  • 9da3fc27be CANN: Support MOE Model MUL_MAT_ID (llama/13042) Chenguang Li 2025-05-19 14:21:17 +08:00
  • 2c13651e08 cmake: use the current build config for vulkan-shaders-gen (llama/13595) Gilad S. 2025-05-17 21:26:43 +03:00
  • 13dca86c56 vulkan: move common FA code to flash_attn_base.comp (llama/13556) Jeff Bolz 2025-05-17 16:14:55 +09:00
  • 6d61a09bc4 vulkan: use scalar FA rather than coopmat2 when N==1 (llama/13554) Jeff Bolz 2025-05-17 15:35:47 +09:00
  • 4fedad988b metal : add FA-vec kernel for head size 64 (llama/13583) Georgi Gerganov 2025-05-16 20:32:58 +03:00
  • a8e17a244d sycl : fixed compilation warnings (llama/13582) Łukasz Ślusarczyk 2025-05-16 12:15:29 +02:00
  • 0c76acd08a gguf : use ggml log system (llama/13571) Diego Devesa 2025-05-15 10:13:11 -07:00
  • 27964db1be sycl: simplify bin_bcast_kernel (llama/13383) Atharva Dubey 2025-05-15 16:39:52 +01:00
  • 8081e7a23d sycl: reordered Q4_K MMVQ (llama/13109) Svetlozar Georgiev 2025-05-15 16:35:44 +01:00
  • d807c497a4 sycl: use oneDNN for matrices multiplication (llama/12972) Łukasz Ślusarczyk 2025-05-15 16:53:41 +02:00
  • 8e9bf548f4 arm64: optimize q6_k_q8_k kernel with i8mm (llama/13519) Yibo Cai 2025-05-15 03:53:52 +08:00
  • 0dda27bc0b CUDA: fix crash on large batch size for quant. MoE (llama/13537) Johannes Gäßler 2025-05-14 16:41:02 +02:00
  • ffa4720f25 CUDA: faster Deepseek FA, add Turing support (llama/13435) Johannes Gäßler 2025-05-14 16:08:20 +02:00
  • 9b8eea28b5 cmake: simplify vulkan shader test logic (llama/13263) bandoti 2025-05-14 07:53:57 -03:00
  • 162bbe8220 vulkan: KHR_coopmat flash attention (llama/13506) Jeff Bolz 2025-05-14 18:55:26 +09:00
  • a221288dc6 vulkan: workaround FA compile failures on macos (llama/13517) Jeff Bolz 2025-05-14 13:15:50 +09:00
  • 08436716ae metal : use FA-vec kernel up to batch size 20 (llama/13496) Georgi Gerganov 2025-05-13 18:04:39 +03:00
  • e11fc21e6c metal : optimize multi-sequence FA vec kernel (llama/13493) Georgi Gerganov 2025-05-13 18:04:00 +03:00
  • a77a924b20 ggml-cpu: Update KleidiAI to v1.6 and fix include directives (llama/13509) Dan Johansson 2025-05-13 17:02:28 +02:00
  • 405b9c77ad mnist: fix segmentation fault (ggml/1227) Johannes Gäßler 2025-05-19 09:33:35 +02:00
  • 9c3bfc1499 ggml : fix apple OS check in ggml_print_backtrace (ggml/1229) Diego Devesa 2025-05-18 18:30:13 -07:00
  • 5b7797f674 ggml : Fix missing backtrace on Linux (ggml/1228) Daniel Tang 2025-05-17 19:06:26 -04:00
  • 82ad275800
    examples : add vad-speech-segments to win warns [no ci] (#3170) Daniel Bevenius 2025-05-19 12:17:18 +02:00
  • d1f114da61
    vad : return early if no vad segments are detected (#3158) Daniel Bevenius 2025-05-16 08:50:53 +02:00
  • bae5d074c7
    vad : store VAD context in whisper_state (#3156) Daniel Bevenius 2025-05-16 07:53:26 +02:00
  • 20a20decd9
    whisper : add build_*/ to .gitignore [no ci] (#3157) Daniel Bevenius 2025-05-15 14:28:10 +02:00
  • f389d7e3e5
    examples : add --print-confidence option to cli (#3150) Daniel Bevenius 2025-05-14 19:21:48 +02:00
  • 96d791ae61
    vad : add download-vad-model scripts (#3149) Daniel Bevenius 2025-05-14 16:47:18 +02:00
  • 3882a099e1
    server : add --flash-attn usage output (#3152) Daniel Bevenius 2025-05-14 15:22:05 +02:00
  • f890560575 talk-llama : sync llama.cpp Georgi Gerganov 2025-05-13 13:20:19 +03:00
  • a14c89aefa whisper : update to ggml-backend changes (#0) Georgi Gerganov 2025-05-13 13:11:24 +03:00
  • a6a956b36d sync : ggml Georgi Gerganov 2025-05-13 13:10:17 +03:00
  • 75e9a840c5 ggml : add mrope kernel for metal (llama/13457) Xuan-Son Nguyen 2025-05-13 13:10:08 +03:00
  • 41ed62bdbc metal : optimize MoE for large batches (llama/13388) Georgi Gerganov 2025-05-13 13:09:20 +03:00
  • 029c8837f8 opencl: remove unnecessary assert for add (llama/13257) lhez 2025-05-12 13:13:49 -07:00
  • 5d8b068249 llama/ggml: add LLM training support (llama/10544) Johannes Gäßler 2025-05-12 14:44:49 +02:00
  • 93ef22657e ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel (llama/13053) Dan Johansson 2025-05-12 13:06:19 +02:00
  • 866f685bbc CUDA: fix misaligned synchronization in FA (llama/13469) Johannes Gäßler 2025-05-12 10:51:21 +02:00
  • 250bcc041a enable dpcpp nightly builds with libraries (llama/13406) Atharva Dubey 2025-05-12 06:15:32 +01:00
  • 90b17a99bf CUDA: fix crash with partial offloading of MoE (llama/13439) Johannes Gäßler 2025-05-11 16:09:33 +02:00
  • e1b2ace0f8 Add --no-op-offload to improve -ot pp perf in MoE models like llama4 400B (llama/13386) David Huang 2025-05-11 20:18:39 +08:00
  • 6db0e01db6 CUDA: fix race conditions FlashAttention kernels (llama/13438) Johannes Gäßler 2025-05-10 22:22:48 +02:00
  • 16f3546f38 CUDA: fix FlashAttention on Turing (llama/13415) Johannes Gäßler 2025-05-10 09:16:52 +02:00
  • a04b329ad1 vulkan: scalar flash attention implementation (llama/13324) Jeff Bolz 2025-05-09 23:07:07 -07:00
  • 45d8b2352e sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs (llama/12858) Alberto Cabrera Pérez 2025-05-09 16:34:08 +01:00
  • 2d436bfbfb CUDA: FA support for Deepseek (Ampere or newer) (llama/13306) Johannes Gäßler 2025-05-09 13:34:58 +02:00
  • 4b7cbb62ef CUDA: fix crash on large batch size for MoE models (llama/13384) Johannes Gäßler 2025-05-09 12:14:04 +02:00
  • e27c91f6d6 rpc : add rpc_msg_set_tensor_hash_req (llama/13353) Radoslav Gerganov 2025-05-09 10:31:07 +03:00
  • e46df4850f vulkan: Allow up to 4096 elements for mul_mat_id row_ids (llama/13326) Jeff Bolz 2025-05-09 02:23:41 -05:00
  • e8a7f1b7bb sycl: addressing non-contiguous src1 mul_mats (nc and batched) (llama/13343) Alberto Cabrera Pérez 2025-05-08 10:08:01 +01:00