Commit Graph

  • e27c91f6d6 rpc : add rpc_msg_set_tensor_hash_req (llama/13353) Radoslav Gerganov 2025-05-09 10:31:07 +03:00
  • e46df4850f vulkan: Allow up to 4096 elements for mul_mat_id row_ids (llama/13326) Jeff Bolz 2025-05-09 02:23:41 -05:00
  • e8a7f1b7bb sycl: addressing non-contiguous src1 mul_mats (nc and batched) (llama/13343) Alberto Cabrera Pérez 2025-05-08 10:08:01 +01:00
  • fbad8058c4 examples : add VAD speech segments example (#3147) Daniel Bevenius 2025-05-13 12:31:00 +02:00
  • bff8dc248a talk-llama : sync llama.cpp sync-ggml-25-05-13 Georgi Gerganov 2025-05-13 13:20:19 +03:00
  • 69753804ed whisper : update to ggml-backend changes (#0) Georgi Gerganov 2025-05-13 13:11:24 +03:00
  • 89970b9aaa sync : ggml Georgi Gerganov 2025-05-13 13:10:17 +03:00
  • 79fb43e252 ggml : add mrope kernel for metal (llama/13457) Xuan-Son Nguyen 2025-05-13 13:10:08 +03:00
  • 926e06dbfd metal : optimize MoE for large batches (llama/13388) Georgi Gerganov 2025-05-13 13:09:20 +03:00
  • 43a59eccf6 opencl: remove unnecessary assert for add (llama/13257) lhez 2025-05-12 13:13:49 -07:00
  • fe0d52b9a2 llama/ggml: add LLM training support (llama/10544) Johannes Gäßler 2025-05-12 14:44:49 +02:00
  • cb90cb0992 ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel (llama/13053) Dan Johansson 2025-05-12 13:06:19 +02:00
  • 8264872b5d CUDA: fix misaligned synchronization in FA (llama/13469) Johannes Gäßler 2025-05-12 10:51:21 +02:00
  • 882d975729 enable dpcpp nightly builds with libraries (llama/13406) Atharva Dubey 2025-05-12 06:15:32 +01:00
  • c426829771 CUDA: fix crash with partial offloading of MoE (llama/13439) Johannes Gäßler 2025-05-11 16:09:33 +02:00
  • 0b1962a181 Add --no-op-offload to improve -ot pp perf in MoE models like llama4 400B (llama/13386) David Huang 2025-05-11 20:18:39 +08:00
  • 86dece9c7c CUDA: fix race conditions FlashAttention kernels (llama/13438) Johannes Gäßler 2025-05-10 22:22:48 +02:00
  • 04445664b4 CUDA: fix FlashAttention on Turing (llama/13415) Johannes Gäßler 2025-05-10 09:16:52 +02:00
  • 22f4997dd8 vulkan: scalar flash attention implementation (llama/13324) Jeff Bolz 2025-05-09 23:07:07 -07:00
  • b493e03b90 sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs (llama/12858) Alberto Cabrera Pérez 2025-05-09 16:34:08 +01:00
  • aef59f4851 CUDA: FA support for Deepseek (Ampere or newer) (llama/13306) Johannes Gäßler 2025-05-09 13:34:58 +02:00
  • f8c75dc43e CUDA: fix crash on large batch size for MoE models (llama/13384) Johannes Gäßler 2025-05-09 12:14:04 +02:00
  • 00c8056715 rpc : add rpc_msg_set_tensor_hash_req (llama/13353) Radoslav Gerganov 2025-05-09 10:31:07 +03:00
  • 19d8d9a928 vulkan: Allow up to 4096 elements for mul_mat_id row_ids (llama/13326) Jeff Bolz 2025-05-09 02:23:41 -05:00
  • 0c4a229154 sycl: addressing non-contiguous src1 mul_mats (nc and batched) (llama/13343) Alberto Cabrera Pérez 2025-05-08 10:08:01 +01:00
  • b2513a6208 vad : remove shortform for --vad option in cli.cpp (#3145) Daniel Bevenius 2025-05-13 06:04:05 +02:00
  • 587ea01f55 docs : update README.md for whisper.objc app (#2569) Tomer Schlesinger 2025-05-13 06:03:50 +02:00
  • e41bc5c61a vad : add initial Voice Activity Detection (VAD) support (#3065) Daniel Bevenius 2025-05-12 16:10:11 +02:00
  • e39ba750cd whisper : remove dummy commit comment [no ci] (#3143) Daniel Bevenius 2025-05-12 14:40:17 +02:00
  • db0fc9edc6 docs : fix -owts flag typo karaoke section [no ci] (#3142) Daniel Bevenius 2025-05-12 10:56:39 +02:00
  • 186855e38b cli : print color scheme info for --print-colors (#3141) Daniel Bevenius 2025-05-12 10:43:04 +02:00
  • a513146102 docs : update Readme to recommend same Openvino as Python tools (#3138) Simon Booth 2025-05-12 08:06:51 +01:00
  • 4730950492 examples : update link to Paul Tol's color scheme [no ci] (#3140) Daniel Bevenius 2025-05-12 09:02:06 +02:00
  • 9dd9685c79 ruby : test extra build options only when env var specified (#3136) KITAITI Makoto 2025-05-12 13:49:46 +09:00
  • 2e310b841e ruby : omit test_build_options locally (#3132) Daniel Bevenius 2025-05-10 08:18:08 +02:00
  • 5d4390d281 examples : add HEAPU8 to all of the exported runtime methods (#3134) Enes Grahovac 2025-05-10 00:44:13 -04:00
  • 9791647653 wasm : add note about worker.js file generation [no ci] (#3133) Daniel Bevenius 2025-05-09 15:42:45 +02:00
  • 288304ee64 whisper : deprecate WHISPER_CCACHE CMake option (#3131) Daniel Bevenius 2025-05-09 14:13:41 +02:00
  • b6f3fa4059 stream.wasm : add HEAPU8 to exported runtime methods (#3130) Daniel Bevenius 2025-05-08 16:58:34 +02:00
  • cb2bd11ee8 sync : ggml Georgi Gerganov 2025-05-07 17:45:14 +03:00
  • 09e6b66025 cuda : remove nrows_x in mul_mat_q_process_tile (llama/13325) R0CKSTAR 2025-05-07 15:48:23 +08:00
  • d41cf26a0f CUDA: mix virt/real CUDA archs for GGML_NATIVE=OFF (llama/13135) Johannes Gäßler 2025-05-06 23:35:51 +02:00
  • 3c67195be9 SYCL: Disable reorder optimize by default and stop setting tensor extras when optimize is disabled (llama/13254) Akarshan Biswas 2025-05-06 20:27:06 +05:30
  • f9f78a773f CUDA: fix bad asserts for partial offload (llama/13337) Johannes Gäßler 2025-05-06 13:58:51 +02:00
  • be55e25cac CUDA: fix --split-mode row for MMQ (llama/13323) Johannes Gäßler 2025-05-06 08:36:46 +02:00
  • 2ffdda99e8 CUDA: fix logic for clearing padding with -ngl 0 (llama/13320) Johannes Gäßler 2025-05-05 22:32:13 +02:00
  • 9bbedc51cc SYCL: Disable mul_mat kernels for noncontiguous tensor b (llama/13308) Akarshan Biswas 2025-05-05 13:39:10 +05:30
  • 1e1fa27add rpc : use backend registry, support dl backends (llama/13304) Diego Devesa 2025-05-04 21:25:43 +02:00
  • e1bdd148c5 ggml : activate s390x simd for Q3_K (llama/13301) Aaron Teo 2025-05-05 01:49:12 +08:00
  • 7fa8bb303f CUDA: fix race condition in MMQ stream-k fixup (llama/13299) Johannes Gäßler 2025-05-04 14:16:39 +02:00
  • 7564f5e6f1 CUDA: fix race condition in MMQ ids_dst (llama/13294) Johannes Gäßler 2025-05-04 13:58:38 +02:00
  • 22ba2e27ce vulkan: Additional type support for unary, binary, and copy (llama/13266) Jeff Bolz 2025-05-04 00:17:16 -05:00
  • 0676b2dab2 ci : add bindings-java jar artifact to release (#3126) Daniel Bevenius 2025-05-07 16:26:54 +02:00
  • 4a512cb153 cli : avoid std::exchange Georgi Gerganov 2025-05-07 13:22:47 +03:00
  • 76171ce199 sync : ggml Georgi Gerganov 2025-05-07 13:17:48 +03:00
  • 5eac2a3fbb vulkan : fix lint (llama/0) Georgi Gerganov 2025-05-02 20:57:07 +03:00
  • 42938398f9 ggml : Enable MMA for BF16 in llamafile_sgemm (llama/13148) shalinib-ibm 2025-05-02 22:23:12 +05:30
  • a8fe90ae15 rpc : avoid uninitialized memory in serialize_tensor (llama/13210) Justin Santa Barbara 2025-05-01 17:32:11 -04:00
  • c5a5a2da5b ggml: Don't assert fail when tensor data changes (llama/13222) Jesse Gross 2025-05-01 13:46:10 -07:00
  • 8316bfd82b build : fix build info on windows (llama/13239) Diego Devesa 2025-05-01 21:48:08 +02:00
  • fd1cb9fc12 vulkan: Add bfloat16 support (llama/12554) Jeff Bolz 2025-05-01 13:49:39 -05:00
  • 17f6b8225e vulkan: Handle src1 batch dimension in non-contiguous mat-vec-mul shader (llama/13191) Jeff Bolz 2025-05-01 13:19:31 -05:00
  • 6374ea32ca vulkan : kernels for depthwise 2D convolution (CONV_2D_DW) (ggml/1204) Acly 2025-05-02 18:02:34 +02:00
  • 3a66f9f248 ci : zip windows artifacts for release uploading (#3124) Daniel Bevenius 2025-05-07 13:12:08 +02:00
  • 0055356fbc cli : avoid std::exchange sync-ggml-25-05-07 Georgi Gerganov 2025-05-07 13:22:47 +03:00
  • eeaa1cd035 sync : ggml Georgi Gerganov 2025-05-07 13:17:48 +03:00
  • a652c8bf72 vulkan : fix lint (llama/0) Georgi Gerganov 2025-05-02 20:57:07 +03:00
  • 0630539c8a ggml : Enable MMA for BF16 in llamafile_sgemm (llama/13148) shalinib-ibm 2025-05-02 22:23:12 +05:30
  • a7988d76db rpc : avoid uninitialized memory in serialize_tensor (llama/13210) Justin Santa Barbara 2025-05-01 17:32:11 -04:00
  • 37ac0264ef ggml: Don't assert fail when tensor data changes (llama/13222) Jesse Gross 2025-05-01 13:46:10 -07:00
  • 5a9ccde7da build : fix build info on windows (llama/13239) Diego Devesa 2025-05-01 21:48:08 +02:00
  • cde0e50536 vulkan: Add bfloat16 support (llama/12554) Jeff Bolz 2025-05-01 13:49:39 -05:00
  • df458380d6 vulkan: Handle src1 batch dimension in non-contiguous mat-vec-mul shader (llama/13191) Jeff Bolz 2025-05-01 13:19:31 -05:00
  • 87b88ed01c vulkan : kernels for depthwise 2D convolution (CONV_2D_DW) (ggml/1204) Acly 2025-05-02 18:02:34 +02:00
  • 9b584b0cc0 ci : add zip extension to xcframework artifact name (#3120) Daniel Bevenius 2025-05-07 12:02:29 +02:00
  • 09846f4e12 whisper: remove MSVC warnings pragmas (#3090) Daniel Bevenius 2025-05-05 13:09:35 +02:00
  • bcf1ed0163 server: update abort mechanism to handle HTTP connection closure (#3112) Sacha Arbonel 2025-05-05 07:16:54 +02:00
  • 934d4b3083 cli : support "-" for stdout like stdin (#3050) Daniel Tang 2025-05-05 01:15:39 -04:00
  • 988dcd4b5b docs : Update cli documentation (#3102) Arpit Jain 2025-05-02 20:18:33 +08:00
  • 9f540ad8cb cmake : removed stdc++fs (#3097) Jared Tweed 2025-05-02 02:41:35 -07:00
  • 1fa17bc752 server : update httplib.h to version 0.20.0 (#3101) Sacha Arbonel 2025-05-02 06:09:41 +02:00
  • 366082d072 ruby : refine HTTP cache feature (#3109) KITAITI Makoto 2025-05-01 23:04:53 +09:00
  • 0778b6ff5f talk-llama : sync llama.cpp Georgi Gerganov 2025-05-01 10:43:30 +03:00
  • 5cd59c9396 sync : ggml Georgi Gerganov 2025-05-01 10:42:48 +03:00
  • d052e64d42 CUDA: batched+noncont MMQ, refactor bs>1 MoE code (llama/13199) Johannes Gäßler 2025-04-30 23:12:59 +02:00
  • 780750a108 vulkan: use uint array index to avoid glslang bug (llama/13193) Jeff Bolz 2025-04-30 07:38:37 -05:00
  • 919c78e618 ggml : fix ppc64le build (llama/13176) shalinib-ibm 2025-04-30 16:47:08 +05:30
  • dc288f84cd feat(ggml-cpu): enable z17 compile (llama/13182) Aaron Teo 2025-04-30 17:47:35 +08:00
  • 1543a3600c CUDA: fix non-cont. inputs for batched mat mul (llama/13155) Johannes Gäßler 2025-04-29 16:00:27 +02:00
  • 4872355f6e fix(rpc): Improve input validation and error handling (llama/13069) Ville Vesilehto 2025-04-28 21:00:20 +03:00
  • 1a76e97c28 SYCL: Add all missing unary kernels (llama/13074) Akarshan Biswas 2025-04-28 15:03:25 +05:30
  • 7017c1d37d musa: fix typo in cc control (llama/13144) R0CKSTAR 2025-04-28 15:33:28 +08:00
  • 670bf02662 CUDA: fix q_nope_absorbed prec for DS 2 Lite f16 (llama/13137) Johannes Gäßler 2025-04-28 09:29:26 +02:00
  • 9fff2f751c musa: fix build warning (llama/13129) R0CKSTAR 2025-04-27 19:22:49 +08:00
  • 46392f733f ggml: move fp16/bf16 conversion optimizations to CPU backend + export conversion APIs (llama/13107) SXX 2025-04-26 22:05:31 +08:00
  • eeb259909e change the reorder tensor from init to execute OP (llama/13003) Neo Zhang Jianyu 2025-04-25 17:37:51 +08:00
  • fe21ddf0dc rpc : do not wait for response when sending RPC_CMD_SET_TENSOR (llama/12943) Radoslav Gerganov 2025-04-25 10:08:08 +03:00
  • 33bdbfbb33 ggml : fix ggml_gallocr_ptr type (ggml/1205) Diego Devesa 2025-04-30 15:20:40 +02:00
  • 0f49edf0f3 whisper : add check that target name exists (#3103) Daniel Bevenius 2025-05-01 10:05:24 +02:00
  • 25efcfe3ed server : add --no-gpu option to print usage output (#3098) Daniel Bevenius 2025-05-01 08:15:12 +02:00