Commit Graph

  • 8f48565b56 talk-llama : sync llama.cpp sync-ggml-25-07-28 Georgi Gerganov 2025-07-28 10:09:47 +03:00
  • c189a3c6fc sync : ggml Georgi Gerganov 2025-07-28 08:43:53 +03:00
  • 6ef17cd8e6 vulkan : add fp16 support for the conv_2d kernel (llama/14872) Erik Scholz 2025-07-27 12:04:33 +02:00
  • 429731295f vulkan: skip empty set_rows to avoid invalid API usage (llama/14860) Jeff Bolz 2025-07-27 04:05:34 -05:00
  • 0d5bf5ee87 HIP: Enable Matrix cores for MMQ Kernels, Enable stream-K for CDNA 3 (llama/14624) deepsek 2025-07-26 18:28:14 -04:00
  • 5962f89983 CANN: Implement GLU ops (llama/14884) hipudding 2025-07-26 17:56:18 +08:00
  • ef7a7f9bcb musa: fix build warnings (unused variable) (llama/14869) R0CKSTAR 2025-07-26 10:36:02 +08:00
  • 7beedb1dee ggml-cpu : disable GGML_NNPA by default due to instability (llama/14880) Aaron Teo 2025-07-26 01:09:03 +08:00
  • cb0a47be57 metal: SSM_SCAN performance (llama/14743) Gabe Goodhart 2025-07-25 10:47:39 -06:00
  • 5fdfe3bf6b opencl: add fused rms_norm_mul (llama/14841) lhez 2025-07-25 08:12:13 -07:00
  • 58942e76e1 ggml : remove invalid portPos specifiers from dot files (llama/14838) Oliver Simons 2025-07-25 13:29:57 +02:00
  • 0e5d0eedf3 rpc : check for null buffers in get/set/copy tensor endpoints (llama/14868) Chris Rohlf 2025-07-25 06:17:02 -04:00
  • 2feb28a287 sched : fix multiple evaluations of the same graph with pipeline parallelism (llama/14855) Diego Devesa 2025-07-25 01:07:26 -07:00
  • e577bb1e44 musa: upgrade musa sdk to rc4.2.0 (llama/14498) R0CKSTAR 2025-07-25 03:05:37 +08:00
  • f8122d2411 cmake : Indent ggml-config.cmake (ggml/1310) Kai Pastor 2025-07-24 19:58:02 +02:00
  • 805c890dc5 sycl: fixed semantics of block offset calculation (llama/14814) Alberto Cabrera Pérez 2025-07-24 11:09:57 +01:00
  • 3a814b91a3 metal : fix fusion across different encoders (llama/14849) Georgi Gerganov 2025-07-24 10:24:05 +03:00
  • 510b3aab2d sycl: fix undefined variable in work group size check (llama/14843) Donghyeon Jeong 2025-07-24 13:50:41 +09:00
  • 9119c3ce49 CUDA: fix overflow in FA, tune performance (llama/14840) Johannes Gäßler 2025-07-23 21:43:25 +02:00
  • 10d2a519d2 CUDA: fix compilation with GGML_CUDA_F16 (llama/14837) Johannes Gäßler 2025-07-23 18:22:30 +02:00
  • c137464233 CUDA: fix quantized KV cache + multiple sequences (llama/14822) Johannes Gäßler 2025-07-23 12:35:53 +02:00
  • 6ca9a0e490 ggml: fix loongarch quantize_row_q8_1 error (llama/14827) lixing-star 2025-07-23 14:39:51 +08:00
  • 88853c4436 CANN: weight format to NZ for Ascend310P3 (llama/14407) chen fan 2025-07-23 11:58:00 +08:00
  • c193044b72 CUDA: add fused rms norm (llama/14800) Aman Gupta 2025-07-23 09:25:42 +08:00
  • 7162f92a4b vulkan: fix rms_norm_mul to handle broadcasting dim0 (llama/14817) Jeff Bolz 2025-07-22 10:35:21 -05:00
  • 0e5770ec79 cuda : implement bf16 cpy ops and enable bf16 cont (llama/14763) Sigbjørn Skjæret 2025-07-22 12:33:10 +02:00
  • bbaaa9372b opencl: remove unreachable return (llama/14806) lhez 2025-07-21 23:53:30 -07:00
  • d7494d5783 cuda: remove linking to cublasLt (llama/14790) R0CKSTAR 2025-07-22 07:45:26 +08:00
  • 444a0fe79a opencl: fix im2col when KW!=KH (llama/14803) Sigbjørn Skjæret 2025-07-21 22:55:10 +02:00
  • 424218632b opencl: add conv2d kernel (llama/14403) rmatif 2025-07-21 19:03:19 +02:00
  • 06c74b3e3c sycl: Fix im2col (llama/14797) Romain Biessy 2025-07-21 18:39:29 +02:00
  • 1d1d640543 kleidiai: add support for get_rows (llama/14676) Charles Xu 2025-07-21 15:49:52 +02:00
  • cb0399121b vulkan/cuda: Fix im2col when KW!=KH (llama/14789) Jeff Bolz 2025-07-21 06:35:40 -05:00
  • 3accff3e13 ggml: adds CONV_2D op and direct GEMM Vulkan implementation (llama/14316) Ervin Áron Tasnádi 2025-07-19 21:59:08 +02:00
  • 0c949dbde3 vulkan: Add logging for bf16 features to ggml_vk_print_gpu_info (#13274) (llama/14707) Peter0x44 2025-07-19 16:58:03 +01:00
  • 682df28558 Vulkan: Fix fprintf format-security warning (llama/14770) 0cc4m 2025-07-19 17:47:53 +02:00
  • 0b441feeb2 cmake : fix usage issues (ggml/1257) Kai Pastor 2025-07-22 20:13:21 +02:00
  • 722a96306c ggml-cpu : remove stdlib include from repack.cpp (ggml/1276) Daniel Bevenius 2025-07-21 15:53:12 +02:00
  • e7bf0294ec Support static xcframework packaging in build-xcframework.sh (#3322) master Rich Waters 2025-07-26 03:25:44 -07:00
  • 7de8dd783f examples : add note about WHISPER_WASM_SINGLE_FILE [no ci] (#3332) Daniel Bevenius 2025-07-24 16:06:48 +02:00
  • 85e474fd55 ci : add paths to build.yml (#3333) Daniel Bevenius 2025-07-24 16:04:21 +02:00
  • 210bbbe4d5 musa: upgrade musa sdk to rc4.2.0 (#3324) R0CKSTAR 2025-07-24 18:19:57 +08:00
  • 1f5cf0b288 server : hide language probabilities option behind flag (#3328) Sacha Arbonel 2025-07-21 13:03:54 +02:00
  • 2e6be2f380 go: fix Mac OS X builds (#3310) BVK Chaitanya 2025-07-21 01:47:35 -05:00
  • c0dc391349 sync : ggml Georgi Gerganov 2025-07-19 17:48:07 +03:00
  • 0ed687c6f1 metal : fuse add, mul + add tests (llama/14596) Georgi Gerganov 2025-07-18 20:37:26 +03:00
  • d4a7ea1634 cuda : Fix Gemma3n not executed as CUDA_GRAPH on NVGPUs (llama/14741) Oliver Simons 2025-07-18 13:35:32 +02:00
  • 9a07cb064a CUDA: set_rows + cpy.cu refactor (llama/14712) Aman Gupta 2025-07-18 14:54:18 +08:00
  • fed20b0682 use max work group size for device to replace the magic number (llama/14732) Neo Zhang Jianyu 2025-07-18 10:23:14 +08:00
  • 17c5411195 ggml: Add initial WebGPU backend (llama/14521) Reese Levine 2025-07-16 08:18:51 -07:00
  • ae1bb2c8ea llama : add high-throughput mode (llama/14363) Georgi Gerganov 2025-07-16 16:35:42 +03:00
  • 9cc645fec0 ggml : add asserts (llama/14720) Georgi Gerganov 2025-07-16 14:43:32 +03:00
  • 8d1a0485f1 vulkan: fix noncontig check for mat_mul_id splitting (llama/14683) Jeff Bolz 2025-07-15 14:51:09 -05:00
  • b33841c453 vulkan: add RTE variants for glu/add/sub/mul/div (llama/14653) Jeff Bolz 2025-07-15 14:32:11 -05:00
  • ab79c6c118 cuda: fix build warnings in set-rows.cu (unused variable) (llama/14687) R0CKSTAR 2025-07-15 15:28:53 +08:00
  • a6b9271c2c sycl: Hotfix for non dnnl codepath (llama/14677) Anton Mitkov 2025-07-14 18:12:42 +01:00
  • ded2e3cf6d ggml : refactor llamafile_sgemm PPC code (llama/14673) shalinib-ibm 2025-07-14 18:46:42 +05:30
  • ebb0e9d0ed SYCL: use 1D kernel for set_rows (llama/14618) Akarshan Biswas 2025-07-14 15:07:55 +05:30
  • 24803d62c6 sycl: Batched mulmat rework for oneDNN dispatch (llama/14617) Anton Mitkov 2025-07-14 10:37:35 +01:00
  • 0611387d17 cuda : add set rows for bf16 (llama/14664) Sigbjørn Skjæret 2025-07-13 15:01:24 +02:00
  • fe33572b22 cuda : add ELU support (llama/14657) Yavor Ivanov 2025-07-13 02:33:16 -07:00
  • 21308b4e6e ggml : add build-time message to remind about ggml_set_rows (llama/14661) Georgi Gerganov 2025-07-13 10:36:33 +03:00
  • 3cad26d807 metal : Add missing unary ops Metal support (llama/14660) Yavor Ivanov 2025-07-12 22:38:13 -07:00
  • 66b3a39bdc CUDA: add set rows for f32 and f16 (llama/14551) Aman Gupta 2025-07-12 21:31:38 +08:00
  • 032697b9a8 whisper: validate get_rows support for cpu extra buffer (#3323) Charles Xu 2025-07-14 14:13:44 +02:00
  • a16da91365 examples : update links in wasm examples (#3318) Greg Sadetsky 2025-07-12 17:22:35 -04:00
  • 3775c503d5 sync : resolve conflicts (#0) Georgi Gerganov 2025-07-12 16:26:44 +03:00
  • 6ddff4d96a talk-llama : sync llama.cpp Georgi Gerganov 2025-07-12 16:26:16 +03:00
  • 6d64e4abf3 sync : ggml Georgi Gerganov 2025-07-12 16:22:40 +03:00
  • 85dcc74b88 sync : resolve conflicts (ggml/0) Georgi Gerganov 2025-07-12 14:39:52 +03:00
  • 915fc153a5 vulkan: support SET_ROWS (llama/14587) Jeff Bolz 2025-07-12 05:12:26 -05:00
  • 8670a3fd5d vulkan: optimizations for deepseek prompt processing (llama/14555) Jeff Bolz 2025-07-12 04:51:58 -05:00
  • 74f6d47904 model : support LiquidAI LFM2 hybrid family (llama/14620) Tarek Dakhran 2025-07-11 20:27:01 +02:00
  • a4ff4ec9cb HIP : Add HIP 7.0+ compatibility for hipBLAS compute types (llama/14634) Slobodan Josic 2025-07-11 18:55:00 +02:00
  • b0754136be opencl: add tiled mul_mat_f16_f32 (llama/14535) rmatif 2025-07-10 23:58:12 +02:00
  • 6f113cbcaa opencl: add set_rows for f16 and f32 (llama/14547) lhez 2025-07-10 11:48:52 -07:00
  • 3c21cde540 SYCL: Initial set_rows kernel implementation (llama/14562) Akarshan Biswas 2025-07-10 13:59:38 +05:30
  • fb885fa48b cuda : support Falcon-H1 state size for SSM_SCAN (llama/14602) compilade 2025-07-09 23:54:38 -04:00
  • 2021870fb8 ggml : add ggml_scale_bias (llama/14417) Xuan-Son Nguyen 2025-07-09 18:16:12 +02:00
  • 48b18f9eb8 ggml : prevent integer overflow in gguf tensor size calculation (llama/14595) Miaoqian Lin 2025-07-09 20:33:53 +08:00
  • fadb3233b6 vulkan: optimize flash attention split_k_reduce (llama/14554) Jeff Bolz 2025-07-08 13:11:42 -05:00
  • 9750e4c988 vulkan : fix rope with partial rotation and non-cont src (llama/14582) Jeff Bolz 2025-07-08 08:21:21 -05:00
  • c3942b3db6 cuda : fix rope with partial rotation and non-cont src (llama/14580) Georgi Gerganov 2025-07-08 10:15:21 +03:00
  • 98e7beac6c CUDA: add bilinear interpolation for upscale (llama/14563) Aman Gupta 2025-07-08 10:11:18 +08:00
  • 7e9c6bbab2 musa: fix build warnings (unused variable) (llama/14561) R0CKSTAR 2025-07-08 07:58:30 +08:00
  • 8e545f466c CUDA: add bf16 and i32 to getrows (llama/14529) Aman Gupta 2025-07-07 21:45:43 +08:00
  • e753b9a952 vulkan: increase LOAD_VEC_A to 8 (IQ1/IQ2) or 4 (IQ3) (llama/14485) Eve 2025-07-06 10:29:36 +00:00
  • 9d0c408260 vulkan: fix rms_norm+mul fusion (llama/14545) Jeff Bolz 2025-07-06 03:08:16 -05:00
  • 3aebb8d5d3 vulkan: Handle updated FA dim2/3 definition (llama/14518) Jeff Bolz 2025-07-05 02:26:04 -05:00
  • df5af1dc75 opencl: add GELU_ERF (llama/14476) Sigbjørn Skjæret 2025-07-05 08:24:56 +02:00
  • 10d0d28f7c metal : disable fast math in all quantize kernels (llama/14528) Georgi Gerganov 2025-07-04 19:19:09 +03:00
  • af304ef080 CANN: Replace aclrtMemsetSync with aclnnInplaceZero operator (llama/14002) luyhcsu 2025-07-04 11:50:07 +08:00
  • e8138c51d2 ggml : implement GEGLU_ERF and GEGLU_QUICK ops (llama/14445) Sigbjørn Skjæret 2025-07-03 23:07:22 +02:00
  • 7cec4cc83a opencl : broadcast for soft_max (llama/14510) lhez 2025-07-03 11:22:24 -07:00
  • a432929d58 vulkan: support mixed/deepseekR1 FA head sizes (llama/14509) Jeff Bolz 2025-07-03 13:21:14 -05:00
  • 4aaf8114e7 ggml: backward pass for split swiglu (llama/14483) Johannes Gäßler 2025-07-03 17:05:18 +02:00
  • 0ca760433c Fix conditional enabling following arch checks for ggml-sycl (llama/14504) Nicolò Scipione 2025-07-03 11:00:03 +02:00
  • ed639c7f22 kv-cache : use ggml_set_rows (llama/14285) Georgi Gerganov 2025-07-03 10:53:35 +03:00
  • 0abd0660e1 ggml : fix FA mask dim 2 and 3 (llama/14505) Georgi Gerganov 2025-07-03 10:46:57 +03:00
  • 9cde908c0a CUDA: add dynamic shared mem to softmax, refactor general usage (llama/14497) Aman Gupta 2025-07-03 07:45:11 +08:00