Commit Graph

  • fc6d4bcfeb
    cont : another try gg/android-fix-build Georgi Gerganov 2025-06-18 16:13:28 +03:00
  • 8a928ffd0d
    cmake : fix android build Georgi Gerganov 2025-06-18 15:38:37 +03:00
  • 2f60ebc3c2 talk-llama : sync llama.cpp master Georgi Gerganov 2025-06-18 10:22:47 +03:00
  • 69061e356f sync : ggml Georgi Gerganov 2025-06-18 10:22:11 +03:00
  • 0e068779c7 cmake: remove shader-gen step-targets from ggml-vulkan (llama/14226) bandoti 2025-06-17 17:33:25 -03:00
  • ac8a303c9a ggml-cpu : remove the weak alias trick (llama/14221) xctan 2025-06-17 17:58:32 +08:00
  • 2a84593960 musa: fix build warning (unused variable) (llama/14231) R0CKSTAR 2025-06-17 17:48:08 +08:00
  • 44871c8a3e llama : add thread safety test (llama/14035) Diego Devesa 2025-06-16 08:11:43 -07:00
  • ad6cd94a3a cmake: clean up external project logic for vulkan-shaders-gen (llama/14179) bandoti 2025-06-16 10:32:13 -03:00
  • dbad9d8fba HIP: disable rocwmma on gfx12 by default until rocm 7.0 (llama/14202) uvos 2025-06-16 13:47:38 +02:00
  • 518835ee56 ggml: Add Android support for GGML_CPU_ALL_VARIANTS (llama/14206) Charles Xu 2025-06-16 11:47:57 +02:00
  • a3d1c55c66 vulkan: mutex around vkQueueSubmit (llama/14127) Jeff Bolz 2025-06-16 00:21:08 -06:00
  • 0c25129d30 ggml-cpu : rework weak alias on apple targets (llama/14146) xctan 2025-06-16 13:54:15 +08:00
  • a433680a2f CUDA/HIP: fix ssm_scan on devices where warp size is not 32 (llama/14196) uvos 2025-06-15 17:30:13 +02:00
  • aeaed9806f HIP: Replace usage of depricated preprocessor macro __AMDGCN_WAVEFRONT_SIZE__ (llama/14183) uvos 2025-06-15 15:45:27 +02:00
  • 4ea599afdf sycl: Adding additional cpy dbg print output (llama/14034) Anton Mitkov 2025-06-13 08:51:39 +01:00
  • 783cf0309f SYCL: Bump oneMath commit (llama/14152) Ewan Crawford 2025-06-13 08:45:37 +01:00
  • 0097eaf839 sycl: Remove not needed copy f16->f32 for dnnl mul mat (llama/14125) Anton Mitkov 2025-06-12 14:15:11 +01:00
  • a96a880f7b cmake : handle whitepsaces in path during metal build (llama/14126) Georgi Gerganov 2025-06-12 10:14:24 +03:00
  • 26c16ad6bd Implement GGML_CPU_ALL_VARIANTS for ARM (llama/14080) Christian Kastner 2025-06-11 19:07:44 +00:00
  • 40d0d47cf1 vulkan: Better thread-safety for command pools/buffers (llama/14116) Jeff Bolz 2025-06-11 09:48:52 -05:00
  • 40c6525517 vulkan: Track descriptor pools/sets per-context (llama/14109) Jeff Bolz 2025-06-11 00:19:25 -05:00
  • 74c68067dc opencl: add mul_mv_id_q4_0_f32_8x_flat (llama/14003) lhez 2025-06-10 16:55:58 -07:00
  • 794bf23994 Vulkan: Don't default to CPU device (like llvmpipe), even if no other device is available, to allow fallback to CPU backend (llama/14099) 0cc4m 2025-06-10 14:01:33 +02:00
  • 26dcc196c7 rpc : nicer error messages for RPC server crash (llama/14076) Isaac McFadyen 2025-06-10 02:41:01 -04:00
  • ffe5400d1b ggml : disable warnings for tests when using MSVC (ggml/1273) Daniel Bevenius 2025-06-13 15:06:42 +02:00
  • 1b01c0cc4e ggml : remove unused ggml_context_container (ggml/1272) Daniel Bevenius 2025-06-13 09:05:44 +02:00
  • db30f46761 examples : include examples in msvc disable warn (ggml/1270) Daniel Bevenius 2025-06-12 12:27:09 +02:00
  • 1591558ccc
    whisper : clear result_all if vad_samples is empty (#3262) Daniel Bevenius 2025-06-18 11:30:29 +02:00
  • f3ff80ea8d
    examples : set the C++ standard to C++17 for server (#3261) Daniel Bevenius 2025-06-17 11:29:48 +02:00
  • 2a4d6db7d9
    examples : update usage/help in yt-wsp.sh (#3251) w1redch4d 2025-06-16 15:51:16 +05:30
  • 107c303e69
    server : graceful shutdown, atomic server state, and health endpoint Improvements (#3243) Sacha Arbonel 2025-06-16 10:14:26 +02:00
  • 705db0f728
    whisper : fix VAD processing for skipped audio segments (#3230) Daniel Bevenius 2025-06-13 17:35:52 +02:00
  • 0a4d85cf8a
    server : add Voice Activity Detection (VAD) support (#3246) Daniel Bevenius 2025-06-13 13:24:03 +02:00
  • 9df8d54bcb
    cli : fix short name conflict for vad options [no ci] (#3247) Daniel Bevenius 2025-06-13 10:25:25 +02:00
  • 20d203aacf
    ruby : add .gitignore entries for ext directory (#3245) Daniel Bevenius 2025-06-13 10:04:20 +02:00
  • ebbc874e85
    ci : update windows runner to windows-2022 (#3242) Daniel Bevenius 2025-06-11 13:53:16 +02:00
  • 2679bec6e0
    ruby : add cleaning of library names in dependencies (#3241) Daniel Bevenius 2025-06-10 15:06:40 +02:00
  • 93d543905e ggml : fix weak alias win32 (#0) Georgi Gerganov 2025-06-10 11:34:10 +03:00
  • 962361bd79 android : fix builds (#0) Georgi Gerganov 2025-06-10 11:09:18 +03:00
  • dbe81c1042 sync : ggml Georgi Gerganov 2025-06-10 11:06:03 +03:00
  • 175e7e4f1a files : remove old sources (part 2) Georgi Gerganov 2025-06-10 11:05:54 +03:00
  • 56475d01dc sync : ggml Georgi Gerganov 2025-06-10 10:58:38 +03:00
  • 38347a7dda files : remove old sources Georgi Gerganov 2025-06-10 10:58:30 +03:00
  • db264d6220 talk-llama : sync llama.cpp Georgi Gerganov 2025-06-10 10:12:44 +03:00
  • 96eaf46ec6 sync : ggml Georgi Gerganov 2025-06-10 10:11:23 +03:00
  • 7a675807a2 metal : use less stack memory in FA kernel (llama/14088) Georgi Gerganov 2025-06-09 23:05:02 +03:00
  • 8cbc889f85 ggml-cpu : split arch-specific implementations (llama/13892) xctan 2025-06-09 22:47:13 +08:00
  • e16a84cd95 cuda : fix device sync on buffer clear (llama/14033) Diego Devesa 2025-06-09 07:36:26 -07:00
  • 26282282fa CANN: Simplify the environment variable setting(#13104) Xinpeng Dou 2025-06-09 19:47:39 +08:00
  • 4737a8c780 sycl: Add reorder to Q6_K mmvq implementation (llama/13885) Nicolò Scipione 2025-06-09 11:47:07 +02:00
  • 8a70f4d18b cuda : fix buffer type check with integrated GPUs (llama/14069) Diego Devesa 2025-06-08 11:39:56 -07:00
  • 489dc158a6 SYCL: Implement few same quantized type copy kernels (llama/13739) Akarshan Biswas 2025-06-07 18:58:20 +05:30
  • f0f5a9f7fb vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs (llama/14001) Masato Nakasaka 2025-06-05 23:00:29 +09:00
  • 13a03c5d33 llama : allow using mmap without PrefetchVirtualMemory, apply GGML_WIN_VER to llama.cpp sources (llama/14013) Diego Devesa 2025-06-05 02:57:42 -07:00
  • 6dd91d4f7e vulkan: automatically deduce size of push constants (llama/13936) Jeff Bolz 2025-06-05 00:17:58 -05:00
  • 5171b24f70 ggml-vulkan: adds support for op CONV_TRANSPOSE_1D (llama/13813) Ervin Áron Tasnádi 2025-06-04 22:02:00 +02:00
  • 23e2fe0682 releases : use dl backend for linux release, remove arm64 linux release (llama/13996) Diego Devesa 2025-06-04 04:15:54 -07:00
  • 7f4d110f53 CUDA: fix FTZ in FA for Gemma 3 (llama/13991) Johannes Gäßler 2025-06-04 08:57:05 +02:00
  • ee0ef39fee vulkan: fix warnings in perf logger querypool code (llama/13937) Jeff Bolz 2025-06-03 13:30:22 -05:00
  • 62791ba2e6 opencl: add backend_synchronize (llama/13939) lhez 2025-06-02 16:54:58 -07:00
  • e16ef08884 OpenCL: Add concat, tsembd, upscale, tanh, pad and repeat (llama/13840) rmatif 2025-06-02 23:53:36 +00:00
  • c72d3ce935 metal : use F32 accumulators in FA kernels (llama/13975) Georgi Gerganov 2025-06-02 21:33:40 +03:00
  • 126aeb4a49 cmake : Handle mixed-case 'Power' strings in POWER CPU detection (llama/13966) shalinib-ibm 2025-06-02 17:48:36 +05:30
  • ef2a79d2b8 sycl: quantize and reorder the input to q8_1 when reorder is enabled (llama/13826) Atharva Dubey 2025-06-02 10:12:20 +01:00
  • 9589645e72 gguf: fix failure on version == 0 (llama/13956) Johannes Gäßler 2025-06-01 18:08:05 +02:00
  • 20f913d119 ggml: check if non-native endian model is being loaded (llama/13943) Aaron Teo 2025-06-01 22:53:57 +08:00
  • b933d17c30 Add in-build ggml::ggml ALIAS library (ggml/1260) Kai Pastor 2025-06-03 12:33:28 +02:00
  • fbead67549
    ruby : output format (#3237) KITAITI Makoto 2025-06-10 13:10:17 +09:00
  • d78f081423
    ci : build and publish main-intel image (#3231) 藍+85CD 2025-06-09 12:42:53 +08:00
  • b175baa665
    docker : add main-intel dockerfile (#3229) 藍+85CD 2025-06-06 11:30:02 +08:00
  • 799eacdde4
    ruby : Add parallel transcription support (#3222) KITAITI Makoto 2025-06-04 14:50:18 +09:00
  • 82f461eaa4
    ci : add mirror for ports.ubuntu.com (ARM packages) (#3221) Daniel Bevenius 2025-06-03 07:56:58 +02:00
  • 269dad68a2
    bindings.java : apply whisperParams in fullTranscribeWithTime instead of ignoring them (#3201) Joas Dev 2025-06-02 23:15:21 -05:00
  • 121d27a495
    musa: correct MUSA SDK rc4.0.1 download URL (#3217) R0CKSTAR 2025-06-03 12:02:12 +08:00
  • e05af2457b
    ci : use mirrors.kernel.org for Ubuntu packages (#3220) Daniel Bevenius 2025-06-02 16:46:40 +02:00
  • b505539670
    node : add language detection support (#3190) Daniel Bevenius 2025-06-02 14:58:05 +02:00
  • 7fd6fa8097 talk-llama : sync llama.cpp Georgi Gerganov 2025-06-01 14:07:36 +03:00
  • 3f46282cbe sync : ggml Georgi Gerganov 2025-06-01 14:03:21 +03:00
  • 1e16340f4b threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling (llama/12995) Max Krasnyansky 2025-05-31 15:39:19 -07:00
  • 4a50254998 CUDA: add a prop in ggml_cuda_device_infor for distinguish iGPU or dGPU in cuda (#13856) (llama/13895) Shawn yang 2025-05-31 14:48:04 +08:00
  • a5aff28198 CUDA: fix typo in FlashAttention code (llama/13926) Johannes Gäßler 2025-05-30 21:22:03 +02:00
  • 6c0472ab8f sched : avoid changing cur_copy when a graph is already allocated (llama/13922) Diego Devesa 2025-05-30 09:56:19 -07:00
  • b14cee184a cuda : prevent using split buffers with 3d/4d matrices (llama/13919) Diego Devesa 2025-05-30 07:37:18 -07:00
  • f7f92d0aab SYCL: Add mrope kernel (llama/13755) Akarshan Biswas 2025-05-30 19:40:57 +05:30
  • 1893359cfd cmake: Guard GGML_CPU_ALL_VARIANTS by architecture (llama/13890) Christian Kastner 2025-05-30 01:28:54 +02:00
  • ea643c6ae3 arm64: optimize q4_k_q8_k kernel with i8mm (llama/13886) Yibo Cai 2025-05-29 19:39:20 +08:00
  • 1d7b3c79f4 cmake: Factor out CPU architecture detection (llama/13883) Christian Kastner 2025-05-29 12:50:25 +02:00
  • ccfaac2bb0 ggml: aarch64: Implement SVE F32 kernels for Mamba Sequential Scan Algorithm (llama/13882) Vineel Abhinav 2025-05-29 14:48:43 +05:30
  • 1230d37bca ggml: aarch64: Implement SVE F32 kernels for vector functions (llama/13843) Vineel Abhinav 2025-05-29 11:31:33 +05:30
  • 9a500394ad CUDA: fix FA tg at long context for CC >= 8.9 (llama/13852) Johannes Gäßler 2025-05-28 13:33:37 +02:00
  • 0035b8527c CANN: Add SOC TYPE printing in cmake configuration (llama/13837) leo-pony 2025-05-28 11:54:20 +08:00
  • 3623186312 opencl: add new ops - argsort, div, sub, addrows, sigmoid, group_norm (llama/13787) lhez 2025-05-27 12:56:08 -07:00
  • 67beac47f3 opencl: mark mul_mat f32f32 as supporting non-contiguous tensors (llama/13790) lhez 2025-05-27 12:53:14 -07:00
  • 47a19bae25 vulkan: use timestamp queries for GGML_VULKAN_PERF (llama/13817) Jeff Bolz 2025-05-27 11:39:07 -05:00
  • 3d5c7ca4bc SYCL: add gelu_erf kernel (llama/13749) Akarshan Biswas 2025-05-27 20:52:59 +05:30
  • 4dfb2c2215 ggml : add ggml_repeat_4d (llama/13824) Xuan-Son Nguyen 2025-05-27 15:53:55 +02:00
  • ad433403ce vulkan : Remove unexpected ; (ggml/1253) Kai Pastor 2025-05-31 12:49:55 +02:00
  • 4064dd6484 cmake : Fix broken CMake error messages (ggml/1252) Kai Pastor 2025-05-31 12:39:19 +02:00
  • fd75c4995b ggml : remove ggml_graph_import and ggml_graph_export declarations (ggml/1247) Radoslav Gerganov 2025-05-30 09:11:09 +03:00