Commit Graph

968 Commits

Author SHA1 Message Date
067a581e60 CUDA: mul_mat_v support for batch sizes > 1 (llama/14262)
* CUDA: mul_mat_v support for batch sizes > 1

* use 64 bit math for initial offset calculation
2025-07-01 12:20:20 +03:00
dda08b6705 HIP: enable vec fattn on RDNA4 (llama/14323) 2025-07-01 12:20:20 +03:00
f8d644741a CUDA: add mean operation (llama/14313)
* CUDA: add mean operation

* add back sum_rows_f32_cuda

* Review: early exit if col!=0
2025-07-01 12:20:20 +03:00
f92d62c5c8 Add support for VK_EXT_debug_utils to add labels to Vulkan objects. (llama/13792)
* Add support for VK_EXT_debug_utils to add labels to Vulkan objects. In step 1 compute pipelines are getting labeled.

* remove #ifdef for debug utils and add queue marker.
2025-07-01 12:20:20 +03:00
06d8b89beb metal : fix thread-safety (llama/14300)
ggml-ci
2025-07-01 12:20:20 +03:00
ddcedc77a6 ggml-cpu : "align corners" for bilinear upscale/downscale (ggml/1285)
* add "align corners" mode for bilinear upscale, and allow downscaling
* add ggml_interpolate, deprecate ggml_upscale_ext, pass in align-corners as bit-flag
* test-backend-ops: replace ggml_upscale_ext with ggml_interpolate, add test cases for downscale and align-corners
2025-07-01 12:20:19 +03:00
ad45376925 ggml-quants : rename best_mad to best_error (ggml/1283)
This commit renames the variable `best_mad` to `best_error` in the
`make_qkx2_quants` function.

The motivation for this is that the name `best_mad` can be somewhat
confusing if mean absolute deviation (MAD) is not in use.
2025-07-01 12:20:18 +03:00
b68222f92c CUDA: add conv_2d_transpose (llama/14287)
* CUDA: add conv_2d_transpose

* remove direct include of cuda_fp16

* Review: add brackets for readability, remove ggml_set_param and add asserts
2025-06-21 07:34:17 +03:00
a455dcb04c sycl: add usage of enqueue_functions extension (llama/14244)
* Add header and namespace to use enqueue_functions extension

* Convert submit and parallel_for to use new extension in convert.cpp

* Convert submit and parallel_for to use extension in ggml-sycl.cpp

* Convert submit and parallel_for to use extension in gla.cpp

* Convert submit and parallel_for in mmq.cpp

* Convert submit and parallel_for in mmvq.cpp

* Convert submit and parallel_for in remaining files

* Convert all simple parallel_for to nd_launch from enqueue_functions
extension

* Wrapping extension in general function

Create a general function that enable the enqueue_functions extension if
it is enable in the compiler, otherwise call the general SYCL function
to launch kernels.

---------

Signed-off-by: nscipione <nicolo.scipione@codeplay.com>
2025-06-21 07:34:17 +03:00
af7168174c Implement GGML_CPU_ALL_VARIANTS for PowerPC (llama/14286)
* Add PowerPC feature detection and scoring

* ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for PowerPC

* ggml-cpu: Delay some initializations until function is called

When using GGML_BACKEND_DL=ON, these initializations might use
instructions that are not supported by the current CPU.

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-06-21 07:34:17 +03:00
33d1f0a3e0 cuda : synchronize graph capture and cublas handle destruction (llama/14288)
Workarounds an issue that may cause CUDA graph capture to fail when a cuBLAS handle is destroyed in a different thread
2025-06-21 07:34:17 +03:00
018b2d340e ggml : fix repack work size for mul_mat_id (llama/14292)
ggml-ci
2025-06-21 07:34:17 +03:00
694f435d22 ggml: Update KleidiAI to v1.9.0 (llama/14277) 2025-06-21 07:34:17 +03:00
5efd43c956 CUDA: add conv_2d_dw (llama/14265)
* CUDA: add conv_2d_dw

* better naming

* simplify using template

* Review: fix operation ordering in ggml-cuda, use __forceinline__, use more const
2025-06-21 07:34:17 +03:00
71adde9203 ggml-cpu : remove unnecesary arm feature detection (llama/14281)
Support for Arm runtime feature detection has now been added to GGML_CPU_ALL_VARIANTS. This removes the old and not very functional code.
2025-06-21 07:34:17 +03:00
cef59c1e26 build : suppress gcc15 compile warnings (llama/14261)
* Change _contains_any() substrs to std::string_view and fix the find comparison logic.
2025-06-21 07:34:17 +03:00
a02a2d4240 sycl: Cleanup codepaths in Get Rows in sycl backend (llama/14215)
Addresses unused reorder path
2025-06-21 07:34:17 +03:00
be4ea0826b llamafile : support s390x SIMD instruction set (llama/14273) 2025-06-21 07:34:17 +03:00
1aca7b5c8a Vulkan: Set device max size for host memory to avoid OOM warning and fallback to CPU buffer (llama/14249) 2025-06-21 07:34:17 +03:00
b251d739ad metal : add mean kernel (llama/14267)
* metal : add mean kernel

ggml-ci

* cont : dedup implementation

ggml-ci
2025-06-21 07:34:17 +03:00
203451bcba ggml-cpu: reduce asm calls for hsum (llama/14037)
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-06-21 07:34:17 +03:00
34940abe53 ggml-cpu: fix uncaught underscore terminators (llama/14023)
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-06-21 07:34:17 +03:00
4fc9c34126 ggml: Add Apple support for GGML_CPU_ALL_VARIANTS (llama/14258) 2025-06-21 07:34:17 +03:00
471df139fa Add ggml_roll (ggml/1274)
* ggml : add ggml_roll

* use set/get_op_params & std::min
2025-06-21 07:34:17 +03:00
0e068779c7 cmake: remove shader-gen step-targets from ggml-vulkan (llama/14226)
* Remove step-targets from vulkan-shaders-gen

* Unset DESTDIR when building vulkan-shaders-gen
2025-06-18 12:40:34 +03:00
ac8a303c9a ggml-cpu : remove the weak alias trick (llama/14221) 2025-06-18 12:40:34 +03:00
2a84593960 musa: fix build warning (unused variable) (llama/14231)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-06-18 12:40:34 +03:00
44871c8a3e llama : add thread safety test (llama/14035)
* llama : add thread safety test

* llamafile : remove global state

* llama : better LLAMA_SPLIT_MODE_NONE logic

when main_gpu < 0 GPU devices are not used

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-06-18 12:40:34 +03:00
ad6cd94a3a cmake: clean up external project logic for vulkan-shaders-gen (llama/14179)
* Remove install step for vulkan-shaders-gen

* Add install step to normalize msvc with make

* Regenerate modified shaders at build-time
2025-06-18 12:40:34 +03:00
dbad9d8fba HIP: disable rocwmma on gfx12 by default until rocm 7.0 (llama/14202) 2025-06-18 12:40:34 +03:00
518835ee56 ggml: Add Android support for GGML_CPU_ALL_VARIANTS (llama/14206) 2025-06-18 12:40:34 +03:00
a3d1c55c66 vulkan: mutex around vkQueueSubmit (llama/14127)
This fixes the remaining crash in test-thread-safety on my system.
2025-06-18 12:40:34 +03:00
0c25129d30 ggml-cpu : rework weak alias on apple targets (llama/14146)
* ggml-cpu : rework weak alias on apple targets

* fix powerpc detection

* fix ppc detection

* fix powerpc detection on darwin
2025-06-18 12:40:34 +03:00
a433680a2f CUDA/HIP: fix ssm_scan on devices where warp size is not 32 (llama/14196) 2025-06-18 12:40:34 +03:00
aeaed9806f HIP: Replace usage of depricated preprocessor macro __AMDGCN_WAVEFRONT_SIZE__ (llama/14183) 2025-06-18 12:40:34 +03:00
4ea599afdf sycl: Adding additional cpy dbg print output (llama/14034) 2025-06-18 12:40:34 +03:00
783cf0309f SYCL: Bump oneMath commit (llama/14152)
Update oneMath commit to merged PR https://github.com/uxlfoundation/oneMath/pull/669
which adds SYCL-Graph support for recording CUDA BLAS commands.

With this change the `MUL_MAT` tests now pass on DPC++ CUDA backends with SYCL-Graph
enabled. Prior to this change, an error would be thrown.

```
$ GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0 -o MUL_MAT -p type_a=f16,type_b=f32,m=16,n=1,k=256,bs=\\[1,1\\],nr=\\[2

UR CUDA ERROR:
        Value:           700
        Name:            CUDA_ERROR_ILLEGAL_ADDRESS
        Description:     an illegal memory access was encountered
        Function:        operator()
        Source Location: $HOME/dpcpp/unified-runtime/source/adapters/cuda/queue.cpp:154

Native API failed. Native API returns: 2147483646 (UR_RESULT_ERROR_UNKNOWN)
Exception caught at file:$HOME/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp, line:3598, func:operator()
SYCL error: CHECK_TRY_ERROR((stream)->wait()): Meet error in this line code!
  in function ggml_backend_sycl_synchronize at $HOME/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:3598
$HOME/llama.cpp/ggml/src/ggml-sycl/../ggml-sycl/common.hpp:118: SYCL error
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
```
2025-06-18 12:40:34 +03:00
0097eaf839 sycl: Remove not needed copy f16->f32 for dnnl mul mat (llama/14125) 2025-06-18 12:40:34 +03:00
a96a880f7b cmake : handle whitepsaces in path during metal build (llama/14126)
* cmake : handle whitepsaces in path during metal build

ggml-ci

* cont : proper fix

ggml-ci

---------

Co-authored-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2025-06-18 12:40:34 +03:00
26c16ad6bd Implement GGML_CPU_ALL_VARIANTS for ARM (llama/14080)
* ggml-cpu: Factor out feature detection build from x86

* ggml-cpu: Add ARM feature detection and scoring

This is analogous to cpu-feats-x86.cpp. However, to detect compile-time
activation of features, we rely on GGML_USE_<FEAT> which need to be set
in cmake, instead of GGML_<FEAT> that users would set for x86.

This is because on ARM, users specify features with GGML_CPU_ARM_ARCH,
rather than with individual flags.

* ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for ARM

Like x86, however to pass around arch flags within cmake, we use
GGML_INTERNAL_<FEAT> as we don't have GGML_<FEAT>.

Some features are optional, so we may need to build multiple backends
per arch version (armv8.2_1, armv8.2_2, ...), and let the scoring
function sort out which one can be used.

* ggml-cpu: Limit ARM GGML_CPU_ALL_VARIANTS to Linux for now

The other platforms will need their own specific variants.

This also fixes the bug that the the variant-building branch was always
being executed as the else-branch of GGML_NATIVE=OFF. The branch is
moved to an elseif-branch which restores the previous behavior.
2025-06-18 12:40:34 +03:00
40d0d47cf1 vulkan: Better thread-safety for command pools/buffers (llama/14116)
This change moves the command pool/buffer tracking into a vk_command_pool
structure. There are two instances per context (for compute+transfer) and
two instances per device for operations that don't go through a context.
This should prevent separate contexts from stomping on each other.
2025-06-18 12:40:34 +03:00
40c6525517 vulkan: Track descriptor pools/sets per-context (llama/14109)
Use the same descriptor set layout for all pipelines (MAX_PARAMETER_COUNT == 8)
and move it to the vk_device. Move all the descriptor pool and set tracking to
the context - none of it is specific to pipelines anymore. It has a single vector
of pools and vector of sets, and a single counter to track requests and a single
counter to track use.
2025-06-18 12:40:34 +03:00
74c68067dc opencl: add mul_mv_id_q4_0_f32_8x_flat (llama/14003) 2025-06-18 12:40:34 +03:00
794bf23994 Vulkan: Don't default to CPU device (like llvmpipe), even if no other device is available, to allow fallback to CPU backend (llama/14099) 2025-06-18 12:40:34 +03:00
26dcc196c7 rpc : nicer error messages for RPC server crash (llama/14076) 2025-06-18 12:40:34 +03:00
ffe5400d1b ggml : disable warnings for tests when using MSVC (ggml/1273)
* ggml : disable warnings for tests when using MSVC

This commit disables warnings for tests on windows when using MSVC.

The motivation for this is that this brings the build output more
inline with what Linux/MacOS systems produce.

There is still one warning generated for the tests which is:
```console
  Building Custom Rule C:/ggml/tests/CMakeLists.txt
cl : command line  warning D9025: overriding '/DNDEBUG' with '/UNDEBUG'
[C:\ggml\build\tests\test-arange.vcxproj]
  test-arange.cpp
  test-arange.vcxproj -> C:\ggml\build\bin\Release\test-arange.exe
```

* ggml : fix typo in tests disable list
2025-06-18 12:40:34 +03:00
1b01c0cc4e ggml : remove unused ggml_context_container (ggml/1272)
This commit removes the unused `ggml_context_container` structure from
the ggml library. It looks like the usage of this struct was removed in
Commit 4757fe18d56ec11bf9c07feaca6e9d5b5357e7f4 ("ggml : alloc
ggml_contexts on the heap (whisper/2525)").

The motivation for this changes is to improve code clarity/readability.
2025-06-18 12:40:34 +03:00
db30f46761 examples : include examples in msvc disable warn (ggml/1270)
This commit adds the examples in the "list" of targets to ignore MSVC
warnings.

The motivation for this is that currently the examples generate a number
of warnings that are ignore/disabled for the core ggml project. This
makes for a cleaner output when building.
2025-06-18 12:40:34 +03:00
93d543905e ggml : fix weak alias win32 (#0)
ggml-ci
2025-06-10 12:40:33 +03:00
175e7e4f1a files : remove old sources (part 2) 2025-06-10 12:40:33 +03:00