* fix part of mul_mat_id
* skip the bfloat 16 sycl ut
Signed-off-by: Chen Xi <xi2chen@intel.com>
---------
Signed-off-by: Chen Xi <xi2chen@intel.com>
Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com>
Co-authored-by: Chen Xi <xi2chen@intel.com>
* cuda : suppress 'noreturn' warn in no_device_code
This commit adds a while(true) loop to the no_device_code function in
common.cuh. This is done to suppress the warning:
```console
/src/ggml-cuda/template-instances/../common.cuh:346:1: warning:
function declared 'noreturn' should not return [-Winvalid-noreturn]
346 | }
| ^
```
The motivation for this is to reduce the number of warnings when
compilng with GGML_HIPBLAS=ON.
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
* squash! cuda : suppress 'noreturn' warn in no_device_code
Update __trap macro instead of using a while loop to suppress the
warning.
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
---------
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
* Arm AArch64: optimized GEMV and GEMM kernels for q4_0_q8_0, and q8_0_q8_0 quantization
* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions
* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions
* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions
* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions
* Arm AArch64: add copyright claim only to ggml-aarch64.cpp and ggml-aarch64.h files
* Arm AArch64: minor code refactoring for rebase
* Arm AArch64: minor code refactoring for resolving a build issue with cmake
* Arm AArch64: minor code refactoring to split the Q4_0_AARC64 type into three separate types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8
* Arm AArch64: minor code change for resolving a build issue with server-windows
* retrigger checks
* Arm AArch64: minor code changes for rebase
* Arm AArch64: minor changes to skip the pr#7433 vec_dot code for arm cpus with SVE VL not equal to 256 bits
* Arm AArch64: remove stale LLAMA_QKK_64 from CMakeLists.txt and delete build.zig
* Arm AArch64: add reference scalar gemm and gemv, and avoid dynamic memory allocations during quantization for Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8
* Arm AArch64: add multithreaded quantization support for the new types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8
* Arm AArch64: minor code refactoring
* Arm AArch64: simplify logic for calling gemm and gemv functions in ggml_compute_forward_mul_mat
* Arm AArch64: minimize changes in ggml_compute_forward_mul_mat
* Arm AArch64: minor code refactoring, and add reference scalar code to quantize routines for new quant types
* Arm AArch64: minor code refactoring
* Arm AArch64: minor code refactoring
* Arm AArch64: minor code refactoring
* rebase on the latest master commit 3fd62a6 and adapt to the new directory structure
* Arm AArch64: remove a redundant comment
* Arm AArch64: add pragma in ggml-aarch64.c to turn -Woverlength-strings warning off
* Arm AArch64: use __aarch64__ check to guard 64-bit neon kernels
* Arm AArch64: update docs/build.md README to include compile time flags for buiilding the Q4_0_4_4 quant type
Apply a loop tiling technique to the generic path, which provides
performance upside for ISAs with enough registers to take advantage
of it. Also helps the compiler optimize this path.
* Add support for float16 tensors in 1d pooling operations
* Add support for float16 input tensors in 2d pooling operations
* code cleanup
remove unnecessary casting during srow ptr initialization
---------
Co-authored-by: vanaka11 <vanaka1189@gmail.com>
This prevents invalid frees when destroying a partially initialized
vk_buffer_struct. For example, this could happen in ggml_vk_create_buffer
when running out of device memory.
Co-authored-by: Tony Wasserka <neobrain@users.noreply.github.com>
* ggml: use vulkan as gpu backend when available
Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com>
* whisper: enable using vk as default buffer type
Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com>
---------
Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com>
This commit removes an UNUSED macro call that is not needed as the
variable n0 is used in the code and will not produce a warning.
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
* conv transpose 1d passing test for 1d input and kernel
* working for different input and output channel counts, added test for variable stride
* initial draft appears to work with stride other than 1
* working with all old and new conv1d tests
* added a test for large tensors
* removed use cuda hardcoding
* restored test-conv-transpose.c
* removed unused arugments, and fixed bug where test failure would cause subsequent tests to fail
* fixed accumulator bug
* added test to test-backend-ops
* fixed mistake
* addressed review
* fixed includes
* removed blank lines
* style and warning fixes
* return failure when test fails
* fix supports_op
---------
Co-authored-by: slaren <slarengh@gmail.com>