* Fixed OpenVino init on state
* Removed an empty line
* Fixed typo
* Replaced tabs with spaces
---------
Co-authored-by: Sandro Hanea <sandrohanea@users.noreply.github.com>
* vulkan : do not use tensor->extra
This patch allows using the Vulkan backend with the RPC backend as
tensor->extra is no longer used.
Ref: #8536
* Adapt GGML_VULKAN_CHECK_RESULTS to extra removal (llama/2)
---------
Co-authored-by: 0cc4m <picard12@live.de>
When the device's warp size is less than 16,
it is possible for loadstride_a (mul_mm.comp:114)
and loadstride_b (mul_mm.comp:115) to be set to 0.
Because they are calculated as: the workgroup size,
multiplied by LOAD_VEC_* (which can be 1) and divided by 16.
And the workgroup size is set to be the same as the
warp/subgroup size.
The loadstride_* variables are used as increments in the
loops that populate the buffers used for the multiplication.
When they are 0 they cause an infinite loop.
But infinite loops without side-effects are UB and the
values of loadstride_* are known at compile time.
So, the compiler quietly optimizes all the loops away.
As a consequence, the buffers are not populated and
the multiplication result is just a matrix with all elements
set to 0.
We prevent the UB by making sure that the workgroup size
will never be less than 16, even if our device has a
smaller warp size (e.g. 8).
Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
* ggml: Added run-time detection of neon, i8mm and sve
Adds run-time detection of the Arm instructions set features
neon, i8mm and sve for Linux and Apple build targets.
* ggml: Extend feature detection to include non aarch64 Arm arch
* ggml: Move definition of ggml_arm_arch_features to the global data section
* ggml : remove assert for AArch64 GEMV and GEMM Q4 kernels
* added fallback mechanism when the offline re-quantized model is not
optimized for the underlying target.
* fix for build errors
* remove prints from the low-level code
* Rebase to the latest upstream
a return before a barrier (that happens only in some threads in
a workgroup) leads to UB.
While the old code actually works on some devices,
it fails on some others (i.e. "smaller" GPUs).
BTW, I think it would be better to set specialization constants
when the graph is built, in that way the local workgroup
could be sized appropriately.
But it would take a lot of work.
Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
* Remove possible leftover ffmpeg temp file from a previous failed conversion
* Revert "Remove possible leftover ffmpeg temp file from a previous failed conversion"
This reverts commit 00797403bd.
* Flag to force ffmpeg to overwrite output file if it exists
The script itself has a hashbang indicating that it is a shell script,
but the README indicates that it must be executed with `bash`.
I checked the script itself, and it seems to be valid POSIX shell. I can
confirm that it works with busybox sh.
Clarify the reference on the README, so it is clear that bash is not
actually a dependency for this script.