* ci : reduce musa image size
This commit contains an attempt to reduce the size of the musa Docker
image by copying only the necessary files from the build stage.
The motivation for this is that the CI runs sometimes fail with out of
memory errors. These seems to be able to pass for PRs, at least
sometimes but fail upon push to the master branch.
* ci : remove build time files instead of selective copying
* ci : add apt-get clean to musa Dockerfile
This commit adds `apt-get clean` to the musa Dockerfile to reduce the
image size by removing cached package files after installation.
The motivation for this is to try to reduce the size of the Docker image
and see if this can avoid the "no space left on device" error during
the CI build process.
Refs: https://github.com/ggml-org/whisper.cpp/actions/runs/15815324254
* Add Apple frameworks to $LDFLAGS when needed
* Add utility method to Options
* Remove unnecessary propaty date from gemspec
* Add Apple frameworks for CoreML build
* Add Accelerate framework only for Apple platform
* Fix ZipURI#cache signature
* Download test fixtures if needed
* Add header and namespace to use enqueue_functions extension
* Convert submit and parallel_for to use new extension in convert.cpp
* Convert submit and parallel_for to use extension in ggml-sycl.cpp
* Convert submit and parallel_for to use extension in gla.cpp
* Convert submit and parallel_for in mmq.cpp
* Convert submit and parallel_for in mmvq.cpp
* Convert submit and parallel_for in remaining files
* Convert all simple parallel_for to nd_launch from enqueue_functions
extension
* Wrapping extension in general function
Create a general function that enable the enqueue_functions extension if
it is enable in the compiler, otherwise call the general SYCL function
to launch kernels.
---------
Signed-off-by: nscipione <nicolo.scipione@codeplay.com>
* Add PowerPC feature detection and scoring
* ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for PowerPC
* ggml-cpu: Delay some initializations until function is called
When using GGML_BACKEND_DL=ON, these initializations might use
instructions that are not supported by the current CPU.
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* android : update CMakeLists.txt to use FetchContent for ggml
This commit updates the CMakeLists.txt file for the Android Whisper
example to use FetchContent for managing the ggml library.
The motivation for this change is avoid having to make manual changes to
the CMakeLists.txt file after syncing the ggml library.
I've built and run the example locally to verify that it works as
expected.
Refs: https://github.com/ggml-org/whisper.cpp/pull/3265#issuecomment-2986715717
* android.java : update cmake to use FetchContent for ggml
This commit updates the CMake configuration for the Android Java example
to use `FetchContent` for including the `ggml` library. Do be able to
use FetchContent we also update the `compileSdkVersion` and
`targetSdkVersion` to 31, and the `buildToolsVersion` to '30.0.3'.
This also required a an update to the Gradle plugin version to 7.4.0.
The motivation for this change is avoid having to make manual changes to
the CMakeLists.txt file after syncing the ggml library.
This commit adds a conversion from stereo to mono in the
`read_audio_data` function of `common-whisper.cpp`.
The motivation for this change is prior to Commit
7d3da68f792018e81a758881e081154d1cbe6b6f ("examples : use miniaudio for
direct decoding flac, mp3, ogg and wav (#2759)", there was a step that
read stereo int16 data -> pcm16 (448512 samples), and then converted to
mono (224256 samples), and then also convert to stereo in `pcmf32s.
The middle step here seems to have been missed when rewriting the code to
use Miniaudio and caused issues then transcribing stereo audio files.
For example, currently using the audio sample in the linked issue the
output is:
```console
[00:00:00.000 --> 00:00:03.000] (speaker 1) Sous-titres réalisés para la communauté d'Amara.org
```
And with the change in this commit the output is:
```
[00:00:00.000 --> 00:00:01.500] (speaker 1) *sonnerie de téléphone*
[00:00:01.500 --> 00:00:07.000] (speaker 1) Salut jeune homme !
[00:00:07.000 --> 00:00:08.500] (speaker 0) C'est vrai que je te dérange ?
[00:00:08.500 --> 00:00:10.500] (speaker 1) Ah pas du tout, pas du tout, pas du tout !
[00:00:10.500 --> 00:00:12.500] (speaker 1) J'étais en train de...
[00:00:12.500 --> 00:00:14.500] (speaker 1) de préparer un courrier
```
Resolves: https://github.com/ggml-org/whisper.cpp/issues/3092
* llama : add thread safety test
* llamafile : remove global state
* llama : better LLAMA_SPLIT_MODE_NONE logic
when main_gpu < 0 GPU devices are not used
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Update oneMath commit to merged PR https://github.com/uxlfoundation/oneMath/pull/669
which adds SYCL-Graph support for recording CUDA BLAS commands.
With this change the `MUL_MAT` tests now pass on DPC++ CUDA backends with SYCL-Graph
enabled. Prior to this change, an error would be thrown.
```
$ GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0 -o MUL_MAT -p type_a=f16,type_b=f32,m=16,n=1,k=256,bs=\\[1,1\\],nr=\\[2
UR CUDA ERROR:
Value: 700
Name: CUDA_ERROR_ILLEGAL_ADDRESS
Description: an illegal memory access was encountered
Function: operator()
Source Location: $HOME/dpcpp/unified-runtime/source/adapters/cuda/queue.cpp:154
Native API failed. Native API returns: 2147483646 (UR_RESULT_ERROR_UNKNOWN)
Exception caught at file:$HOME/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp, line:3598, func:operator()
SYCL error: CHECK_TRY_ERROR((stream)->wait()): Meet error in this line code!
in function ggml_backend_sycl_synchronize at $HOME/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:3598
$HOME/llama.cpp/ggml/src/ggml-sycl/../ggml-sycl/common.hpp:118: SYCL error
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
```
* ggml-cpu: Factor out feature detection build from x86
* ggml-cpu: Add ARM feature detection and scoring
This is analogous to cpu-feats-x86.cpp. However, to detect compile-time
activation of features, we rely on GGML_USE_<FEAT> which need to be set
in cmake, instead of GGML_<FEAT> that users would set for x86.
This is because on ARM, users specify features with GGML_CPU_ARM_ARCH,
rather than with individual flags.
* ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for ARM
Like x86, however to pass around arch flags within cmake, we use
GGML_INTERNAL_<FEAT> as we don't have GGML_<FEAT>.
Some features are optional, so we may need to build multiple backends
per arch version (armv8.2_1, armv8.2_2, ...), and let the scoring
function sort out which one can be used.
* ggml-cpu: Limit ARM GGML_CPU_ALL_VARIANTS to Linux for now
The other platforms will need their own specific variants.
This also fixes the bug that the the variant-building branch was always
being executed as the else-branch of GGML_NATIVE=OFF. The branch is
moved to an elseif-branch which restores the previous behavior.
This change moves the command pool/buffer tracking into a vk_command_pool
structure. There are two instances per context (for compute+transfer) and
two instances per device for operations that don't go through a context.
This should prevent separate contexts from stomping on each other.
Use the same descriptor set layout for all pipelines (MAX_PARAMETER_COUNT == 8)
and move it to the vk_device. Move all the descriptor pool and set tracking to
the context - none of it is specific to pipelines anymore. It has a single vector
of pools and vector of sets, and a single counter to track requests and a single
counter to track use.