whisper.cpp/ggml/include
Johannes Gäßler f8a831779e CUDA: use mma PTX instructions for FlashAttention (llama/11583)
* CUDA: use mma PTX instructions for FlashAttention

* __shfl_sync workaround for movmatrix

* add __shfl_sync to HIP

Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-02-03 22:00:57 +02:00
..
ggml-alloc.h ggml : fix typo in example usage ggml_gallocr_new (ggml/984) 2024-10-05 15:23:51 +03:00
ggml-backend.h rpc : early register backend devices (llama/11262) 2025-02-03 22:00:57 +02:00
ggml-blas.h ggml : build backends as libraries (llama/10256) 2024-11-20 21:00:08 +02:00
ggml-cann.h ggml : build backends as libraries (llama/10256) 2024-11-20 21:00:08 +02:00
ggml-cpp.h GGUF: C++ refactor, backend support, misc fixes (llama/11030) 2025-01-14 10:38:01 +02:00
ggml-cpu.h ggml : refactor online repacking (llama/10446) 2024-12-18 12:52:16 +02:00
ggml-cuda.h ggml : build backends as libraries (llama/10256) 2024-11-20 21:00:08 +02:00
ggml-kompute.h ggml : build backends as libraries (llama/10256) 2024-11-20 21:00:08 +02:00
ggml-metal.h ggml : build backends as libraries (llama/10256) 2024-11-20 21:00:08 +02:00
ggml-opencl.h Introducing experimental OpenCL backend with support for Qualcomm Adreno GPUs (llama/10693) 2024-12-18 12:52:16 +02:00
ggml-opt.h ggml: new optimization interface (ggml/988) 2024-11-20 21:00:08 +02:00
ggml-rpc.h ggml : build backends as libraries (llama/10256) 2024-11-20 21:00:08 +02:00
ggml-sycl.h ggml : build backends as libraries (llama/10256) 2024-11-20 21:00:08 +02:00
ggml-vulkan.h ggml : build backends as libraries (llama/10256) 2024-11-20 21:00:08 +02:00
ggml.h CUDA: use mma PTX instructions for FlashAttention (llama/11583) 2025-02-03 22:00:57 +02:00
gguf.h GGUF: C++ refactor, backend support, misc fixes (skip) (llama/11030) 2025-01-14 10:38:01 +02:00