whisper.cpp/fattn-vec-f16.cuh at gg/cuda-fix-mmvq - whisper.cpp - Gitea: Git with a cup of tea

extern/whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2024-11-07 08:34:37 +01:00

Johannes Gäßler e57e95eb0d

CUDA: add FP32 FlashAttention vector kernel (llama/7188)

* CUDA: add FP32 FlashAttention vector kernel

* fixup! CUDA: add FP32 FlashAttention vector kernel

* fixup! fixup! CUDA: add FP32 FlashAttention vector kernel

* fixup! fixup! fixup! CUDA: add FP32 FlashAttention vector kernel

2024-05-14 19:16:29 +03:00

6 lines

213 B

Plaintext

Raw Permalink Blame History

 #include "common.cuh"
 void ggml_cuda_flash_attn_ext_vec_f16(ggml_backend_cuda_context & ctx, ggml_tensor * dst);
 void ggml_cuda_flash_attn_ext_vec_f16_no_mma(ggml_backend_cuda_context & ctx, ggml_tensor * dst);