whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-05-31 06:56:23 +02:00

History

Jeff Bolz 1d50c6ac22 vulkan: Use fp16 for the flash attention P*V multiplication (llama/12783)

This is consistent with the ggml-cuda behavior and the mul_mat fallback.

2025-04-24 20:39:16 +03:00

ggml-amx

ggml : adapt AMX to tensor->grad removal (llama/0)

2024-11-20 21:00:08 +02:00

ggml-blas

ggml : add support for dynamic loading of backends (llama/10469)

2024-12-08 20:14:35 +02:00

ggml-cann

ggml : add bilinear upscale support (ggml/1185)

2025-04-24 20:39:16 +03:00

ggml-cpu

llama : fix FA when KV cache is not used (i.e. embeddings) (llama/12825)

2025-04-24 20:39:16 +03:00

ggml-cuda

cuda : add f32 to bf16 copy op (llama/12806)

2025-04-24 20:39:16 +03:00

ggml-hip

HIP: implement FlashAttention via rocWMMA for CDNA and RDNA3+ (llama/12032)

2025-03-08 15:13:01 +02:00

ggml-kompute

llama : add Qwen2VL support + multimodal RoPE (llama/10361)

2024-12-18 12:52:16 +02:00

ggml-metal

llama : fix FA when KV cache is not used (i.e. embeddings) (llama/12825)

2025-04-24 20:39:16 +03:00

ggml-musa

cuda : enable CUDA Graph on CUDA Toolkit < 12.x (llama/12394)

2025-03-27 11:06:03 +02:00

ggml-opencl

opencl: better identify Adreno GPU (llama/12760)

2025-04-24 20:39:16 +03:00

ggml-rpc

rpc : send hash when tensor data is above some fixed threshold (llama/12496)

2025-03-28 21:47:42 +02:00

ggml-sycl

ggml : add bilinear upscale support (ggml/1185)

2025-04-24 20:39:16 +03:00

ggml-vulkan

vulkan: Use fp16 for the flash attention P*V multiplication (llama/12783)

2025-04-24 20:39:16 +03:00

CMakeLists.txt

cmake : fix ccache conflict (llama/12522)

2025-03-31 14:56:53 +03:00

ggml-alloc.c

ggml : upgrade init_tensor API to return a ggml_status (llama/11854)

2025-03-08 15:13:01 +02:00

ggml-backend-impl.h

ggml : upgrade init_tensor API to return a ggml_status (llama/11854)

2025-03-08 15:13:01 +02:00

ggml-backend-reg.cpp

ggml-backend : fix backend search path (llama/12330)

2025-03-27 11:06:03 +02:00

ggml-backend.cpp

ggml : portability fixes for VS 2017 (llama/12150)

2025-03-08 15:13:01 +02:00

ggml-common.h

musa: fix all warnings, re-enable -DLLAMA_FATAL_WARNINGS=ON in ci and update doc (llama/12611)

2025-03-31 14:56:53 +03:00

ggml-impl.h

ggml: don't include arm_neon.h when using CUDA 12 with ARM Neon (ggml/1187)

2025-04-24 20:39:16 +03:00

ggml-opt.cpp

ggml-opt: fix data corruption (ggml/1022)

2024-12-08 20:14:35 +02:00

ggml-quants.c

ggml : portability fixes for VS 2017 (llama/12150)

2025-03-08 15:13:01 +02:00

ggml-quants.h

ggml : build backends as libraries (llama/10256)

2024-11-20 21:00:08 +02:00

ggml-threading.cpp

ggml : build backends as libraries (llama/10256)

2024-11-20 21:00:08 +02:00

ggml-threading.h

remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (llama/10797)

2024-12-18 12:52:16 +02:00

ggml.c

ggml : add bilinear upscale support (ggml/1185)

2025-04-24 20:39:16 +03:00

gguf.cpp

Fix clang warning in gguf_check_reserved_keys (llama/12686)

2025-04-02 15:51:57 +03:00