whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-07-18 21:04:41 +02:00

Files

Georgi Gerganov 27533e7f63 metal : improve FA + improve MoE (llama/12612)

* ggml : FA with different K, V head sizes (CPU)

ggml-ci

* metal : add FA with HS=192

* metal : extend FA to support different K and V head sizes

ggml-ci

* metal : add FA vector kernels for heads K 192 and V 128

ggml-ci

* ggml : restrict op on other backends to equal head sizes

ggml-ci

* metal : optimize FA-vec kernel

ggml-ci

* metal : FA remove mq registers

* metal : improve MoE mul_mat_id condition

ggml-ci

* metal : fix comments + remove unnecessary addition

ggml-ci

* metal : avoid too much shared memory usage with mul_mat_id

ggml-ci

2025-03-28 21:47:42 +02:00

ggml-alloc.h

ggml : upgrade init_tensor API to return a ggml_status (llama/11854)

2025-03-08 15:13:01 +02:00

ggml-backend.h

ggml : upgrade init_tensor API to return a ggml_status (llama/11854)

2025-03-08 15:13:01 +02:00

ggml-blas.h

ggml : build backends as libraries (llama/10256)

2024-11-20 21:00:08 +02:00

ggml-cann.h

ggml : build backends as libraries (llama/10256)