whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-07-16 05:25:25 +02:00

Files

Sigbjørn Skjæret 6cb38c3673 Fix conversion of unnormalized BF16->BF16 weights (llama/7843)

* add truncate_bf16

* truncate intermediate fp32 if converting bf16 to bf16

* fix masking in __compute_fp32_to_bf16

* np.int16 no longer used

* missing cast and additional numpy 2.x fix

* ggml-impl : do not flush bf16 subnormals to zero

* ggml : add reference fp32 to bf16 conversion

The fast version is no longer equivalent for all platforms
because of the handling of subnormal values.

* gguf-py : remove flush to zero for bf16 subnormals

* gguf-py : remove float32 truncation to bf16

Rounding achieves the same thing in the cases where this was used.

* missed prototype update in merge

* merge cleanup

---------

Co-authored-by: Francis Couture-Harpin <git@compilade.net>

2024-08-08 22:48:46 +03:00

ggml-alloc.h

whisper : reorganize source code + improve CMake (#2256 )

2024-06-26 19:34:09 +03:00

ggml-backend.h

CUDA: fix partial offloading for ne0 % 256 != 0 (llama/8572)

2024-08-08 22:48:46 +03:00

ggml-blas.h

whisper : reorganize source code + improve CMake (#2256 )

2024-06-26 19:34:09 +03:00

ggml-cuda.h

feat: Support Moore Threads GPU (llama/8383)