sync : ggml (#2001)

* sync : update scripts * sync : ggml * talk-llama : sync llama.cpp * make : WHISPER_CUBLAS -> WHISPER_CUDA * ci : try to fix sycl build * talk-llama : fix make build
2025-08-19 02:56:47 +02:00 · 2024-03-27 18:55:10 +02:00
parent 1558ec5a16
commit 2948c740a2
90 changed files with 15702 additions and 12476 deletions
--- a/README.md
+++ b/README.md
@@ -414,11 +414,11 @@ For more information about the Core ML implementation please refer to PR [#1037]
 With NVIDIA cards the processing of the models is done efficiently on the GPU via cuBLAS and custom CUDA kernels.
 First, make sure you have installed `cuda`: https://developer.nvidia.com/cuda-downloads

-Now build `whisper.cpp` with cuBLAS support:
+Now build `whisper.cpp` with CUDA support:

 ```
 make clean
-WHISPER_CUBLAS=1 make -j
+WHISPER_CUDA=1 make -j
 ```

 ## OpenCL GPU support via CLBlast