* build: add dockerfile for ci
* ci: add action to build/push docker image
* fix: lowercase repository to fix ci
* ci: update cuBLAS flag
* build: install curl and ffmped in image
* docs: add docker section
* fix: improve args check when download model
* Create bench.py
* Various benchmark results
* Update benchmark script with hardware name, and file checks
* Remove old benchmark results
* Add git shorthash
* Round to 2 digits on calculated floats
* Fix the header reference when sorting results
* FIx order of models
* Parse file name
* Simplify filecheck
* Improve print run print statement
* Use simplified model name
* Update benchmark_results.csv
* Process single or lists of processors and threads
* Ignore benchmark results, dont check in
* Move bench.py to extra folder
* Readme section on how to use
* Move command to correct location
* Use separate list for models that exist
* Handle subprocess error in git short hash check
* Fix filtered models list initialization
* metal : init
* whisper : factor out graph builds
* whisper : allocate encoder and decoder using ggml-alloc
* whisper : ggml-alloc is now supported
* whisper : CoreML support ggml-alloc
* build : fix ggml-alloc
* ios : update submodule
* extra : update sync-ggml.sh script to also sync ggml-alloc
* ci : see if this is causing the crash
* whisper : refactor ggml-alloc init
* whisper.android : try to fix build
* whisper : initial Metal version
* ci : try to debug vmem issue
* metal : decoder works on GPU!
* metal : add multi-decoder support
* ggml : fix ggml_nbytes (probably temp solution)
* metal : run "cross" step on the GPU
* whisper : remove ggml_repeat in the encoder
* whisper : offload the Encoder to Metal
* ggml : use simpler ggml_bytes() implementation
* ggml-alloc : try to make CI happy by reducing vram to 128GB
* whisper : add whisper_allocr to wrap ggml_allocr
* whisper : factor out alloc init in a function
* cmake : update to support Metal build
* whisper : add <functional> header
* objc : fix build (no Metal yet)
* ios : add Metal support
* swiftui : fix build
* metal : speed-up KQ multiplication
* metal : sync latest llama.cpp kernels
* readme : add Metal info
* ios : update submodule
* coreml : add code to toggle Core ML config (CPU, ANE, GPU)
* bench : fix timings by running a pre-heat
* bench : start benching the decoder
* whisper : add ggml_mul_mat_pad
* bench : fix uninitialized vars
* whisper : add comment for disabling mul-mat padding
* whisper : add description of ggml_mul_mat_pad
* whisper : clean-up ggml_mul_mat_pad
* metal : remove the "concurrent" flag
* bench : variable n_past
* ios : update SPM package
* ggml : CLBlast support as in llama.cpp
Building with CLBlast speeds up whisper.cpp ~2x on low end / older AMD APUs (CPU with integrated GPU) such as the A9.
Usage:
WHISPER_CLBLAST=1 make
* CMake/Makefile : CLBlast support as in llama.cpp
Building with CLBlast speeds up whisper.cpp ~2x on low end / older AMD APUs (CPU with integrated GPU) such as the A9.
Usage:
```
Makefile:
cd whisper.cpp
WHISPER_CLBLAST=1 make
CMake:
cd whisper.cpp ; mkdir build ; cd build
cmake -DWHISPER_CLBLAST=ON ..
make
```
* Update README.md
Added OpenCL Build Instructions
* Instruction: Partial OpenCL GPU support via CLBlast
Added build instructions and examples for Make and CMake to support OpenCL enabled GPUs.