* whisper : migrate to ggml-backend
* whisper : fix logit reading
* whisper : fix tensor allocation during load
* whisper : fix beam-search with CUDA
* whisper : free backends + fix compile warning
* whisper : print when CUDA is enabled
* whisper : fix CoreML
* make : clean-up
* talk : fix compile warning
* whisper : support ggml_conv with CUDA and Metal (#1473)
* ggml : add CUDA support for ggml_conv
* whisper : remove ggml_repeat for conv bias + single backend
* cuda : fix im2col kernel
* metal : add im2col support + mul mat-vec f16 x f16
* bench-all : add q4 models
* whisper : clean-up
* quantize-all : fix
* ggml : im2col opts
* whisper : avoid whisper_model_data wrapper
* whisper : add note that ggml_mul_mat_pad does not work with CUDA
* whisper : factor out graph compute in common function
* whisper : fixes
* whisper : fix UB with measure buffers
* whisper : try to fix the parallel whisper_state functionality (#1479)
* whisper : try to fix the parallel whisper_state functionality
* whisper : fix multi-state Metal
* whisper : free backend instances in whisper_state
* whisper : check state->ctx_metal not null
* whisper : add whisper_context_params { use_gpu }
* whisper : new API with params & deprecate old API
* examples : use no-gpu param && whisper_init_from_file_with_params
* whisper.objc : enable metal & disable on simulator
* whisper.swiftui, metal : enable metal & support load default.metallib
* whisper.android : use new API
* bindings : use new API
* addon.node : fix build & test
* bindings : updata java binding
* bindings : add missing whisper_context_default_params_by_ref WHISPER_API for java
* metal : use SWIFTPM_MODULE_BUNDLE for GGML_SWIFT and reuse library load
* metal : move bundle var into block
* metal : use SWIFT_PACKAGE instead of GGML_SWIFT
* style : minor updates
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* sync : ggml (backend v2, k-quants, CUDA opts, Metal opts, etc.)
* metal : allow env metal variable to override resource path (#1415)
* Allow env variable to override resource path
* Update ggml-metal.m
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* sync : restore common / main from `master`
* sync : restore whisper from `master`
* talk-llama : update to latest llama.cpp
* ruby : fix build
* ggml : fix 32-bit ARM build
* ggml : fix MIN / MAX macro collisions + update ios bindings
* ggml : fix ifdefs and MIN / MAX again
* exampels : fix Obj-C and Swift examples
* ggml : fix 32-bit ARM compatibility
* ggml : one more attempt to fix 32-bit ARM compat
* whisper : fix support for larger graphs
---------
Co-authored-by: Chris Raethke <codesoda@users.noreply.github.com>
* Clarify doc about where to compile from
* Update examples/stream/README.md
* Update examples/stream/README.md
* Update README.md
---------
Co-authored-by: AI @ Home <>
Co-authored-by: bobqianic <129547291+bobqianic@users.noreply.github.com>
* Allocate class on the stack instead of on the heap
* Add class wav_writer
* fix some minor issues
* fix some minor issues
* remove potential misleading API
* Create bench.py
* Various benchmark results
* Update benchmark script with hardware name, and file checks
* Remove old benchmark results
* Add git shorthash
* Round to 2 digits on calculated floats
* Fix the header reference when sorting results
* FIx order of models
* Parse file name
* Simplify filecheck
* Improve print run print statement
* Use simplified model name
* Update benchmark_results.csv
* Process single or lists of processors and threads
* Ignore benchmark results, dont check in
* Move bench.py to extra folder
* Readme section on how to use
* Move command to correct location
* Use separate list for models that exist
* Handle subprocess error in git short hash check
* Fix filtered models list initialization