whisper.cpp/extra
Georgi Gerganov b6c5f49b78
whisper : add batched decoding (#1486)
* whisper : add whisper_batch

* whisper : move kv_self to whisper_state

* whisper : full batched decoding support

* whisper : fix memory leak in whisper_batch

* whisper : fix mem leak again + remove oboslete function

* whisper : clear kv cache when using whisper_decode API

* whisper : speed-up sampling

* whisper : fix decoders initializer

* bench : add batch size 5 bench

* whisper : add comment about the KV cache size

* whisper : add check for max number of decoders

* whisper : avoid starting sampling threads with bs=1

* whisper : enable beam-search by default

* cuda : sync llama.cpp fixes
2023-11-15 16:12:52 +02:00
..
bench-all.sh whisper : add batched decoding (#1486) 2023-11-15 16:12:52 +02:00
bench-wts.sh bench-wts.sh : rename script + add execute permission 2023-03-06 21:02:24 +02:00
bench.py extra: Add benchmark script implemented in Python (#1298) 2023-09-25 23:45:15 +08:00
convert-all.sh whisper : add support for large v3 (#1444) 2023-11-07 15:30:18 +02:00
deploy-wasm.sh Node.js package (#260) 2022-12-12 20:17:27 +02:00
quantize-all.sh whisper : add full CUDA and Metal offloading (#1472) 2023-11-12 15:31:08 +02:00
sha-all.sh extra : compute SHA of all models files 2022-11-02 18:31:55 +02:00
sync-ggml.sh cuda : fix HIPBLAS build 2023-11-05 19:41:15 +02:00