whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-06-02 16:05:35 +02:00

Author	SHA1	Message	Date
lhez	88c3cecd43	opencl: split ggml-opencl.cl into multiple files and cleanup (llama/12886) --------- Co-authored-by: Shangqing Gu <quic_shawngu@quicinc.com>	2025-04-24 20:39:16 +03:00
kimminsu	cb7642b0f5	opencl: fix incorrect local_size index in profiling log (llama/12868)	2025-04-24 20:39:16 +03:00
lhez	fd1c725e65	opencl: better identify Adreno GPU (llama/12760)	2025-04-24 20:39:16 +03:00
lhez	317a0031f9	opencl: use `max_alloc_size` in backend ctx instead of querying again (llama/12705)	2025-04-24 20:39:16 +03:00
Junil Kim	b63d23f728	opencl : fix memory allocation size (llama/12649) issue: https://github.com/CodeLinaro/llama.cpp/pull/17#issuecomment-2760611283 This patch fixes the memory allocation size not exceeding the maximum size of the OpenCL device.	2025-04-02 15:51:57 +03:00
lhez	6fc0ae2f5a	opencl: add multi and vision rope, `gelu_quick` and `im2col` (llama/12600) * opencl: add `im2col` * opencl: add `gelu_quick` * opencl: add mrope * opencl: add vision rope	2025-03-28 21:47:42 +02:00
lhez	ba6f584f30	opencl: simplify kernel embedding logic in cmakefile (llama/12503) Co-authored-by: Max Krasnyansky <quic_maxk@quicinc.com>	2025-03-27 11:06:03 +02:00
lhez	03c364557d	opencl: improve profiling (llama/12442) * opencl: more profiling timing * opencl: generate trace for profiling * opencl: reduce profiling overhead * Populate profiling timing info at the end rather than after each kernel run * opencl: fix for chrome tracing	2025-03-27 11:06:03 +02:00
Henry Linjamäki	3d60219622	opencl: use OpenCL C standard supported by the device (llama/12221) This patch nudges the llama.cpp a bit to be supported on PoCL which doesn't support OpenCL C CL2.0. The issue is solved by querying the device for the supported OpenCL C versions and using the highest one available.	2025-03-27 11:06:03 +02:00
lhez	a34cb73dc2	opencl: Noncontiguous `norm`, `rms_norm`, disable `fp16` for some ops (llama/12217) * opencl: support noncontiguous `norm` * opencl: support noncontiguous `rms_norm` * opencl: disable fp16 for `ADD`, `MUL`, `SCALE`, `RELU`, `GELU`, `SILU`, `CLAMP`	2025-03-08 15:13:01 +02:00
Henry Linjamäki	76385c8311	opencl : fix buffer alignment (llama/12197) Fix the following error: ``` ggml-alloc.c:99: not enough space in the buffer ggml_tallocr_alloc: not enough space in the buffer to allocate blk.17.ffn_down.weight (needed 27525120, available 27521024) ``` which occurs when `ggml_backend_opencl_context::alignment` is larger than `cl_ptr_base` (hard-coded to `0x1000`). Also, fix `ggml_backend_opencl_context::alignment` was set to `CL_DEVICE_MEM_BASE_ADDR_ALIGN` which was treated as bytes but the value is reported in bits.	2025-03-08 15:13:01 +02:00
Henry Linjamäki	442cd1d2e7	opencl : fix `ulong` kernel args were set from `int` variables (llama/12174) ... which left garbage bits in the upper half of the kernel args. This caused segmentation faults when running PoCL.	2025-03-08 15:13:01 +02:00
simon886212	bc8cb97e02	opencl : fix profile-related errors (llama/12095) Co-authored-by: ubuntu <ubuntu@localhost.localdomain>	2025-03-08 15:13:01 +02:00
William Tambellini	c98681e6d5	ggml : upgrade init_tensor API to return a ggml_status (llama/11854) * Upgrade init_tensor API to return a ggml_status To prepare for an 'abort-free' ggml (ggml not to abort on OOMs but return a OOM status), as agreeed with Diego in the ggml repo, upgrade the init_tensor() and view_init() APIs to return a ggml_status. * misc fixes --------- Co-authored-by: slaren <slarengh@gmail.com>	2025-03-08 15:13:01 +02:00
lhez	b43b9d928c	opencl: fix for small models (llama/11950) * opencl: fix small shape gemv, remove unused extensions * opencl: fix `transpose_16`, `dump_tensor`, enforce subgroup size * opencl: fix for token length < 4 * opencl: use wave size of 64 for all Adreno GPUs --------- Co-authored-by: Shawn Gu <quic_shawngu@quicinc.com> Co-authored-by: Skyler Szot <quic_sszot@quicinc.com>	2025-02-27 08:55:36 +02:00
lhez	4b48fe449a	opencl: Fix rope and softmax (llama/11833) * opencl: fix `ROPE` * opencl: fix `SOFT_MAX` * Add fp16 variant * opencl: enforce subgroup size for `soft_max`	2025-02-27 08:55:36 +02:00
lhez	1deb41f0e7	ggml : add opencl backend (skip) (llama/10693) --------- Co-authored-by: Skyler Szot <quic_sszot@quicinc.com> Co-authored-by: Shangqing Gu <quic_shawngu@quicinc.com> Co-authored-by: Alexander Angus <quic_aangus@quicinc.com> Co-authored-by: Hongqiang Wang <quic_wangh@quicinc.com> Co-authored-by: Max Krasnyansky <quic_maxk@quicinc.com>	2025-01-14 10:38:01 +02:00

17 Commits