Xuan-Son Nguyen
4dfb2c2215
ggml : add ggml_repeat_4d (llama/13824)
2025-06-01 15:14:44 +03:00
Daniel Tang
4d18e52f55
ggml : Fix backtrace breaking Windows build ( #3203 )
2025-05-29 13:26:58 +03:00
Daniel Tang
5ea2c37a4c
ggml : Print backtrace on uncaught C++ exceptions (ggml/1232)
...
The goal is to have what users call "full logs" contain the backtrace.
This is registered upon ggml_init. Also fixes a minor fd leak on Linux.
2025-05-29 09:56:26 +03:00
Xuan-Son Nguyen
dd6ef64060
ggml : add ggml_gelu_erf() (llama/13667)
...
* ggml : add ggml_gelu_na (not approximated)
* fix naming order
* rename na --> erf
* apply review suggesions
* revert naming order
2025-05-27 18:03:00 +03:00
Diego Devesa
9c3bfc1499
ggml : fix apple OS check in ggml_print_backtrace (ggml/1229)
2025-05-19 14:58:39 +03:00
Daniel Tang
5b7797f674
ggml : Fix missing backtrace on Linux (ggml/1228)
...
* Modern Linux defaults /proc/sys/kernel/yama/ptrace_scope to 1
* Fixed lldb attach
* Simplify by having the child do ggml_print_backtrace_symbols
2025-05-19 14:58:39 +03:00
Georgi Gerganov
41ed62bdbc
metal : optimize MoE for large batches (llama/13388)
2025-05-13 13:59:21 +03:00
Johannes Gäßler
5d8b068249
llama/ggml: add LLM training support (llama/10544)
...
* llama/ggml: add LLM training support
more compact progress bar
llama_save_model_to_file
llama_opt_param_filter
ggml_graph_dup force_grads
refactor ggml_opt, fix test-opt
* remove logits_all
* refactor CUDA implementation for ACC
* reset graph at beginning of opt period
2025-05-13 13:59:21 +03:00
Johannes Gäßler
f9f78a773f
CUDA: fix bad asserts for partial offload (llama/13337)
2025-05-07 21:00:32 +03:00
SXX
46392f733f
ggml: move fp16/bf16 conversion optimizations to CPU backend + export conversion APIs (llama/13107)
...
* ggml: dynamic x86_64 feature detection for FP32 <-> FP16/BF16 conversion
* move fp converter to ggml-cpu
* Switch ggml_compute_forward_get_rows_f16/bf16 to new ggml_cpu_fp16/bf16_to_fp32
2025-05-01 13:29:02 +03:00
Georgi Gerganov
fe4acb33e3
ggml : fix trailing whitespaces (llama/0)
2025-04-24 20:39:16 +03:00
Acly
d87dfcf7c0
ggml : Depthwise 2D convolution (ggml/1152)
...
* ggml-cpu : kernels for faster depthwise 2D convolution
* fix compile: remove static after moving to ops.cpp
* add dilation for depthwise_conv_2d
* review: rename to ggml_conv_2d_dw_direct, remove redundant struct keywords, pass by ref, whitespace
* review: rename depthwise_conv_2d -> conv_2d_dw everywhere
2025-04-24 20:39:16 +03:00
Diego Devesa
b9c71fae5a
ggml : add bilinear upscale support (ggml/1185)
2025-04-24 20:39:16 +03:00
Diego Devesa
6d67c6d93d
ggml : add more generic custom op, remove deprecated custom ops (ggml/1183)
...
* ggml : add more generic ggml_custom op
* ggml : remove deprecated custom ops
2025-04-24 20:39:16 +03:00
Diego Devesa
a71c64512a
llama : add option to override model tensor buffers (llama/11397)
...
* llama : add option to override tensor buffers
* ggml : fix possible underflow in ggml_nbytes
2025-04-24 20:39:16 +03:00
Georgi Gerganov
27533e7f63
metal : improve FA + improve MoE (llama/12612)
...
* ggml : FA with different K, V head sizes (CPU)
ggml-ci
* metal : add FA with HS=192
* metal : extend FA to support different K and V head sizes
ggml-ci
* metal : add FA vector kernels for heads K 192 and V 128
ggml-ci
* ggml : restrict op on other backends to equal head sizes
ggml-ci
* metal : optimize FA-vec kernel
ggml-ci
* metal : FA remove mq registers
* metal : improve MoE mul_mat_id condition
ggml-ci
* metal : fix comments + remove unnecessary addition
ggml-ci
* metal : avoid too much shared memory usage with mul_mat_id
ggml-ci
2025-03-28 21:47:42 +02:00
Molly Sophia
52c4c03b0a
llama: Add support for RWKV v7 architecture (llama/12412)
...
* ggml: Add op l2_norm
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* ggml: Add op rwkv_wkv7
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* llama: Add support for RWKV7 and ARWKV7 models
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* llama: fix inference with RWKV6Qwen2
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* llama: add more (a)rwkv7 variants in size
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* Apply code-format changes
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* fix MUSA build
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* llama: fix shape error with rwkv using llama-parallel
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
2025-03-27 11:06:03 +02:00
vmobilis
cc03608e78
ggml : ggml_compute_forward_concat() for arbitrary tensor type (ggml/1118)
...
* ggml_compute_forward_concat() for arbitrary tensor type
* Check that tensors' type match
* ggml-cpu.c: check type of source tensors
* ggml-cpu.c: move tensor type check to ggml_compute_forward_concat()
* ggml.c: check concatenated tensor type
* Remove tensor type check from ggml_compute_forward_concat() in ggml-cpu.c
..., as it was moved to ggml.c.
2025-03-08 15:13:01 +02:00
mgroeber9110
96a92ecc4c
ggml : portability fixes for VS 2017 (llama/12150)
...
* Add include files for std::min/max and std::toupper/tolower
* win32: move _USE_MATH_DEFINES before includes to ensure M_PI is defined
* Use GGML_RESTRICT instead of "restrict" keyword everywhere, and use "__restrict" in MSVC plain C mode
* win32: only use __restrict in MSVC if C11/C17 support is not enabled
---------
Co-authored-by: Marcus Groeber <Marcus.Groeber@cerence.com >
2025-03-08 15:13:01 +02:00
Aaron Teo
82e04e7670
ggml-cpu: Support s390x SIMD Instruction Set (llama/12019)
...
* ggml: add s390x ARCH_FLAGS for compilation
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: add SIMD for s390x using vector intrinsics
SIMD is activated for:
* ggml_vec_dot_f32
* ggml_vec_dot_f16
* ggml_vec_mad_f32
* ggml_vec_mad_f16
* ggml_vec_mad_f32_unroll
* ggml_vec_scale_f32
* ggml_vec_scale_f16
SIMD is NOT activated for:
* ggml_vec_dot_f16_unroll (pending bugfix)
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: fix missing escape character in GGML_F32x4_REDUCE
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: add temporary patch for GGML_F32_ARR and GGML_F16_ARR
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: fix s390x GGML_F32x4_REDUCE
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: full SIMD activation for F32,F16 s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: add option to disable s390x VXE/VXE2
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: change vecintrin.h include to ggml-cpu-impl
* add __VXE__ and __VXE2__ macros
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* cmake: add s390x target detection for VX/VXE/VXE2
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: move s390x vector intrinsics to ggml-cpu-impl.h
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x Q8_0 SIMD
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: correct documentation for Q8_0
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x reduce code complexity Q8_0
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x bugfix typo Q8_0
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x SIMD activated for Q4_1
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x inline vec_reve
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x SIMD activation for Q4_0
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: add VXE backend feature
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: remove test.py
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x SIMD activation for quantize_row_q8_0
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x SIMD activation for quantize_row_q8_1
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x SIMD activation for iq4_xs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: bugfix iq4_xs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x SIMD activation for iq4_nl
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: add float, double, and long vector data type
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: clean up iq4_xs SIMD
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: fix improper use of restrict keyword
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: update warning message for ggml_vec_tbl
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: untested implementation of ggml_vec_dot_iq2_xxs_q8_K
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: update ggml_vec_dot_q4_1_q8_1 to use typedefs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: switch to restrict for iq4_nl
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: slight dot product speed improvement for q4_1_q8_1
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x SIMD activation for q6_K
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: add missing `_t` to ggml_int8x16x4_t
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: fix missing `_t` for ggml_vec_xl_s8x4
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: fix more missing `_t`
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: add unroll and prefetch to Q8_0
increase of 3.86% for prompt processing and 32.22% for token generation
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: patch Q8_0 to use proper vector sizes
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: optimise Q8_0 dot prod compute kernel further
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: add unroll and prefetch to Q4_1
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: refactor Q6_K variable naming for readability
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: fix Q6_K typos
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x SIMD activation for Q5_K
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: fix wrong char*x16_t naming
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: Q5_K y0 wrong signness
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: fix Q5_K invalid uchar type
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: fix Q5_K invalid uchar type
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x SIMD activation for Q4_K
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: fix Q4_K invalid vector intrinsics
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: simplify ggml_padd_s16 compute kernel
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: correct ggml-cpu vxe wording
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: change ggml_aligned_malloc alignment to 256
256 is the cache line size for s390x platforms
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: resolve pr merge via cherry-pick 225bbbf
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml : fix LoongArch compile error with 128-bit SIMD (llama/11701)
* ggml: resolve pr merge via cherry-pick 4571953
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: cmake remove fork when determining s390x machine type
thank you @ericcurtin
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
Co-authored-by: Jinyang He <hejinyang@loongson.cn >
Co-authored-by: junchao-zhao <68935141+junchao-loongson@users.noreply.github.com >
2025-02-27 08:55:36 +02:00
Maxim Evtush
14d7c0368d
fix: typos in documentation files (llama/11791)
...
* Update ggml.c
* Update arg.cpp
* Update speculative.h
2025-02-27 08:55:36 +02:00
Johannes Gäßler
c262dc80e2
CPU/CUDA: fix (GQA) mul mat back, add CUDA support (llama/11380)
2025-02-03 22:00:57 +02:00
Johannes Gäßler
de49024e49
CUDA: backwards pass for misc. ops, add tests (llama/11257)
...
* CUDA: backwards pass for misc. ops, add tests
* remove restrict from pointers
2025-02-03 22:00:57 +02:00
Johannes Gäßler
54a2ee648f
RoPE: fix back, CUDA support for back + noncont. (llama/11240)
...
* RoPE: fix back, CUDA support for back + noncont.
* fix comments reg. non-cont. RoPE support [no-ci]
2025-02-03 22:00:57 +02:00
William Tambellini
8e0143e205
ggml : add option to not print stack on abort (ggml/1081)
...
* Add option to not print stack on abort
Add option/envvar to disable stack printing on abort.
Also link some unittests with Threads to fix link errors on
ubuntu/g++11.
* Update ggml/src/ggml.c
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com >
2025-02-03 22:00:57 +02:00
Molly Sophia
06209f6683
llama: add support for QRWKV6 model architecture (llama/11001)
...
llama: add support for QRWKV6 model architecture (llama/11001)
* WIP: Add support for RWKV6Qwen2
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* RWKV: Some graph simplification
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* Add support for RWKV6Qwen2 with cpu and cuda GLA
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* Fix some typos
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* code format changes
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* Fix wkv test & add gla test
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* Fix cuda warning
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* Update README.md
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* Update ggml/src/ggml-cuda/gla.cu
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Fix fused lerp weights loading with RWKV6
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* better sanity check skipping for QRWKV6 in llama-quant
thanks @compilade
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
Co-authored-by: compilade <git@compilade.net >
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
Co-authored-by: compilade <git@compilade.net >
2025-01-14 10:38:01 +02:00
Johannes Gäßler
acdbe58631
GGUF: C++ refactor, backend support, misc fixes (llama/11030)
...
* GGUF: C++ refactor, backend support, misc fixes
remove ggml_tensor.backend
update CODEOWNERS [no ci]
remove gguf_get_data from API
revise GGUF API data types
2025-01-14 10:38:01 +02:00
Georgi Gerganov
f32ddb3b1c
tts : add OuteTTS support (llama/10784)
...
* server : add "tokens" output
ggml-ci
* server : output embeddings for all tokens when pooling = none
ggml-ci
* server : be explicit about the pooling type in the tests
ggml-ci
* server : do not normalize embeddings when there is no pooling
ggml-ci
* llama : add OuteTTS support (wip)
* wip
* extract features
* first conv
* group norm
* resnet conv
* resnet
* attn
* pos net
* layer norm
* convnext
* head
* hann window
* fix n_embd + remove llama.cpp hacks
* compute hann window
* fft
* spectrum processing
* clean-up
* tts : receive input text and generate codes
* clip : fix new conv name
* tts : minor fix
* tts : add header + minor fixes
ggml-ci
* tts : add matchematical constant
ggml-ci
* tts : fix sampling + cut initial noise
* tts : fixes
* tts : update default samplers
ggml-ci
* tts : text pre-processing
* tts : outetts-voc -> wavtokenizer-dec
* tts : remove hardcoded constants
ggml-ci
* tts : fix tensor shapes
* llama : refactor wavtokenizer tensors
ggml-ci
* cont
ggml-ci
* cont [no ci]
* llama : update WavTokenizer to non-causal attn
* llama : handle no-vocab detokenization
* tts : add Python example for OuteTTS (wip)
* tts : extend python example to generate spectrogram
ggml-ci
* server : fix rebase artifacts
* tts : enable "return_tokens" in Python example
ggml-ci
* tts : minor fixes
* common : support HF download for vocoder
2025-01-04 10:45:01 +02:00
Johannes Gäßler
79b75ece03
tests: add tests for GGUF (llama/10830)
2025-01-04 10:45:01 +02:00
HimariO
e22d38e4f2
llama : add Qwen2VL support + multimodal RoPE (llama/10361)
...
* Barebone Qwen2VL LLM convertor
* Add Qwen2VL cli entrypoint
* [WIP] add qwen2vl arch
* Verify m-rope output
* Add vl-rope/2d-rope support for qwen2vl ViT
* update qwen2vl cli tool
* update 5D tensor op workaround
* [WIP] qwen2vl vision model
* make batch and clip utils compatible with qwen2vl
* [WIP] create inference workflow, gguf convert script but fix
* correcting vision-rope behavior, add the missing last layer back to ViT
* add arg parser to qwen2vl_surgery
* replace variable size array with vector
* cuda-gdb cmake preset
* add fp32 mrope, vision rope kernel
* add fp16 support for qwen2vl and m-rope
* add `GGML_ROPE_TYPE_MROPE`, `GGML_ROPE_TYPE_VISION`
* fix rope op mode switching, out dated func args
* update `llama_hparams`
* update to keep up stream changes
* resolve linter, test errors
* add makefile entry, update speical image padding token
* add mrope unit test, fix few compiler warnings
* rename `mrope` related function, params
* minor updates on debug util, bug fixs
* add `m-rope` testcase to `test-backend-ops`
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* fix traililng whitespce
* store `llama_hparams.rope_sections` with fixed size array
* update position id tensor size check in GGML_OP_ROPE
* minor updates
* update `ggml_backend_*_supports_op` of unsupported backends
* remote old `rope_section` compare operator
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2024-12-18 12:52:16 +02:00
Daniel Bevenius
e0be0de1ee
ggml : add check for grad_accs (ggml/1046)
...
* ggml : add check for grad_accs
This commit adds a check for grad_accs in ggml_graph_get_grad and
ggml_graph_get_grad_acc functions. This is necessary to avoid segfaults
when grad_accs is not initialized.
The motivation for this change is that I find it nice to be able to
print out a computation graph using ggml_graph_print but this function
segfaults when grad_accs is not initialized:
```console
(gdb) p g1
$2 = (ggml_cgraph *) 0x7ffff66004b0
(gdb) p *g1
$3 = {size = 2048, n_nodes = 1, n_leafs = 2, nodes = 0x7ffff6600500,
grads = 0x0, grad_accs = 0x0, leafs = 0x7ffff6604500,
visited_hash_set = {size = 4099, used = 0x7ffff6610518,
keys = 0x7ffff6608500}, order = GGML_CGRAPH_EVAL_ORDER_LEFT_TO_RIGHT}
(gdb) p ggml_graph_print(g1)
=== GRAPH ===
n_nodes = 1
Program received signal SIGSEGV, Segmentation fault.
0x0000555555579775 in ggml_graph_get_grad
(cgraph=0x7ffff66004b0,node=0x7ffff6600340)
at /ggml/ggml/src/ggml.c:5990
5990 return igrad != GGML_HASHSET_FULL &&
ggml_bitset_get(cgraph->visited_hash_set.used, igrad) ?
cgraph->grads[igrad] : NULL;
```
* squash! ggml : add check for grad_accs
Fix the check in ggml_graph_get_grad. The check was incorrectly using
cgraph->grad_accs instead of cgraph->grads.
2024-12-18 12:52:16 +02:00
Djip007
e990d1b791
ggml : refactor online repacking (llama/10446)
...
* rename ggml-cpu-aarch64.c to .cpp
* reformat extra cpu backend.
- clean Q4_0_N_M and IQ4_0_N_M
- remove from "file" tensor type
- allow only with dynamic repack
- extract cpu extra bufts and convert to C++
- hbm
- "aarch64"
- more generic use of extra buffer
- generalise extra_supports_op
- new API for "cpu-accel":
- amx
- aarch64
* clang-format
* Clean Q4_0_N_M ref
Enable restrict on C++
* add op GGML_OP_MUL_MAT_ID for Q4_0_N_M with runtime repack
* added/corrected control on tensor size for Q4 repacking.
* Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* add debug logs on repacks.
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2024-12-18 12:52:16 +02:00
PAB
7895d39508
ggml : add GGML_PAD_REFLECT_1D
operation (ggml/1034)
...
* ggml_pad_reflect_1d defined in header
* implemented on CPU
* called the forward pass
* impl Metal kernel
* added Metal kernel
* added OP_PAD_REFLECT_1D in test-backend-ops.cpp
* add test-pad-reflect-1d test case
* test case support multiple backend
2024-12-08 20:14:35 +02:00
Shupei Fan
330273901f
ggml-cpu: support IQ4_NL_4_4 by runtime repack (llama/10541)
...
* ggml-cpu: support IQ4_NL_4_4 by runtime repack
* ggml-cpu: add __ARM_FEATURE_DOTPROD guard
2024-12-08 20:14:35 +02:00
Diego Devesa
77e3e4a090
ggml : add support for dynamic loading of backends (llama/10469)
...
* ggml : add support for dynamic loading of backends
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2024-12-08 20:14:35 +02:00
Diego Devesa
2a4b5c9d7e
cuda : optimize argmax (llama/10441)
...
* cuda : optimize argmax
* remove unused parameter
ggml-ci
* fixup : use full warps
ggml-ci
* Apply suggestions from code review
Co-authored-by: Johannes Gäßler <johannesg@5d6.de >
* fix ub
* ggml : check ne00 <= INT32_MAX in argmax and argsort
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de >
2024-12-08 20:14:35 +02:00
Johannes Gäßler
98f9916c9f
ggml-opt: fix data corruption (ggml/1022)
2024-12-08 20:14:35 +02:00
Georgi Gerganov
75670ae673
ggml : fix compile warnings (llama/0)
...
ggml-ci
2024-11-20 21:00:08 +02:00
Johannes Gäßler
c9541741e6
ggml: new optimization interface (ggml/988)
...
* ggml: new optimization interface
remove test2.c, test3.c
store adamw params in tensor
move grads from tensor to graph
* avoid segfault upon API misuse
* add ggml-opt.h to public headers
* remove dependence of ggml-opt.cpp on ggml-cpu.h
2024-11-20 21:00:08 +02:00
slaren
7e86030d4d
ggml : fix some build issues
2024-11-20 21:00:08 +02:00
Diego Devesa
746bf2596f
ggml : build backends as libraries (llama/10256)
...
* ggml : build backends as libraries
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com >
2024-11-20 21:00:08 +02:00
Georgi Gerganov
d0b8335789
metal : optimize FA kernels (llama/10171)
...
* ggml : add ggml_flash_attn_ext_get_prec
* metal : use F16 precision in FA kernels
ggml-ci
* metal : minor clean-up
* metal : compile-guard bf16 FA kernels
ggml-ci
* build : remove obsolete compile flag [no ci]
* metal : prevent int overflows [no ci]
* cuda : disable BF16 FA
ggml-ci
* metal : fix BF16 requirement for FA kernels
ggml-ci
* make : clean-up [no ci]
2024-11-15 15:21:04 +02:00
Zhiyuan Li
42398f13b0
Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration (llama/10133)
...
* rwkv6: rename to wkv6
* rwkv6: support avx2 avx512 armv8 armv9
* rwkv6: update cuda file name
* rwkv6: rename params
* wkv on sycl
* sycl: add some ops
* sycl: Enhance OP support judgment
* wkv6: drop armv9 and tranfer to GGML style
ggml-ci
* sync : ggml
* update the function to use appropriate types
* fix define error
* Update ggml/src/ggml-cpu.c
* add appropriate asserts
* move element-wise functions outside
* put the declaration outside the loop
* rewrite to be more inline with the common pattern for distributing threads
* use recommended way GGML_TENSOR_LOCALS
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
Co-authored-by: Diego Devesa <slarengh@gmail.com >
Co-authored-by: Plamen Minev <pacominev@gmail.com >
Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com >
Co-authored-by: Meng, Hengyu <airdldl@163.com >
2024-11-15 15:21:04 +02:00
Georgi Gerganov
d111a0987e
ggml : adjust is_first_call init value (llama/10193)
...
ggml-ci
2024-11-15 15:21:04 +02:00
Diego Devesa
f69c8b6f1b
ggml : fix arch check in bf16_to_fp32 (llama/10164)
2024-11-15 15:21:04 +02:00
Diego Devesa
25da30bd60
ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (llama/10167)
2024-11-15 15:21:04 +02:00
Diego Devesa
9c817edb48
ggml : move CPU backend to a separate file (llama/10144)
2024-11-15 15:21:04 +02:00
Georgi Gerganov
0665168ef3
ggml : remove ggml_scratch (llama/10121)
...
ggml-ci
2024-11-15 15:21:04 +02:00
Diego Devesa
3e231ab9cc
llama : fix buffer checks for mamba and rwk (llama/10111)
...
* llama : fix buffer checks for mamba and rwk
* llama : fix missing worst case flag during reserve
* cuda : fix supports_op for norm
* disable sched SET_CAUSE
2024-11-15 15:21:04 +02:00
Diego Devesa
371bfaca8c
ggml : check tensor name lengths in gguf files (llama/10100)
2024-11-15 15:21:04 +02:00