Commit Graph

1543 Commits

Author SHA1 Message Date
Justine Tunney
7f78675008
examples : use colorblind friendly TTY color scheme (#2360)
This change updates the -pc flag, so that a new xterm256 color scheme is
used. This color scheme is believed to be better for three reasons:

1. It should be friendlier to the colorblind. The scheme was designed by
   Paul Tol (see: https://personal.sron.nl/~pault/). TensorBoard uses it
   since 2017, so it's already popular in the machine learning community

2. It should appear to be the same colors as before to people who aren't
   i.e. it's still a red-green spectrum like before but lightly modified

3. It is readable in both white and black background terminals. The neon
   colors before were probably a bit too intense for white backgrounds.
2024-08-20 10:49:10 +03:00
Georgi Gerganov
22fcd5fd11
sync : ggml 2024-08-12 11:59:15 +03:00
Salvatore Mesoraca
993f0df419
ggml : support forward pass broadcasting in ggml_sub (ggml/914)
* ggml: support forward pass broadcasting in ggml_sub

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>

* Use assert instead of GGML_ASSERT in ggml_compute_forward_sub_f32

The check is already performed in ggml_sub_impl

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>

---------

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
2024-08-12 11:58:49 +03:00
slaren
9b1788483c
metal : fix uninitialized abort_callback (llama/8968) 2024-08-12 11:58:49 +03:00
Georgi Gerganov
ad37d26983
rpc : sanitize tensor data + warnings (llama/0)
Co-authored-by: slaren <slarengh@gmail.com>
2024-08-12 11:58:46 +03:00
Mengqing Cao
81c999fe0a
cann : add Ascend NPU support (#2336)
* enable Ascend NPU in src/whisper.cpp
  * sync test-backend-ops with llama.cpp
2024-08-09 15:21:56 +03:00
Georgi Gerganov
4b7de08bfd whisper : fix compile warning (#0) 2024-08-09 09:58:16 +03:00
Georgi Gerganov
4b9c4de1ad sync : ggml 2024-08-09 09:58:16 +03:00
hipudding
be88ee1d75 ggml : add CANN backend (llama/0)
ggml-ci
2024-08-09 09:58:16 +03:00
Georgi Gerganov
3ab19c744e scripts : sync cann 2024-08-09 09:58:16 +03:00
Georgi Gerganov
6eac06759b ci : disable ruby workflow (#0) 2024-08-08 22:48:46 +03:00
Georgi Gerganov
2e9a5bd2c4 ci : try to fix FreeBSD (#0) 2024-08-08 22:48:46 +03:00
Georgi Gerganov
58323bf8ed build : fix aarch64 (#0) 2024-08-08 22:48:46 +03:00
Georgi Gerganov
22058f2dbc talk-llama : sync llama.cpp 2024-08-08 22:48:46 +03:00
Georgi Gerganov
5b7979a1e6 sync : ggml 2024-08-08 22:48:46 +03:00
slaren
ee14c02365 ggml-backend : fix async copy from CPU (llama/8897)
* ggml-backend : fix async copy from CPU

* cuda : more reliable async copy, fix stream used when the devices are the same
2024-08-08 22:48:46 +03:00
Ouadie EL FAROUKI
ab39dd34e1 Updated SYCL device filtering (llama/8901)
* Updated device filter to depend on default_selector (fixes non-intel device issues)
* Small related update to example/sycl Readme
2024-08-08 22:48:46 +03:00
Johannes Gäßler
b1348d3530 CUDA/HIP: fix tests/test-backend-ops (llama/8896) 2024-08-08 22:48:46 +03:00
Johannes Gäßler
90641b5cf4 CUDA: fix padding logic for FP16/FP32 (llama/8884) 2024-08-08 22:48:46 +03:00
Molly Sophia
4160b930f1 ggml : add epsilon as a parameter for group_norm (llama/8818)
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-08-08 22:48:46 +03:00
Justine Tunney
7a96e661e4 ggml : fix overflows in elu function (llama/8866)
It's helpful to use expm1f(x), because expf(x)-1 will result in overflow
for 25% of single-precision floating point numbers.
2024-08-08 22:48:46 +03:00
jdomke
a902fb4ab2 ggml : reading the runtime sve config of the cpu (llama/8709)
* ggml : reading the runtime sve config of the cpu

* change to one time init to prevent performance drop

* prefix variable to avoid possible conflicts

* revert xxhash fix and add brackets

---------

Co-authored-by: domke <673751-domke@users.noreply.gitlab.com>
2024-08-08 22:48:46 +03:00
Sigbjørn Skjæret
6cb38c3673 Fix conversion of unnormalized BF16->BF16 weights (llama/7843)
* add truncate_bf16

* truncate intermediate fp32 if converting bf16 to bf16

* fix masking in __compute_fp32_to_bf16

* np.int16 no longer used

* missing cast and additional numpy 2.x fix

* ggml-impl : do not flush bf16 subnormals to zero

* ggml : add reference fp32 to bf16 conversion

The fast version is no longer equivalent for all platforms
because of the handling of subnormal values.

* gguf-py : remove flush to zero for bf16 subnormals

* gguf-py : remove float32 truncation to bf16

Rounding achieves the same thing in the cases where this was used.

* missed prototype update in merge

* merge cleanup

---------

Co-authored-by: Francis Couture-Harpin <git@compilade.net>
2024-08-08 22:48:46 +03:00
Ouadie EL FAROUKI
9cf14ebcbc Fixing wrong VDR iq4nl value (llama/8812) 2024-08-08 22:48:46 +03:00
matteo
8e39ee171f ggml-cuda: Adding support for unified memory (llama/8035)
* Adding support for unified memory

* adding again the documentation about unified memory

* refactoring: Moved the unified memory code in the correct location.

* Fixed compilation error when using hipblas

* cleaning up the documentation

* Updating the documentation

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* adding one more case where the PR should not be enabled

---------

Co-authored-by: matteo serva <matteo.serva@gmail.com>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2024-08-08 22:48:46 +03:00
Alex O'Connell
d26250f78c Build: Only include execinfo.h on linux systems that support it (llama/8783)
* Only enable backtrace on GLIBC linux systems

* fix missing file from copy

* use glibc macro instead of defining a custom one
2024-08-08 22:48:46 +03:00
slaren
5218ea21b8 cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X (llama/8800)
* cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X

* update asserts

* only use dmmv for supported types

* add test
2024-08-08 22:48:46 +03:00
l3utterfly
e60be821ce added android implementation of ggml_print_backtrace_symbols (llama/8751)
* added android implementation of ggml_print_backtrace_symbols

* Update ggml/src/ggml.c

Co-authored-by: slaren <slarengh@gmail.com>

* Update ggml/src/ggml.c

Co-authored-by: slaren <slarengh@gmail.com>

* Update ggml/src/ggml.c

Co-authored-by: slaren <slarengh@gmail.com>

* Update ggml/src/ggml.c

Co-authored-by: slaren <slarengh@gmail.com>

* Update ggml/src/ggml.c

Co-authored-by: slaren <slarengh@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-08-08 22:48:46 +03:00
wangshuai09
19708df884 cann: update cmake (llama/8765) 2024-08-08 22:48:46 +03:00
zhentaoyu
3f190addda Add TIMESTEP_EMBEDDING OP (llama/8707)
Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
2024-08-08 22:48:46 +03:00
CarterLi999
b355ee7cfa ggml: bugfix: fix the inactive elements is agnostic for risc-v vector (llama/8748)
In these codes, we want to retain the value that they previously held
when mask[i] is false. So we should use undisturbed. With the default
agnostic policy of rvv intrinsic, these values can be held or be
written with 1s.

Co-authored-by: carter.li <carter.li@starfivetech.com>
2024-08-08 22:48:46 +03:00
R0CKSTAR
49ac8872b4 cuda : organize vendor-specific headers into vendors directory (llama/8746)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-08-08 22:48:46 +03:00
Meng, Hengyu
8ef98ae7e3 add conv support (llama/8688) 2024-08-08 22:48:46 +03:00
R0CKSTAR
e471adcfa5 feat: Support Moore Threads GPU (llama/8383)
* Update doc for MUSA

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* Add GGML_MUSA in Makefile

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* Add GGML_MUSA in CMake

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* CUDA => MUSA

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* MUSA adds support for __vsubss4

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* Fix CI build failure

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-08-08 22:48:46 +03:00
Borislav Stanimirov
aa816c922c ggml : ignore more msvc warnings (ggml/906) 2024-08-08 22:48:46 +03:00
Georgi Gerganov
b3264eb266 metal : fix struct name (ggml/912)
ggml-ci
2024-08-08 22:48:46 +03:00
Conrad Kramer
eb2eb87a58 metal : add abort callback (ggml/905) 2024-08-08 22:48:46 +03:00
0cc4m
83fcb0e486 vulkan : implement Stable Diffusion operators (ggml/904)
* Fix Vulkan repeat op

* Implement Vulkan concat op

* Delete old Vulkan shader generator

* Implement Vulkan im2col op

* Implement Vulkan unary gelu_quick op

* Implement Vulkan group_norm op

* Implement Vulkan timestep_embedding op

* Implement Vulkan upscale op

* Fix Vulkan vk_context tensor extra index issue

* Fix Vulkan matmul shader parameter bug

* Properly fix Vulkan matmul shader parameter bug

* Add Vulkan ADD f16 + f32 -> f16 operator support

* Implement Vulkan tanh op

* Fix Vulkan group count too large Validation error on non-Nvidia GPUs

* Throw error when too much memory is requested

* Fix another Vulkan group count too large Validation error on non-Nvidia GPUs

* Fix matmul MMQ condition

* Implement Vulkan pad op

* Fix Vulkan crash when tensor is used multiple times in a compute graph

* Add Vulkan CONCAT f16 + f16 -> f16 op

* Add Vulkan LEAKY_RELU op
2024-08-08 22:48:46 +03:00
Daniel Bevenius
f7bb412878 ggml : move c parameter comment to ggml_rope_ext (ggml/901)
This commit moves the comment for the c parameter from ggml_rope to
ggml_rope_ext. The comment is currently incorrect as ggml_rope does not
have a c parameter (freq_factors tensor).

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-08-08 22:48:46 +03:00
Georgi Gerganov
ef6dcf0d0c ggml : resolve sync conflicst (ggml/0)
ggml-ci
2024-08-08 22:48:46 +03:00
Georgi Gerganov
c7ea4fd235 common : handle new quant types (ggml/0) 2024-08-08 22:48:46 +03:00
Dibakar Gope
525f190917 ggml : add ggml-aarch64 (ggml/0) 2024-08-08 22:48:46 +03:00
slaren
dd916a2852 ggml : reduce hash table reset cost (llama/8698)
* ggml : reduce hash table reset cost

* fix unreachable code warnings after GGML_ASSERT(false)

* GGML_ASSERT(false) -> GGML_ABORT("fatal error")

* GGML_ABORT use format string
2024-08-08 22:48:46 +03:00
DavidKorczynski
0620fe00ec ggml: handle ggml_init failure to fix NULL pointer deref (llama/8692)
`ggml_init` can fail if no unused context is found. In that case, a NULL-pointer deref will happen later in the code during a call to `ggml_set_on_alloc`.

This fixes it by bailing out if no context is found.
2024-08-08 22:48:46 +03:00
Chen Xi
31d0a9a14f fix multi-gpu issue on sycl (llama/8554)
---------

Signed-off-by: Chen Xi <xi2chen@intel.com>
Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com>
2024-08-08 22:48:46 +03:00
Georgi Gerganov
c06970dd72 ggml : add and use ggml_cpu_has_llamafile() (llama/8664) 2024-08-08 22:48:46 +03:00
Joe Todd
7598acf525 Re-add erroneously removed -fsycl from GGML_EXTRA_LIBS (llama/8667) 2024-08-08 22:48:46 +03:00
Joe Todd
43ddfce969 sycl : Add support for non-release DPC++ & oneMKL (llama/8644)
* Update cmake to support nvidia hardware & open-source compiler
---------
Signed-off-by: Joe Todd <joe.todd@codeplay.com>
2024-08-08 22:48:46 +03:00
0cc4m
a7e6d2cd9c Vulkan IQ4_NL Support (llama/8613)
* Fix Vulkan matmul tests compile errors

* Add Vulkan IQ4_NL support

* Fix Vulkan DeepSeek-Coder-V2-Lite MoE support
2024-08-08 22:48:46 +03:00
Jeroen Mostert
86506b0c5c Allow all RDNA2 archs to use sdot4 intrinsic (llama/8629)
The check gating the use of `__builtin_amdgc_sdot4` specifically checks for gfx1030. This causes a severe perf regression for anything gfx103? that's not gfx1030 and not using `HSA_OVERRIDE_GFX_VERSION` (if you've built ROCm to support it). We already have a generic RDNA2 define, let's use it.
2024-08-08 22:48:46 +03:00