Georgi Gerganov
feac80dd3f
ggml : fix UB (int << 31)
2023-04-30 22:27:30 +03:00
Georgi Gerganov
794b162a46
whisper : add integer quantization support ( #540 )
...
* whisper : add integer quantization support
* examples : add common-ggml + prepare to add "quantize" tool
* whisper : quantization tool ready
* whisper : fix F32 support
* whisper : try to fix shared lib linkage
* wasm : update quantized models to Q5
* bench.wasm : remove "medium" button
* bench.wasm : fix custom model button
* ggml : add Q5_0 and Q5_1 WASM SIMD
* wasm : add quantized models to all WASM examples
* wasm : bump DB version number to 2
* talk-llama : update example to latest llama.cpp
* node : increase test timeout to 10s
* readme : add information for model quantization
* wasm : add links to other examples
2023-04-30 18:51:57 +03:00
Georgi Gerganov
0ccd6746c9
ggml : fix WASM build
2023-04-29 21:37:23 +03:00
Georgi Gerganov
d9b550c0a1
ggml : fix 32-bit ARM NEON ( #836 )
...
* ggml : add support for 32-bit ARM
* ggml : fix
* ggml : fix
2023-04-29 21:33:33 +03:00
Georgi Gerganov
e9b091c92a
ggml : use vzip instead of vuzp for consistency
2023-04-29 21:14:09 +03:00
Georgi Gerganov
1f30b99208
ggml : fix WASM build
2023-04-29 20:21:25 +03:00
Georgi Gerganov
05c3ea3bc8
ggml : sync with ggml repo (warning fixes + asserts)
2023-04-29 19:33:28 +03:00
Georgi Gerganov
acec73ab6e
ggml : sync latest ggml + llama.cpp updates (quantization)
2023-04-29 12:32:28 +03:00
Jhen-Jie Hong
ea1f8a50d4
ggml, ci : fix build on whisper.android (ARM_NEON) + add CI ( #764 )
...
* ggml : fix undefined symbol by remove inline handle
* ggml : make own ggml_aligned_malloc function
* ci: add ios/android build
2023-04-15 14:21:58 +03:00
Georgi Gerganov
677ad754a0
ggml : sync latest ggml
2023-04-14 19:20:39 +03:00
novag
463e46338c
ggml : fix q4_1 dot product types ( #759 )
...
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-04-14 13:34:20 +03:00
Georgi Gerganov
2f889132c6
ggml : sync latest changes from ggml and llama.cpp
2023-04-13 18:53:44 +03:00
Georgi Gerganov
ebef1e8620
ggml : fix WASM build
2023-04-10 23:18:29 +03:00
Georgi Gerganov
69b8503935
ggml : backport llama.cpp updates ( close #709 )
...
- About x2 overall performance improvement on Apple Silicon
- Results should now be the same for different number of threads (not
tested)
2023-04-10 22:28:54 +03:00
Georgi Gerganov
4a0deb8b1e
talk-llama : add new example + sync ggml from llama.cpp ( #664 )
...
* talk-llama : talk with LLaMA AI
* talk.llama : disable EOS token
* talk-llama : add README instructions
* ggml : fix build in debug
2023-03-27 21:00:32 +03:00
Georgi Gerganov
f3ee4a9673
whisper : reduce memory usage during inference ( #431 )
...
* ggml : add "scratch" buffer support
* ggml : support for scratch ring-buffer
* ggml : bug fix in ggml_repeat()
* ggml : error on scratch buffer overflow
* whisper : use scratch buffers during inference (base model only)
* whisper : update memory usage for all models
* whisper : fix encoder memory usage
* whisper : use whisper_context functions instead of macros
* whisper : fix FF + remove it from README
* ggml : reuse ggml_new_i32
* ggml : refactor the scratch buffer storage
* whisper : reorder scratch buffers in the decoder
* main : add option to disable temp fallback
* Update README.md
2023-02-04 09:45:52 +02:00
fitzsim
ae16c21e9c
whisper : PPC64 big-endian support ( #398 )
...
* ggml : set cache line size to 128 on POWER9
* whisper : add PPC64 big endian support
2023-01-23 20:48:10 +02:00
Georgi Gerganov
1290fc6457
bench : add memcpy and ggml_mul_mat benchmarks
2023-01-18 20:31:46 +02:00
Georgi Gerganov
4ef3398e8f
ggml : remove obsolete zeroing + comment fixes ( #390 )
2023-01-08 20:21:03 +02:00
Abitofevrything
8d7b29cedd
ggml : correct behaviour of ggml_vec_sum_f32 ( #390 )
2023-01-08 20:06:09 +02:00
Georgi Gerganov
52a3e0c92a
ggml : improve vec_dot_f16 unrolling in flash_attn_f16
2023-01-08 11:41:18 +02:00
Georgi Gerganov
f30b5d322c
ggml : fix bug in new soft max computation
2023-01-07 21:00:07 +02:00
Georgi Gerganov
d347a59a5f
ggml : when using BLAS start only 1 CPU thread
2023-01-07 19:48:56 +02:00
Georgi Gerganov
6394c906af
ggml : fix running tasks with variable number of threads
2023-01-07 19:20:18 +02:00
Georgi Gerganov
74ffa14e1d
ggml : unroll ggml_vec_dot_f16 in ggml_compute_forward_flash_attn_f16
2023-01-07 19:19:40 +02:00
Georgi Gerganov
65fdcbbbbb
whisper : revert accidental MB change
2023-01-07 16:18:21 +02:00
Georgi Gerganov
d61d55cd4b
ggml : speed-up soft max via Accelerate + unroll
2023-01-07 16:16:42 +02:00
Georgi Gerganov
d51fc3ee0a
ggml : use vDSP_sve and vDSP_maxv from Accelerate
2023-01-07 16:10:16 +02:00
Georgi Gerganov
f82a7dd019
ggml : make gcc happy (minor)
2023-01-07 09:34:39 +02:00
Abitofevrything
a62170c656
ggml : add SSE3 and fp16 conversion lookup table ( #368 )
...
* Improves WASM performance:
On MacBook M1 Pro, I observe 25% faster using Firefox and 35% faster using Chrome
* Add support for SSE3 SIMD
* Add SSE3 to system information
* Add Imath support for fp16-fp32 conversions
* Add Imath to system information
* Wrap Imath calls to avoid static function warnings
* Drop Imath; Add lookup table for f16 -> f32 conversions
* Remove TODO comments
* Update SSE3 to new macro arguments
* Correct updated macro definitions
* Prefer static inline where possible
* ggml : static inlines + add public f16 <-> f32 conversions
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-01-06 18:45:59 +02:00
Thomas Fitzsimmons
1944e7c33e
whisper : document POWER VSX support
2023-01-05 23:53:00 +02:00
Thomas Fitzsimmons
49a8dd6732
ggml : reorganize POWER9 ppc64le SIMD code
2023-01-05 23:53:00 +02:00
Thomas Fitzsimmons
8c7f642286
ggml : change f16 load and store macro arguments
2023-01-05 23:53:00 +02:00
Georgi Gerganov
0a0cfa7985
ggml : add void to argument-less functions
2023-01-05 21:40:38 +02:00
Georgi Gerganov
d51c5eb906
ggml : define MIN / MAX only if not defined (minor)
2023-01-05 21:16:52 +02:00
Thomas Fitzsimmons
424c410c42
ggml : improve f16 acceleration for POWER9 ppc64le
2022-12-31 10:02:19 +02:00
Georgi Gerganov
4e0b2069e7
ggml : barrier refactor + static functions
2022-12-28 19:00:53 +02:00
Georgi Gerganov
ac521a566e
ggml : simplify the SIMD code ( #324 )
...
* ggml : simplify the SIMD code
* ggml : generic reduce for all register sizes + comments
2022-12-24 10:22:28 +02:00
Georgi Gerganov
7282e2109e
ggml : use vaddvq_f32 for slightly more efficient reduce
2022-12-23 13:48:19 +02:00
Thomas Fitzsimmons
466ceebb78
ggml : add f16 acceleration for POWER9 ppc64le
2022-12-23 13:23:58 +02:00
Andy Maloney
493d94130d
ggml : make consts static ( #317 )
...
These shouldn't be able to be referenced outside the compilation unit.
2022-12-23 11:05:27 +02:00
Andy Maloney
fa463313ad
minor : small code cleanups ( #302 )
...
* Small code cleanups
- fix indentation
- remove extra semicolons
- remove extra break after returns in case statements
- remove unnecessary call to .data() on string
- use empty() instead of checking size()
- no need to check for nullptr before free
- remove unnecessary initialization of string to ""
* minor : switch case always break
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2022-12-22 17:06:19 +02:00
Kevin Brothaler
e1432dd91a
Check for both __ARM_NEON and __ARM_FEATURE_FMA so that the project can be compiled for armv7a.
...
Android armeabi-v7a's NEON support doesn't support FMA unless configured with `-mfpu=neon-fp-armv8`, which would need runtime checks.
* Also removed ABI filter from Android project.
2022-12-22 16:47:54 +02:00
katsu560
419b8a6402
Add AVX,AVX2 support for ggml_vec_scale_f32
2022-12-17 19:40:10 +02:00
Georgi Gerganov
a7047b2a28
ggml : implement ggml_compute_forward_dup_f16() special cases
2022-12-16 21:50:41 +02:00
Georgi Gerganov
0f11759406
ggml : make more compatible with c99 ( #262 )
2022-12-16 18:00:12 +02:00
Georgi Gerganov
f66ac6dc4f
ggml : fix indentation
2022-12-13 23:09:21 +02:00
Georgi Gerganov
9955fa4ed7
ggml : make compatible with c99 ( #262 )
2022-12-13 23:07:49 +02:00
Roland Rabien
e70d47baab
Remove C++20 requirement ( #257 )
...
* Remove C++20 requirement
* Roll back C features not supported in VS2017
2022-12-11 20:03:07 +02:00
Georgi Gerganov
3b1aacbe6d
talk : talk with AI in the terminal
2022-12-10 16:51:58 +02:00