Georgi Gerganov
933c5bef97
whisper : support ggml_conv with CUDA and Metal ( #1473 )
...
* ggml : add CUDA support for ggml_conv
* whisper : remove ggml_repeat for conv bias + single backend
* cuda : fix im2col kernel
* metal : add im2col support + mul mat-vec f16 x f16
* bench-all : add q4 models
2023-11-10 22:26:50 +02:00
Georgi Gerganov
f96e1c5b78
sync : ggml (backend v2, k-quants, CUDA opts, Metal opts, etc.) ( #1422 )
...
* sync : ggml (backend v2, k-quants, CUDA opts, Metal opts, etc.)
* metal : allow env metal variable to override resource path (#1415 )
* Allow env variable to override resource path
* Update ggml-metal.m
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* sync : restore common / main from `master`
* sync : restore whisper from `master`
* talk-llama : update to latest llama.cpp
* ruby : fix build
* ggml : fix 32-bit ARM build
* ggml : fix MIN / MAX macro collisions + update ios bindings
* ggml : fix ifdefs and MIN / MAX again
* exampels : fix Obj-C and Swift examples
* ggml : fix 32-bit ARM compatibility
* ggml : one more attempt to fix 32-bit ARM compat
* whisper : fix support for larger graphs
---------
Co-authored-by: Chris Raethke <codesoda@users.noreply.github.com>
2023-11-03 21:35:05 +02:00
Georgi Gerganov
80c1512fd5
sync : ggml (const correctness)
2023-09-15 14:49:56 +03:00
Georgi Gerganov
b8432f28f4
metal : add F32 support + update bench output
2023-09-15 13:56:08 +03:00
Georgi Gerganov
c3f319d7c2
ggml : sync latest llama.cpp (view_src + alloc improvements) ( #1247 )
...
* ggml : sync latest llama.cpp (view_src + alloc improvements)
* ggml : fix build
2023-09-05 20:57:27 +03:00
Georgi Gerganov
59a3d0cb57
ggml : sync (ggml-alloc, GPU, eps, etc.) ( #1220 )
...
* ggml : sync (ggml-alloc, GPU, eps, etc.)
* ggml : fix build
* wasm : fix build
2023-09-05 13:54:40 +03:00
Przemysław Pawełczyk
601c2d2181
ggml : detect SSSE3 ( #1211 )
...
* ggml : add ggml_cpu_has_ssse3
* whisper : show SSSE3 in system info
* make : detect SSSE3 via cpuinfo
2023-08-27 21:36:41 +03:00
Georgi Gerganov
d6509bf78d
ggml : sync latest repo (mostly refactoring changes)
2023-07-02 21:46:09 +03:00
Georgi Gerganov
5feb0dffba
ggml : sync latest ggml lib
2023-06-25 14:30:44 +03:00
Georgi Gerganov
e410cfc3ce
ggml : sync latest ggml repo
...
- new Q4 and Q8 quantization
- updated CUDA
2023-05-20 18:56:30 +03:00
Georgi Gerganov
e693074aa6
ggml : sync latest ggml
...
- New Q4 and Q5 formats
- Various improvements
2023-05-14 18:04:23 +03:00
Georgi Gerganov
0bcb64b184
ggml : sync ggml (clBLAST + tensor names)
2023-05-02 21:24:18 +03:00
Georgi Gerganov
794b162a46
whisper : add integer quantization support ( #540 )
...
* whisper : add integer quantization support
* examples : add common-ggml + prepare to add "quantize" tool
* whisper : quantization tool ready
* whisper : fix F32 support
* whisper : try to fix shared lib linkage
* wasm : update quantized models to Q5
* bench.wasm : remove "medium" button
* bench.wasm : fix custom model button
* ggml : add Q5_0 and Q5_1 WASM SIMD
* wasm : add quantized models to all WASM examples
* wasm : bump DB version number to 2
* talk-llama : update example to latest llama.cpp
* node : increase test timeout to 10s
* readme : add information for model quantization
* wasm : add links to other examples
2023-04-30 18:51:57 +03:00
Georgi Gerganov
05c3ea3bc8
ggml : sync with ggml repo (warning fixes + asserts)
2023-04-29 19:33:28 +03:00
Georgi Gerganov
acec73ab6e
ggml : sync latest ggml + llama.cpp updates (quantization)
2023-04-29 12:32:28 +03:00
Georgi Gerganov
677ad754a0
ggml : sync latest ggml
2023-04-14 19:20:39 +03:00
Georgi Gerganov
2f889132c6
ggml : sync latest changes from ggml and llama.cpp
2023-04-13 18:53:44 +03:00
Georgi Gerganov
69b8503935
ggml : backport llama.cpp updates ( close #709 )
...
- About x2 overall performance improvement on Apple Silicon
- Results should now be the same for different number of threads (not
tested)
2023-04-10 22:28:54 +03:00
Georgi Gerganov
4a0deb8b1e
talk-llama : add new example + sync ggml from llama.cpp ( #664 )
...
* talk-llama : talk with LLaMA AI
* talk.llama : disable EOS token
* talk-llama : add README instructions
* ggml : fix build in debug
2023-03-27 21:00:32 +03:00
Georgi Gerganov
f3ee4a9673
whisper : reduce memory usage during inference ( #431 )
...
* ggml : add "scratch" buffer support
* ggml : support for scratch ring-buffer
* ggml : bug fix in ggml_repeat()
* ggml : error on scratch buffer overflow
* whisper : use scratch buffers during inference (base model only)
* whisper : update memory usage for all models
* whisper : fix encoder memory usage
* whisper : use whisper_context functions instead of macros
* whisper : fix FF + remove it from README
* ggml : reuse ggml_new_i32
* ggml : refactor the scratch buffer storage
* whisper : reorder scratch buffers in the decoder
* main : add option to disable temp fallback
* Update README.md
2023-02-04 09:45:52 +02:00
Abitofevrything
a62170c656
ggml : add SSE3 and fp16 conversion lookup table ( #368 )
...
* Improves WASM performance:
On MacBook M1 Pro, I observe 25% faster using Firefox and 35% faster using Chrome
* Add support for SSE3 SIMD
* Add SSE3 to system information
* Add Imath support for fp16-fp32 conversions
* Add Imath to system information
* Wrap Imath calls to avoid static function warnings
* Drop Imath; Add lookup table for f16 -> f32 conversions
* Remove TODO comments
* Update SSE3 to new macro arguments
* Correct updated macro definitions
* Prefer static inline where possible
* ggml : static inlines + add public f16 <-> f32 conversions
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-01-06 18:45:59 +02:00
Thomas Fitzsimmons
1944e7c33e
whisper : document POWER VSX support
2023-01-05 23:53:00 +02:00
Georgi Gerganov
ac521a566e
ggml : simplify the SIMD code ( #324 )
...
* ggml : simplify the SIMD code
* ggml : generic reduce for all register sizes + comments
2022-12-24 10:22:28 +02:00
Kevin Brothaler
e1432dd91a
Check for both __ARM_NEON and __ARM_FEATURE_FMA so that the project can be compiled for armv7a.
...
Android armeabi-v7a's NEON support doesn't support FMA unless configured with `-mfpu=neon-fp-armv8`, which would need runtime checks.
* Also removed ABI filter from Android project.
2022-12-22 16:47:54 +02:00
Georgi Gerganov
0f11759406
ggml : make more compatible with c99 ( #262 )
2022-12-16 18:00:12 +02:00
Georgi Gerganov
f8ec718b76
ggml : add F16C CPU flag check
2022-12-06 21:56:56 +02:00
katsu560
83456076f0
add AVX support
2022-11-23 22:16:33 +02:00
Georgi Gerganov
3500ce8727
ref #40 : start working on the documentation
2022-11-09 21:41:40 +02:00
Georgi Gerganov
0b2dc3c82c
parallel : working
2022-10-29 19:37:19 +03:00
Georgi Gerganov
34bb3ab0cf
ggml : add system info functions
2022-10-25 20:53:48 +03:00
Borislav Stanimirov
0b45d25151
Building with MSVC
2022-10-11 21:40:46 +03:00
Georgi Gerganov
167324584b
wip : rpi4 support
2022-10-05 23:03:46 +03:00
Georgi Gerganov
f888c2373d
Flash + language support (ref #2 )
...
- Achieved big performance improvement + memory usage reduction
- Can now translate / transcribe different languages
2022-09-28 21:07:32 +03:00
Georgi Gerganov
b0a11594ae
Initial release
2022-09-25 22:13:49 +03:00