Georgi Gerganov
3b8c2dff57
talk-llama : sync latest llama.cpp
2024-01-06 17:22:57 +02:00
Georgi Gerganov
0b9af32a8b
release : v1.5.4
2024-01-05 17:11:27 +02:00
Erik Scholz
11b1b63b14
fix : cuda order of synchronization when setting a buffer (ggml/679)
...
* fix : cuda order of synchronization when setting a buffer
* also sync before memcpy
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-01-05 17:01:59 +02:00
Georgi Gerganov
0e26a6c92e
metal : switch back to default.metallib (ggml/681)
...
ggml-ci
2024-01-05 16:31:30 +02:00
Georgi Gerganov
66d8f0b7f1
ggml : fix q2_k bpw in comments (ggml/680)
2024-01-05 16:31:20 +02:00
Yajing Tang
ba5bcde874
coreml : fix ANE optimized encoder ( #1716 )
2024-01-04 16:28:30 +02:00
Georgi Gerganov
ab0a8593c5
whisper.swiftui : add .gitignore
2024-01-04 15:00:27 +02:00
Georgi Gerganov
668ffc9b23
whispser : reset the "batched" timings ( #1721 )
2024-01-04 13:38:39 +02:00
Georgi Gerganov
9962371f71
release : v1.5.3
2024-01-03 19:36:33 +02:00
Ashraful Islam
993acb5d41
swift : update Package.swift to use ggml as package dependency ( #1701 )
...
* updates Package.swift to use ggml as dependency
* cleans up the Package.swift file by removing redundant source files
* updates ggml url src to ggerganov
2024-01-03 19:30:26 +02:00
Finn Voorhees
a3d0aa73d1
ggml : add error handling to graph_compute ( #1714 )
2024-01-03 15:39:43 +02:00
Georgi Gerganov
14c57952f7
cuda : simplify expression
...
Co-authored-by: slaren <slarengh@gmail.com>
2024-01-03 14:43:51 +02:00
Georgi Gerganov
6c369d6788
cuda : mark I16 and I32 ops as unsupported
...
ggml-ci
2024-01-03 14:43:51 +02:00
Georgi Gerganov
4cdd9aad9b
metal : add kernel_get_rows_i32
...
ggml-ci
2024-01-03 14:43:51 +02:00
Georgi Gerganov
f38c057503
metal : optimize ggml_mul_mat_id (faster Mixtral PP) (llama/4725)
...
* ggml : disable fast-math for Metal (cmake build only)
ggml-ci
* metal : fix Metal API debug warnings
* cmake : add -fno-inline for Metal build (llama/4545)
* metal : fix API debug warnings
* metal : fix compile warnings
* metal : use uint64_t for strides
* cmake : rename option to LLAMA_METAL_SHADER_DEBUG
* metal : fix mat-vec Q8_0 kernel for BS > 1
* metal : normalize mat-vec kernel signatures
* cmake : respect LLAMA_QKK_64 option
* metal : fix mat-vec Q4_K kernel for QK_K == 64
* metal : optimizing ggml_mul_mat_id (wip)
* metal : minor fix
* metal : opt mul_mm_id
2024-01-03 14:43:51 +02:00
Georgi Gerganov
1e5544b39b
metal : enable shader debugging (cmake option) (llama/4705)
...
* ggml : disable fast-math for Metal (cmake build only)
ggml-ci
* metal : fix Metal API debug warnings
* cmake : add -fno-inline for Metal build (llama/4545)
* metal : fix API debug warnings
* metal : fix compile warnings
* metal : use uint64_t for strides
* cmake : rename option to LLAMA_METAL_SHADER_DEBUG
* metal : fix mat-vec Q8_0 kernel for BS > 1
* metal : normalize mat-vec kernel signatures
* cmake : respect LLAMA_QKK_64 option
* metal : fix mat-vec Q4_K kernel for QK_K == 64
ggml-ci
2024-01-03 14:43:51 +02:00
Georgi Gerganov
d5673af79f
ggml : add ggml_vdotq_s32 alias (llama/4715)
...
ggml-ci
2024-01-03 14:43:51 +02:00
Johannes Gäßler
a28dacec65
CUDA: fixed tensor cores not being used on RDNA3 (llama/4697)
2024-01-03 14:43:51 +02:00
automaticcat
dbe29d4e33
ggml : add ggml_cpu_has_avx_vnni() (llama/4589)
...
* feat: add avx_vnni based on intel documents
* ggml: add avx vnni based on intel document
* llama: add avx vnni information display
* docs: add more details about using oneMKL and oneAPI for intel processors
* docs: add more details about using oneMKL and oneAPI for intel processors
* docs: add more details about using oneMKL and oneAPI for intel processors
* docs: add more details about using oneMKL and oneAPI for intel processors
* docs: add more details about using oneMKL and oneAPI for intel processors
* Update ggml.c
Fix indentation upgate
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-03 14:43:51 +02:00
Johannes Gäßler
fe3a67c546
CUDA: fix tensor core logic for Pascal and HIP (llama/4682)
2024-01-03 14:43:51 +02:00
hydai
b138ff2be3
cuda: fix vmm oom issue on NVIDIA AGX Orin (llama/4687)
...
Signed-off-by: hydai <hydai@secondstate.io>
2024-01-03 14:43:51 +02:00
Guillaume Wenzek
cf6f1e4181
ggml : extend ggml_get_rows, ggml_repeat, ggml_concat (ggml/639)
...
* add more int ops
* ggml_compute_forward_dup_bytes
* add tests
* PR comments
* tests : minor indentations
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-03 14:43:51 +02:00
Georgi Gerganov
620a223814
scripts : fix sync order + metal sed
2024-01-03 14:43:51 +02:00
Andreu Huguet
f39f9690ec
examples : fix WASM Stack Overflow ( #1713 )
...
Fix for problem:
"""
RuntimeError: Aborted(Stack overflow! Stack cookie has been overwritten at 0x12be2b10, expected hex dwords 0x89BACDFE and 0x2135467, but received 0x00000000 0x00000000)
"""
That appears when executing the WASM example with the newer versions.
2024-01-02 16:50:04 +00:00
bobqianic
f9ca90256b
docker : fix the publishing of the CUDA Docker image ( #1704 )
2023-12-30 23:12:31 +02:00
Georgi Gerganov
2623640cd6
scripts : do not sync commits from this repo
2023-12-29 15:03:08 +02:00
Tamotsu Takahashi
d87de61ae6
ci : build with CLBlast + ggml-opencl use GGML_API ( #1576 )
...
* Build with CLBlast
* Declare GGML_API
After rebasing, examples/talk-llama failed:
"D:\a\whisper.cpp\whisper.cpp\build\ALL_BUILD.vcxproj" (build target) (1) ->
"D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj" (default target) (14) ->
(Link target) ->
llama.obj : error LNK2019: unresolved external symbol ggml_cl_free_data referenced in function "public: __cdecl llama_model::~llama_model(void)" (??1llama_model@@QEAA@XZ) [D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj]
llama.obj : error LNK2019: unresolved external symbol ggml_cl_transform_tensor referenced in function "public: void __cdecl llama_model_loader::load_all_data(struct ggml_context *,void (__cdecl*)(float,void *),void *,struct llama_mlock *)" (?load_all_data@llama_model_loader@@QEAAXPEAUggml_context@@P6AXMPEAX@Z1PEAUllama_mlock@@@Z) [D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj]
D:\a\whisper.cpp\whisper.cpp\build\bin\Release\talk-llama.exe : fatal error LNK1120: 2 unresolved externals [D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj]
2023-12-29 12:23:27 +02:00
bobqianic
f5f485f899
whisper : replace tensor->n_dims
with ggml_n_dims(tensor)
( #1694 )
2023-12-29 11:38:35 +02:00
Georgi Gerganov
e77b27c331
sync : ggml (VMM, sync-ggml-am, dotprod ARM fixes, CUDA fixes) ( #1691 )
...
* scripts : add sync-ggml-am.sh
* sync : ggml (VMM, ARM dot prod fix, etc.)
* build : fix CUDA build
* ggml : fix some mul mat cases + add tests for src1 F16
dbd02958fa
2023-12-29 11:30:47 +02:00
Dimo
a5cc3dc8a2
download : fix large q5 model name ( #1695 )
...
fixed typo in large-v3-q5-0 model name to match HF link
2023-12-29 11:14:32 +02:00
bobqianic
37a709f655
whisper : Replace WHISPER_PRINT_DEBUG with WHISPER_LOG_DEBUG ( #1681 )
2023-12-23 12:02:58 +00:00
Georgi Gerganov
3a5302108d
sync : ggml (ggml_scale, ggml_row_size, etc.) ( #1677 )
...
* sync : ggml
* sync : llama.cpp
* talk-llama : fix obsolete param
* ggml-alloc : fix ggml_tallocr_is_own
* talk.wasm : update to new ggml
* ggml : fix type punning in ggml_scale
* ggml : cuda jetson + arm quants warnings
2023-12-22 17:53:39 +02:00
Chaoqun
d2ee117a0a
docker : Dockerize whisper.cpp ( #1674 )
...
* build: add dockerfile for ci
* ci: add action to build/push docker image
* fix: lowercase repository to fix ci
* ci: update cuBLAS flag
* build: install curl and ffmped in image
* docs: add docker section
* fix: improve args check when download model
2023-12-22 11:16:02 +00:00
bobqianic
db8ccdb850
CI : Add coverage for talk-llama when WHISPER_CUBLAS=1 ( #1672 )
2023-12-21 22:39:46 +00:00
bobqianic
d2419030b0
examples : Revert CMakeLists.txt for talk-llama ( #1669 )
2023-12-21 22:48:52 +02:00
bobqianic
8986690c2a
cmake : set default CUDA architectures ( #1667 )
2023-12-21 15:44:04 +02:00
Alfredo Montesinos
9286d3f584
bench.py : add different large models ( #1655 )
...
Amend different large v1,v2,v3 models to benchmark.
2023-12-19 12:40:14 +02:00
Georgi Gerganov
940de9dbe9
wchess : update README.md
2023-12-14 22:00:47 +02:00
Georgi Gerganov
88112c8afb
release : v1.5.2
2023-12-14 17:56:39 +02:00
Georgi Gerganov
375585c07c
wchess : update readme
2023-12-14 17:51:14 +02:00
fraxy-v
fd99ece8e3
wchess : whisper assisted chess ( #1595 )
...
* wchess: whisper assisted chess
* wchess: fix allowed moves in check
* wchess: touchstart, touchend events
* wchess: css, disabled button
* wchess : html touches
* wchess : minor fixes and code style
* wchess : bump encoder context to 1280
* wchess : index.html
* wchess : fix CI warnings
* wchess : add array header
* wchess : build static library
* wchess : display grammar
* wchess : update UX
* wchess : add comment
* wchess : add README
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-14 15:58:26 +02:00
Georgi Gerganov
8171e621fc
sync : ggml (Metal fixes, new ops, tests) ( #1633 )
...
* sync : ggml (Metal fixes, new ops, tests)
* cuda : fix bin bcast when src1 and dst have different types
2023-12-13 21:55:03 +02:00
Kreijstal
ec03661b20
cmake : target windows 8 or above for prefetchVirtualMemory in llama-talk ( #1617 )
...
Since we use prefetchVirtualMemory we specify we target win 8 or above, otherwise other compilers will refuse to use the prefetchVirtualMemory api, (I understand you are loading it dynamically but the header definition has this limitation)
2023-12-12 11:35:00 +00:00
Kreijstal
6335933a5b
cmake : Fix bug in httplib.h for mingw ( #1615 )
...
Fix bug in httlib.h for mingw, please see https://github.com/yhirose/cpp-httplib/issues/1669
2023-12-10 17:47:52 +00:00
Finn Voorhees
885b5563d0
metal : fix ggml_metal_log
vargs ( #1606 )
2023-12-08 13:50:50 +02:00
Georgi Gerganov
9521ba6801
whisper.objc : disable timestamps for real-time transcription
2023-12-08 13:43:37 +02:00
Georgi Gerganov
29511d33c7
whisper : more debug messages + fix fallback logic
2023-12-08 13:43:12 +02:00
Georgi Gerganov
7bc4d22337
metal : fix soft_max kernel src1 argument ( #1602 )
2023-12-08 13:39:32 +02:00
Georgi Gerganov
afce6fa113
sync : ggml (new ops, new backend, etc) ( #1602 )
...
* sync : ggml (new ops, new backend, etc)
* whisper : remove obsolete broadcasting code
* ggml : remove backend self-registers + fix ggml_concat + n_task logic
* metal : fix assert
* metal : print resource path
* whisper : fix bug if metal init fails
2023-12-07 22:27:19 +02:00
Oleg Sidorov
3163090d89
server : pass max-len argument to the server ( #1574 )
...
This commit fixes the missing parameter binding for max-len between the input
arguments and wparams.
2023-12-05 23:01:45 +02:00