* passing samples_padded by ref to the threads.
* passing samples_padded by ref to the threads.
---------
Co-authored-by: Vinith Misra <physicsdemon@gmail.com>
* metal : support permuted matrix multiplicaions
ggml-ci
* cont : use nb01 directly for row steps
ggml-ci
* cont : add comments [no ci]
* metal : minor refactor
* metal : minor
* [CANN] Adapt to dynamically loadable backends mechanism
* Fix the Bug: inference running result is garbled in debug running model for LM models who's type is Q4_0 class
* Handle the review comments of this pull request
add intel amx isa detection
add vnni kernel for gemv cases
add vnni and amx kernel support for block_q8_0
code cleanup
fix packing B issue
enable openmp
fine tune amx kernel
switch to aten parallel pattern
add error message for nested parallelism
code cleanup
add f16 support in ggml-amx
add amx kernels for QK_K quant formats: Q4_K, Q5_K, Q6_K and IQ4_XS
update CMakeList
update README
fix some compilation warning
fix compiler warning when amx is not enabled
minor change
ggml-ci
move ggml_amx_init from ggml.c to ggml-amx/mmq.cpp
ggml-ci
update CMakeLists with -mamx-tile, -mamx-int8 and -mamx-bf16
ggml-ci
add amx as an ggml-backend
update header file, the old path for immintrin.h has changed to ggml-cpu-impl.h
minor change
update CMakeLists.txt
minor change
apply weight prepacking in set_tensor method in ggml-backend
fix compile error
ggml-ci
minor change
ggml-ci
update CMakeLists.txt
ggml-ci
add march dependency
minor change
ggml-ci
change ggml_backend_buffer_is_host to return false for amx backend
ggml-ci
fix supports_op
use device reg for AMX backend
ggml-ci
minor change
ggml-ci
minor change
fix rebase
set .buffer_from_host_ptr to be false for AMX backend
* fix: use `vm_allocate` to allocate CPU backend buffer on macOS
* fix: switch to `posix_memalign` to keep existing `free()` usages work
* feat: move `GGML_ALIGNED_MALLOC` to `ggml-backend-impl.h`, add support for `vm_allocate` on macOS
* style: formatting
* fix: move const outside of `#ifndef`
* style: formatting
* fix: unused var
* fix: transform `GGML_ALIGNED_MALLOC` and `GGML_ALIGNED_FREE` into functions and add them to `ggml-impl.h`
* fix: unused var
* fix: page align to `GGUF_DEFAULT_ALIGNMENT`
* fix: page align to `TENSOR_ALIGNMENT`
* fix: convert `TENSOR_ALIGNMENT` to a macro
* fix: increase page size to `32` on iOS
* fix: iOS page size
* fix: `hbw_posix_memalign` alignment
* Vectorize load instructions in dmmv f16 CUDA kernel
Replaces scalar with vector load instructions, which substantially
improves performance on NVIDIA HBM GPUs, e.g. gives a 1.27X overall
speedup for Meta-Llama-3-8B-Instruct-F16 BS1 inference evaluation on
H100 SXM 80GB HBM3. On GDDR GPUs, there is a slight (1.01X) speedup.
* addressed comment
* Update ggml/src/ggml-cuda/dmmv.cu
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* ggml : do not use BLAS with types without to_float
* ggml : return pointer from ggml_internal_get_type_traits to avoid unnecessary copies
* ggml : rename ggml_internal_get_type_traits -> ggml_get_type_traits
it's not really internal if everybody uses it
* docs : clarify building Android on Termux
* docs : update building Android on Termux
* docs : add cross-compiling for Android
* cmake : link dl explicitly for Android
* ggml : add metal backend registry / device
ggml-ci
* metal : fix names [no ci]
* metal : global registry and device instances
ggml-ci
* cont : alternative initialization of global objects
ggml-ci
* llama : adapt to backend changes
ggml-ci
* fixes
* metal : fix indent
* metal : fix build when MTLGPUFamilyApple3 is not available
ggml-ci
* fix merge
* metal : avoid unnecessary singleton accesses
ggml-ci
* metal : minor fix [no ci]
* metal : g_state -> g_ggml_ctx_dev_main [no ci]
* metal : avoid reference of device context in the backend context
ggml-ci
* metal : minor [no ci]
* metal : fix maxTransferRate check
* metal : remove transfer rate stuff
---------
Co-authored-by: slaren <slarengh@gmail.com>
* Single allocation of encode_async block with non-ARC capture in ggml-metal.m
* Moving Block_release to the deallocation code
* Release encode block when re-setting encoding buffer count if needed
* Update ggml/src/ggml-metal.m
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
This commit removes the buffer_id field from the leaf_alloc struct.
The motivation for is that this field is only written to and never
read/used as far as I can tell. Each tensor_alloc has a buffer_id field
and this is what caused me to look into this more closely, to
understand what the buffer_id in leaf_alloc was used for.
* Add test for built package existence
* Add more tests for Whisper::Params
* Add more Whisper::Params attributes
* Add tests for callbacks
* Add progress and abort callback features
* [skip ci] Add prompt usage in README
* Change prompt text in example
* Add Params#new_segment_callback= method
* Add tests for Params#new_segment_callback=
* Group tests for #transcribe
* Don't use static for thread-safety
* Set new_segment_callback only when necessary
* Remove redundant check
* [skip ci] Add Ruby version README
* Revert "Group tests for #transcribe"
This reverts commit 71b65b00cc.
* Revert "Add tests for Params#new_segment_callback="
This reverts commit 81e6df3bab.
* Add test for Context#full_n_segments
* Add Context#full_n_segments
* Add tests for lang API
* Add lang API
* Add tests for Context#full_lang_id API
* Add Context#full_lang_id
* Add abnormal test cases for lang
* Raise appropriate errors from lang APIs
* Add tests for Context#full_get_segment_t{0,1} API
* Add Context#full_get_segment_t{0,1}
* Add tests for Context#full_get_segment_speaker_turn_next API
* Add Context#full_get_segment_speaker_turn_next
* Add tests for Context#full_get_segment_text
* Add Context#full_get_setgment_text
* Add tests for Params#new_segment_callback=
* Run new segment callback
* Split tests to multiple files
* Use container struct for new segment callback
* Add tests for Params#new_segment_callback_user_data=
* Add Whisper::Params#new_user_callback_user_data=
* Add GC-related test for new segment callback
* Protect new segment callback related structs from GC
* Add meaningful test for build
* Rename: new_segment_callback_user_data -> new_segment_callback_container
* Add tests for Whisper::Segment
* Add Whisper::Segment and Whisper::Context#each_segment
* Extract c_ruby_whisper_callback_container_allocate()
* Add test for Whisper::Params#on_new_segment
* Add Whisper::Params#on_new_egment
* Assign symbol IDs to variables
* Make extsources.yaml simpler
* Update README
* Add document comments
* Add test for calling Whisper::Params#on_new_segment multiple times
* Add file dependencies to GitHub actions config and .gitignore
* Add more files to ext/.gitignore
* Improve Rakefile
* Remove intermediate files
* Remove unnecessary manipulations from extconf.rb
* Add README and LINCENSE to source files
* Manage ext source files using YAML file
* Use extsources.yaml to include files into gem package file
* Add git-managed source files to build dependency
* Add test task
* Download model for test if not exists
* Add test for build
* Ignore gem package directory
* Enable GitHub action for Ruby binding
* Fix model name
* Build lib file for test
* Use extension for each platform
* Use extension for each platform on testing
* Move built lib file rather than copy
* Add intermediate files to clean targets