* Fix MSVC compile error C3688
Instead of simply using 'add_compile_options(/utf-8)' to address the MSVC compile error C3688, a better approach would be to handle it in a way that prevents passing '/utf-8' to NVCC.
* Significantly improve inference quality
In the function `log_mel_spectrogram_worker_thread`, there's an array out-of-bounds issue occurring during the calculation of complex number moduli. This issue is causing disruptions in the FFT spectrum, which, in turn, is reducing the quality of inference.
* Significantly improve inference quality
At last, I've pinpointed the actual source of the problem. Given that the frequency spectrum generated from real input data is symmetrical around the Nyquist frequency, there's a for-loop within the `log_mel_spectrogram_worker_thread` function that attempts to fold the frequency spectrum. Regrettably, a bug within this for-loop is causing a frame shift in the frequency spectrum. The previous attempt to remedy this, which involved using `fft_size + 1` when calculating the modulus, was merely a band-aid solution and did not address the underlying issue.
* Addressed a few minor issues
Fixed the issue of `fft_out` continuously expanding. Resolved the fallback caused by using 'break' instead of `fft_in[j] = 0`.
* Significantly improve inference quality
Thanks for your patience everyone. It's finally sorted out. Now, the right side of the FFT spectrum is being flipped over to the left, and the amplitudes at corresponding positions on the left and right are added together (the spectrum on the left needs to be shifted by one position), then the average is calculated. FFT_OUT[0] is no longer discarded, making full use of the limited space to pack in more information.
* Add annotation and performance improvement
* Calculate FFT only when fft_in are not all zero
* Some minor performance improvement
* Fixed a bug impacting inference quality
* The first version after all the analysis is completed.
* Fix some bugs and add debug mode
* Fixed several bugs
* Temporarily disable speed-up mode and add debug mode.
* Add debug mode
* Disable speed-up mode and add debug mode
* Fix CI error (#1)
* Fix error
* Fix error
* Fixed several bugs including [BLANK_AUDIO] problem
* Remove Hard-coded hann window
* Some Final Fix (#2)
* Fix error
* Fix error
* Probably the last commit
* Probably the last commit
* whisper : minor coding style changes
* whisper : remove debug from public API
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Instead of simply using 'add_compile_options(/utf-8)' to address the MSVC compile error C3688, a better approach would be to handle it in a way that prevents passing '/utf-8' to NVCC.
Fixed the issue of not being able to find OpenBLAS on the Windows platform. Even though the name of the previously released binary file was whisper-blas-bin-x64.zip, BLAS was actually not enabled. After enabling, the inference speed can increase by 3-4 times.
* expose api to let user control log output
Add
whisper_set_log_callback()
that lets user set a callback for log messages.
Change all the
fprintf(stderr, ...)
to call via the above.
* whisper : add <cstdarg>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Current `progress_step` was hardcoded into whisper.cpp, this resulted in
bindings having to access progress only at that step even if progress
callback was being called at every iteration.
With this change we get greater granularity progress reporting from
whisper.cpp and bindings/implementations can define their own progress step.
* Initial proof of concept Vim plugin
At present, this is likely only slightly better than feature parity with
the existing whisper.nvim
Known issues:
Trailing whitespace
Up to an existing length(5 seconds) of speech may be processed when
listening is enabled
CPU cycles are spent processing speech even when not listening.
Fixing these issues is likely dependent upon future efforts to create a
dedicated library instead of wrapping examples/stream
* Support $WHISPER_CPP_HOME environment variable
A minor misunderstanding of the whisper.nvim implementation resulted in
a plugin that was functional, but not a drop in replacement as it should
be now.
* add multi platform
* add image name
* fix
* fix /bin/sh path
* add missing \
* add all platforms for check
* remove platforms
* remove s390x
* - add arm v6
- format run cmd
* remove arm v6
* - bump checkout to v3
- use setup emsdk action
- add arch to all ubuntu jobs
* mymindstorm/setup-emsdk to v12
* add missing QEMU step
* add fail-fast: false for debug
* add freebsd
* remark all jobs except freebsd for test
* add sudo
* enable all tests again
* format
* check __AVX__ support before include immintrin.h
* try auto detect flag by cmake
* fix check for immintrin.h
* fix include check for immintrin.h
* Remove all platforms for sanitizer build except amd64
We have no clue why they failed.
---------
Co-authored-by: Alon Faraj <alon.faraj@mapcore.com>
* add HuggingFace mirror to download ggml model
* support tdrz via simple hack overriding solm tokens
* fix incorrect translate/transcribe token_ids that are not static const
* add apollo 13 sample for tdrz demo
* render [SPEAKER TURN] consistently in all terminal output using vocab.id_to_token
* extend whisper_segment with speaker_turn_next field and save in json output
* fix failing go build
* slipped in some python syntax whoops
* whisper : finalize tinydiarize support (add flag + fixes)
* whisper : tdrz support for word-level timestamps (respect max_len)
* java : try to fix tests after adding tdrz_enable flag
* main : remove TODO leftover
* java : fix params order list after adding "tdrz_enable"
* whisper : fix solm and add nosp token
* main : print tinydiarize help
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* talk-llama : use posix_madvise() instead of madvise() derived from BSD
sed -i 's,\<madvise\>,posix_&,g;s,\<MADV_,POSIX_&,g' examples/talk-llama/llama-util.h
* make : enable Darwin extensions for macOS builds
This is an attempt at fixing macOS build error coming from the fact that
RLIMIT_MEMLOCK define is not available there without Darwin extensions.
* Do not use _GNU_SOURCE gratuitously.
What is needed to build whisper.cpp and examples is availability of
stuff defined in The Open Group Base Specifications Issue 6
(https://pubs.opengroup.org/onlinepubs/009695399/) known also as
Single Unix Specification v3 (SUSv3) or POSIX.1-2001 + XSI extensions.
There is no need to penalize musl libc which simply follows standards.
Not having feature test macros in source code gives greater flexibility
to those wanting to reuse it in 3rd party app, as they can build it with
minimal FTM (_XOPEN_SOURCE=600) or other FTM depending on their needs.
It builds without issues in Alpine (musl libc), Ubuntu (glibc), MSYS2.
* examples : include SDL headers before other headers
This is an attempt at fixing macOS build error coming from SDL2 relying
on Darwin extension memset_pattern4/8/16 coming from Apple's string.h.