whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-08-04 10:22:35 +02:00

Author	SHA1	Message	Date
Thijs Raymakers	6108d3cc58	whisper : use correct seek_end when offset is used (#833 ) Whenever an `offset_ms` is provided, the value of `seek_end` is calculated incorrectly. This causes Whisper to keep transcribing after the end of the file. The current behavior looks like ``` [00:34:40.000 --> 00:34:47.000] This is an example audio file. [00:34:47.000 --> 00:34:49.000] The text has been redacted [00:34:49.000 --> 00:34:51.000] This is the end of the audio. [00:34:51.000 --> 00:34:52.000] * [00:34:52.000 --> 00:34:53.000] * [00:34:53.000 --> 00:34:54.000] * [00:34:55.000 --> 00:34:56.000] * ... ``` The expected behavior should be ``` [00:34:40.000 --> 00:34:47.000] This is an example audio file. [00:34:47.000 --> 00:34:49.000] The text has been redacted [00:34:49.000 --> 00:34:51.000] This is the end of the audio. - end of program - ``` This commit changes the calculation of the `seek_end` variable to only add `seek_start` if a custom `duration_ms` is provided. Otherwise, it defaults to the end of the file. Signed-off-by: Thijs Raymakers <thijs@raymakers.nl>	2023-04-29 18:55:37 +03:00
Georgi Gerganov	bab97c83d0	tests : add "threads" to run-tests.sh	2023-04-29 12:32:28 +03:00
Georgi Gerganov	3eaeb030ff	extra : add sync-ggml.sh script	2023-04-29 12:32:28 +03:00
Georgi Gerganov	acec73ab6e	ggml : sync latest ggml + llama.cpp updates (quantization)	2023-04-29 12:32:28 +03:00
Zollner	5cc17418c7	whisper.android : add some tips (#816 )	2023-04-29 11:00:20 +03:00
Georgi Gerganov	3efb81dec6	build : add WHISPER_COREML_ALLOW_FALLBACK to make / CMake (#812 )	2023-04-29 10:55:24 +03:00
Canis Lupus	94a7cd2a07	whisper : allow non-CoreML fallback when Core ML cannot be loaded (#812 ) if the Core ML model cannot be loaded, continue without Core ML instead of returning. This allows a single build to transcribe using Core ML models where available, and regular models when not.	2023-04-29 10:49:02 +03:00
Georgi Gerganov	3e82ff4747	whisper : fix bug from previous commit	2023-04-29 10:42:14 +03:00
Georgi Gerganov	b5bd2f43c5	whisper : avoid designated initializers	2023-04-29 10:36:50 +03:00
AsukaMinato	94aa56f19e	minor : improve C++ and Python style (#768 ) * use some STL functions * use self.field than setattr, use pathlib.Path * recover some format * const some iter * Keep the original * 2 space	2023-04-29 10:06:25 +03:00
Georgi Gerganov	4d89ee2e59	readme : add logo	2023-04-28 22:41:29 +03:00
Laytan Laats	70567eff23	main : escape quotes in csv output (#815 )	2023-04-23 19:01:59 +03:00
Taras Glek	02ec83c5d5	stream : flush upon finishing inference (#811 )	2023-04-23 17:00:30 +03:00
Philipp Zabel	2bd4b8d577	examples : add missing #include <cstdint> (#798 ) common.cpp uses uint8_t and uint64_t, which are defined in <cstdint>.	2023-04-23 16:52:52 +03:00
Tauseef Mohiuddin	eecf2c3d41	main : update escape_double_quotes() function (#776 ) Updated the escape_double_quotes() function such that the function now escapes both double quotes and backslashes in the input string. Changes Made: - Renamed the function to escape_quotes_and_backslashes - Modified the condition in the first loop to increment the value of 'escaped_length' for both double quotes and backslashes. - Modified the condition in second loop to add a backslash before the current character if it is a double quote or a backslash. Resolves: #769	2023-04-23 16:47:30 +03:00
Georgi Gerganov	c23588cc4b	release : v1.3.0 v1.3.0	2023-04-15 17:30:44 +03:00
Georgi Gerganov	5108b30e6d	whisper : pad audio instead of spectrogram (#579 ) Also, fallback only if more temperatures are available and if we are at least 3 seconds before the end of the audio	2023-04-15 17:19:19 +03:00
Georgi Gerganov	f19e23fbd1	whisper : restore decoder temperature fallbacks I disabled this because there were many complaints about slow decoding. The current implementation does not allow batching the decoders when using the "best of" or "beam size" parameters, so the decoding time is proportional to the number of decoders, which is obviously not great. However, now there are even more complaints about wrong decodings and repetition. So, making a compromise by re-enabling the fallbacks, but defaulting to just 2 "best of" / "beam size" decoders. Also, the temperature step is increased from 0.2 to 0.4 - i.e. from maximum of 5 fallbacks to maximum of 2. Also, the stream example now has fallbacks enabled by default. close #471 #477 #508 #612 #719 #731	2023-04-15 16:12:55 +03:00
Jhen-Jie Hong	ea1f8a50d4	ggml, ci : fix build on whisper.android (ARM_NEON) + add CI (#764 ) * ggml : fix undefined symbol by remove inline handle * ggml : make own ggml_aligned_malloc function * ci: add ios/android build	2023-04-15 14:21:58 +03:00
Georgi Gerganov	3dead611bb	whisper : slightly faster Log Mel computation + n-1 FFT threads (#568 )	2023-04-15 14:18:46 +03:00
Georgi Gerganov	355da83690	readme : fix link	2023-04-15 13:30:36 +03:00
Georgi Gerganov	3e5c49e59a	readme : add usage instructions for Core ML	2023-04-15 13:30:07 +03:00
Georgi Gerganov	5e47e223bd	whisper : add Core ML support (#566 ) * coreml : use Core ML encoder inference * coreml : simlpify whisper_encode + log messages * whisper : resolve rebase conflicts * coreml : add scripts for CoreML model generation * bench-all : recognize COREML flag	2023-04-15 13:21:27 +03:00
Maximiliano Levi	794ff3074a	whisper : do not launch log_mel threads when n_thread is 1 (#763 )	2023-04-14 22:35:34 +03:00
AfryMask	7e2afa4384	whisper : fix the bug related to word splitting errors in the "tokenize" function. (#760 ) Co-authored-by: AfryMask <afrymask@gmail.com>	2023-04-14 20:35:03 +03:00
Aaron Taylor	1c5edc3cb3	readme : add SwiftWhisper to listed bindings (#755 )	2023-04-14 20:24:00 +03:00
Georgi Gerganov	34b772727d	gitignore : add .test	2023-04-14 20:13:47 +03:00
Bader-eddine Ouaich	2c856fb9e5	whisper : fix potential memory leaks (#740 ) * fix potential memory leak if whisper_init_state failed * fix potential memory leak if gpt2_init failed	2023-04-14 20:05:56 +03:00
Anton Kostin	7727a40dc9	license : update year (#739 )	2023-04-14 20:04:42 +03:00
GitAritron	b5639ed313	whisper : fix typos in whisper.h (#737 ) Fixed a couple of typos (in comments, so nothing major). Keep up the great work 😄	2023-04-14 20:03:16 +03:00
Ali Alameh	2c4ac2627d	stream : support language auto-detect (#501 ) #445 fix Language auto-detect "auto" flag does not work using the stream tool	2023-04-14 20:02:18 +03:00
Alex Evgrashin	674a8e579b	readme : add unity bindings (#733 )	2023-04-14 19:59:44 +03:00
DGdev91	001083a769	talk, talk-llama : add basic example script for eleven-labs tts (#728 )	2023-04-14 19:53:58 +03:00
Ivan Gorin	62b51c3070	models : change convert-pt-to-ggml to use .tiktoken tokenizer files (#725 )	2023-04-14 19:50:39 +03:00
LittleLoli	61128870b8	cmake : add msvc compiler args /utf-8 fix error C3688 (#721 ) * force msvc compiler use utf-8 encode * only enable on msvc	2023-04-14 19:36:38 +03:00
Maciek	78548dc03f	talk-llama : correct default speak.sh path (#720 ) There is `speak.sh` file in `./examples/talk-llama` as described in README. However `./examples/talk/speak.sh` is used in `talk-llama.cpp`, this commit corrects that.	2023-04-14 19:36:09 +03:00
LittleLoli	66110dafcc	main : add lrc output support (#718 ) * add lrc output support. * fix wrong comment	2023-04-14 19:35:33 +03:00
Sam	b73a4638ac	readme : make the quick start instructions clearer. (#716 ) Users wanting to make use of this implementation of the whisper model with no prior knowledge of C/C++ may download the Whisper model but fail to use of the "make" command as specified given that they forgot or didn't know they needed to clone the repository first. Hope this modification clears things up.	2023-04-14 19:33:06 +03:00
duthils	5f16420333	make : disable avx in case f16c is not available (#706 ) Why: * ggml.c does not support AVX without F16C	2023-04-14 19:31:51 +03:00
bocytko	ccb47e7e10	readme : add shell command example for --print-colors (#710 ) The section of the readme file explaining `--print-colors` includes only a screenshot with directories that are inconsistent with other examples. This commit adds an example shell command, consistent with the remaining examples.	2023-04-14 19:25:23 +03:00
Georgi Gerganov	677ad754a0	ggml : sync latest ggml	2023-04-14 19:20:39 +03:00
Georgi Gerganov	514cd04452	whisper : fix bug in prompt processing (close #705 ) Was dereferencing a dangling pointer	2023-04-14 19:17:07 +03:00
Brian Murray	6704a81255	go : exposed various parts to the Go Interface (#697 )	2023-04-14 18:52:10 +03:00
novag	463e46338c	ggml : fix q4_1 dot product types (#759 ) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-04-14 13:34:20 +03:00
Georgi Gerganov	2f889132c6	ggml : sync latest changes from ggml and llama.cpp	2023-04-13 18:53:44 +03:00
Georgi Gerganov	ebef1e8620	ggml : fix WASM build	2023-04-10 23:18:29 +03:00
Georgi Gerganov	114df388fe	talk-llama : increase context to 2048	2023-04-10 23:09:15 +03:00
Georgi Gerganov	ea36831459	talk-llama : update to latest llama.cpp (improved performance)	2023-04-10 22:59:13 +03:00
Georgi Gerganov	69b8503935	ggml : backport llama.cpp updates (close #709 ) - About x2 overall performance improvement on Apple Silicon - Results should now be the same for different number of threads (not tested)	2023-04-10 22:28:54 +03:00
pajowu	0a2d1210bc	whisper : add progress callback (#600 )	2023-03-30 20:29:29 +03:00

1 2 3 4 5 ...

563 Commits