83c742f1a7
whisper : add option to speed up the audio tempo by x2
...
Using a Phase Vocoder for speeding up the audio tempo by scaling down
the frequencies in the frequency domain.
This reduces the computation in the Encoder by a factor of 2.
The transcription accuracy is degraded, but for slow to normal speech -
it seems to be still very good.
I think this can find application for real-time transcription - i.e. the
"stream" example.
2022-11-13 16:25:43 +02:00
c30bffc8a5
ref #22 : add "duration" option
...
Can be used to partially process a recording
2022-11-07 20:14:52 +02:00
d5afebd37c
whisper : token-level timestamp refactoring ( #49 , #120 )
...
This turned out pretty good overall. The algorithm has been moved from
main.cpp to whisper.cpp and can be reused for all subtitles types. This
means that now you can specify the maximum length of the generated
lines. Simply provide the "-ml" argument specifying the max length in
number of characters
2022-11-02 21:45:54 +02:00
57fb46f307
main : add option for word-leve timestamps (very experimental)
2022-10-30 17:06:57 +02:00
eba62e0fa1
close #113 : fix struct whisper_token_data
2022-10-30 08:23:52 +02:00
dec40be58f
parallel : print time of audio boundaries + fix timings
2022-10-29 19:37:19 +03:00
0b2dc3c82c
parallel : working
2022-10-29 19:37:19 +03:00
85d6e1e1e7
main : fix sampling time + add max_context parameter
2022-10-29 19:37:19 +03:00
72e9cdd6bf
parallel : adding tool for parallel transformer inference
2022-10-29 19:37:19 +03:00
34bb3ab0cf
ggml : add system info functions
2022-10-25 20:53:48 +03:00
7affd309d3
whisper : add new-segment callback
...
Can be used to process new segments as they are being generated.
Sample usage in main, for printing the resulting segments during the
inference.
2022-10-22 21:17:21 +03:00
31ff0c6a1f
wip : experimental color coding of tokens based on probabilities
2022-10-22 21:17:21 +03:00
7eeef0358a
ref #52 : improve greedy sampling strategy
...
Force timestamp token to be sampled if the probability sum over all
timestamp tokens is above the probability of any other token
2022-10-18 19:48:15 +03:00
2d171ced32
close #32 : add comment about thread-safety of the C-style API
2022-10-18 18:27:57 +03:00
e30cf83158
ref #57 , #62 , #63 : remove unions in C-api + remove designated initializers
...
We are not ready for designated initializers - many compilers do not
support this C++ feature yet, so removing it's non-trivial usages.
2022-10-18 18:17:24 +03:00
9d5723435f
ref #35 : add <stdbool.h> to whisper.h
...
"bool" type is not implicitly defined for some compilers.
2022-10-10 08:11:18 +03:00
9bbca3110f
ref #9 : add API documentation in whisper.h
2022-10-08 18:09:56 +03:00
2f069335ab
Adding sanitizer tests
2022-10-08 11:43:42 +03:00
481cd685d5
ref #10 : option to keep context in "stream" example
...
Seems the results become worse when we keep the context, so by default
this is not enabled
2022-10-07 22:30:44 +03:00
7787b878e1
ref #16 , #22 : add "offset" argument
...
Allows to start processing the input audio at some offset from the
beginning. Useful for splitting a long job into multiple tasks.
2022-10-07 22:00:40 +03:00
6814cc9b02
Improve result printing
2022-10-04 23:18:15 +03:00
eba33adadd
Extend C-style API with full inference methods
2022-10-04 23:18:15 +03:00
6b77124e01
Initial C-style interface for whisper.cpp
2022-10-04 23:18:15 +03:00