Georgi Gerganov
32fbc8cd04
main : add option to print the progress ( #276 )
2022-12-16 20:20:43 +02:00
Georgi Gerganov
b8065d90f5
main : add "--prompt" command line argument ( #90 )
...
This allows to provide an initial prompt to be used at the start of the
processing.
2022-12-16 19:43:16 +02:00
Lexevolution
6ed786957e
Add newline per segment for text output ( #254 )
2022-12-11 20:00:29 +02:00
Georgi Gerganov
4698dcdb52
whisper : add mechanism for aborting the whisper_full() computation
2022-11-27 20:42:45 +02:00
Georgi Gerganov
0f619b52ce
main : add stereo-channel-based diarization ( #64 )
...
Not tested - I don't have stereo dialog audio
2022-11-25 22:08:58 +02:00
Georgi Gerganov
bc88eb13c6
examples : add "command" tool ( #171 )
2022-11-25 19:36:57 +02:00
Georgi Gerganov
b8ce25dec1
refactoring : more readable code
2022-11-25 19:28:04 +02:00
Georgi Gerganov
454b91de16
main : fix dangling pointer when using stdin for input ( #65 )
2022-11-24 17:53:51 +02:00
Georgi Gerganov
d7024cf9dc
main, stream : remove --verbose flag ( #178 )
2022-11-24 17:52:04 +02:00
Georgi Gerganov
e5dcdabbb8
unicode : fix character replacement (thanks to @tamo)
2022-11-23 08:24:29 +02:00
Georgi Gerganov
83c742f1a7
whisper : add option to speed up the audio tempo by x2
...
Using a Phase Vocoder for speeding up the audio tempo by scaling down
the frequencies in the frequency domain.
This reduces the computation in the Encoder by a factor of 2.
The transcription accuracy is degraded, but for slow to normal speech -
it seems to be still very good.
I think this can find application for real-time transcription - i.e. the
"stream" example.
2022-11-13 16:25:43 +02:00
Alan
7519eabf65
Adds support for stdin wav input
2022-11-09 20:37:23 +02:00
Georgi Gerganov
c30bffc8a5
ref #22 : add "duration" option
...
Can be used to partially process a recording
2022-11-07 20:14:52 +02:00
Georgi Gerganov
ef47d77492
main : fix generated bash script
2022-11-04 18:30:38 +02:00
Georgi Gerganov
d5afebd37c
whisper : token-level timestamp refactoring ( #49 , #120 )
...
This turned out pretty good overall. The algorithm has been moved from
main.cpp to whisper.cpp and can be reused for all subtitles types. This
means that now you can specify the maximum length of the generated
lines. Simply provide the "-ml" argument specifying the max length in
number of characters
2022-11-02 21:45:54 +02:00
Georgi Gerganov
6fb98370ba
main : add some comments for the word-level timestamp algorithm
2022-11-01 22:35:21 +02:00
Georgi Gerganov
0729da9a3b
main : fix some edge cases for word-level timestamps
2022-11-01 22:09:25 +02:00
Georgi Gerganov
dc12994603
Update README.md
2022-10-30 17:11:37 +02:00
Georgi Gerganov
57fb46f307
main : add option for word-leve timestamps (very experimental)
2022-10-30 17:06:57 +02:00
Georgi Gerganov
2827cbbbe8
main : merge parallel example in main
2022-10-29 19:37:19 +03:00
Georgi Gerganov
0b2dc3c82c
parallel : working
2022-10-29 19:37:19 +03:00
Georgi Gerganov
85d6e1e1e7
main : fix sampling time + add max_context parameter
2022-10-29 19:37:19 +03:00
Georgi Gerganov
ebb01b9e33
Print system info at start of program
2022-10-27 17:22:19 +03:00
Georgi Gerganov
2400660f3f
Print system info in main
2022-10-26 22:54:09 +03:00
Georgi Gerganov
47e78b7288
Update README.md
2022-10-25 20:53:48 +03:00
Georgi Gerganov
c6710efde2
refactoring : move main + stream in examples + other stuff
2022-10-25 20:53:48 +03:00