Commit Graph

76 Commits

Author SHA1 Message Date
Georgi Gerganov
32fbc8cd04
main : add option to print the progress (#276) 2022-12-16 20:20:43 +02:00
Georgi Gerganov
b8065d90f5
main : add "--prompt" command line argument (#90)
This allows to provide an initial prompt to be used at the start of the
processing.
2022-12-16 19:43:16 +02:00
Lexevolution
6ed786957e
Add newline per segment for text output (#254) 2022-12-11 20:00:29 +02:00
Georgi Gerganov
4698dcdb52 whisper : add mechanism for aborting the whisper_full() computation 2022-11-27 20:42:45 +02:00
Georgi Gerganov
0f619b52ce
main : add stereo-channel-based diarization (#64)
Not tested - I don't have stereo dialog audio
2022-11-25 22:08:58 +02:00
Georgi Gerganov
bc88eb13c6
examples : add "command" tool (#171) 2022-11-25 19:36:57 +02:00
Georgi Gerganov
b8ce25dec1
refactoring : more readable code 2022-11-25 19:28:04 +02:00
Georgi Gerganov
454b91de16
main : fix dangling pointer when using stdin for input (#65) 2022-11-24 17:53:51 +02:00
Georgi Gerganov
d7024cf9dc
main, stream : remove --verbose flag (#178) 2022-11-24 17:52:04 +02:00
Georgi Gerganov
e5dcdabbb8
unicode : fix character replacement (thanks to @tamo) 2022-11-23 08:24:29 +02:00
Georgi Gerganov
83c742f1a7 whisper : add option to speed up the audio tempo by x2
Using a Phase Vocoder for speeding up the audio tempo by scaling down
the frequencies in the frequency domain.

This reduces the computation in the Encoder by a factor of 2.
The transcription accuracy is degraded, but for slow to normal speech -
it seems to be still very good.

I think this can find application for real-time transcription - i.e. the
"stream" example.
2022-11-13 16:25:43 +02:00
Alan
7519eabf65 Adds support for stdin wav input 2022-11-09 20:37:23 +02:00
Georgi Gerganov
c30bffc8a5
ref #22 : add "duration" option
Can be used to partially process a recording
2022-11-07 20:14:52 +02:00
Georgi Gerganov
ef47d77492
main : fix generated bash script 2022-11-04 18:30:38 +02:00
Georgi Gerganov
d5afebd37c
whisper : token-level timestamp refactoring (#49, #120)
This turned out pretty good overall. The algorithm has been moved from
main.cpp to whisper.cpp and can be reused for all subtitles types. This
means that now you can specify the maximum length of the generated
lines. Simply provide the "-ml" argument specifying the max length in
number of characters
2022-11-02 21:45:54 +02:00
Georgi Gerganov
6fb98370ba
main : add some comments for the word-level timestamp algorithm 2022-11-01 22:35:21 +02:00
Georgi Gerganov
0729da9a3b
main : fix some edge cases for word-level timestamps 2022-11-01 22:09:25 +02:00
Georgi Gerganov
dc12994603
Update README.md 2022-10-30 17:11:37 +02:00
Georgi Gerganov
57fb46f307 main : add option for word-leve timestamps (very experimental) 2022-10-30 17:06:57 +02:00
Georgi Gerganov
2827cbbbe8 main : merge parallel example in main 2022-10-29 19:37:19 +03:00
Georgi Gerganov
0b2dc3c82c parallel : working 2022-10-29 19:37:19 +03:00
Georgi Gerganov
85d6e1e1e7 main : fix sampling time + add max_context parameter 2022-10-29 19:37:19 +03:00
Georgi Gerganov
ebb01b9e33
Print system info at start of program 2022-10-27 17:22:19 +03:00
Georgi Gerganov
2400660f3f Print system info in main 2022-10-26 22:54:09 +03:00
Georgi Gerganov
47e78b7288 Update README.md 2022-10-25 20:53:48 +03:00
Georgi Gerganov
c6710efde2 refactoring : move main + stream in examples + other stuff 2022-10-25 20:53:48 +03:00