whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2024-12-20 21:23:06 +00:00

Author	SHA1	Message	Date
Sindre Sorhus	d03c60dd7f	ios : add support for Swift Package Manager (#1370 ) * Add support for Swift * Make it build in Xcode * Use the SPM package in the SwiftUI example app	2023-11-07 23:53:31 +02:00
Neil Chudleigh	9edbd0a204	extra: Add benchmark script implemented in Python (#1298 ) * Create bench.py * Various benchmark results * Update benchmark script with hardware name, and file checks * Remove old benchmark results * Add git shorthash * Round to 2 digits on calculated floats * Fix the header reference when sorting results * FIx order of models * Parse file name * Simplify filecheck * Improve print run print statement * Use simplified model name * Update benchmark_results.csv * Process single or lists of processors and threads * Ignore benchmark results, dont check in * Move bench.py to extra folder * Readme section on how to use * Move command to correct location * Use separate list for models that exist * Handle subprocess error in git short hash check * Fix filtered models list initialization	2023-09-25 23:45:15 +08:00
Georgi Gerganov	0839209cab	gitignore : update	2023-09-08 19:45:28 +03:00
AustinMroz	175ffa64ee	examples : vim plugin and LSP server (#1144 ) * Initial proof of concept Vim plugin At present, this is likely only slightly better than feature parity with the existing whisper.nvim Known issues: Trailing whitespace Up to an existing length(5 seconds) of speech may be processed when listening is enabled CPU cycles are spent processing speech even when not listening. Fixing these issues is likely dependent upon future efforts to create a dedicated library instead of wrapping examples/stream * Support $WHISPER_CPP_HOME environment variable A minor misunderstanding of the whisper.nvim implementation resulted in a plugin that was functional, but not a drop in replacement as it should be now. * Initial progress on LSP implementation Libcall is nonviable because the library is immediately freed after a call is made. Further investigation has shown Language Server Protocol as a promising alternative that both simplifies the required logic on the vimscript side and increases the ease with which plugins for other editors could be made in the future. This is a very large undertaking and my progress has slowed substantially. Work is far from being in a usable state, but I wish to keep track of major refactors for organizational purposes. * Rewrite audio windowing of guided transcription One of the defining goals of this venture is allowing consecutive commands to be rattled off without the existing deadzones of the current implementation. * Add unguided_transcription. Cleanup. The unguided transcription implantation heavily borrows from existing example implementations and the guided_transcription logic. A high level pass was done to check that method arguments are accurate to what inputs are actually required. A first attempt at cancellation support was added for record keeping, but will be deleted in a future commit. * Fix compilation. Resolves a large number of compilation errors. No testing has been done yet for execution errors. Update Makefile and .gitignore * Functional unguided_transcription * Functional guided_transcription Fix commandset_list being passed by value Properly register the first token of a multitoken command * Minor changes before time fix I've apparently made an awfully major mistake in thinking that unix time was in milliseconds and will be changing all timekeeping code to use standardized methods. In preparation for this is a number of minor bugfixes. Output is manually flushed. An echo method has been added. registerCommandset now wraps the returned index * Swap timekeeping to use std::chrono * Add work in progress lsp backed whisper.vim plugin Current progress blockers are Adding modality awareness to the command processing (specifically, motion prompting) Improving the VAD to be a little more responsive (testing start of activity) * Reworked vim plugin command loop * Fix change inside Multiple bug fixes that, crucially, bring the plugin to the point where a demonstration video is possible Add better echo messaging so whisper_log isn't required Add loading complete message as indicator when listening has started Insert/append are actually included in command sets Some more heavy handed corrections to prevent a double exit when leaving insert mode As a somewhat hacky fix, the very first space is removed when inserting. This cleans up most use cases, but leaves me unsatisfied with the few cases it would be desired. * Forcibly set commandset_index to 0 after subinsert Also remove unnecessary ! to use builtin vim command * Fix upper A minor scope mistake was causing upper'd inputs to be eaten. This was fixed and echoing was slightly improved for clarity. * Fix formatting Corrects indentation to 4 spaces as project standard Slightly better error support for malformed json input * Remove obsolete vim plugin * Add json.hpp library The same library that is used for the llama.cpp server * Minor cleanups add lsp to the make clean directive. remove a redundant params definition. reorder whisper.vim logging for subtranscriptions Corrections to unlets (variables of argument scope appear immutable) * Fix indentation. Fallback for subTranscription Indentation has been changed to 4 spaces. Unit testing has been set up, I'm opting not to include it in the repository for now. It however, has revealed a bug in the state logic where a subtranscription can be initiated without having a saved command When this occurs, append is added as a fallback * Move audio polling logic to a subfunction While work on the improved vad will continue, It's grown to be a little out of scope. Instead, a future commit will perform multiple detection passes at substretches of audio when a backlog of audio exists. To facilitate this, and prevent code duplication, the vad code has been moved into a subfunction shared by both the unguided and guided transcription functions. * Test for voice over subchunks if backlog > 1s As the existing VAD implementation only checks for a falling edge at the end of an audio chunk. It fails to detect voice in cases where the recorded voice is only at the beginning of the audio. To ameliorate this, when the timestamp would cause analysis of audio over a second in length, it is split into 1 second length subchunks which are individually tested. Results are promising, but there seems to be a remaining bug with unguided transcription likely related to saving context * Limit the maximum length of audio input. This existing VAD implementation only detects falling edges, which means any gap in the users speaking is processed for transcription. This simply establishes a constant maximum length depending on the type of transcription. Uguided gets a generous 10 seconds and guided, 2. While quick testing showed that commands are generally around a half a second to a second, limiting commands to an even second resulted in extreme degradation of quality. (Seemingly always the same output for a given commandset) * Unguided timestamp tracking, cleanup Unguided transcriptions where not setup to allow for passing of timestamp data forward, but have been corrected. No_context is now always set to false. While conceptually desirable for the quality of guided transcription, It was seemingly responsible for prior command inputs ghosting in unguided transcription. Save and Run are now tracked by command number instead of command text. While command_text was provided for convenience, I wish to keep command index authoritative. This gives greater consistency and potentially allows for end users to rename or even translate the spoken versions of these commands * By default, maintain mode. Previously, mode was reset to 0 unless otherwise set. In addition to causing some edge cases, this was didn't mesh well with the existing approach to visual mode. With this change, initial tests indicate visual mode is functional. * Add undo breaks before subtranscriptions Subtranscriptions use undo as a hack to allow for partial responses to be displayed. However, scripts don't cause an undo break mid execution unless specifically instructed to. This meant that multiple unguided transcriptions from a single session would cause a latter to undo a former. This is now fixed and undo should be reasonably usable as a command. * Append instead of insert for new undo sequence When entering and leavening insert mode with `i`, the cursor shifts one column to the left. This is remedied by using append instead of insert for setting these breaks in the undo sequence `-` was also added to the pronunciation dictionary to be pronounced as minus as it was causing a particularly high failure rate. * Move undo sequence breaks to command execution Previously, undo sequence breaks were triggered when there was a command that caused a move to insert mode. This caused commands that changed state (like delete or paste) to be bundled together with into the last command that caused text to be entered. * Fix repeat. Add space, carrot, dollar commands Repeat (.) wasn't being tracked properly just like undo and is being manually tracked now. While efforts have been made to properly handle spaces, it was particularly finicky to add a single space when one is needed. A special 'space' command has been added to insert a single space and move the cursor after it. Carrot and Dollar commands have been added for start of line and end of line respectively. These are both simple to implement, and just a matter of defining a pronunciation. * Return error on duplicate in commandset Not every command in the commandset tokenizes to a single token. Because of this, it's possible for that two commands could resolve to the same single token after subsequent tokens are discarded. This commit adds a simple check for duplicates when a commandset is registered and returns an error if so. Additional code will be required later on the vim side to actually process this error. * Add support for user-defined commands This adds a user definable dictionary from spoken keys to strings or funcrefs. All keys are added to the commandlist and when spoken, trigger the corresponding function. Like "save" and "run", these user commands are only available when the command buffer is empty. * Add readme, update cmake * Add area commandset. Refactor spoken_dict Area commands (inside word, around sentence...) have been given a commandset as considered earlier. Verbose definitions for spoken_dict entries now use dicts instead of lists. This shortens the definition for most keys that require it and scales better with the addition of further commandsets * Add mark, jump. Fix change under visual. Mark (m) and jump (') have been added. When a visual selection was executed upon a command that initiated a subtranscription (change) the area of the visual selection is not properly tracked which causes the attempt to stream in partial response to fail. This is solved by disabling partial transcriptions from being streamed when a subtranscription is started while in visual mode. * Accommodate ignorecase. Fix change. From testing on older different versions of vim, the test for distinguishing an 'R' replace all from an 'r' replace could fail if ignorecase was set. The comparison has been changed to explicitly require case matching Change detection has been moved to the execution section as it was missing the change+motion case. * Support registers. Fix README typo There's no logic to prevent doubled register entry, but the functional result is equivalent to if the same key order was typed into vim. A minor typo in the readme. I've mismemorized the mnemonic for 't' as 'to' instead of till., but 'to' can't be used as it's a homophone with '2'. While there was no mistake in the actual logic, it was misleading to use 'to' in the readme.	2023-08-27 21:35:06 +03:00
Alexey Kharlamov	041be06d58	cmake : build with any BLAS compatible library (#927 ) * Build with any BLAS library * ci: Removed explicit CUDA nvcc path	2023-05-20 21:23:45 +03:00
Nicholas Albion	bc89f285d8	bindings : add java bindings (#931 ) * WIP - java bindings * updated README * failed attempt at JNI * fullTranscribe() test passes * tested on Ubuntu 20 * link to Java bindings	2023-05-20 18:25:02 +03:00
Georgi Gerganov	794b162a46	whisper : add integer quantization support (#540 ) * whisper : add integer quantization support * examples : add common-ggml + prepare to add "quantize" tool * whisper : quantization tool ready * whisper : fix F32 support * whisper : try to fix shared lib linkage * wasm : update quantized models to Q5 * bench.wasm : remove "medium" button * bench.wasm : fix custom model button * ggml : add Q5_0 and Q5_1 WASM SIMD * wasm : add quantized models to all WASM examples * wasm : bump DB version number to 2 * talk-llama : update example to latest llama.cpp * node : increase test timeout to 10s * readme : add information for model quantization * wasm : add links to other examples	2023-04-30 18:51:57 +03:00
Georgi Gerganov	5fd1bdd7fc	whisper : add GPU support via cuBLAS (#834 ) * make : add WHISPER_CUBLAS * make : fix CUBLAS build * whisper : disable Flash Attention + adjust memory buffers * whisper : remove old commented code * readme : add cuBLAS instructions * cmake : add WHISPER_CUBLAS option * gitignore : ignore build-cublas	2023-04-30 12:14:33 +03:00
Georgi Gerganov	5e47e223bd	whisper : add Core ML support (#566 ) * coreml : use Core ML encoder inference * coreml : simlpify whisper_encode + log messages * whisper : resolve rebase conflicts * coreml : add scripts for CoreML model generation * bench-all : recognize COREML flag	2023-04-15 13:21:27 +03:00
Georgi Gerganov	34b772727d	gitignore : add .test	2023-04-14 20:13:47 +03:00
Georgi Gerganov	4a0deb8b1e	talk-llama : add new example + sync ggml from llama.cpp (#664 ) * talk-llama : talk with LLaMA AI * talk.llama : disable EOS token * talk-llama : add README instructions * ggml : fix build in debug	2023-03-27 21:00:32 +03:00
Georgi Gerganov	ad1389003d	release : v1.2.1	2023-02-28 22:29:12 +02:00
Lukas Rist	02c7516c57	go : added wrappers to reset and print timings (#436 )	2023-01-25 18:57:30 +02:00
Georgi Gerganov	9a65269a20	.gitignore : add arm_neon.h	2023-01-23 20:19:04 +02:00
Georgi Gerganov	8de452c18b	Improve decoding (#291 ) * whisper : prepare infra for new decoding strategies * whisper : apply logit filters and compute logprobs * whisper : add whisper_get_logits() * whisper : separate self and cross attention memory Initial step needed for supporting parallel decoders * whisper : move probs_id buffer to whisper_context * whisper : refactor kv cache into separate struct * whisper : move self-attention kv cache to whisper_decoder * whisper : wip decoding parameters + strategies * whisper : wip decoding parameters + strategies (part 2) * whisper : wip decoding parameters + strategies (part 3) * whisper : wip decoding parameters + strategies (part 4) * whisper : fix prompt_past update to not include prompt_init * whisper : temperature + best_of support * whisper : support for compression_ration_threshold We actually use entropy, but it is similar * command : fix example to use logits instead of obsolete probs * whisper : handle empty sequence ranking * whisper : add WHISPER_DEBUG + diagnostic prints + new main args * whisper : minor fixes * whisper : add beam-search support * whisper : bug fix when there no previous context * whisper : add comments * stream : disable temperature fallback For real-time processing, we always want a single decoder running at T=0 * whisper.swiftui : update example - fix paths + add empty folders	2023-01-15 11:29:57 +02:00
Georgi Gerganov	054940e1f6	minor : fix .gitignore to not ignore examples	2022-12-11 11:39:46 +02:00
Georgi Gerganov	3b1aacbe6d	talk : talk with AI in the terminal	2022-12-10 16:51:58 +02:00
Georgi Gerganov	832b4f34c9	make : indentation + .gitignore	2022-12-08 19:42:06 +02:00
Georgi Gerganov	bc88eb13c6	examples : add "command" tool (#171 )	2022-11-25 19:36:57 +02:00
Georgi Gerganov	b8ce25dec1	refactoring : more readable code	2022-11-25 19:28:04 +02:00
Georgi Gerganov	c6710efde2	refactoring : move main + stream in examples + other stuff	2022-10-25 20:53:48 +03:00
Georgi Gerganov	bb1ee266d2	ios : whisper.objc example	2022-10-24 18:23:07 +03:00
Georgi Gerganov	d6b84b2a23	ref #62 : fix build for some compilers For some reason, new version of GCC panic when the struct type is not specified explicitly	2022-10-18 10:57:03 +03:00
Georgi Gerganov	b4a3875b2c	Revert recent sampling change It does not actually help and seems to produce worse results on some of the samples	2022-10-18 08:26:16 +03:00
Georgi Gerganov	0e858f080d	close #56 : build on FreeBSD Thanks to @abelbabel for the contribution	2022-10-17 18:10:16 +03:00
Borislav Stanimirov	28252352d7	Visual Studio ignored dirs	2022-10-11 20:57:33 +03:00
Georgi Gerganov	2f069335ab	Adding sanitizer tests	2022-10-08 11:43:42 +03:00
Georgi Gerganov	877c058179	Add CMake support	2022-10-08 09:02:41 +03:00
Georgi Gerganov	b6bf906730	ref #10 : quick-and-dirty attempt for real-time audio transciption - Processes input in chunks of 3 seconds. - Padding audio with silence - Uses 1 second audio from previous pass - No text context	2022-10-02 17:55:45 +03:00
Georgi Gerganov	b0a11594ae	Initial release	2022-09-25 22:13:49 +03:00

30 Commits