whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2024-12-26 23:51:05 +00:00

Author	SHA1	Message	Date
Georgi Gerganov	af6f67b251	whisper : ggml-alloc is now supported	2023-09-10 20:09:17 +03:00
Georgi Gerganov	bed5ad69dd	whisper : allocate encoder and decoder using ggml-alloc	2023-09-10 19:50:34 +03:00
Georgi Gerganov	949ab6328d	whisper : factor out graph builds	2023-09-10 19:23:06 +03:00
Georgi Gerganov	fbc3f8033e	metal : init	2023-09-10 18:38:34 +03:00
bobqianic	9b14418863	whisper : faster beam_search sampling via reduced KV cache copies (#1243 ) * Faster `beam_search` sampling Refine the KV cache update logic for more intelligent and efficient updating. * Faster `whisper_sample_token_topk` * Update whisper.cpp * Update whisper.cpp * Update whisper.cpp * Reduce `memory allocation` * Add `pointer swapping` * Fixed some bugs * Update whisper.cpp * Apply suggestions from code review * Updated the logic for determining `two-copy` * Updated the logic for determining `two-copy` v2 * whisper : add debug logs + coding style --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-09-10 16:04:27 +03:00
Nicholas Albion	6ddc727fac	java : fixed signing of java artifact using gradle (#1267 ) * --stacktrace signMavenJavaPublication * added temporary step "Debug gradle signing" * cd bindings/java * use GPG_PRIVATE_KEY and GPG_PASSPHRASE * use secrets.GPG_PRIVATE_KEY and GPG_PASSPHRASE	2023-09-09 18:55:51 +03:00
Georgi Gerganov	acb5278cc8	ci : try to fix gradle action (#1265 )	2023-09-08 20:50:15 +03:00
Georgi Gerganov	0839209cab	gitignore : update	2023-09-08 19:45:28 +03:00
Georgi Gerganov	b39809668a	sync : ggml (HBM + Metal + style) (#1264 )	2023-09-08 17:58:31 +03:00
Georgi Gerganov	3e9edc6845	ci : upgrade gradle to 2.4.2 (#1263 ) * ci : upgrade gradle to 2.4.2 * cmake : add comment (#1129)	2023-09-08 17:58:14 +03:00
Georgi Gerganov	bfc73f1fa2	sync : ggml (CUDA faster rope)	2023-09-08 15:01:26 +03:00
Georgi Gerganov	f00c9bba33	cmake : noramlize case (#1129 )	2023-09-08 14:50:03 +03:00
Przemysław Pawełczyk	b55b505690	build : do not use _GNU_SOURCE gratuitously (#1129 ) * Do not use _GNU_SOURCE gratuitously. What is needed to build whisper.cpp and examples is availability of stuff defined in The Open Group Base Specifications Issue 6 (https://pubs.opengroup.org/onlinepubs/009695399/) known also as Single Unix Specification v3 (SUSv3) or POSIX.1-2001 + XSI extensions, plus some stuff from BSD that is not specified in POSIX.1. Well, that was true until NUMA support was added recently in ggml, so enable GNU libc extensions for Linux builds to cover that. There is no need to penalize musl libc which simply follows standards. Not having feature test macros in source code gives greater flexibility to those wanting to reuse it in 3rd party app, as they can build it with minimal FTM (_XOPEN_SOURCE=600) or other FTM depending on their needs. It builds without issues in Alpine (musl libc), Ubuntu (glibc), MSYS2. * examples : include SDL headers before other headers Avoid macOS build error when _DARWIN_C_SOURCE is not defined, brought by SDL2 relying on Darwin extension memset_pattern4/8/16 (from string.h). * make : enable BSD extensions for DragonFlyBSD to expose RLIMIT_MEMLOCK * make : use BSD-specific FTMs to enable alloca on BSDs * make : fix OpenBSD build by exposing newer POSIX definitions * cmake : follow recent FTM improvements from Makefile	2023-09-07 12:36:14 +03:00
Georgi Gerganov	2818de21ff	examples : fix build + compile warnings (close #1256 )	2023-09-07 12:33:12 +03:00
Neil Chudleigh	aed5d40607	models : add quantum models to download-ggml-model.sh (#1235 ) * Add quantized models to download-ggml-model.sh * Update names in download-ggml-model script to normalized	2023-09-07 12:16:58 +03:00
Digipom	afa5477d1c	whisper.android : bump gradle plugin and dependencies + a lint pass (#1255 )	2023-09-07 12:15:59 +03:00
Nicholas Albion	01fcd42431	sign jar for Maven Central repo	2023-09-07 11:45:44 +10:00
Digipom	f990610776	whisper.android : address ARM's big.LITTLE arch by checking cpu info (#1254 ) Addresses https://github.com/ggerganov/whisper.cpp/issues/1248	2023-09-06 18:32:30 +03:00
Didzis Gosko	64cb45fd79	make : fix detection of AVX2 on macOS (#1250 )	2023-09-06 18:22:21 +03:00
Przemysław Pawełczyk	ace6c12ec6	ggml : posixify pagesize (#1251 ) * ggml : use sysconf(_SC_PAGESIZE) instead of getpagesize() derived from BSD sed -i 's,getpagesize(),sysconf(_SC_PAGESIZE),g' ggml.c * metal : use sysconf(_SC_PAGESIZE) instead of getpagesize() derived from BSD sed -i 's,getpagesize(),sysconf(_SC_PAGESIZE),g' ggml-metal.m	2023-09-06 18:19:36 +03:00
Nicholas Albion	cac75be05b	configured publishing.repositories	2023-09-06 13:13:36 +10:00
Georgi Gerganov	c3f319d7c2	ggml : sync latest llama.cpp (view_src + alloc improvements) (#1247 ) * ggml : sync latest llama.cpp (view_src + alloc improvements) * ggml : fix build	2023-09-05 20:57:27 +03:00
Przemysław Pawełczyk	ba3c333611	make : improve cpuinfo handling on x86 hosts (#1238 ) * make : simplify and correct x86 ISA extensions detection on the host It got broken in commit `c5f9acf4b7` for Haiku and Mac OS (Intel), which report CPU features in upper case. Now we're finding the names in case-insensitive manner and as words. SSE3 detection has been corrected for Linux, which uses PNI for that (Prescott New Instructions). * make : use dmesg.boot in FreeBSD/DragonFlyBSD to detect x86 ISA extensions on the host * make : enable x86 ISA extensions on the host both in CFLAGS and CXXFLAGS * make : correct AVX x86 ISA extension detection on macOS (Intel) host It got broken in commit `c5f9acf4b7`. macOS calls it AVX1.0.	2023-09-05 14:58:47 +03:00
Georgi Gerganov	59a3d0cb57	ggml : sync (ggml-alloc, GPU, eps, etc.) (#1220 ) * ggml : sync (ggml-alloc, GPU, eps, etc.) * ggml : fix build * wasm : fix build	2023-09-05 13:54:40 +03:00
布客飞龙	6780c98e19	readme : update CMake build commands (#1231 ) * Update README.md * Update README.md: `vcpkg install opencl clblast` * readme : update build commands --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-09-05 13:53:34 +03:00
Nicholas Albion	2f52783a08	OSSRH_USERNAME -> JIRA_USER	2023-08-31 14:54:02 +10:00
Nicholas Albion	7dec9d8cc4	build-root-directory: bindings/java	2023-08-31 12:04:16 +10:00
Georgi Gerganov	fb0a24fba2	ci : enable java package publishing (#1228 )	2023-08-31 09:57:43 +10:00
ChangSeok Oh	8e30bf3c02	ggml : fix compilation errors incurred by -Werror (#1227 ) The -Werror warning option turns all warnings into errors. This PR makes the compiler happy to build ggml.c and whisper.cpp with the stricter option.	2023-08-30 22:09:15 +03:00
Jhen-Jie Hong	99d3c105f5	whisper.android : fix cmake multiple libraries build (#1224 ) * whisper.android : fix multiple libraries build * fix flags for default target	2023-08-30 14:45:13 +03:00
Dener Stassun	18e9889418	coreml : wrap inference call in @autoreleasepool to fix memory leak (#1218 )	2023-08-29 15:44:38 +03:00
Przemysław Pawełczyk	8e46ba80d3	make : use cpuinfo in MSYS2 to enable x86 ISA extensions on the host (#1216 )	2023-08-28 13:28:26 +03:00
Przemysław Pawełczyk	b0d35995c4	make : add support for building on DragonFlyBSD/NetBSD/OpenBSD (#1212 )	2023-08-27 21:38:46 +03:00
Przemysław Pawełczyk	25466aa1c3	ggml : fix compiling when SSE3 is available but not SSSE3 (#1210 ) It got broken in commit `3998465721`.	2023-08-27 21:37:31 +03:00
Przemysław Pawełczyk	601c2d2181	ggml : detect SSSE3 (#1211 ) * ggml : add ggml_cpu_has_ssse3 * whisper : show SSSE3 in system info * make : detect SSSE3 via cpuinfo	2023-08-27 21:36:41 +03:00
AustinMroz	175ffa64ee	examples : vim plugin and LSP server (#1144 ) * Initial proof of concept Vim plugin At present, this is likely only slightly better than feature parity with the existing whisper.nvim Known issues: Trailing whitespace Up to an existing length(5 seconds) of speech may be processed when listening is enabled CPU cycles are spent processing speech even when not listening. Fixing these issues is likely dependent upon future efforts to create a dedicated library instead of wrapping examples/stream * Support $WHISPER_CPP_HOME environment variable A minor misunderstanding of the whisper.nvim implementation resulted in a plugin that was functional, but not a drop in replacement as it should be now. * Initial progress on LSP implementation Libcall is nonviable because the library is immediately freed after a call is made. Further investigation has shown Language Server Protocol as a promising alternative that both simplifies the required logic on the vimscript side and increases the ease with which plugins for other editors could be made in the future. This is a very large undertaking and my progress has slowed substantially. Work is far from being in a usable state, but I wish to keep track of major refactors for organizational purposes. * Rewrite audio windowing of guided transcription One of the defining goals of this venture is allowing consecutive commands to be rattled off without the existing deadzones of the current implementation. * Add unguided_transcription. Cleanup. The unguided transcription implantation heavily borrows from existing example implementations and the guided_transcription logic. A high level pass was done to check that method arguments are accurate to what inputs are actually required. A first attempt at cancellation support was added for record keeping, but will be deleted in a future commit. * Fix compilation. Resolves a large number of compilation errors. No testing has been done yet for execution errors. Update Makefile and .gitignore * Functional unguided_transcription * Functional guided_transcription Fix commandset_list being passed by value Properly register the first token of a multitoken command * Minor changes before time fix I've apparently made an awfully major mistake in thinking that unix time was in milliseconds and will be changing all timekeeping code to use standardized methods. In preparation for this is a number of minor bugfixes. Output is manually flushed. An echo method has been added. registerCommandset now wraps the returned index * Swap timekeeping to use std::chrono * Add work in progress lsp backed whisper.vim plugin Current progress blockers are Adding modality awareness to the command processing (specifically, motion prompting) Improving the VAD to be a little more responsive (testing start of activity) * Reworked vim plugin command loop * Fix change inside Multiple bug fixes that, crucially, bring the plugin to the point where a demonstration video is possible Add better echo messaging so whisper_log isn't required Add loading complete message as indicator when listening has started Insert/append are actually included in command sets Some more heavy handed corrections to prevent a double exit when leaving insert mode As a somewhat hacky fix, the very first space is removed when inserting. This cleans up most use cases, but leaves me unsatisfied with the few cases it would be desired. * Forcibly set commandset_index to 0 after subinsert Also remove unnecessary ! to use builtin vim command * Fix upper A minor scope mistake was causing upper'd inputs to be eaten. This was fixed and echoing was slightly improved for clarity. * Fix formatting Corrects indentation to 4 spaces as project standard Slightly better error support for malformed json input * Remove obsolete vim plugin * Add json.hpp library The same library that is used for the llama.cpp server * Minor cleanups add lsp to the make clean directive. remove a redundant params definition. reorder whisper.vim logging for subtranscriptions Corrections to unlets (variables of argument scope appear immutable) * Fix indentation. Fallback for subTranscription Indentation has been changed to 4 spaces. Unit testing has been set up, I'm opting not to include it in the repository for now. It however, has revealed a bug in the state logic where a subtranscription can be initiated without having a saved command When this occurs, append is added as a fallback * Move audio polling logic to a subfunction While work on the improved vad will continue, It's grown to be a little out of scope. Instead, a future commit will perform multiple detection passes at substretches of audio when a backlog of audio exists. To facilitate this, and prevent code duplication, the vad code has been moved into a subfunction shared by both the unguided and guided transcription functions. * Test for voice over subchunks if backlog > 1s As the existing VAD implementation only checks for a falling edge at the end of an audio chunk. It fails to detect voice in cases where the recorded voice is only at the beginning of the audio. To ameliorate this, when the timestamp would cause analysis of audio over a second in length, it is split into 1 second length subchunks which are individually tested. Results are promising, but there seems to be a remaining bug with unguided transcription likely related to saving context * Limit the maximum length of audio input. This existing VAD implementation only detects falling edges, which means any gap in the users speaking is processed for transcription. This simply establishes a constant maximum length depending on the type of transcription. Uguided gets a generous 10 seconds and guided, 2. While quick testing showed that commands are generally around a half a second to a second, limiting commands to an even second resulted in extreme degradation of quality. (Seemingly always the same output for a given commandset) * Unguided timestamp tracking, cleanup Unguided transcriptions where not setup to allow for passing of timestamp data forward, but have been corrected. No_context is now always set to false. While conceptually desirable for the quality of guided transcription, It was seemingly responsible for prior command inputs ghosting in unguided transcription. Save and Run are now tracked by command number instead of command text. While command_text was provided for convenience, I wish to keep command index authoritative. This gives greater consistency and potentially allows for end users to rename or even translate the spoken versions of these commands * By default, maintain mode. Previously, mode was reset to 0 unless otherwise set. In addition to causing some edge cases, this was didn't mesh well with the existing approach to visual mode. With this change, initial tests indicate visual mode is functional. * Add undo breaks before subtranscriptions Subtranscriptions use undo as a hack to allow for partial responses to be displayed. However, scripts don't cause an undo break mid execution unless specifically instructed to. This meant that multiple unguided transcriptions from a single session would cause a latter to undo a former. This is now fixed and undo should be reasonably usable as a command. * Append instead of insert for new undo sequence When entering and leavening insert mode with `i`, the cursor shifts one column to the left. This is remedied by using append instead of insert for setting these breaks in the undo sequence `-` was also added to the pronunciation dictionary to be pronounced as minus as it was causing a particularly high failure rate. * Move undo sequence breaks to command execution Previously, undo sequence breaks were triggered when there was a command that caused a move to insert mode. This caused commands that changed state (like delete or paste) to be bundled together with into the last command that caused text to be entered. * Fix repeat. Add space, carrot, dollar commands Repeat (.) wasn't being tracked properly just like undo and is being manually tracked now. While efforts have been made to properly handle spaces, it was particularly finicky to add a single space when one is needed. A special 'space' command has been added to insert a single space and move the cursor after it. Carrot and Dollar commands have been added for start of line and end of line respectively. These are both simple to implement, and just a matter of defining a pronunciation. * Return error on duplicate in commandset Not every command in the commandset tokenizes to a single token. Because of this, it's possible for that two commands could resolve to the same single token after subsequent tokens are discarded. This commit adds a simple check for duplicates when a commandset is registered and returns an error if so. Additional code will be required later on the vim side to actually process this error. * Add support for user-defined commands This adds a user definable dictionary from spoken keys to strings or funcrefs. All keys are added to the commandlist and when spoken, trigger the corresponding function. Like "save" and "run", these user commands are only available when the command buffer is empty. * Add readme, update cmake * Add area commandset. Refactor spoken_dict Area commands (inside word, around sentence...) have been given a commandset as considered earlier. Verbose definitions for spoken_dict entries now use dicts instead of lists. This shortens the definition for most keys that require it and scales better with the addition of further commandsets * Add mark, jump. Fix change under visual. Mark (m) and jump (') have been added. When a visual selection was executed upon a command that initiated a subtranscription (change) the area of the visual selection is not properly tracked which causes the attempt to stream in partial response to fail. This is solved by disabling partial transcriptions from being streamed when a subtranscription is started while in visual mode. * Accommodate ignorecase. Fix change. From testing on older different versions of vim, the test for distinguishing an 'R' replace all from an 'r' replace could fail if ignorecase was set. The comparison has been changed to explicitly require case matching Change detection has been moved to the execution section as it was missing the change+motion case. * Support registers. Fix README typo There's no logic to prevent doubled register entry, but the functional result is equivalent to if the same key order was typed into vim. A minor typo in the readme. I've mismemorized the mnemonic for 't' as 'to' instead of till., but 'to' can't be used as it's a homophone with '2'. While there was no mistake in the actual logic, it was misleading to use 'to' in the readme.	2023-08-27 21:35:06 +03:00
ardfork	cb5fb0a12d	whisper : initial hipBLAS support (#1209 )	2023-08-27 20:03:58 +03:00
Georgi Gerganov	b5bb5c85d4	whisper : allow whisper_full from mel spectrogram - no audio (#1214 ) Co-authored-by: jbrough <jamie1612@gmail.com>	2023-08-27 20:02:57 +03:00
bobqianic	7e54df414e	whisper : significantly improve the inference quality (#1148 ) * Fix MSVC compile error C3688 Instead of simply using 'add_compile_options(/utf-8)' to address the MSVC compile error C3688, a better approach would be to handle it in a way that prevents passing '/utf-8' to NVCC. * Significantly improve inference quality In the function `log_mel_spectrogram_worker_thread`, there's an array out-of-bounds issue occurring during the calculation of complex number moduli. This issue is causing disruptions in the FFT spectrum, which, in turn, is reducing the quality of inference. * Significantly improve inference quality At last, I've pinpointed the actual source of the problem. Given that the frequency spectrum generated from real input data is symmetrical around the Nyquist frequency, there's a for-loop within the `log_mel_spectrogram_worker_thread` function that attempts to fold the frequency spectrum. Regrettably, a bug within this for-loop is causing a frame shift in the frequency spectrum. The previous attempt to remedy this, which involved using `fft_size + 1` when calculating the modulus, was merely a band-aid solution and did not address the underlying issue. * Addressed a few minor issues Fixed the issue of `fft_out` continuously expanding. Resolved the fallback caused by using 'break' instead of `fft_in[j] = 0`. * Significantly improve inference quality Thanks for your patience everyone. It's finally sorted out. Now, the right side of the FFT spectrum is being flipped over to the left, and the amplitudes at corresponding positions on the left and right are added together (the spectrum on the left needs to be shifted by one position), then the average is calculated. FFT_OUT[0] is no longer discarded, making full use of the limited space to pack in more information. * Add annotation and performance improvement * Calculate FFT only when fft_in are not all zero * Some minor performance improvement * Fixed a bug impacting inference quality * The first version after all the analysis is completed. * Fix some bugs and add debug mode * Fixed several bugs * Temporarily disable speed-up mode and add debug mode. * Add debug mode * Disable speed-up mode and add debug mode * Fix CI error (#1) * Fix error * Fix error * Fixed several bugs including [BLANK_AUDIO] problem * Remove Hard-coded hann window * Some Final Fix (#2) * Fix error * Fix error * Probably the last commit * Probably the last commit * whisper : minor coding style changes * whisper : remove debug from public API --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-08-27 19:51:33 +03:00
junkfood	20a80972f4	whisper.android : migrate from ndk-build to CMake (#1204 )	2023-08-27 19:35:16 +03:00
Yunès	7ef3f3837e	main : log probs to text file (#1205 ) * token/probability file generated with -ls * code comment cleaning * main : indentations --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-08-27 19:09:06 +03:00
Fangjun Kuang	aad2dad38a	whisper : minor fixes (#1154 )	2023-08-27 19:02:00 +03:00
Marcin Mielniczuk	66f2078878	build : fix OpenBLAS detection under Arch Linux (#1173 )	2023-08-25 19:26:34 +03:00
Eric Swanson	8ce20f0f3d	make : fix Linux machines supporting AVX1 not AVX2 (#1162 ) e.g. ancient CPU E5-2670 (v1) See issue #1126 Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-08-25 15:52:22 +03:00
Alexandr Graschenkov	c84cf87261	whisper : add precalculated values of sin/cos for speeding up FFT (#1142 ) * Add sin/cos precalculated values to speedup FFT * Update whisper.cpp Co-authored-by: bobqianic <129547291+bobqianic@users.noreply.github.com> * Update whisper.cpp Co-authored-by: bobqianic <129547291+bobqianic@users.noreply.github.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: bobqianic <129547291+bobqianic@users.noreply.github.com>	2023-08-25 15:51:14 +03:00
alonfaraj	c5f9acf4b7	make : simplify Makefile (#1147 ) * Simplify Architecture specific in Makefile * unified OS specific check	2023-08-25 15:20:44 +03:00
Marcin Mielniczuk	7decc85eb7	cmake : fix PowerPC build failures introduced in #1174 (#1196 )	2023-08-25 15:19:48 +03:00
Marcin Mielniczuk	21e8c67a4f	Fix AVX etc. under GCC/CMake (#1174 )	2023-08-19 21:39:03 +03:00
Jhen-Jie Hong	a4bb2df36a	quantize : fix load vocab crash when len is 128 (#1160 ) * quantize : fix load vocab crash when len is 128 * ci : add quantize job	2023-08-06 11:04:42 +03:00
Duncan McConnell	b948361956	examples : add tinydiarization support for streaming (#1137 )	2023-08-03 11:24:07 +03:00

1 2 3 4 5 ...

716 Commits