whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2024-12-24 14:46:39 +00:00

Author	SHA1	Message	Date
jiez	60f3713026	llama : add gguf_remove_key + remove split meta during quantize (llama/6591) * Remove split metadata when quantize model shards * Find metadata key by enum * Correct loop range for gguf_remove_key and code format * Free kv memory --------- Co-authored-by: z5269887 <z5269887@unsw.edu.au>	2024-05-13 11:02:26 +03:00
Justina Cho	37e6757453	feat: implemented sigmoid function (ggml/806) * added sigmoid function * implemented metal kernel for sigmoid * implemented cuda kernel for sigmoid * added sigmoid unary op and incremented count	2024-05-13 11:02:26 +03:00
Borislav Stanimirov	8dcefdf4a9	build: fix and ignore msvc warnings (ggml/805)	2024-05-13 11:02:26 +03:00
Przemysław Pawełczyk	73d13ad19a	ggml : expose SSE3 and SSSE3 for MSVC when AVX is available (#2128 )	2024-05-08 18:33:43 +03:00
Przemysław Pawełczyk	b6680fab50	build : improve disabling AVX-512 (#2129 ) * cmake : make WHISPER_NO_AVX512=ON disable all subsets of AVX-512 Previously it happened only for MSVC, but it makes sense to have the same behavior for other compilers too. * make : reorder x86 ISA extensions in chronological order And update compiler flags at the end to ease modifying conditions. * make : support WHISPER_NO_AVX512=1 for disabling all AVX-512 subsets. That way you do not have to override each AVX-512 subset setting individually if it has been turned on during autodetection.	2024-05-08 18:32:43 +03:00
Borislav Stanimirov	f760756078	minor: add CMakeSettings.json to gitignore (#2094 )	2024-05-08 11:03:21 +03:00
Pedro Probst	58210d6a76	examples : fix node compilation (#2115 ) * node : fix compilation and update examples * node : fix readme * Update addon.node test	2024-05-02 22:52:55 +01:00
Przemysław Pawełczyk	8fac6455ff	make : change GNU make default CXX from g++ to c++ (#2100 )	2024-04-28 22:54:21 +01:00
goldwaving	22b6598cc9	Remove unnecessary memory reallocation in fft (#2080 ) fft_out needs to be twice the frame_size, not the frame_step. It is resized in fft() anyway, but this change prevents an unnecessary reallocation. n_fft must match the mel filter size, so it is best not to calculate it from the framesize. We only need to get the magnitudes for half the spectrum since the other half is a mirror and not used in the mel filter loop later.	2024-04-28 18:36:12 +01:00
Georgi Gerganov	858452d58d	models : disable old script (#2079 )	2024-04-24 14:56:30 +03:00
Georgi Gerganov	7f85e1d7fd	whisper : more prominent log message for sub-1s audio (#2065 )	2024-04-24 14:46:06 +03:00
Georgi Gerganov	b0c3cbf2e8	main : pass nullptr when regex is empty (#2070 )	2024-04-17 12:23:47 +03:00
AIWintermuteAI	a750868428	readme : add up-to-date repository for Python bindings (#2063 ) README	2024-04-16 14:15:52 +03:00
Georgi Gerganov	7395c70a74	release : v1.5.5	2024-04-16 14:08:31 +03:00
Emmanuel Schmidbauer	9fab28135c	server : add dtw (#2044 ) * server.cpp: add dtw * Update examples/server/server.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-15 22:16:58 +03:00
Didzis Gosko	08d3eef97d	build : fix embedded Metal library generation (#2045 )	2024-04-15 20:23:05 +03:00
Pedro Probst	1b5439a6c2	node : support no timestamps (#2048 ) * fix: node: do not compute timestamps if you do not need them * feat: add no_timestamps parameter to node addon	2024-04-15 20:03:34 +03:00
Didzis Gosko	c7f95b7ca2	build : detect AVX512 in Makefile, add AVX512 option in CMake (#2043 ) * make : add AVX512 detection to Makefile and CMakeLists.txt * make : autodetect more AVX512 instruction subsets * cmake : do not default to AVX512, must be enabled explicitly * cmake : enable a set of AVX512 subsets, when AVX512 is turned on * make : consolidate AVX512 subsets, add AVX512 VBMI * cmake : revert to NO AVX512 setting, add settings for AVX512 VNNI and VBMI * make : re-introduce AVX512VNNI back * cmake : remove superfluous comment line	2024-04-15 20:02:09 +03:00
Kendrick Taylor	5c554c04ff	whisper.nvim : fix missing reference to "model" variable (#2049 )	2024-04-15 19:41:28 +03:00
Ikko Eltociear Ashimine	c383f091a1	whisper : update grammar-parser.cpp (#2058 ) preceeding -> preceding	2024-04-15 19:40:27 +03:00
Georgi Gerganov	8f253ef3af	sync : ggml	2024-04-09 20:27:55 +03:00
Georgi Gerganov	c7dc37f97c	license : update copyright notice + add AUTHORS	2024-04-09 20:27:44 +03:00
Carolinabanana	526332873b	llama : add Command R Plus support (llama/6491) * Add Command R Plus GGUF * Add Command R Plus GGUF * Loading works up to LayerNorm2D * Export new tensors in 1D so they are not quantized. * Fix embedding layer based on Noeda's example * Whitespace * Add line * Fix unexpected tokens on MPS. Re-add F16 fix. ((Noeda) * dranger003: Fix block index overflow in CUDA dequantizing. * Reverted blocked multiplication code as it still has issues and could affect other Llama arches * export norms as f32 * fix overflow issues during quant and other cleanup * Type convention Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * dranger003: Fix more int overflow during quant. --------- Co-authored-by: S <seast@Ss-Mac-Studio.local> Co-authored-by: S <s@example.com> Co-authored-by: slaren <slarengh@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-09 20:26:18 +03:00
Abhilash Majumder	1d2721ca72	remove row=1 cond (llama/6532)	2024-04-09 20:26:18 +03:00
Neo Zhang Jianyu	219e601dab	support/fix OPs GGML_TYPE_IQ4_NL, GGML_TYPE_IQ4_XS, GGML_TYPE_IQ3_XXS, GGML_TYPE_IQ3_S, GGML_TYPE_IQ2_XXS, GGML_TYPE_IQ2_XS, GGML_TYPE_IQ2_S, GGML_TYPE_IQ1_S, GGML_TYPE_IQ1_M (llama/6521)	2024-04-09 20:26:18 +03:00
Georgi Gerganov	3b8aade3c2	scripts : update sync	2024-04-09 20:25:50 +03:00
Georgi Gerganov	52ccd4a3a8	files : rename ./extra to ./scripts	2024-04-09 20:13:41 +03:00
Brad Murray	5275074d37	whisper : fix DTW memory access (#2012 ) * Fix DTW memory access * Memory fix - Apply changes from denersc	2024-04-09 18:38:19 +03:00
ulatekh	c15b4cda7d	common : fix file-handle leak in read_wav() (#2026 ) Now it cleans up in case of error.	2024-04-09 18:34:34 +03:00
Rotem Dan	d3cfb6ca2b	main : set stdin to binary mode on Windows (#2025 )	2024-04-09 18:33:32 +03:00
slashlib	956ef860bc	cmake : support for CPU BLAS build via Intel MKL (#2024 )	2024-04-09 18:32:46 +03:00
ulatekh	671b4bde6c	main : allow a response-file as the sole parameter (#2019 ) * The "main" example now allows a response-file as the sole parameter. A response-file is a text file with command-line parameters, one per line. Prefix the name of the response-file with "@" to identify it as such. It's used under MS Windows to work around command-line length limits. It may be useful under other platforms to simplify character-escaping. * minor : style --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-09 18:31:16 +03:00
ulatekh	c8eeb93a6a	whisper : suppress tokens with a regex (#1997 ) * Allow a regular expression to describe tokens to suppress. Example: --suppress-tokens-re "[,\.]\|[ ]?[0-9]+" will suppress commas, periods, and numeric tokens. Technique inspired by https://github.com/openai/whisper/discussions/1041 Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Blind change to fix Java test. --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-09 18:27:28 +03:00
ulatekh	319fe5146e	cmake : create solution folders (#2004 ) * Create solution folders in the CMake build. * Fixed non-SDL2 build. * Fixed emscripten build.	2024-04-09 18:23:33 +03:00
Georgi Gerganov	13c22321d1	sync : ggml	2024-04-07 17:04:56 +03:00
Georgi Gerganov	ccbe9d5676	extra : sync grammar-parser	2024-04-07 17:04:22 +03:00
Georgi Gerganov	81a3c41aa0	talk-llama : sync llama.cpp	2024-04-07 16:21:08 +03:00
Georgi Gerganov	a50207c65d	sync : ggml	2024-04-07 16:18:11 +03:00
Georgi Gerganov	97878e53fd	sync : llama.cpp (skip) ggml-ci	2024-04-07 16:15:57 +03:00
Ouadie EL FAROUKI	61b05815e0	Fixed minor bug when enabling FP16 for non intel targets (llama/6464) * moved INTEL_MKL guard from gemm_impl to gemm (wrapper) * Update ggml-sycl.cpp Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com> --------- Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com>	2024-04-07 16:15:57 +03:00
slaren	1dce94cf26	ggml : mul_mat_id use the same tensor for all the experts (llama/6387) * ggml : update mul_mat_id to use the same tensor for all the experts * update cuda * minor * update metal * update test-backend-ops * fix cuda * Update ggml-metal.m Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * update convert.py * update convert-hf-to-gguf.py * update convert.py for mixtral hf models * Update convert-hf-to-gguf.py Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * cuda : support non-pow-2 number of experts * allow quantize to work for split and merged experts models in the same way * cleanup + disable mmap automatically with split tensors models * update imatrix * test-backend-ops : test qwen argsort * update grok model loading * llama : add merged experts tensors to the grok tensor map * minor * gguf : bump version * fix quantizing of merged experts * convert-hf-to-gguf.py : update grok (untested) * make linter happy * cuda/argsort : use shared memory instead of pool memory * convert : fix grok tensor names * metal : add support for non-pow-2 argsort * llama : more loader cleanup, better error checking * cuda : fix warning * llama : still use mmap for loading old models, but copy the data to a host buffer * add review note * llama : remove ffn tensor counting + add sanity check ggml-ci * convert : fix handling of n_experts == None ggml-ci * imatrix : fix ncall counters * llama : produce error if imatrix size does not match * quantize : terminate on errors + trace logs ggml-ci * metal : pad shared memory to 16 bytes --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-07 16:15:57 +03:00
Meng, Hengyu	f12e982c0b	Disable iqx on windows as WA (llama/6435) * disable iqx on windows as WA * array instead of global_memory	2024-04-07 16:15:57 +03:00
0cc4m	fa966b9b40	Vulkan k-quant mmq and ggml-backend offload functionality (llama/6155) * Fix Vulkan no kv offload incoherence * Add k-quant mul mat mat shaders * Rework working buffer allocation, reduces vram use noticeably Clean up cpu assist code, replaced with ggml-backend offload function * Default to all dedicated GPUs * Add fallback for integrated GPUs if no dedicated GPUs are found * Add debug info which device is allocating memory * Fix Intel dequant issue Fix validation issue * Fix Vulkan GGML_OP_GET_ROWS implementation * Clean up merge artifacts * Remove Vulkan warning	2024-04-07 16:15:57 +03:00
Neo Zhang Jianyu	b83a9fc9d3	fix set main gpu crash (llama/6339)	2024-04-07 16:15:56 +03:00
slaren	3adbf2fb03	ggml : fix bounds checking of zero size views (llama/6347)	2024-04-07 16:15:56 +03:00
Daniel Bevenius	700d146127	backend : fix typo in scheduler documentation (ggml/781) Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-04-07 16:15:56 +03:00
Georgi Gerganov	a74fde9b4c	extra : sync ggml-cuda folder	2024-04-07 16:10:44 +03:00
Slava Primenko	1d7657f409	ggml: bypass code incompatible with CUDA < 11.1 (#2020 ) `cudaHostRegisterReadOnly` parameter was only introduced in CUDA 11.1 See this issue for more details: https://github.com/ggerganov/whisper.cpp/issues/2007	2024-04-04 14:49:24 +02:00
Przemysław Pawełczyk	ac283dbce7	ci : add building in MSYS2 environments (Windows) (#1994 )	2024-03-30 09:20:20 +02:00
Przemysław Pawełczyk	1e8f28c42a	build : use pkg-config for OpenBLAS (#1778 ) * make : use pkg-config for finding CFLAGS & LDFLAGS needed by OpenBLAS That way building on nix like environments (including MSYS2 on Windows) with WHISPER_OPENBLAS=1 works out of the box. Fix handling of WHISPER_OPENBLAS, so that empty value or 0 won't be misinterpreted by make as enabled. Mind that it's not intended to detect CMake false constants (OFF NO FALSE N). make is not CMake. By default OpenBLAS with 64-bit interface is used, but that can be changed with `WHISPER_OPENBLAS_INTERFACE64=0` if 32-bit one is desired. If OpenBLAS headers and library are respectively in include/ and lib/ subdirectories of given path, then you can specify it, e.g. `OPENBLAS_PATH=/usr/local/openblas`, and this will take precedence over any pkg-config file. If there is no pkg-config file (.pc) for OpenBLAS and OPENBLAS_PATH is empty, then headers are assumed to be in /usr/include/openblas and library as assumed to be called 'openblas64' (or 'openblas' if `WHISPER_OPENBLAS_INTERFACE64=0`). If different headers location should be used, then it can be done, e.g. `WHISPER_BLAS_CFLAGS=-I/usr/local/include/openblas`. If different library should be used, it can be specified, e.g. `WHISPER_BLAS_LIB=openblasp64` (pthreads version as seen on Fedora), or you can provide LDFLAGS needed to link with OpenBLAS directly: `WHISPER_BLAS_LDFLAGS="-L/usr/local/lib/openblas -lopenblas64"`. Current solution is flexible enough to handle most cases out there without needlessly hardcoding possible OpenBLAS installation details. cmake : fix how pkg-config is used for finding include dirs and libraries needed by OpenBLAS That way building on nix like environments (including MSYS2 on Windows) with -DWHISPER_OPENBLAS=ON should work out of the box as long as you have CMake 3.25 or newer. Make OPENBLAS_PATH environment variable supported not only on Windows. It sets OpenBLAS include dir to ${OPENBLAS_PATH}/include and library to ${WHISPER_BLAS_LIB} (name without prefixes and suffixes) in ${OPENBLAS_PATH}/lib and avoids further package finding. By default OpenBLAS with 64-bit interface is used (equivalent to setting `-DWHISPER_BLAS_LIB=openblas64`), but that can be changed with `-DWHISPER_OPENBLAS_INTERFACE64=OFF` (equivalent to setting `-DWHISPER_BLAS_LIB=openblas`) if 32-bit one is desired. Turn on BLA_STATIC for FindBLAS only when WHISPER_STATIC is enabled. BLA_STATIC may not work as expected for pkg-config based operation. Get rid of supporting BLAS_HOME environment variable. If OPENBLAS_PATH is insufficient in your case, there is no pkg-config file to rely on, then you can manually specify include dir, e.g. `-DBLAS_INCLUDE_DIRS=/usr/local/include/openblas`, and library, e.g. `-DBLAS_LIBRARIES=/usr/local/lib/libopenblas.so`. make / cmake : use OpenBLAS with 32-bit interface by default. OpenBLAS w/o INTERFACE64=1 vel USE_64BITINT=1 seems to be more common. * cmake : hardcode "lib" prefix for OpenBLAS lib filename (even on Windows) * cmake : hardcode OpenBLAS library name when building in MSVC (Windows) Most *nix like environments (including MSYS2 on Windows) have OpenBLAS packages that allow coexistence of OpenBLAS builds with 32-bit and 64-bit interface (w/o and w/ OPENBLAS_USE64BITINT defined) and they differ by not having or having "64" suffix in their library filenames. That's not the case for OpenBLAS prebuilt libraries for Windows.	2024-03-29 15:53:26 +02:00

... 2 3 4 5 6 ...

1364 Commits