whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2024-12-22 22:12:21 +00:00

Author	SHA1	Message	Date
Neo Zhang	9ad202bee9	add device version in device list (llama/6959) Co-authored-by: arthw <>	2024-05-13 11:02:26 +03:00
agray3	f0d3fb4a7e	Reset schedule earlier to allow overlap with ggml graph computation on device (llama/6933) * Reset schedule earlier to allow overlap with graph computation on device	2024-05-13 11:02:26 +03:00
slaren	9d4c8b8aa5	add basic tensor data validation function (llama/6884) * add basic tensor data validation function * add --check-tensors command line argument tensor validation is disabled by default and can be enabled by adding `--check-tensors` to the command line arguments. quantize always validates tensors.	2024-05-13 11:02:26 +03:00
slaren	ecfac1e240	gguf : fix mismatch between alloc and free functions (llama/6929)	2024-05-13 11:02:26 +03:00
Georgi Gerganov	6f7140f568	Merge pull request from GHSA-p5mv-gjc5-mwqv * always use calloc clamp n_kv on failure to read a kv * ggml : alternative ctx->header.n_kv update --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-05-13 11:02:26 +03:00
Georgi Gerganov	05b17112cf	ggml : fix redefinition of vaddvq_f32 for 32-bit ARM (llama/6906)	2024-05-13 11:02:26 +03:00
Georgi Gerganov	a15fb5cd79	ggml : fix MIN / MAX macros (llama/6904) ggml-ci	2024-05-13 11:02:26 +03:00
Georgi Gerganov	63fd148d8f	ggml : move 32-bit arm compat in ggml-impl.h (llama/6865) ggml-ci	2024-05-13 11:02:26 +03:00
Justine Tunney	6c3971b29b	llamafile : improve sgemm.cpp (llama/6796) * llamafile : improve sgemm.cpp - Re-enable by default - Fix issue described in #6716 - Make code more abstract, elegant, and maintainable - Faster handling of weirdly shaped `m` an `n` edge cases * Address review comments * Help clang produce fma instructions * Address review comments	2024-05-13 11:02:26 +03:00
Dave Airlie	a6d264f331	ggml : fix calloc argument ordering. (llama/6820) Latest gcc complains here: /home/airlied/devel/llama.cpp/ggml-alloc.c: In function ‘ggml_gallocr_new_n’: /home/airlied/devel/llama.cpp/ggml-alloc.c:374:59: warning: ‘calloc’ sizes specified with ‘sizeof’ in the earlier argument and not in the later argument [-Wcalloc-transposed-args] 374 \| ggml_gallocr_t galloc = (ggml_gallocr_t)calloc(sizeof(struct ggml_gallocr), 1); \| ^~~~~~ /home/airlied/devel/llama.cpp/ggml-alloc.c:374:59: note: earlier argument should specify number of elements, later size of each element and a bunch more. calloc is specified to take nmemb first then size, so realign the code. In a couple of places there was a * x, 1 so I fixed those to use calloc properly.	2024-05-13 11:02:26 +03:00
Georgi Gerganov	2959686019	ggml : fix ggml_backend_cpu_supports_op() for CPY (llama/0)	2024-05-13 11:02:26 +03:00
slaren	c96b0a938e	ggml : group all experts in a single ggml_mul_mat_id (llama/6505) * ggml : group all experts in a single ggml_mul_mat_id cuda : improve mmid row copy * cuda : fix bin bcast with non-cont src0 * test-backend-ops : only run all mul mat tests for base types * llama : disable moe offloading with SYCL --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-05-13 11:02:26 +03:00
Georgi Gerganov	c97796aa0f	ggml : fix llamafile sgemm wdata offsets (llama/6710) ggml-ci	2024-05-13 11:02:26 +03:00
Justine Tunney	7a4f7d825e	ggml : add llamafile sgemm (llama/6414) This change upstreams llamafile's cpu matrix multiplication kernels which improve image and prompt evaluation speed. For starters, Q4_0 and Q8_0 weights should go ~40% faster on CPU. The biggest benefits are with data types like f16 / f32, which process prompts 2x faster thus making them faster than quantized data types for prompt evals. This change also introduces bona fide AVX512 support since tinyBLAS is able to exploit the larger register file. For example, on my CPU llama.cpp llava-cli processes an image prompt at 305 tokens/second, using the Q4_K and Q4_0 types, which has always been faster than if we used f16 LLaVA weights, which at HEAD go 188 tokens/second. With this change, f16 LLaVA performance leap frogs to 464 tokens/second. On Intel Core i9-14900K this change improves F16 prompt perf by 5x. For example, using llama.cpp at HEAD with Mistral 7b f16 to process a 215 token prompt will go 13 tok/sec. This change has fixes making it go 52 tok/sec. It's mostly thanks to my vectorized outer product kernels but also because I added support for correctly counting the number of cores on Alderlake, so the default thread count discounts Intel's new efficiency cores. Only Linux right now can count cores. This work was sponsored by Mozilla who's given permission to change the license of this code from Apache 2.0 to MIT. To read more about what's improved, and how it works, see: https://justine.lol/matmul/	2024-05-13 11:02:26 +03:00
Shijie	fdb2c87350	llama : add qwen2moe (llama/6074) * support qwen2moe * fix-review * metal : support unary ops for nelements % 4 != 0 * metal : require contiguousness for float4 unary kernels * metal : require contiguousness for float4 unary kernels (cont) * fix-review * names : for brevity "SHARED_EXP" -> "SHEXP" * llama : reuse build_moe_ffn() * llama : add model type name --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-05-13 11:02:26 +03:00
Neo Zhang Jianyu	98c0b77e0c	fix mul_mat_id() for new input, make the ut pass (llama/6682)	2024-05-13 11:02:26 +03:00
Dave	9d6d50d933	Added support for GGML_OP_CLAMP in Metal (llama/6662) * Added support for GGML_OP_CLAMP in Metal * Corrected size --------- Co-authored-by: dave-fl <dave@Davids-MacBook-Pro.local>	2024-05-13 11:02:26 +03:00
Neo Zhang Jianyu	c1320c1f0c	fix memcpy() crash, add missed cmd in guide, fix softmax (llama/6622) * disable mmap to fix memcpy crash, add missed cmd in guide, fix softmax * refactor to disable mmap for SYCL backend * fix compile error in other os * refactor the solution, use host buf to fix it, instead of disable mmap * keep to support mmap() * use host buff to reduce malloc times * revert to malloc/free solution, for threaad safe	2024-05-13 11:02:26 +03:00
Johannes Gäßler	66aaf03a7a	CUDA: fix matrix multiplication logic for tests (llama/6667)	2024-05-13 11:02:26 +03:00
slaren	00a0947c65	metal : unify mul_mv_id kernels (llama/6556)	2024-05-13 11:02:26 +03:00
jiez	60f3713026	llama : add gguf_remove_key + remove split meta during quantize (llama/6591) * Remove split metadata when quantize model shards * Find metadata key by enum * Correct loop range for gguf_remove_key and code format * Free kv memory --------- Co-authored-by: z5269887 <z5269887@unsw.edu.au>	2024-05-13 11:02:26 +03:00
Justina Cho	37e6757453	feat: implemented sigmoid function (ggml/806) * added sigmoid function * implemented metal kernel for sigmoid * implemented cuda kernel for sigmoid * added sigmoid unary op and incremented count	2024-05-13 11:02:26 +03:00
Borislav Stanimirov	8dcefdf4a9	build: fix and ignore msvc warnings (ggml/805)	2024-05-13 11:02:26 +03:00
Przemysław Pawełczyk	73d13ad19a	ggml : expose SSE3 and SSSE3 for MSVC when AVX is available (#2128 )	2024-05-08 18:33:43 +03:00
Przemysław Pawełczyk	b6680fab50	build : improve disabling AVX-512 (#2129 ) * cmake : make WHISPER_NO_AVX512=ON disable all subsets of AVX-512 Previously it happened only for MSVC, but it makes sense to have the same behavior for other compilers too. * make : reorder x86 ISA extensions in chronological order And update compiler flags at the end to ease modifying conditions. * make : support WHISPER_NO_AVX512=1 for disabling all AVX-512 subsets. That way you do not have to override each AVX-512 subset setting individually if it has been turned on during autodetection.	2024-05-08 18:32:43 +03:00
Borislav Stanimirov	f760756078	minor: add CMakeSettings.json to gitignore (#2094 )	2024-05-08 11:03:21 +03:00
Pedro Probst	58210d6a76	examples : fix node compilation (#2115 ) * node : fix compilation and update examples * node : fix readme * Update addon.node test	2024-05-02 22:52:55 +01:00
Przemysław Pawełczyk	8fac6455ff	make : change GNU make default CXX from g++ to c++ (#2100 )	2024-04-28 22:54:21 +01:00
goldwaving	22b6598cc9	Remove unnecessary memory reallocation in fft (#2080 ) fft_out needs to be twice the frame_size, not the frame_step. It is resized in fft() anyway, but this change prevents an unnecessary reallocation. n_fft must match the mel filter size, so it is best not to calculate it from the framesize. We only need to get the magnitudes for half the spectrum since the other half is a mirror and not used in the mel filter loop later.	2024-04-28 18:36:12 +01:00
Georgi Gerganov	858452d58d	models : disable old script (#2079 )	2024-04-24 14:56:30 +03:00
Georgi Gerganov	7f85e1d7fd	whisper : more prominent log message for sub-1s audio (#2065 )	2024-04-24 14:46:06 +03:00
Georgi Gerganov	b0c3cbf2e8	main : pass nullptr when regex is empty (#2070 )	2024-04-17 12:23:47 +03:00
AIWintermuteAI	a750868428	readme : add up-to-date repository for Python bindings (#2063 ) README	2024-04-16 14:15:52 +03:00
Georgi Gerganov	7395c70a74	release : v1.5.5	2024-04-16 14:08:31 +03:00
Emmanuel Schmidbauer	9fab28135c	server : add dtw (#2044 ) * server.cpp: add dtw * Update examples/server/server.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-15 22:16:58 +03:00
Didzis Gosko	08d3eef97d	build : fix embedded Metal library generation (#2045 )	2024-04-15 20:23:05 +03:00
Pedro Probst	1b5439a6c2	node : support no timestamps (#2048 ) * fix: node: do not compute timestamps if you do not need them * feat: add no_timestamps parameter to node addon	2024-04-15 20:03:34 +03:00
Didzis Gosko	c7f95b7ca2	build : detect AVX512 in Makefile, add AVX512 option in CMake (#2043 ) * make : add AVX512 detection to Makefile and CMakeLists.txt * make : autodetect more AVX512 instruction subsets * cmake : do not default to AVX512, must be enabled explicitly * cmake : enable a set of AVX512 subsets, when AVX512 is turned on * make : consolidate AVX512 subsets, add AVX512 VBMI * cmake : revert to NO AVX512 setting, add settings for AVX512 VNNI and VBMI * make : re-introduce AVX512VNNI back * cmake : remove superfluous comment line	2024-04-15 20:02:09 +03:00
Kendrick Taylor	5c554c04ff	whisper.nvim : fix missing reference to "model" variable (#2049 )	2024-04-15 19:41:28 +03:00
Ikko Eltociear Ashimine	c383f091a1	whisper : update grammar-parser.cpp (#2058 ) preceeding -> preceding	2024-04-15 19:40:27 +03:00
Georgi Gerganov	8f253ef3af	sync : ggml	2024-04-09 20:27:55 +03:00
Georgi Gerganov	c7dc37f97c	license : update copyright notice + add AUTHORS	2024-04-09 20:27:44 +03:00
Carolinabanana	526332873b	llama : add Command R Plus support (llama/6491) * Add Command R Plus GGUF * Add Command R Plus GGUF * Loading works up to LayerNorm2D * Export new tensors in 1D so they are not quantized. * Fix embedding layer based on Noeda's example * Whitespace * Add line * Fix unexpected tokens on MPS. Re-add F16 fix. ((Noeda) * dranger003: Fix block index overflow in CUDA dequantizing. * Reverted blocked multiplication code as it still has issues and could affect other Llama arches * export norms as f32 * fix overflow issues during quant and other cleanup * Type convention Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * dranger003: Fix more int overflow during quant. --------- Co-authored-by: S <seast@Ss-Mac-Studio.local> Co-authored-by: S <s@example.com> Co-authored-by: slaren <slarengh@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-09 20:26:18 +03:00
Abhilash Majumder	1d2721ca72	remove row=1 cond (llama/6532)	2024-04-09 20:26:18 +03:00
Neo Zhang Jianyu	219e601dab	support/fix OPs GGML_TYPE_IQ4_NL, GGML_TYPE_IQ4_XS, GGML_TYPE_IQ3_XXS, GGML_TYPE_IQ3_S, GGML_TYPE_IQ2_XXS, GGML_TYPE_IQ2_XS, GGML_TYPE_IQ2_S, GGML_TYPE_IQ1_S, GGML_TYPE_IQ1_M (llama/6521)	2024-04-09 20:26:18 +03:00
Georgi Gerganov	3b8aade3c2	scripts : update sync	2024-04-09 20:25:50 +03:00
Georgi Gerganov	52ccd4a3a8	files : rename ./extra to ./scripts	2024-04-09 20:13:41 +03:00
Brad Murray	5275074d37	whisper : fix DTW memory access (#2012 ) * Fix DTW memory access * Memory fix - Apply changes from denersc	2024-04-09 18:38:19 +03:00
ulatekh	c15b4cda7d	common : fix file-handle leak in read_wav() (#2026 ) Now it cleans up in case of error.	2024-04-09 18:34:34 +03:00
Rotem Dan	d3cfb6ca2b	main : set stdin to binary mode on Windows (#2025 )	2024-04-09 18:33:32 +03:00

... 4 5 6 7 8 ...

1484 Commits