whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2024-12-19 12:47:52 +00:00

Author	SHA1	Message	Date
Kawrakow	97b12212dd	ggml : SOTA 2-bit quants (add IQ2_XS) (llama/4856) * iq2_xs: basics * iq2_xs: this should have been in the basics * iq2_xs: CUDA and scalar CPU works * iq2_xs: WIP Metal * iq2_xs: Metal now works * iq2_xs: working, but dog slow, ARM_NEON dot product * iq2_xs: better ARM_NEON dot product We are now at 19.5 t/s for TG-128 and 61 t/s for PP-512 when running on the CPU. * iq2_xs: AVX2 dot product - 19.5 t/s * iq2_xs: faster AVX2 dit product 21.4 t/s for TG-128, 59.2 t/s for PP-512. The latter is 2x compared to the previous version. * iq2_xs: had forgotten to delete iq2-data.h * Add llama enum for IQ2_XS --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-01-11 21:50:01 +02:00
Timothy Cronin	73072a7c73	ggml : remove ggml_cpy_inplace and ggml_cont_inplace (ggml/693)	2024-01-11 21:50:00 +02:00
Halalaluyafail3	338442d773	Fix execlp call (ggml/689) NULL can be an integer constant expression with the value zero, in this case the behavior would be undefined because of an incorrect type being passed to the variable arguments.	2024-01-11 21:50:00 +02:00
Kawrakow	10651bddf6	SOTA 2-bit quants (llama/4773) * iq2_xxs: basics * iq2_xxs: scalar and AVX2 dot products Needed to change Q8_K to have quants in the -127...127 range, else the IQ2_XXS AVX implementation becomes very awkward. The alternative would have been to use Q8_0 instead. Perhaps I'll change later, for now this is what we have. * iq2_xxs: ARM_NEON dot product Somehow strangely slow (112 ms/token). * iq2_xxs: WIP Metal Dequantize works, something is still wrong with the dot product. * iq2_xxs: Metal dot product now works We have PP-512 = 475 t/s TG-128 = 47.3 t/s Not the greatest performance, but not complete garbage either. * iq2_xxs: slighty faster dot product TG-128 is now 48.4 t/s * iq2_xxs: slighty faster dot product TG-128 is now 50.9 t/s * iq2_xxs: even faster Metal dot product TG-128 is now 54.1 t/s. Strangely enough, putting the signs lookup table into shared memory has a bigger impact than the grid values being in shared memory. * iq2_xxs: dequantize CUDA kernel - fix conflict with master * iq2_xxs: quantized CUDA dot product (MMVQ) We get TG-128 = 153.1 t/s * iq2_xxs: slightly faster CUDA dot product TG-128 is now at 155.1 t/s. * iq2_xxs: add to llama ftype enum * iq2_xxs: fix MoE on Metal * Fix missing MMQ ops when on hipBLAS I had put the ggml_supports_mmq call at the wrong place. * Fix bug in qequantize_row_iq2_xxs The 0.25f factor was missing. Great detective work by @ggerganov! * Fixing tests * PR suggestion --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-01-11 21:50:00 +02:00
Georgi Gerganov	c46a74a19d	ggml : do not sched_yield when calling BLAS (llama/4761) * ggml : do not sched_yield when calling BLAS ggml-ci * ggml : fix do_yield logic ggml-ci * ggml : simplify do_yield logic ggml-ci	2024-01-11 21:50:00 +02:00
automaticcat	dbe29d4e33	ggml : add ggml_cpu_has_avx_vnni() (llama/4589) * feat: add avx_vnni based on intel documents * ggml: add avx vnni based on intel document * llama: add avx vnni information display * docs: add more details about using oneMKL and oneAPI for intel processors * docs: add more details about using oneMKL and oneAPI for intel processors * docs: add more details about using oneMKL and oneAPI for intel processors * docs: add more details about using oneMKL and oneAPI for intel processors * docs: add more details about using oneMKL and oneAPI for intel processors * Update ggml.c Fix indentation upgate Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-01-03 14:43:51 +02:00
Guillaume Wenzek	cf6f1e4181	ggml : extend ggml_get_rows, ggml_repeat, ggml_concat (ggml/639) * add more int ops * ggml_compute_forward_dup_bytes * add tests * PR comments * tests : minor indentations --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-01-03 14:43:51 +02:00
Georgi Gerganov	e77b27c331	sync : ggml (VMM, sync-ggml-am, dotprod ARM fixes, CUDA fixes) (#1691 ) * scripts : add sync-ggml-am.sh * sync : ggml (VMM, ARM dot prod fix, etc.) * build : fix CUDA build * ggml : fix some mul mat cases + add tests for src1 F16 `dbd02958fa`	2023-12-29 11:30:47 +02:00
Georgi Gerganov	3a5302108d	sync : ggml (ggml_scale, ggml_row_size, etc.) (#1677 ) * sync : ggml * sync : llama.cpp * talk-llama : fix obsolete param * ggml-alloc : fix ggml_tallocr_is_own * talk.wasm : update to new ggml * ggml : fix type punning in ggml_scale * ggml : cuda jetson + arm quants warnings	2023-12-22 17:53:39 +02:00
Georgi Gerganov	8171e621fc	sync : ggml (Metal fixes, new ops, tests) (#1633 ) * sync : ggml (Metal fixes, new ops, tests) * cuda : fix bin bcast when src1 and dst have different types	2023-12-13 21:55:03 +02:00
Georgi Gerganov	afce6fa113	sync : ggml (new ops, new backend, etc) (#1602 ) * sync : ggml (new ops, new backend, etc) * whisper : remove obsolete broadcasting code * ggml : remove backend self-registers + fix ggml_concat + n_task logic * metal : fix assert * metal : print resource path * whisper : fix bug if metal init fails	2023-12-07 22:27:19 +02:00
Georgi Gerganov	e369243ebd	ggml : re-enable blas for src0 != F32 (#1583 )	2023-12-01 23:57:52 +02:00
Georgi Gerganov	d4353e48f7	sync : ggml (ggml-alloc + linker + gguf fixes) (#1501 )	2023-11-17 10:00:07 +02:00
Georgi Gerganov	b0502836b8	whisper : add full CUDA and Metal offloading (#1472 ) * whisper : migrate to ggml-backend * whisper : fix logit reading * whisper : fix tensor allocation during load * whisper : fix beam-search with CUDA * whisper : free backends + fix compile warning * whisper : print when CUDA is enabled * whisper : fix CoreML * make : clean-up * talk : fix compile warning * whisper : support ggml_conv with CUDA and Metal (#1473) * ggml : add CUDA support for ggml_conv * whisper : remove ggml_repeat for conv bias + single backend * cuda : fix im2col kernel * metal : add im2col support + mul mat-vec f16 x f16 * bench-all : add q4 models * whisper : clean-up * quantize-all : fix * ggml : im2col opts * whisper : avoid whisper_model_data wrapper * whisper : add note that ggml_mul_mat_pad does not work with CUDA * whisper : factor out graph compute in common function * whisper : fixes * whisper : fix UB with measure buffers * whisper : try to fix the parallel whisper_state functionality (#1479) * whisper : try to fix the parallel whisper_state functionality * whisper : fix multi-state Metal * whisper : free backend instances in whisper_state	2023-11-12 15:31:08 +02:00
Georgi Gerganov	0cbef75422	ggml : fix MIN / MAX macro re-definition	2023-11-07 16:08:46 +02:00
Georgi Gerganov	f96e1c5b78	sync : ggml (backend v2, k-quants, CUDA opts, Metal opts, etc.) (#1422 ) * sync : ggml (backend v2, k-quants, CUDA opts, Metal opts, etc.) * metal : allow env metal variable to override resource path (#1415) * Allow env variable to override resource path * Update ggml-metal.m --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * sync : restore common / main from `master` * sync : restore whisper from `master` * talk-llama : update to latest llama.cpp * ruby : fix build * ggml : fix 32-bit ARM build * ggml : fix MIN / MAX macro collisions + update ios bindings * ggml : fix ifdefs and MIN / MAX again * exampels : fix Obj-C and Swift examples * ggml : fix 32-bit ARM compatibility * ggml : one more attempt to fix 32-bit ARM compat * whisper : fix support for larger graphs --------- Co-authored-by: Chris Raethke <codesoda@users.noreply.github.com>	2023-11-03 21:35:05 +02:00
Georgi Gerganov	80c1512fd5	sync : ggml (const correctness)	2023-09-15 14:49:56 +03:00
Georgi Gerganov	b8432f28f4	metal : add F32 support + update bench output	2023-09-15 13:56:08 +03:00
Georgi Gerganov	93935980f8	whisper : Metal and ggml-alloc support (#1270 ) * metal : init * whisper : factor out graph builds * whisper : allocate encoder and decoder using ggml-alloc * whisper : ggml-alloc is now supported * whisper : CoreML support ggml-alloc * build : fix ggml-alloc * ios : update submodule * extra : update sync-ggml.sh script to also sync ggml-alloc * ci : see if this is causing the crash * whisper : refactor ggml-alloc init * whisper.android : try to fix build * whisper : initial Metal version * ci : try to debug vmem issue * metal : decoder works on GPU! * metal : add multi-decoder support * ggml : fix ggml_nbytes (probably temp solution) * metal : run "cross" step on the GPU * whisper : remove ggml_repeat in the encoder * whisper : offload the Encoder to Metal * ggml : use simpler ggml_bytes() implementation * ggml-alloc : try to make CI happy by reducing vram to 128GB * whisper : add whisper_allocr to wrap ggml_allocr * whisper : factor out alloc init in a function * cmake : update to support Metal build * whisper : add <functional> header * objc : fix build (no Metal yet) * ios : add Metal support * swiftui : fix build * metal : speed-up KQ multiplication * metal : sync latest llama.cpp kernels * readme : add Metal info * ios : update submodule * coreml : add code to toggle Core ML config (CPU, ANE, GPU) * bench : fix timings by running a pre-heat * bench : start benching the decoder * whisper : add ggml_mul_mat_pad * bench : fix uninitialized vars * whisper : add comment for disabling mul-mat padding * whisper : add description of ggml_mul_mat_pad * whisper : clean-up ggml_mul_mat_pad * metal : remove the "concurrent" flag * bench : variable n_past * ios : update SPM package	2023-09-15 12:18:18 +03:00
Georgi Gerganov	3fec2119e6	whisper : fix bench regression + fix performance when using CPU BLAS (#1275 ) * whisper : fix bench regression * ggml : use sched_yield when using BLAS + add comment	2023-09-12 13:54:04 +03:00
Georgi Gerganov	b39809668a	sync : ggml (HBM + Metal + style) (#1264 )	2023-09-08 17:58:31 +03:00
Przemysław Pawełczyk	b55b505690	build : do not use _GNU_SOURCE gratuitously (#1129 ) * Do not use _GNU_SOURCE gratuitously. What is needed to build whisper.cpp and examples is availability of stuff defined in The Open Group Base Specifications Issue 6 (https://pubs.opengroup.org/onlinepubs/009695399/) known also as Single Unix Specification v3 (SUSv3) or POSIX.1-2001 + XSI extensions, plus some stuff from BSD that is not specified in POSIX.1. Well, that was true until NUMA support was added recently in ggml, so enable GNU libc extensions for Linux builds to cover that. There is no need to penalize musl libc which simply follows standards. Not having feature test macros in source code gives greater flexibility to those wanting to reuse it in 3rd party app, as they can build it with minimal FTM (_XOPEN_SOURCE=600) or other FTM depending on their needs. It builds without issues in Alpine (musl libc), Ubuntu (glibc), MSYS2. * examples : include SDL headers before other headers Avoid macOS build error when _DARWIN_C_SOURCE is not defined, brought by SDL2 relying on Darwin extension memset_pattern4/8/16 (from string.h). * make : enable BSD extensions for DragonFlyBSD to expose RLIMIT_MEMLOCK * make : use BSD-specific FTMs to enable alloca on BSDs * make : fix OpenBSD build by exposing newer POSIX definitions * cmake : follow recent FTM improvements from Makefile	2023-09-07 12:36:14 +03:00
Przemysław Pawełczyk	ace6c12ec6	ggml : posixify pagesize (#1251 ) * ggml : use sysconf(_SC_PAGESIZE) instead of getpagesize() derived from BSD sed -i 's,getpagesize(),sysconf(_SC_PAGESIZE),g' ggml.c * metal : use sysconf(_SC_PAGESIZE) instead of getpagesize() derived from BSD sed -i 's,getpagesize(),sysconf(_SC_PAGESIZE),g' ggml-metal.m	2023-09-06 18:19:36 +03:00
Georgi Gerganov	c3f319d7c2	ggml : sync latest llama.cpp (view_src + alloc improvements) (#1247 ) * ggml : sync latest llama.cpp (view_src + alloc improvements) * ggml : fix build	2023-09-05 20:57:27 +03:00
Georgi Gerganov	59a3d0cb57	ggml : sync (ggml-alloc, GPU, eps, etc.) (#1220 ) * ggml : sync (ggml-alloc, GPU, eps, etc.) * ggml : fix build * wasm : fix build	2023-09-05 13:54:40 +03:00
ChangSeok Oh	8e30bf3c02	ggml : fix compilation errors incurred by -Werror (#1227 ) The -Werror warning option turns all warnings into errors. This PR makes the compiler happy to build ggml.c and whisper.cpp with the stricter option.	2023-08-30 22:09:15 +03:00
Przemysław Pawełczyk	25466aa1c3	ggml : fix compiling when SSE3 is available but not SSSE3 (#1210 ) It got broken in commit `3998465721`.	2023-08-27 21:37:31 +03:00
Przemysław Pawełczyk	601c2d2181	ggml : detect SSSE3 (#1211 ) * ggml : add ggml_cpu_has_ssse3 * whisper : show SSSE3 in system info * make : detect SSSE3 via cpuinfo	2023-08-27 21:36:41 +03:00
alonfaraj	3998465721	ci : more platforms coverage (#1101 ) * add multi platform * add image name * fix * fix /bin/sh path * add missing \ * add all platforms for check * remove platforms * remove s390x * - add arm v6 - format run cmd * remove arm v6 * - bump checkout to v3 - use setup emsdk action - add arch to all ubuntu jobs * mymindstorm/setup-emsdk to v12 * add missing QEMU step * add fail-fast: false for debug * add freebsd * remark all jobs except freebsd for test * add sudo * enable all tests again * format * check __AVX__ support before include immintrin.h * try auto detect flag by cmake * fix check for immintrin.h * fix include check for immintrin.h * Remove all platforms for sanitizer build except amd64 We have no clue why they failed. --------- Co-authored-by: Alon Faraj <alon.faraj@mapcore.com>	2023-07-16 23:00:34 +03:00
Georgi Gerganov	8ba42095c5	Revert "ggml : do not use _GNU_SOURCE gratuitously (#1027 )" This reverts commit `3f7a03ebe3`.	2023-07-02 21:53:52 +03:00
Georgi Gerganov	d6509bf78d	ggml : sync latest repo (mostly refactoring changes)	2023-07-02 21:46:09 +03:00
Przemysław Pawełczyk	3f7a03ebe3	ggml : do not use _GNU_SOURCE gratuitously (#1027 ) * Do not use _GNU_SOURCE gratuitously. What is needed to build whisper.cpp and examples is availability of stuff defined in The Open Group Base Specifications Issue 6 (https://pubs.opengroup.org/onlinepubs/009695399/) known also as Single Unix Specification v3 (SUSv3) or POSIX.1-2001 + XSI extensions. There is no need to penalize musl libc which simply follows standards. Not having feature test macros in source code gives greater flexibility to those wanting to reuse it in 3rd party app, as they can build it with minimal FTM (_XOPEN_SOURCE=600) or other FTM depending on their needs. It builds without issues in Alpine (musl libc), Ubuntu (glibc), MSYS2. * examples : include SDL headers before other headers This is an attempt at fixing macOS build error coming from SDL2 relying on Darwin extension memset_pattern4/8/16 coming from Apple's string.h.	2023-06-25 16:34:30 +03:00
Georgi Gerganov	5feb0dffba	ggml : sync latest ggml lib	2023-06-25 14:30:44 +03:00
Georgi Gerganov	429b9785c0	ggml : update WASM SIMD	2023-05-20 20:00:06 +03:00
Georgi Gerganov	e410cfc3ce	ggml : sync latest ggml repo - new Q4 and Q8 quantization - updated CUDA	2023-05-20 18:56:30 +03:00
Georgi Gerganov	aaf0d41c7c	ggml : add AVX dot products	2023-05-14 18:56:46 +03:00
Georgi Gerganov	e693074aa6	ggml : sync latest ggml - New Q4 and Q5 formats - Various improvements	2023-05-14 18:04:23 +03:00
Georgi Gerganov	5974c8facd	ggml : fix 32-bit ARM build + quantization	2023-05-02 21:52:26 +03:00
Georgi Gerganov	0bcb64b184	ggml : sync ggml (clBLAST + tensor names)	2023-05-02 21:24:18 +03:00
Georgi Gerganov	feac80dd3f	ggml : fix UB (int << 31)	2023-04-30 22:27:30 +03:00
Georgi Gerganov	794b162a46	whisper : add integer quantization support (#540 ) * whisper : add integer quantization support * examples : add common-ggml + prepare to add "quantize" tool * whisper : quantization tool ready * whisper : fix F32 support * whisper : try to fix shared lib linkage * wasm : update quantized models to Q5 * bench.wasm : remove "medium" button * bench.wasm : fix custom model button * ggml : add Q5_0 and Q5_1 WASM SIMD * wasm : add quantized models to all WASM examples * wasm : bump DB version number to 2 * talk-llama : update example to latest llama.cpp * node : increase test timeout to 10s * readme : add information for model quantization * wasm : add links to other examples	2023-04-30 18:51:57 +03:00
Georgi Gerganov	0ccd6746c9	ggml : fix WASM build	2023-04-29 21:37:23 +03:00
Georgi Gerganov	d9b550c0a1	ggml : fix 32-bit ARM NEON (#836 ) * ggml : add support for 32-bit ARM * ggml : fix * ggml : fix	2023-04-29 21:33:33 +03:00
Georgi Gerganov	e9b091c92a	ggml : use vzip instead of vuzp for consistency	2023-04-29 21:14:09 +03:00
Georgi Gerganov	1f30b99208	ggml : fix WASM build	2023-04-29 20:21:25 +03:00
Georgi Gerganov	05c3ea3bc8	ggml : sync with ggml repo (warning fixes + asserts)	2023-04-29 19:33:28 +03:00
Georgi Gerganov	acec73ab6e	ggml : sync latest ggml + llama.cpp updates (quantization)	2023-04-29 12:32:28 +03:00
Jhen-Jie Hong	ea1f8a50d4	ggml, ci : fix build on whisper.android (ARM_NEON) + add CI (#764 ) * ggml : fix undefined symbol by remove inline handle * ggml : make own ggml_aligned_malloc function * ci: add ios/android build	2023-04-15 14:21:58 +03:00
Georgi Gerganov	677ad754a0	ggml : sync latest ggml	2023-04-14 19:20:39 +03:00
novag	463e46338c	ggml : fix q4_1 dot product types (#759 ) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-04-14 13:34:20 +03:00

1 2 3

125 Commits