whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-06-21 16:09:55 +00:00

Author	SHA1	Message	Date
Abhilash Majumder	a0ddd8392c	whisper : add SYCL support (#1863 ) * add changes from llama upstream * add sycl abstraction * add sycl build * update cmake * add sycl build config * fix bug * fix bug * refactor build * fix bug * update build * call build * use sycl header * add examples * add target * fix typecast in quant.c * readd fp16 and readme * fix quant typecast * add sample * add readme * remove cxx file check	2024-02-23 09:22:24 +02:00
Georgi Gerganov	a2506909b1	talk-llama : sync llama.cpp	2024-02-22 23:30:53 +02:00
Georgi Gerganov	7b1ff212d9	sync : ggml	2024-02-22 23:25:38 +02:00
Georgi Gerganov	e5d06cfc0f	ggml : always define ggml_fp16_t as uint16_t (llama/5666) * ggml : always define ggml_fp16_t as uint16_t ggml-ci * ggml : cont ggml-ci * ggml : cont * ggml : cont ggml-ci * ggml : cont ggml-ci * cuda : no longer ggml headers last ggml-ci * ggml : fix q6_K FP16 -> FP32 conversion ggml-ci * ggml : more FP16 -> FP32 conversion fixes ggml-ci	2024-02-22 23:25:33 +02:00
Georgi Gerganov	31891db2e3	ci : fix whitespace	2024-02-22 20:20:34 +02:00
Georgi Gerganov	5fdb27ff80	ggml : 32-bit arm compat (#1891 ) * ggml : 32-bit arm compat * ggml : add ggml_vqtbl1q_s8 impl * ggml : cont	2024-02-22 18:31:40 +02:00
Georgi Gerganov	6b16927d18	sync : ggml	2024-02-22 15:15:38 +02:00
Georgi Gerganov	ce411498f6	sync : llama.cpp (ggml/0) ggml-ci	2024-02-22 15:12:36 +02:00
Meng, Hengyu	208de95ac7	conext add name (llama/5624) * [SYCL] conext add name * name should start with SYCL*	2024-02-22 15:12:36 +02:00
AidanBeltonS	c2ce39c795	Update ggml_sycl_op_mul_mat_vec_q (llama/5502) * Update ggml_sycl_op_mul_mat_vec_q * Apply suggestions from code review Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com> * revert suggestion on macro * fix bug * Add quant type GGML_TYPE_IQ1_S to unsupported * fix format --------- Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>	2024-02-22 15:12:36 +02:00
0cc4m	8daa534818	Refactor validation and enumeration platform checks into functions to clean up ggml_vk_instance_init()	2024-02-22 15:12:36 +02:00
0cc4m	9fca69b410	Add check for VK_KHR_portability_enumeration for MoltenVK support	2024-02-22 15:12:36 +02:00
Mathijs de Bruin	b26c645420	Add preprocessor checks for Apple devices. Based on work by @rbourgeat in https://github.com/ggerganov/llama.cpp/pull/5322/files	2024-02-22 15:12:36 +02:00
Mathijs de Bruin	1879ec556e	Resolve ErrorIncompatibleDriver with Vulkan on MacOS. Refs: - https://chat.openai.com/share/7020ce72-65fc-45ec-b7be-9d9d798a5f3f - https://github.com/SaschaWillems/Vulkan/issues/954 - https://github.com/haasn/libplacebo/issues/128 - https://github.com/KhronosGroup/Vulkan-Samples/issues/476	2024-02-22 15:12:35 +02:00
Mathijs de Bruin	c6e53cfc46	Allow for Vulkan build with Accelerate. Closes #5304	2024-02-22 15:12:35 +02:00
slaren	b19f2fb815	cuda : ignore peer access already enabled errors (llama/5597) * cuda : ignore peer access already enabled errors * fix hip	2024-02-22 15:12:35 +02:00
Siddharth Ramakrishnan	a6b0950916	ggml : compute forward no longer pass src tensors (ggml/729) * refactored compute forward to not pass in the src tensors each time * fix merge issues with flags * missed one place in the last commit to fix the is_param / flags issue * minor spacing fix * fixed some variable assignments so all tests locally are passing * new change after merge fix --------- Co-authored-by: siddharthvader <siddharth@coinlist.co>	2024-02-22 15:12:35 +02:00
bssrdf	d352dbd163	ggml : fix conv_2d batch mode (ggml/737) Co-authored-by: bssrdf <bssrdf@gmail.com>	2024-02-22 15:12:32 +02:00
st-gr	eb23f4ef16	openvino : fix convert-whisper-to-openvino.py (#1890 ) Fix issue: Conversion from Whisper to OpenVino failed #1870 convert-whisper-to-openvino.py stopped working with OpenVINO version 2023.0.0-10926-b4452d56304-releases/2023/0 . Error was: TypeError: load(): incompatible function arguments. The following argument types are supported: 1. (self: openvino._pyopenvino.FrontEnd, path: object) -> ov::frontend::InputModel Tested successfully with a large-v3 conversion. Co-authored-by: Stefan Grundmann <grundmanns@sandiego.gov>	2024-02-22 15:11:35 +02:00
Davidson Francis	c56344b509	main : fix file existence check in main.cpp (#1889 ) In commit `dda4b0e` of PR #1872, I've introduced a check for the existence of files before loading the model. However, I haven't considered the case where whisper.cpp might read from stdin as well, and in such cases, the checks should ignore the "-" argument as it does not represent a regular file. Additionally, this commit removes the usage of 'stat()' in favor of the recently introduced function 'is_file_exist()' in common.cpp from PR #1871. Apologies for the bug introduced in the previous PR and any inconvenience it may have caused.	2024-02-22 15:01:08 +02:00
Georgi Gerganov	59119f4f20	talk-llama : sync llama.cpp	2024-02-20 12:09:57 +02:00
LBlue	276615d708	make : fix CUBLAS link with WSL (#1878 )	2024-02-20 12:05:38 +02:00
Georgi Gerganov	b602819b6e	sync : ggml	2024-02-19 15:54:25 +02:00
Georgi Gerganov	c2c606f05b	ggml : resolve merge conflicts (ggml/0) ggml-ci	2024-02-19 15:53:25 +02:00
Georgi Gerganov	83afebe872	common : add IQ1_S (ggml/0) ggml-ci	2024-02-19 15:53:25 +02:00
Georgi Gerganov	a4d8f9d559	ci : enable -Werror for CUDA builds (llama/5579) * cmake : pass -Werror through -Xcompiler ggml-ci * make, cmake : enable CUDA errors on warnings ggml-ci	2024-02-19 15:53:24 +02:00
slaren	5ec1e0edfa	cuda, metal : fix nans in soft_max (llama/5574) * cuda : fix nans in soft_max * metal : fix nans in soft_max --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-02-19 15:53:24 +02:00
bmwl	30a11b1ab8	ggml : android and old glibc NUMA incompatibility bugfixes (llama/5557) * #ifdef out some code NUMA blocks for Android due to lack of support * added in some __ANDROID__ if def gates around numa code and forced GLIBC prior to 2.29 to use a syscall for getcpu instead of the wrapper * Changed gates on numa platform specific stuff to __gnu_linux__ to skip any platforms without glibc * harmonizing #if defined blocks for numa code to __gnu_linux__ since that's the only model that's being followed anyways --------- Co-authored-by: root <root@nenya.lothlorien.ca>	2024-02-19 15:53:24 +02:00
Georgi Gerganov	f04e6b87d7	ggml : restore vec dot stride arg names (llama/5453)	2024-02-19 15:53:24 +02:00
Georgi Gerganov	0c33928b55	ci : fix wikitext url + compile warnings (llama/5569) ggml-ci	2024-02-19 15:53:24 +02:00
Georgi Gerganov	0775374750	metal : fix unused warnings (llama/0)	2024-02-19 15:53:24 +02:00
Herman Semenov	7d90bb035b	ggml, common, examples, tests : fixed type arguments in printf (llama/5528)	2024-02-19 15:53:24 +02:00
Kawrakow	2c1ad21ba8	1.5 bit quantization (llama/5453) * iq1_s: WIP basics * iq1_s: CUDA is working * iq1_s: scalar CPU dot product * iq1_s: WIP AVX2 dot product - something is not right * Fix tests * Fix shadow warnings * Fix after merge with latest master * iq1_s: AVX2 finally works * iq1_s: ARM_NEON dot product. Works, but not very fast * iq1_s: better grid * iq1_s: use IQ2_XXS for attn_output At a cost of 0.04 extra bpw this gives a big improvement in PPL. * iq1_s: Metal basics Dequantize works, but not dot product * iq1_s: Metal works, but quite slow As usual, Apple Silicon does not like the code I write. * iq1_s: Tests * iq1_s: slightly faster dot product --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-02-19 15:53:23 +02:00
Georgi Gerganov	eca5ff9868	ggml : add ALiBi support for ggml_soft_max_ext (llama/5488)	2024-02-19 15:53:23 +02:00
Ananta Bastola	1b25d2fa0a	ci : add an option to fail on compile warning (llama/3952) * feat(ci): add an option to fail on compile warning * Update CMakeLists.txt * minor : fix compile warnings ggml-ci * ggml : fix unreachable code warnings ggml-ci * ci : disable fatal warnings for windows, ios and tvos * ggml : fix strncpy warning * ci : disable fatal warnings for MPI build * ci : add fatal warnings to ggml-ci ggml-ci --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-02-19 15:53:23 +02:00
Georgi Gerganov	74a6acc999	cmake : fix VULKAN and ROCm builds (llama/5525) * cmake : fix VULKAN and ROCm builds * cmake : fix (cont) * vulkan : fix compile warnings ggml-ci * cmake : fix ggml-ci * cmake : minor ggml-ci	2024-02-19 15:53:23 +02:00
bmwl	a4ed8a0821	ggml : add numa options (llama/5377) * Added numa options to allow finer grained control as well as plumbing for a new mirror mode that will require numa.h * Reverted Makefile * Fixed include * Removed sched.h from ggml.h, moved ggml_get_numa_affinity into ggml.c, removed trailing whitespace and fixed up a few inconsistent variables * removed trailing whitespace * Added numa options to allow finer grained control as well as plumbing for a new mirror mode that will require numa.h * Reverting Makefile * Fixed a number of issues with the move from BOOL to ggml_numa_strategies. Added a note about mirror mode note being implemented yet * Removing MIRROR_MODE code for this PR * Removing last bit of MIRROR_MODE code for this PR * Removing unneeded branch in server.cpp example and moving get_numa_affinity and making it static * Fixed lingering init_llama_backend() bool calls in tests and examples * Remote enum llama_numa_strategies * Revert bad merge with dynatemp flags * add missing enum ggml_numa_strategies declaration and revert sync problem with master * add missing enum ggml_numa_strategies declaration * fixed ggml_init_numa variable * Update ggml.h Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Update READMEs with info about numa flags, change INTERLEAVE strategy name to DISTRIBUTE everywhere, implement the improved distribution strategy from @rankaiyx, fix a spelling mistake and un-merge some bad merges * split numa init out from llama_backend_init and created llama_numa_init. Updated all code paths and samples * Fix up some boolean vs enum comparisons * Added #ifdefs for non-Linux OS that don't have cpu_set_t datatype * Update ggml.h Align enum values Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml.c Remove whitespace Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml.c align paremeters Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update examples/server/server.cpp remove whitespace and align brace Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update common/common.cpp Remove whitespace and align brace Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * unified ggml_numa_strategy enum and fixed text alignment in server.cpp example * Update ggml.c simplified return for platforms without NUMA support Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * removed redundant else from cli argument processing of --numa * whitespace --------- Co-authored-by: root <root@nenya.lothlorien.ca> Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Jared Van Bortel <jared@nomic.ai>	2024-02-19 15:53:23 +02:00
slaren	9f675e021c	cuda : print message when initialization fails (llama/5512) * cuda : print message when initialization fails * use CUDA_NAME both times	2024-02-19 15:53:23 +02:00
Neuman Vong	a38efcb9fd	vulkan: Find optimal memory type but with fallback (llama/5381) * @0cc4m feedback * More feedback @0cc4m	2024-02-19 15:53:22 +02:00
AT	31591649a0	Early return for zero size calls to get_tensor. (llama/5482) * Early return for zero size calls to get_tensor. Signed-off-by: Adam Treat <treat.adam@gmail.com> * Update ggml-kompute.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml-kompute.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Add an early return to the get/set tensor when the size is null. Signed-off-by: Adam Treat <treat.adam@gmail.com> * Early return after the assertions. Signed-off-by: Adam Treat <treat.adam@gmail.com> * Since we do the early return in the generic backend now no reason to do so here as well. Signed-off-by: Adam Treat <treat.adam@gmail.com> --------- Signed-off-by: Adam Treat <treat.adam@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-02-19 15:53:22 +02:00
Kawrakow	4f5c46a84f	ggml-quants : fix compiler warnings (shadow variable) (llama/5472) Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-02-19 15:53:22 +02:00
Abhilash Majumder	462ffc58db	ggml-sycl: Replace 3d ops with macro (llama/5458) * use macro * use macro * fix format	2024-02-19 15:53:21 +02:00
Georgi Gerganov	65faae0b6a	build : update CBLAS flags + fix unused var warning (#0 )	2024-02-19 14:44:46 +02:00
Davidson Francis	dda4b0ed06	main : check if input files exist before proceeding (#1872 ) Until the most recent commit (`3d42463`), the main.cpp sample file does not check whether the input files exist or not. Consequently, the model is loaded first before reporting whether there was a failure or not when processing a file. In environments with HDD, this can take about 50 seconds or more, depending on the loaded model. This commit addresses this issue by checking in advance whether the input files exist or not.	2024-02-19 10:51:26 +02:00
Felix	07d04280be	examples : clean up common code (#1871 ) move some utility functions into common.h	2024-02-19 10:50:15 +02:00
Jumper775	917c56ded4	models : fix openvino setup info (#1874 )	2024-02-19 02:19:47 +00:00
Georgi Gerganov	3d42463845	models : add update py requirements	2024-02-13 11:51:32 +02:00
Georgi Gerganov	3ffc83d90a	swift : package no longer use ggml dependency (#1861 ) * Revert "swift : update Package.swift to use ggml as package dependency (#1701)" This reverts commit `993acb5d41`. * spm : add ggml.h	2024-02-12 19:54:11 +02:00
Georgi Gerganov	e3c5e2cba8	whisper : fix external encoder (#1860 )	2024-02-12 19:53:51 +02:00
Georgi Gerganov	b742f13e70	sync : ggml	2024-02-12 19:07:56 +02:00

... 26 27 28 29 30 ...

2424 Commits