whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2024-12-29 16:58:51 +00:00

Author	SHA1	Message	Date
arizhih	7ae885c1ef	whisper : fix DTW assert (#2299 )	2024-07-15 15:50:36 +03:00
Georgi Gerganov	d207c68822	cmake : use WHISPER_EXTRA_FLAGS (#2294 )	2024-07-09 18:54:18 +03:00
Borislav Stanimirov	16d72504fe	cmake : allow external ggml	2024-07-09 11:38:15 +03:00
Georgi Gerganov	1c31f9d4a8	cmake : try to fix openvino build (#2281 )	2024-07-08 15:36:51 +03:00
Georgi Gerganov	8ecb2f1f68	cmake : remove install of llama convert script [no ci] (#2266 )	2024-07-08 14:53:55 +03:00
Georgi Gerganov	5226c3d45c	make : remove llama prints [no ci] (#2265 )	2024-07-08 14:53:55 +03:00
Georgi Gerganov	dbf9c15e30	talk-llama : sync llama.cpp	2024-07-08 14:53:55 +03:00
Georgi Gerganov	d3f6c34976	examples : fix compile warnings [no ci] (#0 )	2024-07-08 14:53:55 +03:00
Georgi Gerganov	425e2910a3	sync : ggml	2024-07-08 14:53:55 +03:00
Georgi Gerganov	49868aa851	ggml : sync sycl (skip) (#0 )	2024-07-08 14:53:55 +03:00
Georgi Gerganov	ff08e30ab5	scripts : fix sync scripts	2024-07-08 14:53:55 +03:00
Daniel Bevenius	95f2a191c0	ggml : remove unnecessary UNUSED macro call (ggml/880) This commit removes an UNUSED macro call that is not needed as the variable n0 is used in the code and will not produce a warning. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-07-08 14:53:55 +03:00
Natsu	00422ec3cf	cmake : add GGML_BUILD and GGML_SHARED macro definitions (llama/8281)	2024-07-08 14:53:55 +03:00
Ouadie EL FAROUKI	c5b05321e9	Enabled more data types for oneMKL gemm_batch (llama/8236)	2024-07-08 14:53:55 +03:00
Johannes Gäßler	5dc636a65a	CUDA: MMQ support for iq4_nl, iq4_xs (llama/8278)	2024-07-08 14:53:55 +03:00
Daniele	73703a144f	CUDA: revert part of the RDNA1 optimizations (llama/8309) The change on the launch_bounds was causing a small performance drop in perplexity of 25 t/s	2024-07-08 14:53:55 +03:00
Johannes Gäßler	e89fdceec2	CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 (llama/8311)	2024-07-08 14:53:55 +03:00
luoyu-intel	29a2739d27	Fix WARP_SIZE=16 bug of Intel GPU (llama/8266) * fix group_norm ut * split softmax * fix softmax * add concat support condition * revert debug code * move QK_WARP_SIZE to presets.hpp	2024-07-08 14:53:55 +03:00
Neo Zhang Jianyu	ee6d17f6b4	rm get_work_group_size() by local cache for performance (llama/8286) Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>	2024-07-08 14:53:55 +03:00
Daniele	95e90823d9	Define and optimize RDNA1 (llama/8085)	2024-07-08 14:53:55 +03:00
Judd	005cc45df3	fix typo (llama/8267) Co-authored-by: Judd <foldl@boxvest.com>	2024-07-08 14:53:55 +03:00
Clint Herron	c2c60dc9ba	Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (llama/8258)	2024-07-08 14:53:55 +03:00
slaren	4af3194b7c	cuda : update supports_op for matrix multiplication (llama/8245)	2024-07-08 14:53:55 +03:00
luoyu-intel	4a2ba1a065	Fix win build conflict of math library (llama/8230) * fix win build conflict of math library * fix the condition: !(win32 & SYCL) * revert warp_size=16	2024-07-08 14:53:55 +03:00
luoyu-intel	f096cc6807	Fix the sub group size of Intel (llama/8106) * use warp_size macro for all sycl kernels * fix mask of permute_sub_group_by_xor * fix rms_norm with correct warp number * fix rms_norm_f32/group_norm_f32 * move norm to norm.cpp file * fix quantize bug * fix mmvq's batch size	2024-07-08 14:53:55 +03:00
Johannes Gäßler	e4bc83ab47	CUDA: refactor and optimize IQ MMVQ (llama/8215) * CUDA: refactor and optimize IQ MMVQ * uint -> uint32_t * __dp4a -> ggml_cuda_dp4a * remove MIN_CC_DP4A checks * change default * try CI fix	2024-07-08 14:53:55 +03:00
zhentaoyu	db7e0dbe6e	Update SYCL-Rope op and Refactor (llama/8157) * align with rope.cu and move sycl-op to a single file	2024-07-08 14:53:55 +03:00
Johannes Gäßler	bf88c94da9	CUDA: fix MMQ stream-k for --split-mode row (llama/8167)	2024-07-08 14:53:55 +03:00
John Balis	3eea171cab	feat: cuda implementation for `ggml_conv_transpose_1d` (ggml/854) * conv transpose 1d passing test for 1d input and kernel * working for different input and output channel counts, added test for variable stride * initial draft appears to work with stride other than 1 * working with all old and new conv1d tests * added a test for large tensors * removed use cuda hardcoding * restored test-conv-transpose.c * removed unused arugments, and fixed bug where test failure would cause subsequent tests to fail * fixed accumulator bug * added test to test-backend-ops * fixed mistake * addressed review * fixed includes * removed blank lines * style and warning fixes * return failure when test fails * fix supports_op --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-07-08 14:53:55 +03:00
Georgi Gerganov	64a56ebf13	ci : disable java build	2024-07-08 14:26:59 +03:00
Emmanuel Schmidbauer	bec9836849	server : add inference path to make OAI API compatible (#2270 )	2024-07-08 14:24:58 +03:00
Georgi Gerganov	c118733a29	sync : ggml + fix sync script	2024-06-26 23:20:19 +03:00
Georgi Gerganov	bb3dd45524	make : disable CUDA graphs	2024-06-26 23:20:13 +03:00
slaren	04e7fa6f4f	ggml : add GGML_CUDA_USE_GRAPHS option, restore GGML_CUDA_FORCE_CUBLAS (cmake) (llama/8140)	2024-06-26 23:18:11 +03:00
Georgi Gerganov	9f7f36d4c9	make : disable CUDA mel build	2024-06-26 22:25:25 +03:00
Georgi Gerganov	4a62efbb95	cmake : minor fixes	2024-06-26 21:42:39 +03:00
Georgi Gerganov	0a55a70b9b	make : fix missing -O3 same as https://github.com/ggerganov/llama.cpp/pull/8143	2024-06-26 21:21:12 +03:00
Georgi Gerganov	dc8cc2dd6f	whisper : disable CUDA mel + fix FFMPEG	2024-06-26 20:11:38 +03:00
Georgi Gerganov	3efedb9511	sync : ggml	2024-06-26 19:40:23 +03:00
Georgi Gerganov	e30c679928	whisper : reorganize source code + improve CMake (#2256 ) * scripts : update sync [no ci] * files : reorganize [no ci] * sync : llama.cpp * cmake : link math library * cmake : build normal ggml library * files : move headers to include * objc : fix path to ggml-metal.h * ci : fix WHISPER_CUDA -> GGML_CUDA * scripts : sync LICENSE [no ci]	2024-06-26 19:34:09 +03:00
mky_coder	bf4cb4abad	whisper : optimize fft() function (#2242 ) Co-authored-by: Mike Fan <60965742+mike-fzy@users.noreply.github.com>	2024-06-18 18:10:33 +03:00
Georgi Gerganov	e293f17d34	talk-llama : sync llama.cpp	2024-06-18 09:45:37 +03:00
Georgi Gerganov	5d950c4b8d	whisper : use ggml_backend_sched (#2239 ) * whisper : use ggml_backend_sched (wip) * use sched in whisper_allocr * whisper : single backend in whisper_context * whisper : remove whisper_state->backends_used * whisper : remove whisper_context->backend * whisper : reset scheduler after init * whisper : fix external encoder (e.g. CoreML) * whisper : cleanup * whisper : handle null GPU buffer types + fix sycl --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-06-18 09:39:40 +03:00
Georgi Gerganov	820446e230	fix : remove extra files	2024-06-18 09:39:40 +03:00
Georgi Gerganov	54d5823ebe	scripts : sync ggml-blas	2024-06-18 09:39:40 +03:00
Georgi Gerganov	5181494e9f	build : update make / cmake	2024-06-18 09:39:40 +03:00
Georgi Gerganov	4a6e6e8b30	sync : ggml	2024-06-18 09:39:40 +03:00
slaren	de29b193f6	move BLAS to a separate backend (cont) (llama/6210) ggml-ci	2024-06-18 09:39:40 +03:00
0cc4m	922971041b	Vulkan Shader Refactor, Memory Debugging Option (llama/7947) * Refactor shaders, extract GLSL code from ggml_vk_generate_shaders.py into vulkan-shaders directory * Improve debug log code * Add memory debug output option * Fix flake8 * Fix unnecessary high llama-3 VRAM use	2024-06-18 09:39:40 +03:00
Georgi Gerganov	63a767a134	scripts : stop sync whisper example from ggml	2024-06-18 09:39:40 +03:00

... 5 6 7 8 9 ...

1760 Commits