whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2024-12-21 05:33:06 +00:00

Author	SHA1	Message	Date
Daniel Bevenius	84493d7f3e	cuda : suppress 'noreturn' warn in no_device_code (llama/8414) * cuda : suppress 'noreturn' warn in no_device_code This commit adds a while(true) loop to the no_device_code function in common.cuh. This is done to suppress the warning: ```console /src/ggml-cuda/template-instances/../common.cuh:346:1: warning: function declared 'noreturn' should not return [-Winvalid-noreturn] 346 \| } \| ^ ``` The motivation for this is to reduce the number of warnings when compilng with GGML_HIPBLAS=ON. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! cuda : suppress 'noreturn' warn in no_device_code Update __trap macro instead of using a while loop to suppress the warning. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> --------- Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-08-08 22:48:46 +03:00
Johannes Gäßler	15d71189e9	CUDA: optimize and refactor MMQ (llama/8416) * CUDA: optimize and refactor MMQ * explicit q8_1 memory layouts, add documentation	2024-08-08 22:48:46 +03:00
AidanBeltonS	37e962580f	Use multi_ptr to clean up deprecated warnings (llama/8256)	2024-08-08 22:48:46 +03:00
Georgi Gerganov	db0ea7a2f2	ggml : move sgemm sources to llamafile subfolder (llama/8394) ggml-ci	2024-08-08 22:48:46 +03:00
Dibakar Gope	5498b0e6c0	ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (llama/5780) * Arm AArch64: optimized GEMV and GEMM kernels for q4_0_q8_0, and q8_0_q8_0 quantization * Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions * Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions * Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions * Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions * Arm AArch64: add copyright claim only to ggml-aarch64.cpp and ggml-aarch64.h files * Arm AArch64: minor code refactoring for rebase * Arm AArch64: minor code refactoring for resolving a build issue with cmake * Arm AArch64: minor code refactoring to split the Q4_0_AARC64 type into three separate types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8 * Arm AArch64: minor code change for resolving a build issue with server-windows * retrigger checks * Arm AArch64: minor code changes for rebase * Arm AArch64: minor changes to skip the pr#7433 vec_dot code for arm cpus with SVE VL not equal to 256 bits * Arm AArch64: remove stale LLAMA_QKK_64 from CMakeLists.txt and delete build.zig * Arm AArch64: add reference scalar gemm and gemv, and avoid dynamic memory allocations during quantization for Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8 * Arm AArch64: add multithreaded quantization support for the new types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8 * Arm AArch64: minor code refactoring * Arm AArch64: simplify logic for calling gemm and gemv functions in ggml_compute_forward_mul_mat * Arm AArch64: minimize changes in ggml_compute_forward_mul_mat * Arm AArch64: minor code refactoring, and add reference scalar code to quantize routines for new quant types * Arm AArch64: minor code refactoring * Arm AArch64: minor code refactoring * Arm AArch64: minor code refactoring * rebase on the latest master commit 3fd62a6 and adapt to the new directory structure * Arm AArch64: remove a redundant comment * Arm AArch64: add pragma in ggml-aarch64.c to turn -Woverlength-strings warning off * Arm AArch64: use __aarch64__ check to guard 64-bit neon kernels * Arm AArch64: update docs/build.md README to include compile time flags for buiilding the Q4_0_4_4 quant type	2024-08-08 22:48:46 +03:00
Alberto Cabrera Pérez	2af4a52c39	sycl : Reenabled mmvq path for the SYCL Nvidia Backend (llama/8372) * SYCL : Reenabled mmvq path for the SYCL Nvidia Backend * Reduced verbosity of comment	2024-08-08 22:48:46 +03:00
Alberto Cabrera Pérez	eee2fe882e	sycl : fix powf call in device code (llama/8368)	2024-08-08 22:48:46 +03:00
Mahesh Madhav	0d1a11e5e2	ggml : loop tiling optimizations for scalar path (ggml/898) Apply a loop tiling technique to the generic path, which provides performance upside for ISAs with enough registers to take advantage of it. Also helps the compiler optimize this path.	2024-08-08 22:48:46 +03:00
Ivan Filipov	b2ead7d6f4	ggml: add support for float16 input tensors in pooling operations (ggml/895) * Add support for float16 tensors in 1d pooling operations * Add support for float16 input tensors in 2d pooling operations * code cleanup remove unnecessary casting during srow ptr initialization --------- Co-authored-by: vanaka11 <vanaka1189@gmail.com>	2024-08-08 22:48:46 +03:00
Tony Wasserka	8da6fd4dff	vulkan : initialize vk_buffer_struct members to VK_NULL_HANDLE (ggml/893) This prevents invalid frees when destroying a partially initialized vk_buffer_struct. For example, this could happen in ggml_vk_create_buffer when running out of device memory. Co-authored-by: Tony Wasserka <neobrain@users.noreply.github.com>	2024-08-08 22:48:46 +03:00
Borislav Stanimirov	ab8ec9e940	cmake : only enable GGML_NATIVE and x86 flags if not crosscompiling (ggml/885)	2024-08-08 22:48:46 +03:00
Matt Stephenson	f68298ce06	whisper : use vulkan as gpu backend when available (#2302 ) * ggml: use vulkan as gpu backend when available Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com> * whisper: enable using vk as default buffer type Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com> --------- Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com>	2024-07-16 10:21:09 +03:00
Georgi Gerganov	49868aa851	ggml : sync sycl (skip) (#0 )	2024-07-08 14:53:55 +03:00
Daniel Bevenius	95f2a191c0	ggml : remove unnecessary UNUSED macro call (ggml/880) This commit removes an UNUSED macro call that is not needed as the variable n0 is used in the code and will not produce a warning. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-07-08 14:53:55 +03:00
Natsu	00422ec3cf	cmake : add GGML_BUILD and GGML_SHARED macro definitions (llama/8281)	2024-07-08 14:53:55 +03:00
Ouadie EL FAROUKI	c5b05321e9	Enabled more data types for oneMKL gemm_batch (llama/8236)	2024-07-08 14:53:55 +03:00
Johannes Gäßler	5dc636a65a	CUDA: MMQ support for iq4_nl, iq4_xs (llama/8278)	2024-07-08 14:53:55 +03:00
Daniele	73703a144f	CUDA: revert part of the RDNA1 optimizations (llama/8309) The change on the launch_bounds was causing a small performance drop in perplexity of 25 t/s	2024-07-08 14:53:55 +03:00
Johannes Gäßler	e89fdceec2	CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 (llama/8311)	2024-07-08 14:53:55 +03:00
luoyu-intel	29a2739d27	Fix WARP_SIZE=16 bug of Intel GPU (llama/8266) * fix group_norm ut * split softmax * fix softmax * add concat support condition * revert debug code * move QK_WARP_SIZE to presets.hpp	2024-07-08 14:53:55 +03:00
Neo Zhang Jianyu	ee6d17f6b4	rm get_work_group_size() by local cache for performance (llama/8286) Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>	2024-07-08 14:53:55 +03:00
Daniele	95e90823d9	Define and optimize RDNA1 (llama/8085)	2024-07-08 14:53:55 +03:00
Judd	005cc45df3	fix typo (llama/8267) Co-authored-by: Judd <foldl@boxvest.com>	2024-07-08 14:53:55 +03:00
Clint Herron	c2c60dc9ba	Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (llama/8258)	2024-07-08 14:53:55 +03:00
slaren	4af3194b7c	cuda : update supports_op for matrix multiplication (llama/8245)	2024-07-08 14:53:55 +03:00
luoyu-intel	4a2ba1a065	Fix win build conflict of math library (llama/8230) * fix win build conflict of math library * fix the condition: !(win32 & SYCL) * revert warp_size=16	2024-07-08 14:53:55 +03:00
luoyu-intel	f096cc6807	Fix the sub group size of Intel (llama/8106) * use warp_size macro for all sycl kernels * fix mask of permute_sub_group_by_xor * fix rms_norm with correct warp number * fix rms_norm_f32/group_norm_f32 * move norm to norm.cpp file * fix quantize bug * fix mmvq's batch size	2024-07-08 14:53:55 +03:00
Johannes Gäßler	e4bc83ab47	CUDA: refactor and optimize IQ MMVQ (llama/8215) * CUDA: refactor and optimize IQ MMVQ * uint -> uint32_t * __dp4a -> ggml_cuda_dp4a * remove MIN_CC_DP4A checks * change default * try CI fix	2024-07-08 14:53:55 +03:00
zhentaoyu	db7e0dbe6e	Update SYCL-Rope op and Refactor (llama/8157) * align with rope.cu and move sycl-op to a single file	2024-07-08 14:53:55 +03:00
Johannes Gäßler	bf88c94da9	CUDA: fix MMQ stream-k for --split-mode row (llama/8167)	2024-07-08 14:53:55 +03:00
John Balis	3eea171cab	feat: cuda implementation for `ggml_conv_transpose_1d` (ggml/854) * conv transpose 1d passing test for 1d input and kernel * working for different input and output channel counts, added test for variable stride * initial draft appears to work with stride other than 1 * working with all old and new conv1d tests * added a test for large tensors * removed use cuda hardcoding * restored test-conv-transpose.c * removed unused arugments, and fixed bug where test failure would cause subsequent tests to fail * fixed accumulator bug * added test to test-backend-ops * fixed mistake * addressed review * fixed includes * removed blank lines * style and warning fixes * return failure when test fails * fix supports_op --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-07-08 14:53:55 +03:00
slaren	04e7fa6f4f	ggml : add GGML_CUDA_USE_GRAPHS option, restore GGML_CUDA_FORCE_CUBLAS (cmake) (llama/8140)	2024-06-26 23:18:11 +03:00
Georgi Gerganov	e30c679928	whisper : reorganize source code + improve CMake (#2256 ) * scripts : update sync [no ci] * files : reorganize [no ci] * sync : llama.cpp * cmake : link math library * cmake : build normal ggml library * files : move headers to include * objc : fix path to ggml-metal.h * ci : fix WHISPER_CUDA -> GGML_CUDA * scripts : sync LICENSE [no ci]	2024-06-26 19:34:09 +03:00

... 3 4 5 6 7

333 Commits