whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-05-29 21:44:13 +00:00

Author	SHA1	Message	Date
Georgi Gerganov	6eac06759b	ci : disable ruby workflow (#0 )	2024-08-08 22:48:46 +03:00
Georgi Gerganov	2e9a5bd2c4	ci : try to fix FreeBSD (#0 )	2024-08-08 22:48:46 +03:00
Georgi Gerganov	58323bf8ed	build : fix aarch64 (#0 )	2024-08-08 22:48:46 +03:00
Georgi Gerganov	22058f2dbc	talk-llama : sync llama.cpp	2024-08-08 22:48:46 +03:00
Georgi Gerganov	5b7979a1e6	sync : ggml	2024-08-08 22:48:46 +03:00
slaren	ee14c02365	ggml-backend : fix async copy from CPU (llama/8897) * ggml-backend : fix async copy from CPU * cuda : more reliable async copy, fix stream used when the devices are the same	2024-08-08 22:48:46 +03:00
Ouadie EL FAROUKI	ab39dd34e1	Updated SYCL device filtering (llama/8901) * Updated device filter to depend on default_selector (fixes non-intel device issues) * Small related update to example/sycl Readme	2024-08-08 22:48:46 +03:00
Johannes Gäßler	b1348d3530	CUDA/HIP: fix tests/test-backend-ops (llama/8896)	2024-08-08 22:48:46 +03:00
Johannes Gäßler	90641b5cf4	CUDA: fix padding logic for FP16/FP32 (llama/8884)	2024-08-08 22:48:46 +03:00
Molly Sophia	4160b930f1	ggml : add epsilon as a parameter for group_norm (llama/8818) Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2024-08-08 22:48:46 +03:00
Justine Tunney	7a96e661e4	ggml : fix overflows in elu function (llama/8866) It's helpful to use expm1f(x), because expf(x)-1 will result in overflow for 25% of single-precision floating point numbers.	2024-08-08 22:48:46 +03:00
jdomke	a902fb4ab2	ggml : reading the runtime sve config of the cpu (llama/8709) * ggml : reading the runtime sve config of the cpu * change to one time init to prevent performance drop * prefix variable to avoid possible conflicts * revert xxhash fix and add brackets --------- Co-authored-by: domke <673751-domke@users.noreply.gitlab.com>	2024-08-08 22:48:46 +03:00
Sigbjørn Skjæret	6cb38c3673	Fix conversion of unnormalized BF16->BF16 weights (llama/7843) * add truncate_bf16 * truncate intermediate fp32 if converting bf16 to bf16 * fix masking in __compute_fp32_to_bf16 * np.int16 no longer used * missing cast and additional numpy 2.x fix * ggml-impl : do not flush bf16 subnormals to zero * ggml : add reference fp32 to bf16 conversion The fast version is no longer equivalent for all platforms because of the handling of subnormal values. * gguf-py : remove flush to zero for bf16 subnormals * gguf-py : remove float32 truncation to bf16 Rounding achieves the same thing in the cases where this was used. * missed prototype update in merge * merge cleanup --------- Co-authored-by: Francis Couture-Harpin <git@compilade.net>	2024-08-08 22:48:46 +03:00
Ouadie EL FAROUKI	9cf14ebcbc	Fixing wrong VDR iq4nl value (llama/8812)	2024-08-08 22:48:46 +03:00
matteo	8e39ee171f	ggml-cuda: Adding support for unified memory (llama/8035) * Adding support for unified memory * adding again the documentation about unified memory * refactoring: Moved the unified memory code in the correct location. * Fixed compilation error when using hipblas * cleaning up the documentation * Updating the documentation Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * adding one more case where the PR should not be enabled --------- Co-authored-by: matteo serva <matteo.serva@gmail.com> Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2024-08-08 22:48:46 +03:00
Alex O'Connell	d26250f78c	Build: Only include execinfo.h on linux systems that support it (llama/8783) * Only enable backtrace on GLIBC linux systems * fix missing file from copy * use glibc macro instead of defining a custom one	2024-08-08 22:48:46 +03:00
slaren	5218ea21b8	cuda : fix dmmv cols requirement to 2GGML_CUDA_DMMV_X (llama/8800) cuda : fix dmmv cols requirement to 2GGML_CUDA_DMMV_X update asserts * only use dmmv for supported types * add test	2024-08-08 22:48:46 +03:00
l3utterfly	e60be821ce	added android implementation of ggml_print_backtrace_symbols (llama/8751) * added android implementation of ggml_print_backtrace_symbols * Update ggml/src/ggml.c Co-authored-by: slaren <slarengh@gmail.com> * Update ggml/src/ggml.c Co-authored-by: slaren <slarengh@gmail.com> * Update ggml/src/ggml.c Co-authored-by: slaren <slarengh@gmail.com> * Update ggml/src/ggml.c Co-authored-by: slaren <slarengh@gmail.com> * Update ggml/src/ggml.c Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-08-08 22:48:46 +03:00
wangshuai09	19708df884	cann: update cmake (llama/8765)	2024-08-08 22:48:46 +03:00
zhentaoyu	3f190addda	Add `TIMESTEP_EMBEDDING` OP (llama/8707) Signed-off-by: zhentaoyu <zhentao.yu@intel.com>	2024-08-08 22:48:46 +03:00
CarterLi999	b355ee7cfa	ggml: bugfix: fix the inactive elements is agnostic for risc-v vector (llama/8748) In these codes, we want to retain the value that they previously held when mask[i] is false. So we should use undisturbed. With the default agnostic policy of rvv intrinsic, these values can be held or be written with 1s. Co-authored-by: carter.li <carter.li@starfivetech.com>	2024-08-08 22:48:46 +03:00
R0CKSTAR	49ac8872b4	cuda : organize vendor-specific headers into vendors directory (llama/8746) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2024-08-08 22:48:46 +03:00
Meng, Hengyu	8ef98ae7e3	add conv support (llama/8688)	2024-08-08 22:48:46 +03:00
R0CKSTAR	e471adcfa5	feat: Support Moore Threads GPU (llama/8383) * Update doc for MUSA Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Add GGML_MUSA in Makefile Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Add GGML_MUSA in CMake Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * CUDA => MUSA Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * MUSA adds support for __vsubss4 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Fix CI build failure Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2024-08-08 22:48:46 +03:00
Borislav Stanimirov	aa816c922c	ggml : ignore more msvc warnings (ggml/906)	2024-08-08 22:48:46 +03:00
Georgi Gerganov	b3264eb266	metal : fix struct name (ggml/912) ggml-ci	2024-08-08 22:48:46 +03:00
Conrad Kramer	eb2eb87a58	metal : add abort callback (ggml/905)	2024-08-08 22:48:46 +03:00
0cc4m	83fcb0e486	vulkan : implement Stable Diffusion operators (ggml/904) * Fix Vulkan repeat op * Implement Vulkan concat op * Delete old Vulkan shader generator * Implement Vulkan im2col op * Implement Vulkan unary gelu_quick op * Implement Vulkan group_norm op * Implement Vulkan timestep_embedding op * Implement Vulkan upscale op * Fix Vulkan vk_context tensor extra index issue * Fix Vulkan matmul shader parameter bug * Properly fix Vulkan matmul shader parameter bug * Add Vulkan ADD f16 + f32 -> f16 operator support * Implement Vulkan tanh op * Fix Vulkan group count too large Validation error on non-Nvidia GPUs * Throw error when too much memory is requested * Fix another Vulkan group count too large Validation error on non-Nvidia GPUs * Fix matmul MMQ condition * Implement Vulkan pad op * Fix Vulkan crash when tensor is used multiple times in a compute graph * Add Vulkan CONCAT f16 + f16 -> f16 op * Add Vulkan LEAKY_RELU op	2024-08-08 22:48:46 +03:00
Daniel Bevenius	f7bb412878	ggml : move c parameter comment to ggml_rope_ext (ggml/901) This commit moves the comment for the c parameter from ggml_rope to ggml_rope_ext. The comment is currently incorrect as ggml_rope does not have a c parameter (freq_factors tensor). Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-08-08 22:48:46 +03:00
Georgi Gerganov	ef6dcf0d0c	ggml : resolve sync conflicst (ggml/0) ggml-ci	2024-08-08 22:48:46 +03:00
Georgi Gerganov	c7ea4fd235	common : handle new quant types (ggml/0)	2024-08-08 22:48:46 +03:00
Dibakar Gope	525f190917	ggml : add ggml-aarch64 (ggml/0)	2024-08-08 22:48:46 +03:00
slaren	dd916a2852	ggml : reduce hash table reset cost (llama/8698) * ggml : reduce hash table reset cost * fix unreachable code warnings after GGML_ASSERT(false) * GGML_ASSERT(false) -> GGML_ABORT("fatal error") * GGML_ABORT use format string	2024-08-08 22:48:46 +03:00
DavidKorczynski	0620fe00ec	ggml: handle ggml_init failure to fix NULL pointer deref (llama/8692) `ggml_init` can fail if no unused context is found. In that case, a NULL-pointer deref will happen later in the code during a call to `ggml_set_on_alloc`. This fixes it by bailing out if no context is found.	2024-08-08 22:48:46 +03:00
Chen Xi	31d0a9a14f	fix multi-gpu issue on sycl (llama/8554) --------- Signed-off-by: Chen Xi <xi2chen@intel.com> Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com>	2024-08-08 22:48:46 +03:00
Georgi Gerganov	c06970dd72	ggml : add and use ggml_cpu_has_llamafile() (llama/8664)	2024-08-08 22:48:46 +03:00
Joe Todd	7598acf525	Re-add erroneously removed -fsycl from GGML_EXTRA_LIBS (llama/8667)	2024-08-08 22:48:46 +03:00
Joe Todd	43ddfce969	sycl : Add support for non-release DPC++ & oneMKL (llama/8644) * Update cmake to support nvidia hardware & open-source compiler --------- Signed-off-by: Joe Todd <joe.todd@codeplay.com>	2024-08-08 22:48:46 +03:00
0cc4m	a7e6d2cd9c	Vulkan IQ4_NL Support (llama/8613) * Fix Vulkan matmul tests compile errors * Add Vulkan IQ4_NL support * Fix Vulkan DeepSeek-Coder-V2-Lite MoE support	2024-08-08 22:48:46 +03:00
Jeroen Mostert	86506b0c5c	Allow all RDNA2 archs to use sdot4 intrinsic (llama/8629) The check gating the use of `__builtin_amdgc_sdot4` specifically checks for gfx1030. This causes a severe perf regression for anything gfx103? that's not gfx1030 and not using `HSA_OVERRIDE_GFX_VERSION` (if you've built ROCm to support it). We already have a generic RDNA2 define, let's use it.	2024-08-08 22:48:46 +03:00
luoyu-intel	11182fae34	fix scratch size of softmax (llama/8642)	2024-08-08 22:48:46 +03:00
Mark Zhuang	0bc8bffe1d	ggml: fix compile error for RISC-V (llama/8623)	2024-08-08 22:48:46 +03:00
Johannes Gäßler	8c4f30497a	CUDA: MMQ code deduplication + iquant support (llama/8495) * CUDA: MMQ code deduplication + iquant support * 1 less parallel job for CI build	2024-08-08 22:48:46 +03:00
Georgi Gerganov	b1ee3a8444	gguf : handle null name during init (llama/8587)	2024-08-08 22:48:46 +03:00
slaren	be9a16fd3f	ggml : fix quant dot product with odd number of blocks (llama/8549) * ggml : fix iq4_nl dot product with odd number of blocks * ggml : fix odd blocks for ARM_NEON (llama/8556) * ggml : fix iq4_nl dot product with odd number of blocks * ggml : fix q4_1 * ggml : fix q5_0 * ggml : fix q5_1 * ggml : fix iq4_nl metal ggml-ci * ggml : fix q4_0 * ggml : fix q8_0 ggml-ci * ggml : remove special Q4_0 code for first 2 blocks * ggml : fix sumf redefinition --------- Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-08-08 22:48:46 +03:00
Clint Herron	f4d9a95b0f	ggml : add friendlier error message to fopen errors (llama/8575) * Add additional error information when model files fail to load. * Adding additional error information to most instances of fopen.	2024-08-08 22:48:46 +03:00
Johannes Gäßler	a8ab3abe09	CUDA: fix partial offloading for ne0 % 256 != 0 (llama/8572)	2024-08-08 22:48:46 +03:00
65a	fb6a835938	cmake : install all ggml public headers (llama/8480) Co-authored-by: 65a <65a@65a.invalid>	2024-08-08 22:48:46 +03:00
hipudding	8923bb4292	Add Ascend NPU backend (llama/6035) * [CANN] Add Ascend NPU backend Ascend is a full-stack AI computing infrastructure for industry applications and services based on Huawei Ascend processors and software. CANN (Compute Architecture of Neural Networks), developped by Huawei, is a heterogeneous computing architecture for AI. Co-authored-by: wangshuai09 <391746016@qq.com> * delete trailing whitespaces * Modify the code based on review comment * Rename LLAMA_CANN to GGML_CANN * Make ggml-common.h private * add ggml_cann prefix for acl funcs * Add logging for CANN backend * Delete Trailing whitespace --------- Co-authored-by: wangshuai09 <391746016@qq.com>	2024-08-08 22:48:46 +03:00
Johannes Gäßler	fcba6aa352	make/cmake: add missing force MMQ/cuBLAS for HIP (llama/8515)	2024-08-08 22:48:46 +03:00

... 2 3 4 5 6 ...

1683 Commits