whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2024-12-20 21:23:06 +00:00

Author	SHA1	Message	Date
compilade	9bf7250bf9	llama : simplify Mamba with advanced batch splits (llama/8526) * llama : advanced batch splits This includes equal-sequence-length batch splits which are useful to simplify recurrent model operators. * llama : always make recurrent state slots contiguous * ggml : simplify mamba operators * llama : fix integer signedness mixing * llama : logits_all has priority over batch->logits Otherwise, the server embeddings tests failed. This was likely an existing problem but was only detected here because of an additional assertion. * llama : apply suggestions Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * llama : fix t5 segfault * llama : fix Mamba session save and restore * llama : minor cosmetic changes * llama : rename llama_reorder_outputs to llama_output_reorder Also move it closer to llama_output_reserve. * llama : fix pooled embeddings when using batches with equal_seqs * minor : add struct members for clarity ggml-ci * llama : fix T5 segfault again * llama : fix Mamba pooled embeddings with multiple sequences Until the pooled embeddings are refactored to allow splitting across ubatches for causal embeddings, recurrent models can only process a single sequence per ubatch when calculating pooled embeddings. * llama : add llama_model_is_recurrent to simplify figuring that out This will make it easier to more cleanly support RWKV-v6 and Mamba-2. * llama : fix simple splits when the batch contains embeddings --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-08-28 13:22:20 +03:00
Daniel Bevenius	60098d6204	ggml : move rope type enum to ggml.h (llama/8949) * ggml : move rope type enum to ggml.h This commit moves the `llama_rope_type` enum from `llama.h` to `ggml.h` and changes its name to `ggml_rope_type`. The motivation for this change is to address the TODO in `llama.h` and use the enum in ggml. Note: This commit does not change the `mode` parameter to be of type `enum ggml_rope_type`. The name `mode` and its usage suggest that it might be more generic and possibly used as a bit field for multiple flags. Further investigation/discussion may be needed to determine if `mode` should be restricted to RoPE types. * squash! ggml : move rope type enum to ggml.h This commit removes GGML_ROPE_TYPE_NONE and GGML_ROPE_TYPE_GLM from ggml.h, and back the llama_rope_type enum. I've kept the assert for GGML_ROPE_TYPE_GLM as I'm not sure if it is safe to remove it yet. * squash! ggml : move rope type enum to ggml.h This commit removes the enum ggml_rope_type from ggml.h and replaces it with a define (GGML_ROPE_TYPE_NEOX). This define is used in the code to check if the mode is set to GPT-NeoX. Also the enum llama_rope_type has been updated to reflect this change. * squash! ggml : move rope type enum to ggml.h This commit contains a suggestion enable the GGML_ROPE_TYPE_NEOX macro/define to be passed to the shader compiler. * squash! ggml : move rope type enum to ggml.h This commit fixes the editorconfig-checker warnings. * squash! ggml : move rope type enum to ggml.h Update comment for ggml_rope function. * Revert "squash! ggml : move rope type enum to ggml.h" This reverts commit 6261222bd0dc0efd51f0fb0435ad3f16a5b52fd6. * squash! ggml : move rope type enum to ggml.h Add GGML_ROPE_TYPE_NEOX to rope_common.comp. * remove extra line --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-08-28 13:22:20 +03:00
Johannes Gäßler	8954769aa2	feat: ref. cross entropy, add CUDA, fix grad test (ggml/929)	2024-08-28 13:22:20 +03:00
Johannes Gäßler	1fbd828a5d	examples: add MNIST training + missing ops	2024-08-28 13:22:20 +03:00
Ronsor	3643120690	feat: add new `sin` and `cos` operators (ggml/919) * ggml : add sin/cos operators * ggml-cuda : add sin/cos operators * ggml : add corresponding tests for sin/cos * ggml : add backward computation for sin/cos operators * ggml-vulkan : add sin/cos operators * ggml-vulkan : add sin/cos shader source * metal : add sin, cos --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-08-21 11:07:13 +03:00
hipudding	be88ee1d75	ggml : add CANN backend (llama/0) ggml-ci	2024-08-09 09:58:16 +03:00
Molly Sophia	4160b930f1	ggml : add epsilon as a parameter for group_norm (llama/8818) Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2024-08-08 22:48:46 +03:00
Sigbjørn Skjæret	6cb38c3673	Fix conversion of unnormalized BF16->BF16 weights (llama/7843) * add truncate_bf16 * truncate intermediate fp32 if converting bf16 to bf16 * fix masking in __compute_fp32_to_bf16 * np.int16 no longer used * missing cast and additional numpy 2.x fix * ggml-impl : do not flush bf16 subnormals to zero * ggml : add reference fp32 to bf16 conversion The fast version is no longer equivalent for all platforms because of the handling of subnormal values. * gguf-py : remove flush to zero for bf16 subnormals * gguf-py : remove float32 truncation to bf16 Rounding achieves the same thing in the cases where this was used. * missed prototype update in merge * merge cleanup --------- Co-authored-by: Francis Couture-Harpin <git@compilade.net>	2024-08-08 22:48:46 +03:00
R0CKSTAR	e471adcfa5	feat: Support Moore Threads GPU (llama/8383) * Update doc for MUSA Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Add GGML_MUSA in Makefile Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Add GGML_MUSA in CMake Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * CUDA => MUSA Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * MUSA adds support for __vsubss4 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Fix CI build failure Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2024-08-08 22:48:46 +03:00
Conrad Kramer	eb2eb87a58	metal : add abort callback (ggml/905)	2024-08-08 22:48:46 +03:00
Daniel Bevenius	f7bb412878	ggml : move c parameter comment to ggml_rope_ext (ggml/901) This commit moves the comment for the c parameter from ggml_rope to ggml_rope_ext. The comment is currently incorrect as ggml_rope does not have a c parameter (freq_factors tensor). Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-08-08 22:48:46 +03:00
slaren	dd916a2852	ggml : reduce hash table reset cost (llama/8698) * ggml : reduce hash table reset cost * fix unreachable code warnings after GGML_ASSERT(false) * GGML_ASSERT(false) -> GGML_ABORT("fatal error") * GGML_ABORT use format string	2024-08-08 22:48:46 +03:00
Georgi Gerganov	c06970dd72	ggml : add and use ggml_cpu_has_llamafile() (llama/8664)	2024-08-08 22:48:46 +03:00
Johannes Gäßler	a8ab3abe09	CUDA: fix partial offloading for ne0 % 256 != 0 (llama/8572)	2024-08-08 22:48:46 +03:00
hipudding	8923bb4292	Add Ascend NPU backend (llama/6035) * [CANN] Add Ascend NPU backend Ascend is a full-stack AI computing infrastructure for industry applications and services based on Huawei Ascend processors and software. CANN (Compute Architecture of Neural Networks), developped by Huawei, is a heterogeneous computing architecture for AI. Co-authored-by: wangshuai09 <391746016@qq.com> * delete trailing whitespaces * Modify the code based on review comment * Rename LLAMA_CANN to GGML_CANN * Make ggml-common.h private * add ggml_cann prefix for acl funcs * Add logging for CANN backend * Delete Trailing whitespace --------- Co-authored-by: wangshuai09 <391746016@qq.com>	2024-08-08 22:48:46 +03:00
Georgi Gerganov	2157abaab4	ggml : minor naming changes (llama/8433) * ggml : minor naming changes ggml-ci * ggml : use PRId64 [no ci] * ggml : revert FA K/Q names	2024-08-08 22:48:46 +03:00
Dibakar Gope	5498b0e6c0	ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (llama/5780) * Arm AArch64: optimized GEMV and GEMM kernels for q4_0_q8_0, and q8_0_q8_0 quantization * Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions * Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions * Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions * Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions * Arm AArch64: add copyright claim only to ggml-aarch64.cpp and ggml-aarch64.h files * Arm AArch64: minor code refactoring for rebase * Arm AArch64: minor code refactoring for resolving a build issue with cmake * Arm AArch64: minor code refactoring to split the Q4_0_AARC64 type into three separate types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8 * Arm AArch64: minor code change for resolving a build issue with server-windows * retrigger checks * Arm AArch64: minor code changes for rebase * Arm AArch64: minor changes to skip the pr#7433 vec_dot code for arm cpus with SVE VL not equal to 256 bits * Arm AArch64: remove stale LLAMA_QKK_64 from CMakeLists.txt and delete build.zig * Arm AArch64: add reference scalar gemm and gemv, and avoid dynamic memory allocations during quantization for Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8 * Arm AArch64: add multithreaded quantization support for the new types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8 * Arm AArch64: minor code refactoring * Arm AArch64: simplify logic for calling gemm and gemv functions in ggml_compute_forward_mul_mat * Arm AArch64: minimize changes in ggml_compute_forward_mul_mat * Arm AArch64: minor code refactoring, and add reference scalar code to quantize routines for new quant types * Arm AArch64: minor code refactoring * Arm AArch64: minor code refactoring * Arm AArch64: minor code refactoring * rebase on the latest master commit 3fd62a6 and adapt to the new directory structure * Arm AArch64: remove a redundant comment * Arm AArch64: add pragma in ggml-aarch64.c to turn -Woverlength-strings warning off * Arm AArch64: use __aarch64__ check to guard 64-bit neon kernels * Arm AArch64: update docs/build.md README to include compile time flags for buiilding the Q4_0_4_4 quant type	2024-08-08 22:48:46 +03:00
Clint Herron	c2c60dc9ba	Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (llama/8258)	2024-07-08 14:53:55 +03:00
Georgi Gerganov	e30c679928	whisper : reorganize source code + improve CMake (#2256 ) * scripts : update sync [no ci] * files : reorganize [no ci] * sync : llama.cpp * cmake : link math library * cmake : build normal ggml library * files : move headers to include * objc : fix path to ggml-metal.h * ci : fix WHISPER_CUDA -> GGML_CUDA * scripts : sync LICENSE [no ci]	2024-06-26 19:34:09 +03:00

1 2

69 Commits