whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-05-02 00:39:49 +00:00

Author	SHA1	Message	Date
Georgi Gerganov	0e7798677a	ggml : add SSM Metal kernels (llama/8546) * ggml : add ggml_ssm_conv metal impl * ggml : add ssm_scan metal impl ggml-ci	2024-08-28 13:22:20 +03:00
Johannes Gäßler	24d8534bd8	CPU/CUDA: Gemma 2 FlashAttention support (llama/8542) * CPU/CUDA: Gemma 2 FlashAttention support * apply logit_softcap to scale in kernel * disable logit softcapping tests on Metal * remove metal check	2024-08-28 13:22:20 +03:00
compilade	9bf7250bf9	llama : simplify Mamba with advanced batch splits (llama/8526) * llama : advanced batch splits This includes equal-sequence-length batch splits which are useful to simplify recurrent model operators. * llama : always make recurrent state slots contiguous * ggml : simplify mamba operators * llama : fix integer signedness mixing * llama : logits_all has priority over batch->logits Otherwise, the server embeddings tests failed. This was likely an existing problem but was only detected here because of an additional assertion. * llama : apply suggestions Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * llama : fix t5 segfault * llama : fix Mamba session save and restore * llama : minor cosmetic changes * llama : rename llama_reorder_outputs to llama_output_reorder Also move it closer to llama_output_reserve. * llama : fix pooled embeddings when using batches with equal_seqs * minor : add struct members for clarity ggml-ci * llama : fix T5 segfault again * llama : fix Mamba pooled embeddings with multiple sequences Until the pooled embeddings are refactored to allow splitting across ubatches for causal embeddings, recurrent models can only process a single sequence per ubatch when calculating pooled embeddings. * llama : add llama_model_is_recurrent to simplify figuring that out This will make it easier to more cleanly support RWKV-v6 and Mamba-2. * llama : fix simple splits when the batch contains embeddings --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-08-28 13:22:20 +03:00
Daniel Bevenius	60098d6204	ggml : move rope type enum to ggml.h (llama/8949) * ggml : move rope type enum to ggml.h This commit moves the `llama_rope_type` enum from `llama.h` to `ggml.h` and changes its name to `ggml_rope_type`. The motivation for this change is to address the TODO in `llama.h` and use the enum in ggml. Note: This commit does not change the `mode` parameter to be of type `enum ggml_rope_type`. The name `mode` and its usage suggest that it might be more generic and possibly used as a bit field for multiple flags. Further investigation/discussion may be needed to determine if `mode` should be restricted to RoPE types. * squash! ggml : move rope type enum to ggml.h This commit removes GGML_ROPE_TYPE_NONE and GGML_ROPE_TYPE_GLM from ggml.h, and back the llama_rope_type enum. I've kept the assert for GGML_ROPE_TYPE_GLM as I'm not sure if it is safe to remove it yet. * squash! ggml : move rope type enum to ggml.h This commit removes the enum ggml_rope_type from ggml.h and replaces it with a define (GGML_ROPE_TYPE_NEOX). This define is used in the code to check if the mode is set to GPT-NeoX. Also the enum llama_rope_type has been updated to reflect this change. * squash! ggml : move rope type enum to ggml.h This commit contains a suggestion enable the GGML_ROPE_TYPE_NEOX macro/define to be passed to the shader compiler. * squash! ggml : move rope type enum to ggml.h This commit fixes the editorconfig-checker warnings. * squash! ggml : move rope type enum to ggml.h Update comment for ggml_rope function. * Revert "squash! ggml : move rope type enum to ggml.h" This reverts commit 6261222bd0dc0efd51f0fb0435ad3f16a5b52fd6. * squash! ggml : move rope type enum to ggml.h Add GGML_ROPE_TYPE_NEOX to rope_common.comp. * remove extra line --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-08-28 13:22:20 +03:00
DavidKorczynski	317293e6a7	ggml: fix div-by-zero (llama/9003) Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=70724 In order to access the above bug you need to login using one of the emails in https://github.com/google/oss-fuzz/blob/master/projects/llamacpp/project.yaml#L3-L5 Signed-off-by: David Korczynski <david@adalogics.com>	2024-08-28 13:22:20 +03:00
Johannes Gäßler	8954769aa2	feat: ref. cross entropy, add CUDA, fix grad test (ggml/929)	2024-08-28 13:22:20 +03:00
Johannes Gäßler	df06468d9e	ggml: remove bad assert (ggml/928)	2024-08-28 13:22:20 +03:00
Johannes Gäßler	1fbd828a5d	examples: add MNIST training + missing ops	2024-08-28 13:22:20 +03:00
Ronsor	3643120690	feat: add new `sin` and `cos` operators (ggml/919) * ggml : add sin/cos operators * ggml-cuda : add sin/cos operators * ggml : add corresponding tests for sin/cos * ggml : add backward computation for sin/cos operators * ggml-vulkan : add sin/cos operators * ggml-vulkan : add sin/cos shader source * metal : add sin, cos --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-08-21 11:07:13 +03:00
Salvatore Mesoraca	993f0df419	ggml : support forward pass broadcasting in ggml_sub (ggml/914) * ggml: support forward pass broadcasting in ggml_sub Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> * Use assert instead of GGML_ASSERT in ggml_compute_forward_sub_f32 The check is already performed in ggml_sub_impl Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> --------- Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>	2024-08-12 11:58:49 +03:00
Georgi Gerganov	ad37d26983	rpc : sanitize tensor data + warnings (llama/0) Co-authored-by: slaren <slarengh@gmail.com>	2024-08-12 11:58:46 +03:00
Molly Sophia	4160b930f1	ggml : add epsilon as a parameter for group_norm (llama/8818) Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2024-08-08 22:48:46 +03:00
Justine Tunney	7a96e661e4	ggml : fix overflows in elu function (llama/8866) It's helpful to use expm1f(x), because expf(x)-1 will result in overflow for 25% of single-precision floating point numbers.	2024-08-08 22:48:46 +03:00
jdomke	a902fb4ab2	ggml : reading the runtime sve config of the cpu (llama/8709) * ggml : reading the runtime sve config of the cpu * change to one time init to prevent performance drop * prefix variable to avoid possible conflicts * revert xxhash fix and add brackets --------- Co-authored-by: domke <673751-domke@users.noreply.gitlab.com>	2024-08-08 22:48:46 +03:00
Sigbjørn Skjæret	6cb38c3673	Fix conversion of unnormalized BF16->BF16 weights (llama/7843) * add truncate_bf16 * truncate intermediate fp32 if converting bf16 to bf16 * fix masking in __compute_fp32_to_bf16 * np.int16 no longer used * missing cast and additional numpy 2.x fix * ggml-impl : do not flush bf16 subnormals to zero * ggml : add reference fp32 to bf16 conversion The fast version is no longer equivalent for all platforms because of the handling of subnormal values. * gguf-py : remove flush to zero for bf16 subnormals * gguf-py : remove float32 truncation to bf16 Rounding achieves the same thing in the cases where this was used. * missed prototype update in merge * merge cleanup --------- Co-authored-by: Francis Couture-Harpin <git@compilade.net>	2024-08-08 22:48:46 +03:00
Alex O'Connell	d26250f78c	Build: Only include execinfo.h on linux systems that support it (llama/8783) * Only enable backtrace on GLIBC linux systems * fix missing file from copy * use glibc macro instead of defining a custom one	2024-08-08 22:48:46 +03:00
l3utterfly	e60be821ce	added android implementation of ggml_print_backtrace_symbols (llama/8751) * added android implementation of ggml_print_backtrace_symbols * Update ggml/src/ggml.c Co-authored-by: slaren <slarengh@gmail.com> * Update ggml/src/ggml.c Co-authored-by: slaren <slarengh@gmail.com> * Update ggml/src/ggml.c Co-authored-by: slaren <slarengh@gmail.com> * Update ggml/src/ggml.c Co-authored-by: slaren <slarengh@gmail.com> * Update ggml/src/ggml.c Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-08-08 22:48:46 +03:00
Borislav Stanimirov	aa816c922c	ggml : ignore more msvc warnings (ggml/906)	2024-08-08 22:48:46 +03:00
Georgi Gerganov	ef6dcf0d0c	ggml : resolve sync conflicst (ggml/0) ggml-ci	2024-08-08 22:48:46 +03:00
slaren	dd916a2852	ggml : reduce hash table reset cost (llama/8698) * ggml : reduce hash table reset cost * fix unreachable code warnings after GGML_ASSERT(false) * GGML_ASSERT(false) -> GGML_ABORT("fatal error") * GGML_ABORT use format string	2024-08-08 22:48:46 +03:00
DavidKorczynski	0620fe00ec	ggml: handle ggml_init failure to fix NULL pointer deref (llama/8692) `ggml_init` can fail if no unused context is found. In that case, a NULL-pointer deref will happen later in the code during a call to `ggml_set_on_alloc`. This fixes it by bailing out if no context is found.	2024-08-08 22:48:46 +03:00
Georgi Gerganov	c06970dd72	ggml : add and use ggml_cpu_has_llamafile() (llama/8664)	2024-08-08 22:48:46 +03:00
Georgi Gerganov	b1ee3a8444	gguf : handle null name during init (llama/8587)	2024-08-08 22:48:46 +03:00
Clint Herron	f4d9a95b0f	ggml : add friendlier error message to fopen errors (llama/8575) * Add additional error information when model files fail to load. * Adding additional error information to most instances of fopen.	2024-08-08 22:48:46 +03:00
hipudding	8923bb4292	Add Ascend NPU backend (llama/6035) * [CANN] Add Ascend NPU backend Ascend is a full-stack AI computing infrastructure for industry applications and services based on Huawei Ascend processors and software. CANN (Compute Architecture of Neural Networks), developped by Huawei, is a heterogeneous computing architecture for AI. Co-authored-by: wangshuai09 <391746016@qq.com> * delete trailing whitespaces * Modify the code based on review comment * Rename LLAMA_CANN to GGML_CANN * Make ggml-common.h private * add ggml_cann prefix for acl funcs * Add logging for CANN backend * Delete Trailing whitespace --------- Co-authored-by: wangshuai09 <391746016@qq.com>	2024-08-08 22:48:46 +03:00
Xuan Son Nguyen	8807fe608b	Refactor lora adapter support (llama/8332) * lora: load to devide buft * add patch tensor function * correct tensor patch * llama_lora_adapter_apply * correct ggml_backend_tensor_copy * add llm_build_mm * fix auto merge * update based on review comments * add convert script * no more transpose A * add f16 convert * add metadata check * add sanity check * fix ftype * add requirements * fix requirements * fix outfile * conversion: only allow selected models * fix types * cuda : do not use dmmv if the tensor does not have enough cols * llama : lora fixes * do not disable mmap with lora Co-authored-by: slaren <slarengh@gmail.com> * llm_build_lora_mm_id * convert_lora : MoE LoRA conversion support * convert_lora : prefer safetensors, similarly to convert_hf * convert_hf : simplify modify_tensors for InternLM2 * convert_lora : lazy conversion * llama : load and use alpha from LoRA adapters * llama : use llm_build_lora_mm in most model graphs * auto scale * Revert "auto scale" This reverts commit 42415a4874e0f963e4aca6796ea5dfb97cd17464. * remove redundant params * Apply suggestions from code review Co-authored-by: slaren <slarengh@gmail.com> * change kv metadata * move add_type to __init__ * convert_hf : move add_type to main() * convert_lora : use the GGUFWriter from Model instead of overwriting it --------- Co-authored-by: slaren <slarengh@gmail.com> Co-authored-by: Francis Couture-Harpin <git@compilade.net>	2024-08-08 22:48:46 +03:00
Georgi Gerganov	2157abaab4	ggml : minor naming changes (llama/8433) * ggml : minor naming changes ggml-ci * ggml : use PRId64 [no ci] * ggml : revert FA K/Q names	2024-08-08 22:48:46 +03:00
Georgi Gerganov	db0ea7a2f2	ggml : move sgemm sources to llamafile subfolder (llama/8394) ggml-ci	2024-08-08 22:48:46 +03:00
Dibakar Gope	5498b0e6c0	ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (llama/5780) * Arm AArch64: optimized GEMV and GEMM kernels for q4_0_q8_0, and q8_0_q8_0 quantization * Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions * Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions * Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions * Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions * Arm AArch64: add copyright claim only to ggml-aarch64.cpp and ggml-aarch64.h files * Arm AArch64: minor code refactoring for rebase * Arm AArch64: minor code refactoring for resolving a build issue with cmake * Arm AArch64: minor code refactoring to split the Q4_0_AARC64 type into three separate types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8 * Arm AArch64: minor code change for resolving a build issue with server-windows * retrigger checks * Arm AArch64: minor code changes for rebase * Arm AArch64: minor changes to skip the pr#7433 vec_dot code for arm cpus with SVE VL not equal to 256 bits * Arm AArch64: remove stale LLAMA_QKK_64 from CMakeLists.txt and delete build.zig * Arm AArch64: add reference scalar gemm and gemv, and avoid dynamic memory allocations during quantization for Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8 * Arm AArch64: add multithreaded quantization support for the new types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8 * Arm AArch64: minor code refactoring * Arm AArch64: simplify logic for calling gemm and gemv functions in ggml_compute_forward_mul_mat * Arm AArch64: minimize changes in ggml_compute_forward_mul_mat * Arm AArch64: minor code refactoring, and add reference scalar code to quantize routines for new quant types * Arm AArch64: minor code refactoring * Arm AArch64: minor code refactoring * Arm AArch64: minor code refactoring * rebase on the latest master commit 3fd62a6 and adapt to the new directory structure * Arm AArch64: remove a redundant comment * Arm AArch64: add pragma in ggml-aarch64.c to turn -Woverlength-strings warning off * Arm AArch64: use __aarch64__ check to guard 64-bit neon kernels * Arm AArch64: update docs/build.md README to include compile time flags for buiilding the Q4_0_4_4 quant type	2024-08-08 22:48:46 +03:00
Ivan Filipov	b2ead7d6f4	ggml: add support for float16 input tensors in pooling operations (ggml/895) * Add support for float16 tensors in 1d pooling operations * Add support for float16 input tensors in 2d pooling operations * code cleanup remove unnecessary casting during srow ptr initialization --------- Co-authored-by: vanaka11 <vanaka1189@gmail.com>	2024-08-08 22:48:46 +03:00
Daniel Bevenius	95f2a191c0	ggml : remove unnecessary UNUSED macro call (ggml/880) This commit removes an UNUSED macro call that is not needed as the variable n0 is used in the code and will not produce a warning. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-07-08 14:53:55 +03:00
Judd	005cc45df3	fix typo (llama/8267) Co-authored-by: Judd <foldl@boxvest.com>	2024-07-08 14:53:55 +03:00
Georgi Gerganov	e30c679928	whisper : reorganize source code + improve CMake (#2256 ) * scripts : update sync [no ci] * files : reorganize [no ci] * sync : llama.cpp * cmake : link math library * cmake : build normal ggml library * files : move headers to include * objc : fix path to ggml-metal.h * ci : fix WHISPER_CUDA -> GGML_CUDA * scripts : sync LICENSE [no ci]	2024-06-26 19:34:09 +03:00

33 Commits