whisper.cpp/ggml/src
Xuan Son Nguyen 8807fe608b Refactor lora adapter support (llama/8332)
* lora: load to devide buft

* add patch tensor function

* correct tensor patch

* llama_lora_adapter_apply

* correct ggml_backend_tensor_copy

* add llm_build_mm

* fix auto merge

* update based on review comments

* add convert script

* no more transpose A

* add f16 convert

* add metadata check

* add sanity check

* fix ftype

* add requirements

* fix requirements

* fix outfile

* conversion: only allow selected models

* fix types

* cuda : do not use dmmv if the tensor does not have enough cols

* llama : lora fixes

* do not disable mmap with lora

Co-authored-by: slaren <slarengh@gmail.com>

* llm_build_lora_mm_id

* convert_lora : MoE LoRA conversion support

* convert_lora : prefer safetensors, similarly to convert_hf

* convert_hf : simplify modify_tensors for InternLM2

* convert_lora : lazy conversion

* llama : load and use alpha from LoRA adapters

* llama : use llm_build_lora_mm in most model graphs

* auto scale

* Revert "auto scale"

This reverts commit 42415a4874e0f963e4aca6796ea5dfb97cd17464.

* remove redundant params

* Apply suggestions from code review

Co-authored-by: slaren <slarengh@gmail.com>

* change kv metadata

* move add_type to __init__

* convert_hf : move add_type to main()

* convert_lora : use the GGUFWriter from Model instead of overwriting it

---------

Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: Francis Couture-Harpin <git@compilade.net>
2024-08-08 22:48:46 +03:00
..
ggml-cuda cuda : suppress 'noreturn' warn in no_device_code (llama/8414) 2024-08-08 22:48:46 +03:00
ggml-sycl add concat through dim 1/2 (llama/8483) 2024-08-08 22:48:46 +03:00
kompute-shaders whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
vulkan-shaders whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
CMakeLists.txt vulkan : cmake integration (llama/8119) 2024-08-08 22:48:46 +03:00
ggml-alloc.c whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
ggml-backend-impl.h whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
ggml-backend.c fix the mul_mat_id ut issues (llama/8427) 2024-08-08 22:48:46 +03:00
ggml-blas.cpp ggml : add NVPL BLAS support (ggml/8329) (llama/8425) 2024-08-08 22:48:46 +03:00
ggml-common.h ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (llama/5780) 2024-08-08 22:48:46 +03:00
ggml-cuda.cu Refactor lora adapter support (llama/8332) 2024-08-08 22:48:46 +03:00
ggml-impl.h ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (llama/5780) 2024-08-08 22:48:46 +03:00
ggml-kompute.cpp whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
ggml-metal.m metal : template-ify some of the kernels (llama/8447) 2024-08-08 22:48:46 +03:00
ggml-metal.metal metal : template-ify some of the kernels (llama/8447) 2024-08-08 22:48:46 +03:00
ggml-quants.c ggml : minor naming changes (llama/8433) 2024-08-08 22:48:46 +03:00
ggml-quants.h ggml : minor naming changes (llama/8433) 2024-08-08 22:48:46 +03:00
ggml-rpc.cpp whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
ggml-sycl.cpp add concat through dim 1/2 (llama/8483) 2024-08-08 22:48:46 +03:00
ggml-vulkan-shaders.hpp whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
ggml-vulkan.cpp Vulkan MMQ Fix (llama/8479) 2024-08-08 22:48:46 +03:00
ggml.c Refactor lora adapter support (llama/8332) 2024-08-08 22:48:46 +03:00
sgemm.cpp whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
sgemm.h whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00