whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-04-25 13:30:12 +00:00

Author	SHA1	Message	Date
Johannes Gäßler	54a2ee648f	RoPE: fix back, CUDA support for back + noncont. (llama/11240) * RoPE: fix back, CUDA support for back + noncont. * fix comments reg. non-cont. RoPE support [no-ci]	2025-02-03 22:00:57 +02:00
issixx	f12559d590	ggml-cpu : fix ggml_graph_compute_thread did not terminate on abort. (ggml/1065) some threads kept looping and failed to terminate properly after an abort during CPU execution. Co-authored-by: issi <issi@gmail.com>	2025-02-03 22:00:57 +02:00
Molly Sophia	06209f6683	llama: add support for QRWKV6 model architecture (llama/11001) llama: add support for QRWKV6 model architecture (llama/11001) * WIP: Add support for RWKV6Qwen2 Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * RWKV: Some graph simplification Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Add support for RWKV6Qwen2 with cpu and cuda GLA Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Fix some typos Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * code format changes Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Fix wkv test & add gla test Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Fix cuda warning Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Update README.md Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Update ggml/src/ggml-cuda/gla.cu Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Fix fused lerp weights loading with RWKV6 Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * better sanity check skipping for QRWKV6 in llama-quant thanks @compilade Signed-off-by: Molly Sophia <mollysophia379@gmail.com> Co-authored-by: compilade <git@compilade.net> --------- Signed-off-by: Molly Sophia <mollysophia379@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: compilade <git@compilade.net>	2025-01-14 10:38:01 +02:00
Djip007	bcf937c216	ggml : more perfo with llamafile tinyblas on x86_64 (llama/10714) * more perfo with llamafile tinyblas on x86_64. - add bf16 suport - change dispache strategie (thanks: https://github.com/ikawrakow/ik_llama.cpp/pull/71 ) - reduce memory bandwidth simple tinyblas dispache and more cache freindly * tinyblas dynamic dispaching * sgemm: add M blocs. * - git 2.47 use short id of len 9. - show-progress is not part of GNU Wget2 * remove not stable test	2025-01-04 10:45:01 +02:00
Diego Devesa	3387415bad	ggml : fix const usage in SSE path (llama/10962)	2025-01-04 10:45:01 +02:00
HimariO	e22d38e4f2	llama : add Qwen2VL support + multimodal RoPE (llama/10361) * Barebone Qwen2VL LLM convertor * Add Qwen2VL cli entrypoint * [WIP] add qwen2vl arch * Verify m-rope output * Add vl-rope/2d-rope support for qwen2vl ViT * update qwen2vl cli tool * update 5D tensor op workaround * [WIP] qwen2vl vision model * make batch and clip utils compatible with qwen2vl * [WIP] create inference workflow, gguf convert script but fix * correcting vision-rope behavior, add the missing last layer back to ViT * add arg parser to qwen2vl_surgery * replace variable size array with vector * cuda-gdb cmake preset * add fp32 mrope, vision rope kernel * add fp16 support for qwen2vl and m-rope * add `GGML_ROPE_TYPE_MROPE`, `GGML_ROPE_TYPE_VISION` * fix rope op mode switching, out dated func args * update `llama_hparams` * update to keep up stream changes * resolve linter, test errors * add makefile entry, update speical image padding token * add mrope unit test, fix few compiler warnings * rename `mrope` related function, params * minor updates on debug util, bug fixs * add `m-rope` testcase to `test-backend-ops` * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * fix traililng whitespce * store `llama_hparams.rope_sections` with fixed size array * update position id tensor size check in GGML_OP_ROPE * minor updates * update `ggml_backend__supports_op` of unsupported backends remote old `rope_section` compare operator --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-12-18 12:52:16 +02:00
Karol Kontny	e6eed605cf	ggml : Fix compilation issues on ARM platform when building without fp16 (llama/10811)	2024-12-18 12:52:16 +02:00
Diego Devesa	1193e494a9	remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (llama/10797) other windows build fixes	2024-12-18 12:52:16 +02:00
Djip007	e990d1b791	ggml : refactor online repacking (llama/10446) * rename ggml-cpu-aarch64.c to .cpp * reformat extra cpu backend. - clean Q4_0_N_M and IQ4_0_N_M - remove from "file" tensor type - allow only with dynamic repack - extract cpu extra bufts and convert to C++ - hbm - "aarch64" - more generic use of extra buffer - generalise extra_supports_op - new API for "cpu-accel": - amx - aarch64 * clang-format * Clean Q4_0_N_M ref Enable restrict on C++ * add op GGML_OP_MUL_MAT_ID for Q4_0_N_M with runtime repack * added/corrected control on tensor size for Q4 repacking. * Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * add debug logs on repacks. --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-12-18 12:52:16 +02:00
Diego Devesa	a815940e0e	ggml : add predefined list of CPU backend variants to build (llama/10626) * ggml : add predefined list of CPU backend variants to build * update CPU dockerfiles	2024-12-08 20:14:35 +02:00
Diego Devesa	904e307bce	ggml-cpu : fix HWCAP2_I8MM value (llama/10646)	2024-12-08 20:14:35 +02:00
PAB	b7c64a4352	ggml: add `GGML_SET` Metal kernel + i32 CPU kernel (ggml/1037) * implemented cpu kernel * add i32 test cases in test-backend-ops * typedef `ggml_metal_kargs_set` * implemented `kernel_set` * memcpy	2024-12-08 20:14:35 +02:00
PAB	7895d39508	ggml : add `GGML_PAD_REFLECT_1D` operation (ggml/1034) * ggml_pad_reflect_1d defined in header * implemented on CPU * called the forward pass * impl Metal kernel * added Metal kernel * added OP_PAD_REFLECT_1D in test-backend-ops.cpp * add test-pad-reflect-1d test case * test case support multiple backend	2024-12-08 20:14:35 +02:00
Diego Devesa	3daeacad24	ggml : move AMX to the CPU backend (llama/10570) ggml : automatic selection of best CPU backend (llama/10606)	2024-12-08 20:14:35 +02:00
Georgi Gerganov	3623bd58f2	ggml : fix I8MM Q4_1 scaling factor conversion (llama/10562) ggml-ci	2024-12-08 20:14:35 +02:00
Georgi Gerganov	4ca1e72fe0	ggml : fix row condition for i8mm kernels (llama/10561) ggml-ci	2024-12-08 20:14:35 +02:00
Shupei Fan	330273901f	ggml-cpu: support IQ4_NL_4_4 by runtime repack (llama/10541) * ggml-cpu: support IQ4_NL_4_4 by runtime repack * ggml-cpu: add __ARM_FEATURE_DOTPROD guard	2024-12-08 20:14:35 +02:00
Diego Devesa	77e3e4a090	ggml : add support for dynamic loading of backends (llama/10469) * ggml : add support for dynamic loading of backends --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-12-08 20:14:35 +02:00
Diego Devesa	8b1c1c30a7	ggml : do not use ARM features not included in the build (llama/10457)	2024-12-08 20:14:35 +02:00
FirstTimeEZ	45cf1634dc	ggml : fix undefined reference to 'getcpu' (llama/10354) https://github.com/ggerganov/llama.cpp/issues/10352	2024-11-20 21:00:08 +02:00
Johannes Gäßler	c9541741e6	ggml: new optimization interface (ggml/988) * ggml: new optimization interface remove test2.c, test3.c store adamw params in tensor move grads from tensor to graph * avoid segfault upon API misuse * add ggml-opt.h to public headers * remove dependence of ggml-opt.cpp on ggml-cpu.h	2024-11-20 21:00:08 +02:00
Eve	3216efef2e	AVX BF16 and single scale quant optimizations (llama/10212) * use 128 bit loads (i've tried 256->128 to death and its slower) * double accumulator * avx bf16 vec dot * +3% q4_0 inference * +7% tg +5% pp compared to master * slower f16c version, kep for reference * 256b version, also slow. i tried :) * revert f16 * faster with madd * split to functions * Q8_0 and IQ4_NL, 5-7% faster * fix potential overflow (performance reduced) * 16 bit add for q4_0 only * merge	2024-11-20 21:00:08 +02:00
Charles Xu	3298916e5e	backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (llama/9921) * backend-cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2024-11-20 21:00:08 +02:00
Diego Devesa	746bf2596f	ggml : build backends as libraries (llama/10256) * ggml : build backends as libraries --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>	2024-11-20 21:00:08 +02:00

24 Commits