whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-06-01 06:50:41 +00:00

Author	SHA1	Message	Date
Diego Devesa	1550be79f1	ggml : add ggml-cpu.h to the public headers (llama/10204)	2024-11-15 15:21:04 +02:00
snadampal	807f848c2f	fix q4_0_8_8 format for corrupted tokens issue (llama/10198) Co-authored-by: EC2 Default User <ec2-user@ip-172-31-62-167.us-west-2.compute.internal>	2024-11-15 15:21:04 +02:00
Zhiyuan Li	42398f13b0	Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration (llama/10133) * rwkv6: rename to wkv6 * rwkv6: support avx2 avx512 armv8 armv9 * rwkv6: update cuda file name * rwkv6: rename params * wkv on sycl * sycl: add some ops * sycl: Enhance OP support judgment * wkv6: drop armv9 and tranfer to GGML style ggml-ci * sync : ggml * update the function to use appropriate types * fix define error * Update ggml/src/ggml-cpu.c * add appropriate asserts * move element-wise functions outside * put the declaration outside the loop * rewrite to be more inline with the common pattern for distributing threads * use recommended way GGML_TENSOR_LOCALS --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: Plamen Minev <pacominev@gmail.com> Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com> Co-authored-by: Meng, Hengyu <airdldl@163.com>	2024-11-15 15:21:04 +02:00
Georgi Gerganov	31c3482a4e	metal : add BF16 support (llama/8439) * ggml : add initial BF16 support ggml-ci * metal : add mul_mat_id BF16 support ggml-ci * metal : check for bfloat support on the Metal device ggml-ci * metal : better var names [no ci] * metal : do not build bfloat kernels when not supported ggml-ci * metal : try to fix BF16 support check ggml-ci * metal : this should correctly check bfloat support	2024-11-15 15:21:04 +02:00
Diego Devesa	50257af686	metal : fix from ptr buffer name (llama/10189)	2024-11-15 15:21:04 +02:00
Georgi Gerganov	d111a0987e	ggml : adjust is_first_call init value (llama/10193) ggml-ci	2024-11-15 15:21:04 +02:00
Georgi Gerganov	915bcd2c63	metal : add quantized FA support (llama/10149) * metal : add quantized FA (vec) support ggml-ci * metal : add quantized FA (non-vec) support * metal : fix support check ggml-ci * metal : clean-up * metal : clean-up (cont) * metal : fix shared memory calc + reduce smem + comments * metal : float-correctness * metal : minor [no ci]	2024-11-15 15:21:04 +02:00
Diego Devesa	f69c8b6f1b	ggml : fix arch check in bf16_to_fp32 (llama/10164)	2024-11-15 15:21:04 +02:00
Eve	8c9044bef0	Q6_K AVX improvements (llama/10118) * q6_k instruction reordering attempt * better subtract method * should be theoretically faster small improvement with shuffle lut, likely because all loads are already done at that stage * optimize bit fiddling * handle -32 offset separately. bsums exists for a reason! * use shift * Update ggml-quants.c * have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86	2024-11-15 15:21:04 +02:00
Diego Devesa	5f8e928194	ggml : fix gelu tables initialization (llama/10172)	2024-11-15 15:21:04 +02:00
Diego Devesa	25da30bd60	ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (llama/10167)	2024-11-15 15:21:04 +02:00
snadampal	542734100e	fix build break on arm64 linux (llama/10166) This fixes the build break from the recent changes to move the CPU backend to separate files https://github.com/ggerganov/llama.cpp/pull/10144	2024-11-15 15:21:04 +02:00
Diego Devesa	b06b4c0c08	cuda : clear error after changing peer access (llama/10153)	2024-11-15 15:21:04 +02:00
Georgi Gerganov	939d36fb4c	metal : simplify f16 and f32 dequant kernels (llama/0)	2024-11-15 15:21:04 +02:00
Georgi Gerganov	1471e41180	metal : move dequantize templates to beginning of MSL source (llama/0)	2024-11-15 15:21:04 +02:00
leo-pony	35949192e9	CANN: adjust backend registry refactor. (llama/10158) remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR.	2024-11-15 15:21:04 +02:00
Diego Devesa	9c817edb48	ggml : move CPU backend to a separate file (llama/10144)	2024-11-15 15:21:04 +02:00
Georgi Gerganov	24a0feb5d9	metal : minor fixup in FA kernel (llama/10143) * metal : minor fixup in FA kernel ggml-ci * metal : use the unrolled loop variable * metal : remove unused var	2024-11-15 15:21:04 +02:00
Diego Devesa	2ab8cce7e3	llama : add simple-chat example (llama/10124) * llama : add simple-chat example --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>	2024-11-15 15:21:04 +02:00
Diego Devesa	b40c255e98	llama : use smart pointers for ggml resources (llama/10117)	2024-11-15 15:21:04 +02:00
Shupei Fan	ec3e16445e	vulkan : improve ggml_vk_create_buffer error handling (llama/9898)	2024-11-15 15:21:04 +02:00
Georgi Gerganov	0665168ef3	ggml : remove ggml_scratch (llama/10121) ggml-ci	2024-11-15 15:21:04 +02:00
Zhenwei Jin	5f6b992eea	build: fix build error in Windows env with OneAPI setup (llama/10107)	2024-11-15 15:21:04 +02:00
Diego Devesa	3e231ab9cc	llama : fix buffer checks for mamba and rwk (llama/10111) * llama : fix buffer checks for mamba and rwk * llama : fix missing worst case flag during reserve * cuda : fix supports_op for norm * disable sched SET_CAUSE	2024-11-15 15:21:04 +02:00
Diego Devesa	371bfaca8c	ggml : check tensor name lengths in gguf files (llama/10100)	2024-11-15 15:21:04 +02:00
Sergio López	91e30a3a23	kompute: add mul_mat_q4_k shader (llama/10097) This is a more or less direct translation from the Metal implementation to GLSL. Signed-off-by: Sergio Lopez <slp@redhat.com>	2024-11-15 15:21:04 +02:00
Sergio López	1e122d66f9	kompute: add backend registry / device interfaces (llama/10045) Get in line with the other backends by supporting the newer backend/device registry interfaces. Signed-off-by: Sergio Lopez <slp@redhat.com>	2024-11-15 15:21:04 +02:00
Diego Devesa	63a4e09a0f	ggml : fix memory leaks when loading invalid gguf files (llama/10094) * ggml : fix gguf string leak when reading kv pairs fails * ggml : avoid crashing with GGML_ABORT when the KV has an invalid type * ggml : avoid crashing on failed memory allocations when loading a gguf file	2024-11-15 15:21:04 +02:00
xctan	75dd198870	ggml : add Q4_0_8_8 RISC-V GEMV and GEMM kernels (llama/10029) * ggml : RISC-V vector gemv for q4_0_8x8 * ggml : Added WIP rvv q4_0_8x8 gemm * ggml : Added initial implementation of rvv gemm * ggml : optimize gemm to avoid register spillover * ggml : Fix GCC rvv load alignment issue * ggml : Format gemm rvv code * ggml : Fix a typo in RVV q4_0_8_8 GEMM	2024-11-15 15:21:04 +02:00
Diego Devesa	1d48457aa6	llama : refactor model loader with backend registry (llama/10026)	2024-11-15 15:21:04 +02:00
Changyeon Kim	307712a903	ggml: Add POOL2D OP for GPU acceleration to the Vulkan backend in the MobileVLM model. (llama/9763) * ggml: Add POOL2D OP for GPU ACC to the Vulkan. - The MobileVLM model now supports inference acceleration through GPU by utilizing the Vulkan backend. - A GGML_OP_POOL_2D shader has been added. (Pooling) - The encoding performance of the CLIP model improved from 2.8s on the CPU to 0.7s on the GPU. Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com> * [fix] Correct the incorrect order of the parameters. fix casting to int. Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com> --------- Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>	2024-11-15 15:21:04 +02:00
R0CKSTAR	fbc9a05ddf	musa: workaround for Guilty Lockup in cleaning src0 (llama/10042) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2024-11-15 15:21:04 +02:00
Yuri Khrustalev	28496ac55e	cmake : make it possible linking ggml as external lib (ggml/1003)	2024-11-15 15:21:04 +02:00
Plamen Minev	b1c06c09b0	metal : fix minor string leaks (ggml/1004)	2024-11-15 15:21:04 +02:00
thewh1teagle	5ccca19f0c	ggml : vulkan logs (#2547 )	2024-11-13 21:47:15 +02:00
Ma Mingfei	b5b4b0f5de	ggml : add AMX backend (llama/8998)	2024-11-01 10:19:05 +02:00
Georgi Gerganov	ab36d02560	metal : support permuted matrix multiplicaions (llama/10033) * metal : support permuted matrix multiplicaions ggml-ci * cont : use nb01 directly for row steps ggml-ci * cont : add comments [no ci] * metal : minor refactor * metal : minor	2024-11-01 10:19:05 +02:00
Johannes Gäßler	6e67749c00	CUDA: fix insufficient buffer clearing for MMQ (llama/10032)	2024-11-01 10:19:05 +02:00
Johannes Gäßler	ab0385f43b	CUDA: fix MMQ for non-contiguous src0, add tests (llama/10021) * CUDA: fix MMQ for non-contiguous src0, add tests * revise test code	2024-11-01 10:19:05 +02:00
bssrdf	10eb603a3c	increase cuda_cpy block size (ggml/996) Co-authored-by: bssrdf <bssrdf@gmail.com>	2024-11-01 10:19:05 +02:00
Jun Hee Yoo	a3231b2f2e	metal : add POOL2D and fix IM2COL (llama/9943) * add pool_2d Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * fix im2col and add unittest for N>=1024 Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * add tests for N % 1024 != 0 Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * remove trailing whitespaces Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * apply suggestions Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * apply more optimization - original IM2COL kernel + _ext with MIN() Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * apply review: change kernel name of pool_2d Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * apply review Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * fix more formatting and enhance readability Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> --------- Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>	2024-11-01 10:19:05 +02:00
leo-pony	13db492f83	Adapt to dynamically loadable backends mechanism (llama/9970) * [CANN] Adapt to dynamically loadable backends mechanism * Fix the Bug: inference running result is garbled in debug running model for LM models who's type is Q4_0 class * Handle the review comments of this pull request	2024-11-01 10:19:05 +02:00
Georgi Gerganov	741c138aa1	ggml : add asserts for type conversion in fattn kernels (llama/9971) ggml-ci	2024-11-01 10:19:05 +02:00
Radoslav Gerganov	25f9fee6fb	rpc : pack only RPC structs (llama/9959)	2024-11-01 10:19:05 +02:00
Neo Zhang Jianyu	7c1570bee6	fix mul_mat_vec_q and *_vec_q error (llama/9939) Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>	2024-11-01 10:19:05 +02:00
Radoslav Gerganov	4078e4c388	rpc : backend refactoring (llama/9912) * rpc : refactor backend Use structs for RPC request/response messages * rpc : refactor server	2024-11-01 10:19:05 +02:00
Ouadie EL FAROUKI	a4a22daa8f	Add SYCL Backend registry, device and Event Interfaces (llama/9705) * implemented missing SYCL event APIs * sycl : Added device and backend reg interfaces * Restructured ggml-sycl.cpp	2024-11-01 10:19:05 +02:00
Ma Mingfei	e1936eb2a5	add amx kernel for gemm (llama/8998) add intel amx isa detection add vnni kernel for gemv cases add vnni and amx kernel support for block_q8_0 code cleanup fix packing B issue enable openmp fine tune amx kernel switch to aten parallel pattern add error message for nested parallelism code cleanup add f16 support in ggml-amx add amx kernels for QK_K quant formats: Q4_K, Q5_K, Q6_K and IQ4_XS update CMakeList update README fix some compilation warning fix compiler warning when amx is not enabled minor change ggml-ci move ggml_amx_init from ggml.c to ggml-amx/mmq.cpp ggml-ci update CMakeLists with -mamx-tile, -mamx-int8 and -mamx-bf16 ggml-ci add amx as an ggml-backend update header file, the old path for immintrin.h has changed to ggml-cpu-impl.h minor change update CMakeLists.txt minor change apply weight prepacking in set_tensor method in ggml-backend fix compile error ggml-ci minor change ggml-ci update CMakeLists.txt ggml-ci add march dependency minor change ggml-ci change ggml_backend_buffer_is_host to return false for amx backend ggml-ci fix supports_op use device reg for AMX backend ggml-ci minor change ggml-ci minor change fix rebase set .buffer_from_host_ptr to be false for AMX backend	2024-11-01 10:19:05 +02:00
Diego Devesa	28b044dad9	vulkan : add backend registry / device interfaces (llama/9721) * vulkan : add backend registry / device interfaces * llama : print devices used on model load	2024-11-01 10:19:05 +02:00
Gilad S	b8f11a0a17	fix: allocating CPU buffer with size `0` (llama/9917)	2024-11-01 10:19:05 +02:00

1 2 3 4 5 ...

273 Commits