whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2024-12-23 14:32:23 +00:00

Author	SHA1	Message	Date
Diego Devesa	2ab8cce7e3	llama : add simple-chat example (llama/10124) * llama : add simple-chat example --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>	2024-11-15 15:21:04 +02:00
Diego Devesa	b40c255e98	llama : use smart pointers for ggml resources (llama/10117)	2024-11-15 15:21:04 +02:00
Shupei Fan	ec3e16445e	vulkan : improve ggml_vk_create_buffer error handling (llama/9898)	2024-11-15 15:21:04 +02:00
Georgi Gerganov	0665168ef3	ggml : remove ggml_scratch (llama/10121) ggml-ci	2024-11-15 15:21:04 +02:00
Zhenwei Jin	5f6b992eea	build: fix build error in Windows env with OneAPI setup (llama/10107)	2024-11-15 15:21:04 +02:00
Diego Devesa	3e231ab9cc	llama : fix buffer checks for mamba and rwk (llama/10111) * llama : fix buffer checks for mamba and rwk * llama : fix missing worst case flag during reserve * cuda : fix supports_op for norm * disable sched SET_CAUSE	2024-11-15 15:21:04 +02:00
Diego Devesa	371bfaca8c	ggml : check tensor name lengths in gguf files (llama/10100)	2024-11-15 15:21:04 +02:00
Sergio López	91e30a3a23	kompute: add mul_mat_q4_k shader (llama/10097) This is a more or less direct translation from the Metal implementation to GLSL. Signed-off-by: Sergio Lopez <slp@redhat.com>	2024-11-15 15:21:04 +02:00
Sergio López	1e122d66f9	kompute: add backend registry / device interfaces (llama/10045) Get in line with the other backends by supporting the newer backend/device registry interfaces. Signed-off-by: Sergio Lopez <slp@redhat.com>	2024-11-15 15:21:04 +02:00
Diego Devesa	63a4e09a0f	ggml : fix memory leaks when loading invalid gguf files (llama/10094) * ggml : fix gguf string leak when reading kv pairs fails * ggml : avoid crashing with GGML_ABORT when the KV has an invalid type * ggml : avoid crashing on failed memory allocations when loading a gguf file	2024-11-15 15:21:04 +02:00
xctan	75dd198870	ggml : add Q4_0_8_8 RISC-V GEMV and GEMM kernels (llama/10029) * ggml : RISC-V vector gemv for q4_0_8x8 * ggml : Added WIP rvv q4_0_8x8 gemm * ggml : Added initial implementation of rvv gemm * ggml : optimize gemm to avoid register spillover * ggml : Fix GCC rvv load alignment issue * ggml : Format gemm rvv code * ggml : Fix a typo in RVV q4_0_8_8 GEMM	2024-11-15 15:21:04 +02:00
Diego Devesa	1d48457aa6	llama : refactor model loader with backend registry (llama/10026)	2024-11-15 15:21:04 +02:00
Changyeon Kim	307712a903	ggml: Add POOL2D OP for GPU acceleration to the Vulkan backend in the MobileVLM model. (llama/9763) * ggml: Add POOL2D OP for GPU ACC to the Vulkan. - The MobileVLM model now supports inference acceleration through GPU by utilizing the Vulkan backend. - A GGML_OP_POOL_2D shader has been added. (Pooling) - The encoding performance of the CLIP model improved from 2.8s on the CPU to 0.7s on the GPU. Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com> * [fix] Correct the incorrect order of the parameters. fix casting to int. Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com> --------- Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>	2024-11-15 15:21:04 +02:00
R0CKSTAR	fbc9a05ddf	musa: workaround for Guilty Lockup in cleaning src0 (llama/10042) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2024-11-15 15:21:04 +02:00
Yuri Khrustalev	28496ac55e	cmake : make it possible linking ggml as external lib (ggml/1003)	2024-11-15 15:21:04 +02:00
Plamen Minev	b1c06c09b0	metal : fix minor string leaks (ggml/1004)	2024-11-15 15:21:04 +02:00
Georgi Gerganov	498ac0dc27	scripts : update sync	2024-11-15 15:21:04 +02:00
Raiya Araki	03af461de8	ci : fix building workflow for linux/arm64 container (#2555 )	2024-11-15 11:07:17 +02:00
KITAITI Makoto	f19463ece2	ruby : extend API (#2551 ) * Handle objs in Ruby code * Add task to make Makefile * Share commont constance in test suites * Add model-related APIs * Add Whisper::Model class * Add tests for Whisper::Model * Add missing LDFLAG -lstdc++ * Add tests for Whisper.log_set * Add Whisper.set_log * Define log level * Add document on logging * Add license section to README * Add document on Whisper::Model * Fix examples in README * Add test for Model with GC * Make dependency on Makefile more accurate * Fix bug about Whisper::Model and GC	2024-11-13 21:52:56 +02:00
Jhen-Jie Hong	5f8a086e22	whisper.swiftui : add model download list & bench methods (#2546 ) * swift : fix resources & exclude build * whisper : impl whisper_timings struct & api * whisper.swiftui : model list & bench methods * whisper : return ptr for whisper_get_timings * revert unnecessary change * whisper : avoid designated initializer * whisper.swiftui: code style changes * whisper.swiftui : get device name / os from UIDevice * whisper.swiftui : fix UIDevice usage * whisper.swiftui : add memcpy and ggml_mul_mat (commented)	2024-11-13 21:51:34 +02:00
Wilson Silva	a28d82e373	ruby : fix the instructions (#2548 ) #prompt doesn't exist but #initial_prompt does	2024-11-13 21:47:42 +02:00
thewh1teagle	5ccca19f0c	ggml : vulkan logs (#2547 )	2024-11-13 21:47:15 +02:00
Stefan Sydow	300c07b94d	examples : fix ffmpeg v5 build (#2543 ) remove call to 'av_register_all()' which does not exist in ffmpeg v5 anymore.	2024-11-13 21:41:52 +02:00
Vin Misra	31aea563a8	whisper : fix extra memory usage (#2534 ) * passing samples_padded by ref to the threads. * passing samples_padded by ref to the threads. --------- Co-authored-by: Vinith Misra <physicsdemon@gmail.com>	2024-11-06 23:02:11 +02:00
Georgi Gerganov	0377596b77	whisper : backend registry init before model load	2024-11-01 10:19:05 +02:00
Georgi Gerganov	c65d0fd3c8	talk-llama : sync llama.cpp	2024-11-01 10:19:05 +02:00
Georgi Gerganov	d9efb664ac	sync : ggml	2024-11-01 10:19:05 +02:00
Ma Mingfei	b5b4b0f5de	ggml : add AMX backend (llama/8998)	2024-11-01 10:19:05 +02:00
Georgi Gerganov	ab36d02560	metal : support permuted matrix multiplicaions (llama/10033) * metal : support permuted matrix multiplicaions ggml-ci * cont : use nb01 directly for row steps ggml-ci * cont : add comments [no ci] * metal : minor refactor * metal : minor	2024-11-01 10:19:05 +02:00
Johannes Gäßler	6e67749c00	CUDA: fix insufficient buffer clearing for MMQ (llama/10032)	2024-11-01 10:19:05 +02:00
Johannes Gäßler	ab0385f43b	CUDA: fix MMQ for non-contiguous src0, add tests (llama/10021) * CUDA: fix MMQ for non-contiguous src0, add tests * revise test code	2024-11-01 10:19:05 +02:00
bssrdf	10eb603a3c	increase cuda_cpy block size (ggml/996) Co-authored-by: bssrdf <bssrdf@gmail.com>	2024-11-01 10:19:05 +02:00
Jun Hee Yoo	a3231b2f2e	metal : add POOL2D and fix IM2COL (llama/9943) * add pool_2d Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * fix im2col and add unittest for N>=1024 Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * add tests for N % 1024 != 0 Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * remove trailing whitespaces Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * apply suggestions Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * apply more optimization - original IM2COL kernel + _ext with MIN() Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * apply review: change kernel name of pool_2d Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * apply review Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * fix more formatting and enhance readability Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> --------- Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>	2024-11-01 10:19:05 +02:00
leo-pony	13db492f83	Adapt to dynamically loadable backends mechanism (llama/9970) * [CANN] Adapt to dynamically loadable backends mechanism * Fix the Bug: inference running result is garbled in debug running model for LM models who's type is Q4_0 class * Handle the review comments of this pull request	2024-11-01 10:19:05 +02:00
Georgi Gerganov	741c138aa1	ggml : add asserts for type conversion in fattn kernels (llama/9971) ggml-ci	2024-11-01 10:19:05 +02:00
Radoslav Gerganov	25f9fee6fb	rpc : pack only RPC structs (llama/9959)	2024-11-01 10:19:05 +02:00
Neo Zhang Jianyu	7c1570bee6	fix mul_mat_vec_q and *_vec_q error (llama/9939) Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>	2024-11-01 10:19:05 +02:00
Radoslav Gerganov	4078e4c388	rpc : backend refactoring (llama/9912) * rpc : refactor backend Use structs for RPC request/response messages * rpc : refactor server	2024-11-01 10:19:05 +02:00
Ouadie EL FAROUKI	a4a22daa8f	Add SYCL Backend registry, device and Event Interfaces (llama/9705) * implemented missing SYCL event APIs * sycl : Added device and backend reg interfaces * Restructured ggml-sycl.cpp	2024-11-01 10:19:05 +02:00
Ma Mingfei	e1936eb2a5	add amx kernel for gemm (llama/8998) add intel amx isa detection add vnni kernel for gemv cases add vnni and amx kernel support for block_q8_0 code cleanup fix packing B issue enable openmp fine tune amx kernel switch to aten parallel pattern add error message for nested parallelism code cleanup add f16 support in ggml-amx add amx kernels for QK_K quant formats: Q4_K, Q5_K, Q6_K and IQ4_XS update CMakeList update README fix some compilation warning fix compiler warning when amx is not enabled minor change ggml-ci move ggml_amx_init from ggml.c to ggml-amx/mmq.cpp ggml-ci update CMakeLists with -mamx-tile, -mamx-int8 and -mamx-bf16 ggml-ci add amx as an ggml-backend update header file, the old path for immintrin.h has changed to ggml-cpu-impl.h minor change update CMakeLists.txt minor change apply weight prepacking in set_tensor method in ggml-backend fix compile error ggml-ci minor change ggml-ci update CMakeLists.txt ggml-ci add march dependency minor change ggml-ci change ggml_backend_buffer_is_host to return false for amx backend ggml-ci fix supports_op use device reg for AMX backend ggml-ci minor change ggml-ci minor change fix rebase set .buffer_from_host_ptr to be false for AMX backend	2024-11-01 10:19:05 +02:00
Diego Devesa	28b044dad9	vulkan : add backend registry / device interfaces (llama/9721) * vulkan : add backend registry / device interfaces * llama : print devices used on model load	2024-11-01 10:19:05 +02:00
Gilad S	b8f11a0a17	fix: allocating CPU buffer with size `0` (llama/9917)	2024-11-01 10:19:05 +02:00
Gilad S	ff5a838099	fix: use `vm_allocate` to allocate CPU backend buffer on macOS (llama/9875) * fix: use `vm_allocate` to allocate CPU backend buffer on macOS * fix: switch to `posix_memalign` to keep existing `free()` usages work * feat: move `GGML_ALIGNED_MALLOC` to `ggml-backend-impl.h`, add support for `vm_allocate` on macOS * style: formatting * fix: move const outside of `#ifndef` * style: formatting * fix: unused var * fix: transform `GGML_ALIGNED_MALLOC` and `GGML_ALIGNED_FREE` into functions and add them to `ggml-impl.h` * fix: unused var * fix: page align to `GGUF_DEFAULT_ALIGNMENT` * fix: page align to `TENSOR_ALIGNMENT` * fix: convert `TENSOR_ALIGNMENT` to a macro * fix: increase page size to `32` on iOS * fix: iOS page size * fix: `hbw_posix_memalign` alignment	2024-11-01 10:19:05 +02:00
Johannes Gäßler	84713613be	CUDA: fix 1D im2col, add tests (ggml/993)	2024-11-01 10:19:05 +02:00
leo-pony	ded89c9d08	Fix cann compilation error (llama/9891) Fix cann compilation error after merging llama.cpp supports dynamically loadable backends.	2024-11-01 10:19:05 +02:00
agray3	042e95d92f	Vectorize load instructions in dmmv f16 CUDA kernel (llama/9816) * Vectorize load instructions in dmmv f16 CUDA kernel Replaces scalar with vector load instructions, which substantially improves performance on NVIDIA HBM GPUs, e.g. gives a 1.27X overall speedup for Meta-Llama-3-8B-Instruct-F16 BS1 inference evaluation on H100 SXM 80GB HBM3. On GDDR GPUs, there is a slight (1.01X) speedup. * addressed comment * Update ggml/src/ggml-cuda/dmmv.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2024-11-01 10:19:05 +02:00
Diego Devesa	81110c0174	ggml : move more prints to the ggml log system (llama/9839) * ggml : move more prints to the ggml log system * show BLAS OpenMP warnings in all builds using debug print	2024-11-01 10:19:05 +02:00
Diego Devesa	c313723860	rpc : add backend registry / device interfaces (llama/9812) * rpc : add backend registry / device interfaces * llama : add llama_supports_rpc API * ggml_backend_rpc_start_rpc_server -> ggml_backend_rpc_start_server	2024-11-01 10:19:05 +02:00
R0CKSTAR	e69b2371e2	musa: add docker image support (llama/9685) * mtgpu: add docker image support Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * mtgpu: enable docker workflow Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2024-11-01 10:19:05 +02:00
Diego Devesa	1531259b2c	ggml : fix BLAS with unsupported types (llama/9775) * ggml : do not use BLAS with types without to_float * ggml : return pointer from ggml_internal_get_type_traits to avoid unnecessary copies * ggml : rename ggml_internal_get_type_traits -> ggml_get_type_traits it's not really internal if everybody uses it	2024-11-01 10:19:05 +02:00

... 3 4 5 6 7 ...

1978 Commits