whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2024-12-21 21:47:47 +00:00

Author	SHA1	Message	Date
Vin Misra	31aea563a8	whisper : fix extra memory usage (#2534 ) * passing samples_padded by ref to the threads. * passing samples_padded by ref to the threads. --------- Co-authored-by: Vinith Misra <physicsdemon@gmail.com>	2024-11-06 23:02:11 +02:00
Georgi Gerganov	0377596b77	whisper : backend registry init before model load	2024-11-01 10:19:05 +02:00
Georgi Gerganov	c65d0fd3c8	talk-llama : sync llama.cpp	2024-11-01 10:19:05 +02:00
Georgi Gerganov	d9efb664ac	sync : ggml	2024-11-01 10:19:05 +02:00
Ma Mingfei	b5b4b0f5de	ggml : add AMX backend (llama/8998)	2024-11-01 10:19:05 +02:00
Georgi Gerganov	ab36d02560	metal : support permuted matrix multiplicaions (llama/10033) * metal : support permuted matrix multiplicaions ggml-ci * cont : use nb01 directly for row steps ggml-ci * cont : add comments [no ci] * metal : minor refactor * metal : minor	2024-11-01 10:19:05 +02:00
Johannes Gäßler	6e67749c00	CUDA: fix insufficient buffer clearing for MMQ (llama/10032)	2024-11-01 10:19:05 +02:00
Johannes Gäßler	ab0385f43b	CUDA: fix MMQ for non-contiguous src0, add tests (llama/10021) * CUDA: fix MMQ for non-contiguous src0, add tests * revise test code	2024-11-01 10:19:05 +02:00
bssrdf	10eb603a3c	increase cuda_cpy block size (ggml/996) Co-authored-by: bssrdf <bssrdf@gmail.com>	2024-11-01 10:19:05 +02:00
Jun Hee Yoo	a3231b2f2e	metal : add POOL2D and fix IM2COL (llama/9943) * add pool_2d Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * fix im2col and add unittest for N>=1024 Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * add tests for N % 1024 != 0 Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * remove trailing whitespaces Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * apply suggestions Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * apply more optimization - original IM2COL kernel + _ext with MIN() Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * apply review: change kernel name of pool_2d Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * apply review Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> * fix more formatting and enhance readability Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com> --------- Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>	2024-11-01 10:19:05 +02:00
leo-pony	13db492f83	Adapt to dynamically loadable backends mechanism (llama/9970) * [CANN] Adapt to dynamically loadable backends mechanism * Fix the Bug: inference running result is garbled in debug running model for LM models who's type is Q4_0 class * Handle the review comments of this pull request	2024-11-01 10:19:05 +02:00
Georgi Gerganov	741c138aa1	ggml : add asserts for type conversion in fattn kernels (llama/9971) ggml-ci	2024-11-01 10:19:05 +02:00
Radoslav Gerganov	25f9fee6fb	rpc : pack only RPC structs (llama/9959)	2024-11-01 10:19:05 +02:00
Neo Zhang Jianyu	7c1570bee6	fix mul_mat_vec_q and *_vec_q error (llama/9939) Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>	2024-11-01 10:19:05 +02:00
Radoslav Gerganov	4078e4c388	rpc : backend refactoring (llama/9912) * rpc : refactor backend Use structs for RPC request/response messages * rpc : refactor server	2024-11-01 10:19:05 +02:00
Ouadie EL FAROUKI	a4a22daa8f	Add SYCL Backend registry, device and Event Interfaces (llama/9705) * implemented missing SYCL event APIs * sycl : Added device and backend reg interfaces * Restructured ggml-sycl.cpp	2024-11-01 10:19:05 +02:00
Ma Mingfei	e1936eb2a5	add amx kernel for gemm (llama/8998) add intel amx isa detection add vnni kernel for gemv cases add vnni and amx kernel support for block_q8_0 code cleanup fix packing B issue enable openmp fine tune amx kernel switch to aten parallel pattern add error message for nested parallelism code cleanup add f16 support in ggml-amx add amx kernels for QK_K quant formats: Q4_K, Q5_K, Q6_K and IQ4_XS update CMakeList update README fix some compilation warning fix compiler warning when amx is not enabled minor change ggml-ci move ggml_amx_init from ggml.c to ggml-amx/mmq.cpp ggml-ci update CMakeLists with -mamx-tile, -mamx-int8 and -mamx-bf16 ggml-ci add amx as an ggml-backend update header file, the old path for immintrin.h has changed to ggml-cpu-impl.h minor change update CMakeLists.txt minor change apply weight prepacking in set_tensor method in ggml-backend fix compile error ggml-ci minor change ggml-ci update CMakeLists.txt ggml-ci add march dependency minor change ggml-ci change ggml_backend_buffer_is_host to return false for amx backend ggml-ci fix supports_op use device reg for AMX backend ggml-ci minor change ggml-ci minor change fix rebase set .buffer_from_host_ptr to be false for AMX backend	2024-11-01 10:19:05 +02:00
Diego Devesa	28b044dad9	vulkan : add backend registry / device interfaces (llama/9721) * vulkan : add backend registry / device interfaces * llama : print devices used on model load	2024-11-01 10:19:05 +02:00
Gilad S	b8f11a0a17	fix: allocating CPU buffer with size `0` (llama/9917)	2024-11-01 10:19:05 +02:00
Gilad S	ff5a838099	fix: use `vm_allocate` to allocate CPU backend buffer on macOS (llama/9875) * fix: use `vm_allocate` to allocate CPU backend buffer on macOS * fix: switch to `posix_memalign` to keep existing `free()` usages work * feat: move `GGML_ALIGNED_MALLOC` to `ggml-backend-impl.h`, add support for `vm_allocate` on macOS * style: formatting * fix: move const outside of `#ifndef` * style: formatting * fix: unused var * fix: transform `GGML_ALIGNED_MALLOC` and `GGML_ALIGNED_FREE` into functions and add them to `ggml-impl.h` * fix: unused var * fix: page align to `GGUF_DEFAULT_ALIGNMENT` * fix: page align to `TENSOR_ALIGNMENT` * fix: convert `TENSOR_ALIGNMENT` to a macro * fix: increase page size to `32` on iOS * fix: iOS page size * fix: `hbw_posix_memalign` alignment	2024-11-01 10:19:05 +02:00
Johannes Gäßler	84713613be	CUDA: fix 1D im2col, add tests (ggml/993)	2024-11-01 10:19:05 +02:00
leo-pony	ded89c9d08	Fix cann compilation error (llama/9891) Fix cann compilation error after merging llama.cpp supports dynamically loadable backends.	2024-11-01 10:19:05 +02:00
agray3	042e95d92f	Vectorize load instructions in dmmv f16 CUDA kernel (llama/9816) * Vectorize load instructions in dmmv f16 CUDA kernel Replaces scalar with vector load instructions, which substantially improves performance on NVIDIA HBM GPUs, e.g. gives a 1.27X overall speedup for Meta-Llama-3-8B-Instruct-F16 BS1 inference evaluation on H100 SXM 80GB HBM3. On GDDR GPUs, there is a slight (1.01X) speedup. * addressed comment * Update ggml/src/ggml-cuda/dmmv.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2024-11-01 10:19:05 +02:00
Diego Devesa	81110c0174	ggml : move more prints to the ggml log system (llama/9839) * ggml : move more prints to the ggml log system * show BLAS OpenMP warnings in all builds using debug print	2024-11-01 10:19:05 +02:00
Diego Devesa	c313723860	rpc : add backend registry / device interfaces (llama/9812) * rpc : add backend registry / device interfaces * llama : add llama_supports_rpc API * ggml_backend_rpc_start_rpc_server -> ggml_backend_rpc_start_server	2024-11-01 10:19:05 +02:00
R0CKSTAR	e69b2371e2	musa: add docker image support (llama/9685) * mtgpu: add docker image support Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * mtgpu: enable docker workflow Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2024-11-01 10:19:05 +02:00
Diego Devesa	1531259b2c	ggml : fix BLAS with unsupported types (llama/9775) * ggml : do not use BLAS with types without to_float * ggml : return pointer from ggml_internal_get_type_traits to avoid unnecessary copies * ggml : rename ggml_internal_get_type_traits -> ggml_get_type_traits it's not really internal if everybody uses it	2024-11-01 10:19:05 +02:00
Diego Devesa	44bc2767fd	ggml : add backend registry / device interfaces to BLAS backend (llama/9752) * ggml : add backend registry / device interfaces to BLAS backend * fix mmap usage when using host buffers	2024-11-01 10:19:05 +02:00
Andrew Minh Nguyen	bd7ace7adc	Update building for Android (llama/9672) * docs : clarify building Android on Termux * docs : update building Android on Termux * docs : add cross-compiling for Android * cmake : link dl explicitly for Android	2024-11-01 10:19:05 +02:00
Georgi Gerganov	315364d7de	ggml : add metal backend registry / device (llama/9713) * ggml : add metal backend registry / device ggml-ci * metal : fix names [no ci] * metal : global registry and device instances ggml-ci * cont : alternative initialization of global objects ggml-ci * llama : adapt to backend changes ggml-ci * fixes * metal : fix indent * metal : fix build when MTLGPUFamilyApple3 is not available ggml-ci * fix merge * metal : avoid unnecessary singleton accesses ggml-ci * metal : minor fix [no ci] * metal : g_state -> g_ggml_ctx_dev_main [no ci] * metal : avoid reference of device context in the backend context ggml-ci * metal : minor [no ci] * metal : fix maxTransferRate check * metal : remove transfer rate stuff --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-11-01 10:19:05 +02:00
Paul Tsochantaris	80753d4da8	metal : single allocation of encode_async block (llama/9747) * Single allocation of encode_async block with non-ARC capture in ggml-metal.m * Moving Block_release to the deallocation code * Release encode block when re-setting encoding buffer count if needed * Update ggml/src/ggml-metal.m --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-11-01 10:19:05 +02:00
Daniel Bevenius	8f9bdca4c4	ggml-alloc : remove buffer_id from leaf_alloc (ggml/987) This commit removes the buffer_id field from the leaf_alloc struct. The motivation for is that this field is only written to and never read/used as far as I can tell. Each tensor_alloc has a buffer_id field and this is what caused me to look into this more closely, to understand what the buffer_id in leaf_alloc was used for.	2024-11-01 10:19:05 +02:00
Georgi Gerganov	4e10afb5a9	scripts : sync amx	2024-10-31 22:13:24 +02:00
Georgi Gerganov	aa037a60f3	ggml : alloc ggml_contexts on the heap (#2525 ) * whisper : reduce ggml_context usage * ggml : allocate contexts on the heap (v2) * ggml : aligned malloc -> malloc	2024-10-31 22:00:09 +02:00
Georgi Gerganov	19dca2bb14	ci : fix openblas build (#2511 ) * ci : fix openblas build * cont : would this work? * ci : I'm sorry, windows * cont : disabled wrong build * ci : fix openblas build with pkgconfiglite (#2517) - choco install pkgconfiglite (vcpkg-pkgconf doesn't contain pkg-config executable?) - vcpkg install openblas (otherwise it is not detected now) --------- Co-authored-by: Tamotsu Takahashi <ttakah+github@gmail.com>	2024-10-30 12:58:26 +02:00
Georgi Gerganov	55e422109b	scripts : add turbo-q8_0 to the benchmark	2024-10-29 19:37:24 +02:00
Georgi Gerganov	3f020fac9d	whisper : minor compile warning	2024-10-29 19:30:26 +02:00
jettoblack	1626b73b03	whisper : move new-segment callback after DTW step (#2515 )	2024-10-29 08:47:21 +02:00
KITAITI Makoto	850f7b19d3	ruby : fix installation test (#2519 )	2024-10-29 08:45:37 +02:00
KITAITI Makoto	d4bc413505	ruby : add more APIs (#2518 ) * Add test for built package existence * Add more tests for Whisper::Params * Add more Whisper::Params attributes * Add tests for callbacks * Add progress and abort callback features * [skip ci] Add prompt usage in README * Change prompt text in example	2024-10-28 19:23:23 +02:00
KITAITI Makoto	fc49ee4479	ruby : support new-segment callback (#2506 ) * Add Params#new_segment_callback= method * Add tests for Params#new_segment_callback= * Group tests for #transcribe * Don't use static for thread-safety * Set new_segment_callback only when necessary * Remove redundant check * [skip ci] Add Ruby version README * Revert "Group tests for #transcribe" This reverts commit `71b65b00cc`. * Revert "Add tests for Params#new_segment_callback=" This reverts commit `81e6df3bab`. * Add test for Context#full_n_segments * Add Context#full_n_segments * Add tests for lang API * Add lang API * Add tests for Context#full_lang_id API * Add Context#full_lang_id * Add abnormal test cases for lang * Raise appropriate errors from lang APIs * Add tests for Context#full_get_segment_t{0,1} API * Add Context#full_get_segment_t{0,1} * Add tests for Context#full_get_segment_speaker_turn_next API * Add Context#full_get_segment_speaker_turn_next * Add tests for Context#full_get_segment_text * Add Context#full_get_setgment_text * Add tests for Params#new_segment_callback= * Run new segment callback * Split tests to multiple files * Use container struct for new segment callback * Add tests for Params#new_segment_callback_user_data= * Add Whisper::Params#new_user_callback_user_data= * Add GC-related test for new segment callback * Protect new segment callback related structs from GC * Add meaningful test for build * Rename: new_segment_callback_user_data -> new_segment_callback_container * Add tests for Whisper::Segment * Add Whisper::Segment and Whisper::Context#each_segment * Extract c_ruby_whisper_callback_container_allocate() * Add test for Whisper::Params#on_new_segment * Add Whisper::Params#on_new_egment * Assign symbol IDs to variables * Make extsources.yaml simpler * Update README * Add document comments * Add test for calling Whisper::Params#on_new_segment multiple times * Add file dependencies to GitHub actions config and .gitignore * Add more files to ext/.gitignore	2024-10-28 15:43:27 +02:00
KITAITI Makoto	c0ea41f6b2	ruby : add Metal support (#2516 )	2024-10-28 13:08:09 +02:00
Josscii	0fbaac9c89	whisper : fix index overflow in token-level timestamp logic (#2505 )	2024-10-23 15:14:03 +03:00
toboil-features	a5abfe6a90	readme : update links and make commands (#2489 ) * Update links to headers in README.md * Add link to Vulkan section in README.md * Add "-j" for parallelism for "make" in README.md * Update README.md	2024-10-17 13:25:18 +03:00
KITAITI Makoto	d3f7137cc9	ruby : fix bindings (#2484 ) * Improve Rakefile * Remove intermediate files * Remove unnecessary manipulations from extconf.rb * Add README and LINCENSE to source files * Manage ext source files using YAML file * Use extsources.yaml to include files into gem package file * Add git-managed source files to build dependency * Add test task * Download model for test if not exists * Add test for build * Ignore gem package directory * Enable GitHub action for Ruby binding * Fix model name * Build lib file for test * Use extension for each platform * Use extension for each platform on testing * Move built lib file rather than copy * Add intermediate files to clean targets	2024-10-16 18:44:04 +03:00
toboil-features	f7c99e49b3	readme : add Vulkan notice (#2488 ) * Add Vulkan notice in README.md * Fix formatting for Vulkan section in README.md * Fix formatting in README.md	2024-10-16 18:43:26 +03:00
Georgi Gerganov	1d5752fa42	make : fix GGML_VULKAN=1 build (#2485 )	2024-10-16 18:42:47 +03:00
Rotem Dan	b6049060dd	whisper : add dtw preset for large-v3-turbo (#2481 )	2024-10-15 21:00:21 +03:00
CrispStrobe	06a1da9daf	convert : handle max_target_positions (#2477 ) as needed eg for https://huggingface.co/primeline/whisper-large-v3-turbo-german/blob/main/config.json	2024-10-14 10:46:33 +03:00
Salman Faroz	746d173592	readme : update the Quick Start section (#2475 ) navigating into the directory	2024-10-14 10:44:57 +03:00

1 2 3 4 5 ...

1855 Commits