whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2024-12-20 21:23:06 +00:00

Author	SHA1	Message	Date
mahorozte	4af9626702	CUDA: remove unnecessary warp reduce in FA (ggml/1032) * kqmax_new_j in every thread within warp is same after operate at line 199,this reduce can be omit * same problem in vec32 --------- Co-authored-by: ZhaoXiaoYu <zhao.xiaoyu@zte.com.cn>	2024-12-08 20:14:35 +02:00
PAB	c52d1035de	feat: add `GGML_UNARY_OP_ARGMAX` Metal kernel (ggml/1019) * implemented argmax kernel * tpig -> tgpig * change to strides * contiguous assertions * kernel working and tested * argmax simd parallel implementation * added 2 new tests for argmax in test-backend-ops * cosmit * added 3 tests cases for perf eval * add test_argmax in make_test_cases_perf * Update test-backend-ops.cpp Co-authored-by: Diego Devesa <slarengh@gmail.com> --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2024-12-08 20:14:35 +02:00
PAB	5773a14980	metal : add `GGML_OP_CONV_TRANSPOSE_1D` kernels (ggml/1026) * wip * wip implementation f32 * kernel conv transpose 1d f32 working * initial commit	2024-12-08 20:14:35 +02:00
Frankie Robertson	6939147c47	Do not include arm_neon.h when compiling CUDA code (ggml/1028)	2024-12-08 20:14:35 +02:00
Johannes Gäßler	98f9916c9f	ggml-opt: fix data corruption (ggml/1022)	2024-12-08 20:14:35 +02:00
KITAITI Makoto	021eef1000	ruby : Add low-level methods to transcribe (#2585 ) * Add tests for Whisper::Context#full * Add Whisper::Context#full * Add tests for Whisper::Error * Add document of Whisper::Context#full [skip ci] * Add additional signature for Whisper::Context#full * Add description to Whisper::Context#full * Add test for Whisper::Context#full_parallel * Add Whisper::Context#full_parallel * Hide Whisper's instance methods from Ruby code * Add class to test MemoryView * Build test class before running test * Add test for MemoryView * Make Whisper::Context#full and #full_parallel accept MemoryView * Use Ruby 3.1 on CI * Add comment on samples data type * Update README * Update README * Remove unused code	2024-11-28 10:33:07 +02:00
Michael Rienstra	a9d06ce151	models : add `q8_0` models to `download-ggml-model.sh` (#2589 )	2024-11-28 10:31:54 +02:00
KITAITI Makoto	8c6a9b8bb6	ruby : Follow source tree change (#2580 ) * Follow whisper.cpp source tree change * Update whispercpp.gemspec * Follow whisper.cpp log level change * Fix paths in GitHub workflow for Ruby bindings * Use GitHub workflow setting for dependency definition * Use ternary operator	2024-11-21 17:04:29 +02:00
Georgi Gerganov	37c88027e1	whisper : use backend registry (#0 )	2024-11-20 21:00:08 +02:00
slaren	9db070a3c5	ggml/sched : do not skip views in pre-assignments	2024-11-20 21:00:08 +02:00
Georgi Gerganov	7fd8d9c220	whisper : adapt to new ggml (wip)	2024-11-20 21:00:08 +02:00
Georgi Gerganov	06e059b8f8	talk-llama : sync llama.cpp	2024-11-20 21:00:08 +02:00
Georgi Gerganov	c9f49d5f9d	sync : ggml	2024-11-20 21:00:08 +02:00
Georgi Gerganov	f4c1d7df39	ggml : sync resolve (skip) (#0 )	2024-11-20 21:00:08 +02:00
bandoti	339b8e559c	Add required ggml-base and backend libs to cmake pkg (llama/10407)	2024-11-20 21:00:08 +02:00
Diego Devesa	5f6d6919b4	cuda : fix CUDA_FLAGS not being applied (llama/10403)	2024-11-20 21:00:08 +02:00
Romain Biessy	8ee767732f	sycl : Add option to set the SYCL architecture for all targets (llama/10266) * Add option to set the SYCL architecture for all targets * Convert GGML_SYCL_HIP_TARGET to the more generic GGML_SYCL_ARCH option * Document that setting GGML_SYCL_ARCH can improve the performance	2024-11-20 21:00:08 +02:00
Jeff Bolz	45f1f9144f	vulkan: Optimize soft_max (llama/10301) * vulkan: Optimize soft_max Large soft_max could already saturate memory, but small/medium sizes were pretty slow. The bulk of the gains for them comes from using a smaller workgroup size, and making the workgroup size match the subgroup size also makes the barriers much cheaper. Cache some values in locals to avoid refetching/recomputing. And stamp out a few "template instantiations" so smaller cases will fully unroll. Add a missing early return for OOB rows. This happens when there are more than 512 rows and the dispatch is 512 x H. * vulkan: Further soft_max optimizations Restore the workgroup size of 512 case, use it for >1024. Use unrollable loops for more iteration counts.	2024-11-20 21:00:08 +02:00
Alberto Cabrera Pérez	53589c8f12	sycl: Revert MUL_MAT_OP support changes (llama/10385)	2024-11-20 21:00:08 +02:00
Diego Devesa	7ac2f17fac	cuda : only use native when supported by cmake (llama/10389)	2024-11-20 21:00:08 +02:00
Jeff Bolz	48862c7b27	vulkan: remove use of null initializer (llama/10372) Seems like this isn't working for vulkan-over-metal when the array is sized by a spec constant. Maybe a spirv-cross limitation?	2024-11-20 21:00:08 +02:00
Plamen Minev	44f7d9f4e3	metal : fox offset integer overflows in im2col (ggml/1015) -- While running StableDiffusion.cpp locally with Metal some offsets overflow and results in incorrect calculations	2024-11-20 21:00:08 +02:00
0cc4m	fd12302587	Vulkan: Fix device info output format specifiers (llama/10366) * Vulkan: Fix device info output format specifiers * Vulkan: Use zu printf specifier for size_t instead of ld	2024-11-20 21:00:08 +02:00
PAB	f80bef4630	metal : add `GGML_UNARY_OP_ELU` kernel (ggml/1018)	2024-11-20 21:00:08 +02:00
Johannes Gäßler	161b443514	CUDA: fix MMV kernel being used for FP16 src1 (llama/10357)	2024-11-20 21:00:08 +02:00
Johannes Gäßler	ef7fbe1c66	CMake: fix typo in comment [no ci] (llama/10360)	2024-11-20 21:00:08 +02:00
Diego Devesa	0879d3599e	llama : only use default buffer types for the KV cache (llama/10358)	2024-11-20 21:00:08 +02:00
Georgi Gerganov	2a444dc5bd	metal : refactor kernel args into structs (llama/10238) * metal : add kernel arg structs (wip) * metal : fattn args ggml-ci * metal : cont + avoid potential int overflow [no ci] * metal : mul mat struct (wip) * cont : mul mat vec * cont : pass by reference * cont : args is first argument * cont : use char ptr * cont : shmem style * cont : thread counters style * cont : mul mm id ggml-ci * cont : int safety + register optimizations ggml-ci * metal : GGML_OP_CONCAT ggml-ci * metal : GGML_OP_ADD, GGML_OP_SUB, GGML_OP_MUL, GGML_OP_DIV * metal : GGML_OP_REPEAT * metal : GGML_OP_CPY * metal : GGML_OP_RMS_NORM * metal : GGML_OP_NORM * metal : add TODOs for rest of ops * ggml : add ggml-metal-impl.h ggml-ci	2024-11-20 21:00:08 +02:00
FirstTimeEZ	45cf1634dc	ggml : fix undefined reference to 'getcpu' (llama/10354) https://github.com/ggerganov/llama.cpp/issues/10352	2024-11-20 21:00:08 +02:00
Johannes Gäßler	dcb2922d1d	CUDA: remove DMMV, consolidate F16 mult mat vec (llama/10318)	2024-11-20 21:00:08 +02:00
Johannes Gäßler	3c5c751174	CMake: default to -arch=native for CUDA build (llama/10320)	2024-11-20 21:00:08 +02:00
Diego Devesa	24ad19d0e9	ggml : fix possible buffer use after free in sched reserve (llama/9930)	2024-11-20 21:00:08 +02:00
Georgi Gerganov	bd574b05af	ggml : inttypes.h -> cinttypes (llama/0) ggml-ci	2024-11-20 21:00:08 +02:00
Georgi Gerganov	7e0eafcb1e	ggml : adapt AMX to tensor->grad removal (llama/0) ggml-ci	2024-11-20 21:00:08 +02:00
Georgi Gerganov	75670ae673	ggml : fix compile warnings (llama/0) ggml-ci	2024-11-20 21:00:08 +02:00
Georgi Gerganov	d4fcdf602b	llamafile : fix include path (llama/0) ggml-ci	2024-11-20 21:00:08 +02:00
Jeff Bolz	1bebb1a116	vulkan: Optimize some mat-vec mul quant shaders (llama/10296) Compute two result elements per workgroup (for Q{4,5}_{0,1}). This reuses the B loads across the rows and also reuses some addressing calculations. This required manually partially unrolling the loop, since the compiler is less willing to unroll outer loops. Add bounds-checking on the last iteration of the loop. I think this was at least partly broken before. Optimize the Q4_K shader to vectorize most loads and reduce the number of bit twiddling instructions.	2024-11-20 21:00:08 +02:00
Dan Johansson	ee437cde59	ggml : optimize Q4_0 into Q4_0_X_Y repack (llama/10324)	2024-11-20 21:00:08 +02:00
Srihari-mcw	c1506d38cf	Make updates to fix issues with clang-cl builds while using AVX512 flags (llama/10314)	2024-11-20 21:00:08 +02:00
Johannes Gäßler	c9541741e6	ggml: new optimization interface (ggml/988) * ggml: new optimization interface remove test2.c, test3.c store adamw params in tensor move grads from tensor to graph * avoid segfault upon API misuse * add ggml-opt.h to public headers * remove dependence of ggml-opt.cpp on ggml-cpu.h	2024-11-20 21:00:08 +02:00
Georgi Gerganov	6a55015dc4	ggml : remove duplicated sources from the last sync (ggml/1017) * ggml : remove duplicated sources from the last sync ggml-ci * cont : remove FindSIMD.cmake [no ci]	2024-11-20 21:00:08 +02:00
slaren	7e86030d4d	ggml : fix some build issues	2024-11-20 21:00:08 +02:00
Georgi Gerganov	401fbea326	sync : leftovers (ggml/0) ggml-ci	2024-11-20 21:00:08 +02:00
Georgi Gerganov	44d1cbdfe9	cmake : restore CMakeLists.txt (llama/10256) ggml-ci	2024-11-20 21:00:08 +02:00
Eve	3216efef2e	AVX BF16 and single scale quant optimizations (llama/10212) * use 128 bit loads (i've tried 256->128 to death and its slower) * double accumulator * avx bf16 vec dot * +3% q4_0 inference * +7% tg +5% pp compared to master * slower f16c version, kep for reference * 256b version, also slow. i tried :) * revert f16 * faster with madd * split to functions * Q8_0 and IQ4_NL, 5-7% faster * fix potential overflow (performance reduced) * 16 bit add for q4_0 only * merge	2024-11-20 21:00:08 +02:00
Romain Biessy	2c0484ebf7	sycl: Use syclcompat::dp4a (llama/10267) * sycl: Use syclcompat::dp4a * Using the syclcompat version allow the compiler to optimize the operation with native function * Update news section * Update CI Windows oneAPI version to 2025.0 * Reword doc * Call syclcompat::dp4a inside dpct::dp4a This reverts commit 90cb61d692d61360b46954a1c7f780bd2e569b73.	2024-11-20 21:00:08 +02:00
Charles Xu	3298916e5e	backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (llama/9921) * backend-cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2024-11-20 21:00:08 +02:00
Diego Devesa	746bf2596f	ggml : build backends as libraries (llama/10256) * ggml : build backends as libraries --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>	2024-11-20 21:00:08 +02:00
Georgi Gerganov	5f7e094ccb	scripts : update sync	2024-11-20 21:00:08 +02:00
Georgi Gerganov	6266a9f9e5	release : v1.7.2	2024-11-19 18:54:22 +02:00

1 2 3 4 5 ...

1922 Commits