whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-04-24 13:06:09 +00:00

Author	SHA1	Message	Date
uvos	b9eab73fa2	HIP/CUDA: set the paramerter value in maintain_cuda_graph instead of replaceing it. (llama/12209) This avoids conflict with internal cuda/hip runtimes memory managment behavior.	2025-03-08 15:13:01 +02:00
Henry Linjamäki	76385c8311	opencl : fix buffer alignment (llama/12197) Fix the following error: ``` ggml-alloc.c:99: not enough space in the buffer ggml_tallocr_alloc: not enough space in the buffer to allocate blk.17.ffn_down.weight (needed 27525120, available 27521024) ``` which occurs when `ggml_backend_opencl_context::alignment` is larger than `cl_ptr_base` (hard-coded to `0x1000`). Also, fix `ggml_backend_opencl_context::alignment` was set to `CL_DEVICE_MEM_BASE_ADDR_ALIGN` which was treated as bytes but the value is reported in bits.	2025-03-08 15:13:01 +02:00
Henry Linjamäki	442cd1d2e7	opencl : fix `ulong` kernel args were set from `int` variables (llama/12174) ... which left garbage bits in the upper half of the kernel args. This caused segmentation faults when running PoCL.	2025-03-08 15:13:01 +02:00
simon886212	bc8cb97e02	opencl : fix profile-related errors (llama/12095) Co-authored-by: ubuntu <ubuntu@localhost.localdomain>	2025-03-08 15:13:01 +02:00
Rémy O	8dcadf736b	ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions (llama/12154) * ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions * cmake: Add GGML_BMI2 build option * ggml: enable BMI2 on relevant CPU variants * ggml-cpu: include BMI2 in backend score * ggml-cpu: register BMI2 in ggml_backend_cpu_get_features * ggml-cpu: add __BMI2__ define when using MSVC	2025-03-08 15:13:01 +02:00
Akarshan Biswas	93986b61e0	SYCL: Disable f16 Unary OPs as not supported by the kernels (llama/12201)	2025-03-08 15:13:01 +02:00
Plamen Minev	bd1a9e34c9	ggml : fix GGMLMetalClass ODR (llama/12200) -- it might happen if ggml is loaded from 2 separate libraries since each one of them will expose the class. This is more of a guard since we want to use only Metal as embedded library and don't care about the other case.	2025-03-08 15:13:01 +02:00
vmobilis	cc03608e78	ggml : ggml_compute_forward_concat() for arbitrary tensor type (ggml/1118) * ggml_compute_forward_concat() for arbitrary tensor type * Check that tensors' type match * ggml-cpu.c: check type of source tensors * ggml-cpu.c: move tensor type check to ggml_compute_forward_concat() * ggml.c: check concatenated tensor type * Remove tensor type check from ggml_compute_forward_concat() in ggml-cpu.c ..., as it was moved to ggml.c.	2025-03-08 15:13:01 +02:00
Georgi Gerganov	54a54faee4	vulkan : sync (llama/0) ggml-ci	2025-03-08 15:13:01 +02:00
mgroeber9110	96a92ecc4c	ggml : portability fixes for VS 2017 (llama/12150) * Add include files for std::min/max and std::toupper/tolower * win32: move _USE_MATH_DEFINES before includes to ensure M_PI is defined * Use GGML_RESTRICT instead of "restrict" keyword everywhere, and use "__restrict" in MSVC plain C mode * win32: only use __restrict in MSVC if C11/C17 support is not enabled --------- Co-authored-by: Marcus Groeber <Marcus.Groeber@cerence.com>	2025-03-08 15:13:01 +02:00
David Huang	edd1d8686a	HIP: implement FlashAttention via rocWMMA for CDNA and RDNA3+ (llama/12032) Adds GGML_HIP_ROCWMMA_FATTN and rocwmma header check Adds rocWMMA support to fattn-wmma-f16	2025-03-08 15:13:01 +02:00
ag2s20150909	dc6f4e7c05	ggml : fix kleidiai build (llama/12159) The libggml API has changed, but this has not been updated.	2025-03-08 15:13:01 +02:00
Akarshan Biswas	74c85d154e	SYCL: Move CPY kernels to a separate file and add few missing kernels (llama/12133) * SYCL: refactor and move cpy kernels to a separate file * Add few missing cpy kernels * refactor and add debug logs	2025-03-08 15:13:01 +02:00
Diego Devesa	eb2d8b6ffd	ggml-backend : keep paths in native string type when possible (llama/12144)	2025-03-08 15:13:01 +02:00
Erik Scholz	b442dcd598	CUDA: compress mode option and default to size (llama/12029) cuda 12.8 added the option to specify stronger compression for binaries, so we now default to "size".	2025-03-08 15:13:01 +02:00
William Tambellini	c98681e6d5	ggml : upgrade init_tensor API to return a ggml_status (llama/11854) * Upgrade init_tensor API to return a ggml_status To prepare for an 'abort-free' ggml (ggml not to abort on OOMs but return a OOM status), as agreeed with Diego in the ggml repo, upgrade the init_tensor() and view_init() APIs to return a ggml_status. * misc fixes --------- Co-authored-by: slaren <slarengh@gmail.com>	2025-03-08 15:13:01 +02:00
Rémy O	3bab804981	vulkan: add specific MMV kernels for IQ2 and IQ3 quants + optimizations (llama/11595) * vulkan: implement specialized MMV kernels for IQ2 quantizations * vulkan: add MMV kernels for IQ3 quants * vulkan: Increase MMV batch size and unroll IQ LUT setup * vulkan: fix init_iq_shmem for WG sizes larger than tables * vulkan: common batch size for all I-quants	2025-03-08 15:13:01 +02:00
Johannes Gäßler	c927830a70	CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ (llama/12098)	2025-03-08 15:13:01 +02:00
Prashant Vithule	992b51b3d5	ggml: aarch64: implement SVE kernels for q2_k_q8_k vector dot (llama/12064) * Added SVE Support for Q2_K Quantized Models * Use 4-space indentation in the switch cases * removed comments lines * Remove the loop Retain the curly bracess for better understanding of code * Remove the comment like added for q3_k_q8_k kernel --------- Co-authored-by: vithulep <p.m.vithule1517@gmail.com>	2025-03-08 15:13:01 +02:00
hipudding	2c882cbe4c	CANN: Fix build error with GCC 13 (llama/11990) Remove unused header file that causes compilation failure on ARM platform with GCC 13.	2025-03-08 15:13:01 +02:00
Eve	1fbb119b1e	vulkan: matmul dequantization improvements (llama/12015) * faster dequant for old quants * dont use unpack for iq4_nl * vec2 unpack for q8	2025-03-08 15:13:01 +02:00
Daniele	40dea850fd	vulkan: improve im2col (llama/11826) * vulkan: improve im2col performance	2025-03-08 15:13:01 +02:00
Vladimir Vuksanovic	8255a830a8	cmake: Fix ggml backend dependencies and installation (llama/11818) * Fix dependencies between ggml and backends ggml backends link only to ggml-base and ggml links to all backends. * Fix installation of ggml backends Set up GNUInstallDirs before setting the installation directory of ggml backends	2025-03-08 15:13:01 +02:00
Jeff Bolz	a0f76b2da7	vulkan: fix assertion when qy_needs_dequant (llama/12068) Looks like a copy/paste bug from qx_needs_dequant.	2025-03-08 15:13:01 +02:00
Molly Sophia	394768c48b	ggml-cpu: Fix build with sve (llama/12059) * ggml-cpu: Fix build with sve Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * ggml-cpu: Remove unused variable in sve q3_k vec dot Signed-off-by: Molly Sophia <mollysophia379@gmail.com> --------- Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2025-03-08 15:13:01 +02:00
cmdr2	846e01b2c0	cuda: unary ops as float + de-duplicate (ggml/1130)	2025-03-08 15:13:01 +02:00
cmdr2	6ac8e6b2ce	cuda/vulkan: specify fp32-only support for some operations in supports_op (ggml/1129) * cuda: restrict SILU_BACK to fp32, since fp16 exceeds the desired test threshold * vulkan: specify fp32-only support for certain ops (that are now tested for fp16 as well) * f32 sigmoid in vulkan supports op * Revert "f32 sigmoid in vulkan supports op" This reverts commit c6f04b3c19bf4504c2776149c6d8cd84e0b48acb.	2025-03-08 15:13:01 +02:00
cmdr2	60d2ddebdf	cuda/cpu: Increase support for fp16 unary operations (ggml/1125) * Support fp16 unary operations in the CUDA backend * cpu: increase fp16 support for unary operators in the CPU backend * cuda: increase fp16 support for unary operators in the CUDA backend * Add test cases for fp16 unary operators * metal: update supports_op for unary operators that don't support fp16, to prevent test-backend-ops from failing * metal: fix PR comments for unary op support after fp16 unary tests	2025-03-08 15:13:01 +02:00
petterreinholdtsen	2e180184a8	Told cmake to install ggml-cpp.h as a public header file. (ggml/1126) It is used by Whisper talk-llama example. Co-authored-by: Petter Reinholdtsen <pere@debian.org>	2025-03-08 15:13:01 +02:00
Ivy233	ef40950c4a	common : more general m_audio_len update logic (#2855 ) Co-authored-by: Ivy233 <wangjinrun@uniontech.com>	2025-03-07 10:10:03 +02:00
Ryan Johnson	c774eec709	go : improve model download (#2756 ) * Updated models download URL * Updated list of models available All of the high efficiency quantized models are rejected when trying to download. They exist on the server. Let's allow them. * added path prefix for whisper-cli in message to user. The message is misleading if this script is called from another script in a different folder. So the message has to be fixed. * undid download URL change I made earlier. Fixed filepath.Join(urlPath, model) bug. * Undid download URL change I made earlier. Seems that the old URL works but only when provided a model to download. Still doesn't explain why there's a different download URL that also works. Please elucidate in docs. * Fixed URLForModel Function's bug filepath.Join is designed for filesystem paths, and it uses backslashes (\) on Windows. URLs, however, require forward slashes (/), so the use of filepath.Join is inappropriate for constructing URLs. The fmt.Sprintf function ensures that forward slashes are used. * Fixed URL trailing / double slash bug Ensure no double slash by trimming trailing '/' from srcUrl if present * Fixed bad download URL, missing ggml prefix Not sure if that was a bug I introduced but it was trying to download without the prefix. * Added question before downloading all models. Added download size estimate HEAD Requests: Efficiently fetches file sizes without downloading the content. Interactive Workflow: Allows the user to make informed decisions about downloading all models. Safe Defaults: Aborts if the user does not explicitly confirm. * Fixed Unbuffered channel warning. warning in context.go : misuse of unbuffered os.Signal channel as argument to signal. The warning indicates that the unbuffered channel used in signal.Notify in context.go may be misused. In Go, unbuffered channels can cause potential deadlocks if signals are sent faster than they are received. * Fixed download size calculation, download URL prefix bug, added link to models URL for user. The URL formatter was prepending the model name to the formatted model name in the URL * Added logs and exes to gitignore * Delete bindings/go/examples/go-model-download/go-model-download.exe * Delete whisper_build.log	2025-03-07 10:03:51 +02:00
Dmitry Atamanov	5b481a27a6	common : fix audio loading by miniaudio (#2862 ) Some checks failed CI / ubuntu-22-clang (linux/amd64, Debug) (push) Has been cancelled Details CI / ubuntu-22-clang (linux/amd64, Release) (push) Has been cancelled Details CI / ubuntu-22-clang (linux/arm64, Debug) (push) Has been cancelled Details CI / ubuntu-22-clang (linux/arm64, Release) (push) Has been cancelled Details CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Has been cancelled Details CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Has been cancelled Details CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled Details CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled Details CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled Details CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled Details CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled Details CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled Details CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled Details CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled Details CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled Details CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled Details CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled Details CI / emscripten (Release) (push) Has been cancelled Details CI / ios-xcode-build (Release) (push) Has been cancelled Details CI / android (push) Has been cancelled Details CI / quantize (push) Has been cancelled Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled Details	2025-03-04 19:05:21 +02:00
Lin Xiaodong	fc7b1ee521	fix: missing include common-whisper (#2858 ) Some checks failed CI / ubuntu-22-clang (linux/arm64, Debug) (push) Has been cancelled Details CI / ubuntu-22-clang (linux/arm64, Release) (push) Has been cancelled Details CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Has been cancelled Details CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Has been cancelled Details CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled Details CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled Details CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled Details CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled Details CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled Details CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled Details CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled Details CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled Details CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled Details CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled Details CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled Details CI / emscripten (Release) (push) Has been cancelled Details CI / ios-xcode-build (Release) (push) Has been cancelled Details CI / android (push) Has been cancelled Details CI / quantize (push) Has been cancelled Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled Details Examples Tests / addon_node-ubuntu-22 (16.x) (push) Has been cancelled Details Examples Tests / addon_node-ubuntu-22 (18.x) (push) Has been cancelled Details	2025-03-02 20:55:11 +02:00
KITAITI Makoto	c42f67e2d2	ruby : follow audio library change (#2851 ) Some checks failed CI / ubuntu-22-clang (linux/amd64, Debug) (push) Has been cancelled Details CI / ubuntu-22-clang (linux/amd64, Release) (push) Has been cancelled Details CI / ubuntu-22-clang (linux/arm64, Debug) (push) Has been cancelled Details CI / ubuntu-22-clang (linux/arm64, Release) (push) Has been cancelled Details CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Has been cancelled Details CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Has been cancelled Details CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled Details CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled Details CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled Details CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled Details CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled Details CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled Details CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled Details CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled Details CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled Details CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled Details CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled Details CI / emscripten (Release) (push) Has been cancelled Details CI / ios-xcode-build (Release) (push) Has been cancelled Details CI / android (push) Has been cancelled Details CI / quantize (push) Has been cancelled Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled Details * Enable CPU * Follow audio lib change	2025-02-28 08:09:02 +02:00
Diego Devesa	339a1cba5d	whisper : support GGML_BACKEND_DL (#2843 ) Some checks are pending CI / ubuntu-22-clang (linux/amd64, Debug) (push) Waiting to run Details CI / ubuntu-22-clang (linux/amd64, Release) (push) Waiting to run Details CI / ubuntu-22-clang (linux/arm64, Debug) (push) Waiting to run Details CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run Details CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run Details CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run Details CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run Details CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run Details CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run Details CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run Details CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run Details CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run Details CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run Details CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run Details CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run Details CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run Details CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run Details CI / emscripten (Release) (push) Waiting to run Details CI / ios-xcode-build (Release) (push) Waiting to run Details CI / android (push) Waiting to run Details CI / quantize (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run Details * whisper : support GGML_BACKEND_DL * fix DTW crash * whisper.objc : fix build - add ggml-cpp.h --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-02-27 13:35:07 +01:00
Georgi Gerganov	c64f3e8ada	common : separate whisper sources (#2846 ) Some checks failed CI / ubuntu-22-clang (linux/arm64, Debug) (push) Waiting to run Details CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run Details CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run Details CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run Details CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run Details CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run Details CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run Details CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run Details CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run Details CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run Details CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run Details CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run Details CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run Details CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run Details CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run Details CI / emscripten (Release) (push) Waiting to run Details CI / ios-xcode-build (Release) (push) Waiting to run Details CI / android (push) Waiting to run Details CI / quantize (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run Details Examples Tests / addon_node-ubuntu-22 (16.x) (push) Has been cancelled Details Examples Tests / addon_node-ubuntu-22 (18.x) (push) Has been cancelled Details * common : separate whisper sources * examples : add chrono * examples : add more headers	2025-02-27 12:50:32 +02:00
Georgi Gerganov	9f83f67221	common : fix build min/max (#2845 ) * common : try to fix build * cont : try another fix	2025-02-27 10:39:13 +02:00
Dmitry Atamanov	7d3da68f79	examples : use miniaudio for direct decoding flac, mp3, ogg and wav (#2759 )	2025-02-27 09:06:54 +02:00
petterreinholdtsen	b5d21359c1	stream : stop on ^C when no audio is received (#2822 ) Add check for ctrl-c in potentially endless loop while calling audio.get() to receive sound. Co-authored-by: Petter Reinholdtsen <pere@debian.org>	2025-02-27 08:59:51 +02:00
Georgi Gerganov	17addf7104	sync : ggml	2025-02-27 08:55:36 +02:00
cmdr2	cdaee8b4bd	Support pure float16 add/sub/mul/div operations in the CUDA (and CPU) backend (ggml/1121) * Support float16-to-float16 add/sub/mul/div operations in the CUDA backend * Add fp16 support for add/sub/mul/div on the CPU backend * Add test cases for fp16 add/sub/mul/div	2025-02-27 08:55:36 +02:00
Gian-Carlo Pascutto	4b60ff4f92	metal : copy kernels for quant to F32/F16 conversions (llama/12017) metal: use dequantize_q templates --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-02-27 08:55:36 +02:00
lhez	b43b9d928c	opencl: fix for small models (llama/11950) * opencl: fix small shape gemv, remove unused extensions * opencl: fix `transpose_16`, `dump_tensor`, enforce subgroup size * opencl: fix for token length < 4 * opencl: use wave size of 64 for all Adreno GPUs --------- Co-authored-by: Shawn Gu <quic_shawngu@quicinc.com> Co-authored-by: Skyler Szot <quic_sszot@quicinc.com>	2025-02-27 08:55:36 +02:00
Neo Zhang Jianyu	e3cb412a59	Optimize mul_mat for Q4_0 on Intel GPU (llama/12035) * opt performance by reorder for Intel GPU * detect hw type and save opt feature, and print opt feature * correct name * support optimize graph once when compute graph, record the opt status in tensor->extra, make CI passed * add env variable GGML_SYCL_DISABLE_OPT for debug * use syclex::architecture replace the custom hw define, update the guide for GGML_SYCL_DISABLE_OPT * add performance data * mv getrows functions to separeted files * fix global variables --------- Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>	2025-02-27 08:55:36 +02:00
Akarshan Biswas	ac301a7d9b	SYCL: Fix GGML_SYCL_DEBUG macro (llama/11995)	2025-02-27 08:55:36 +02:00
Aaron Teo	82e04e7670	ggml-cpu: Support s390x SIMD Instruction Set (llama/12019) * ggml: add s390x ARCH_FLAGS for compilation Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: add SIMD for s390x using vector intrinsics SIMD is activated for: * ggml_vec_dot_f32 * ggml_vec_dot_f16 * ggml_vec_mad_f32 * ggml_vec_mad_f16 * ggml_vec_mad_f32_unroll * ggml_vec_scale_f32 * ggml_vec_scale_f16 SIMD is NOT activated for: * ggml_vec_dot_f16_unroll (pending bugfix) Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: fix missing escape character in GGML_F32x4_REDUCE Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: add temporary patch for GGML_F32_ARR and GGML_F16_ARR Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: fix s390x GGML_F32x4_REDUCE Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: full SIMD activation for F32,F16 s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: add option to disable s390x VXE/VXE2 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: change vecintrin.h include to ggml-cpu-impl * add __VXE__ and __VXE2__ macros Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * cmake: add s390x target detection for VX/VXE/VXE2 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: move s390x vector intrinsics to ggml-cpu-impl.h Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x Q8_0 SIMD Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: correct documentation for Q8_0 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x reduce code complexity Q8_0 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x bugfix typo Q8_0 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x SIMD activated for Q4_1 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x inline vec_reve Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x SIMD activation for Q4_0 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: add VXE backend feature Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: remove test.py Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x SIMD activation for quantize_row_q8_0 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x SIMD activation for quantize_row_q8_1 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x SIMD activation for iq4_xs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: bugfix iq4_xs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x SIMD activation for iq4_nl Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: add float, double, and long vector data type Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: clean up iq4_xs SIMD Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: fix improper use of restrict keyword Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: update warning message for ggml_vec_tbl Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: untested implementation of ggml_vec_dot_iq2_xxs_q8_K Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: update ggml_vec_dot_q4_1_q8_1 to use typedefs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: switch to restrict for iq4_nl Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: slight dot product speed improvement for q4_1_q8_1 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x SIMD activation for q6_K Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: add missing `_t` to ggml_int8x16x4_t Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: fix missing `_t` for ggml_vec_xl_s8x4 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: fix more missing `_t` Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: add unroll and prefetch to Q8_0 increase of 3.86% for prompt processing and 32.22% for token generation Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: patch Q8_0 to use proper vector sizes Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: optimise Q8_0 dot prod compute kernel further Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: add unroll and prefetch to Q4_1 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: refactor Q6_K variable naming for readability Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: fix Q6_K typos Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x SIMD activation for Q5_K Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: fix wrong charx16_t naming Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> ggml: Q5_K y0 wrong signness Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: fix Q5_K invalid uchar type Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: fix Q5_K invalid uchar type Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x SIMD activation for Q4_K Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: fix Q4_K invalid vector intrinsics Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: simplify ggml_padd_s16 compute kernel Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: correct ggml-cpu vxe wording Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: change ggml_aligned_malloc alignment to 256 256 is the cache line size for s390x platforms Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: resolve pr merge via cherry-pick 225bbbf Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml : fix LoongArch compile error with 128-bit SIMD (llama/11701) * ggml: resolve pr merge via cherry-pick 4571953 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: cmake remove fork when determining s390x machine type thank you @ericcurtin Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> Co-authored-by: Jinyang He <hejinyang@loongson.cn> Co-authored-by: junchao-zhao <68935141+junchao-loongson@users.noreply.github.com>	2025-02-27 08:55:36 +02:00
Johannes Gäßler	38ac47cd4d	CUDA: app option to compile without FlashAttention (llama/12025)	2025-02-27 08:55:36 +02:00
Johannes Gäßler	2d70cd36d7	CUDA: optimize FA for GQA + large batches (llama/12014)	2025-02-27 08:55:36 +02:00
Gian-Carlo Pascutto	98dab49b9a	cuda: Add Q5_1, Q5_0, Q4_1 and Q4_0 to F32 conversion support. (llama/12000)	2025-02-27 08:55:36 +02:00
PureJourney	b1385e9aa9	CUDA: correct the lowest Maxwell supported by CUDA 12 (llama/11984) * CUDA: correct the lowest Maxwell supported by CUDA 12 --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-02-27 08:55:36 +02:00

1 2 3 4 5 ...

2230 Commits