whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-04-17 15:59:21 +00:00

Author	SHA1	Message	Date
Daniel Bevenius	e153b8eaa2	android.java : re-add ggml source updates (#2975 ) This commit updates the ggml source to include the new unary and binary operations. I merged https://github.com/ggerganov/whisper.cpp/pull/2958 which seems to have overwritten the changes to the ggml source which were added in https://github.com/ggerganov/whisper.cpp/pull/2972. Sorry about this. b2365	2025-03-31 16:14:33 +02:00
Daniel Bevenius	83af237f0b	ci : re-enable freeBDS-latest job (#2973 ) This commit re-enables the freeBSD-latest job which has been commented out. Refs: https://github.com/ggerganov/whisper.cpp/issues/2781	2025-03-31 15:24:08 +02:00
Daniel Bevenius	7a2e39750a	ci : re-enable android_java job (#2958 ) This commit re-enables the android_java job in the CI workflow. The job was disabled because of a failing build. The motivation for this is that Commit 226d344f565ea6140e7c6a583bc300a64454af58 ("whisper.android.java : update build with ggml source changes") addressed build issues and it should now be possible to re-enable this job.	2025-03-31 15:14:24 +02:00
Georgi Gerganov	0a40ae9728	android : add new ggml source files ggml-ci	2025-03-31 14:56:53 +03:00
Georgi Gerganov	32cfdcbf42	ruby : add new ggml sources ggml-ci	2025-03-31 14:56:53 +03:00
Georgi Gerganov	cfa42aca09	sync : ggml ggml-ci	2025-03-31 14:56:53 +03:00
Akarshan Biswas	2e2f0f954b	SYCL: Remove misleading ggml_sycl_op_flatten function (llama/12387) * SYCL: Remove misleading ggml_sycl_op_flatten function * remove trailing whitespace * Fix L2 norm from rebase * remove try catch block from element_wise.cpp * remove comment from common.hp * ggml-sycl.cpp: Add try catch sycl::exception block in compute_forward * norm.cpp: remove try catch exception block	2025-03-31 14:56:53 +03:00
Georgi Gerganov	93631b2be6	metal : use constexpr in FA kernels + fix typedef (llama/12659) * metal : use constexpr in FA kernels ggml-ci * cont ggml-ci * cont : fix typedef ggml-ci	2025-03-31 14:56:53 +03:00
R0CKSTAR	f9015b585b	musa: fix all warnings, re-enable `-DLLAMA_FATAL_WARNINGS=ON` in ci and update doc (llama/12611) * musa: fix all warnings Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * musa: enable -DLLAMA_FATAL_WARNINGS=ON in run.sh Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * musa: update ci doc (install ccache) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * fix Windows build issue Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Address review comments Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Address review comments Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-03-31 14:56:53 +03:00
Jay	1880ffd7ff	cmake : fix ccache conflict (llama/12522) If users already set CMAKE_C_COMPILER_LAUNCHER globally, setting it in cmake again will lead to conflict and compile fail. Signed-off-by: Jay <BusyJay@users.noreply.github.com>	2025-03-31 14:56:53 +03:00
Xuan-Son Nguyen	9173932c78	cpu : rm unused variable (ggml/1166)	2025-03-31 14:56:53 +03:00
cmdr2	94c3f3877f	cpu: de-duplicate some of the operators and refactor (ggml/1144) * cpu: de-duplicate some of the operators and refactor * Fix PR comments * Fix PR comments	2025-03-31 14:56:53 +03:00
Sandro Hanea	00086469fb	cmake: improve Vulkan cooperative matrix support checks (#2966 ) Co-authored-by: Sandro Hanea <me@sandro.rocks>	2025-03-31 13:44:36 +03:00
Daniel Bevenius	2d8e40e2a0	examples : update README links to point to pages deployment (#2971 ) This commit updates the README links to point to the pages deployment instead of whisper.ggerganov.com.	2025-03-31 12:32:27 +02:00
Daniel Bevenius	e17af6524f	ci : add github pages workflow for wasm examples (#2969 ) * ci : add github pages workflow for wasm examples This commit adds a github workflow to build and deploy the wasm examples to github pages. The whisper.wasm example is deployed as the main page. This workflow is trigged by a push to master and will deploy the examples to: https://ggerganov.github.io/whisper.cpp/. This requires that the repository has enabled github actions in `Settings` -> `Pages` -> `Build and deployment` -> `Source` be set to `GitHub Actions`. One thing to note is that this commit removes the `talk` example as I'm not sure how this example is built yet. Refs: https://github.com/ggerganov/whisper.cpp/issues/2784	2025-03-31 11:34:40 +02:00
Sacha Arbonel	88d13a17a7	feat: add health check endpoint to server (#2968 ) Some checks failed CI / ubuntu-22-clang (linux/amd64, Release) (push) Has been cancelled Details CI / ubuntu-22-clang (linux/arm64, Debug) (push) Has been cancelled Details CI / ubuntu-22-clang (linux/arm64, Release) (push) Has been cancelled Details CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Has been cancelled Details CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Has been cancelled Details CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled Details CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled Details CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled Details CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled Details CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled Details CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled Details CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled Details CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled Details CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled Details CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled Details CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled Details CI / emscripten (Release) (push) Has been cancelled Details CI / android (push) Has been cancelled Details CI / quantize (push) Has been cancelled Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled Details CI / ios-xcode-build (Release) (push) Has been cancelled Details CI / release (push) Has been cancelled Details	2025-03-31 11:03:41 +03:00
Daniel Bevenius	f92bd59951	whisper : remove unnecessary GGML_UNUSED macro (#2960 ) Some checks failed CI / ubuntu-22-clang (linux/arm64, Debug) (push) Waiting to run Details CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run Details CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run Details CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run Details CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run Details CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run Details CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run Details CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run Details CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run Details CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run Details CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run Details CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run Details CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run Details CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run Details CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run Details CI / emscripten (Release) (push) Waiting to run Details CI / ios-xcode-build (Release) (push) Blocked by required conditions Details CI / android (push) Waiting to run Details CI / quantize (push) Waiting to run Details CI / release (push) Blocked by required conditions Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run Details Bindings Tests (Ruby) / ubuntu-22 (push) Has been cancelled Details	2025-03-30 05:56:10 +02:00
Georgi Gerganov	6e7629b146	sync : ggml Some checks failed CI / ubuntu-22-clang (linux/amd64, Release) (push) Has been cancelled Details CI / ubuntu-22-clang (linux/arm64, Debug) (push) Has been cancelled Details CI / ubuntu-22-clang (linux/arm64, Release) (push) Has been cancelled Details CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Has been cancelled Details CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Has been cancelled Details CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled Details CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled Details CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled Details CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled Details CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled Details CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled Details CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled Details CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled Details CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled Details CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled Details CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled Details CI / emscripten (Release) (push) Has been cancelled Details CI / android (push) Has been cancelled Details CI / quantize (push) Has been cancelled Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled Details CI / ios-xcode-build (Release) (push) Has been cancelled Details CI / release (push) Has been cancelled Details ggml-ci	2025-03-28 21:47:42 +02:00
Georgi Gerganov	27533e7f63	metal : improve FA + improve MoE (llama/12612) * ggml : FA with different K, V head sizes (CPU) ggml-ci * metal : add FA with HS=192 * metal : extend FA to support different K and V head sizes ggml-ci * metal : add FA vector kernels for heads K 192 and V 128 ggml-ci * ggml : restrict op on other backends to equal head sizes ggml-ci * metal : optimize FA-vec kernel ggml-ci * metal : FA remove mq registers * metal : improve MoE mul_mat_id condition ggml-ci * metal : fix comments + remove unnecessary addition ggml-ci * metal : avoid too much shared memory usage with mul_mat_id ggml-ci	2025-03-28 21:47:42 +02:00
Icenowy Zheng	1b81415963	vulkan: fix coopmat shader generation when cross-compiling (llama/12272) * vulkan: fix coopmat shader generation when cross-compiling Previously the status of coopmat{,2} support isn't passed to the vulkan-shaders-gen project building on the host, which leads to build failure because of the cross-compiling code expecting coopmat{,2} shaders that didn't get generated. Fix this by passing the coopmat{,2} support status to vulkan-shaders subproject. Signed-off-by: Icenowy Zheng <uwu@icenowy.me> * Only call coop-mat shaders once * Fix whitespace --------- Signed-off-by: Icenowy Zheng <uwu@icenowy.me> Co-authored-by: bandoti <141645996+bandoti@users.noreply.github.com>	2025-03-28 21:47:42 +02:00
amritahs-ibm	0001ec075f	llamafile : ppc64le GEMV forwarding for FP32. (llama/12594) This patch enables usage of MMA when one of the dimensions of the matrix(ie either M or N) is 1. This is useful in case of token generation where N < 2. The concept of 'GEMV Forwarding' is used where when one of the matrix has a single row/column, the elements are broadcasted, instead of using packing routine to prepack the matrix elements. This change results in 5% - 15% improvement in total speed(ie all tokens/total time), across various batch sizes. This is in comparision with the corresponding dot product implementation. The patch is tested with FP32 models of Meta-Lllama-3-8B, Mistral-7B, Llama-2-7B-chat-hf on a IBM POWER10 machine. Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>	2025-03-28 21:47:42 +02:00
Radoslav Gerganov	5bad2e5099	rpc : send hash when tensor data is above some fixed threshold (llama/12496) * rpc : send hash when tensor data is above some fixed threshold ref #10095 * rpc : put cache under $HOME/.cache/llama.cpp * try to fix win32 build * another try to fix win32 build * remove llama as dependency	2025-03-28 21:47:42 +02:00
lhez	6fc0ae2f5a	opencl: add multi and vision rope, `gelu_quick` and `im2col` (llama/12600) * opencl: add `im2col` * opencl: add `gelu_quick` * opencl: add mrope * opencl: add vision rope	2025-03-28 21:47:42 +02:00
Amanda Der Bedrosian	de6b38c6d9	bindings.go : add DetectedLanguage to go bindings (#2947 ) Some checks failed CI / ubuntu-22-clang (linux/arm64, Debug) (push) Waiting to run Details CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run Details CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run Details CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run Details CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run Details CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run Details CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run Details CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run Details CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run Details CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run Details CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run Details CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run Details CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run Details CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run Details CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run Details CI / emscripten (Release) (push) Waiting to run Details CI / ios-xcode-build (Release) (push) Blocked by required conditions Details CI / android (push) Waiting to run Details CI / quantize (push) Waiting to run Details CI / release (push) Blocked by required conditions Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run Details Bindings Tests (Go) / ubuntu-22 (push) Has been cancelled Details Adding in DetectedLanguage(), a function to retrieve the detected language that's populated by processing audio. Also adding in a unit test to test the success. Co-authored-by: Amanda Der Bedrosian <aderbedrosian@sdl.com>	2025-03-28 12:26:22 +01:00
Daniel Bevenius	46d6e0abc1	ruby : fix test failures in test_whisper (#2955 ) * bindings.ruby : fix test failures in test_whisper This commit updates the parallel tests to use 2 processors instead of the number of processors on the system. It also comments out the setting of the log callback to an empty lambda as this causes a segfault when enabled. The motivation for the change to the number of processors is that if one has a large number of processors, for example I have 16 on the machine I used to test this, this would cause the following warning to be printed: ```console whisper_full_with_state: input is too short - 680 ms < 1000 ms. consider padding the input audio with silence ``` This is logged from: ```c++ int whisper_full_with_state( struct whisper_context * ctx, struct whisper_state * state, struct whisper_full_params params, const float * samples, int n_samples) { ... if (seek_end < seek_start + 100) { WHISPER_LOG_WARN("%s: input is too short - %d ms < 1000 ms. consider padding the input audio with silence\n", __func__, (seek_end - seek_start)10); return 0; } ``` This will return early and there will be segment callbacks to be invoked which in turn will cause the tests to fail. bindings.ruby : fix warnings in tests This commit fixes the following warnings in the Ruby tests: ```console /whisper/bindings/ruby/tests/test_segment.rb:52: warning: ambiguity between regexp and two divisions: wrap regexp in parentheses or add a space after `/' operator ``` And also adds a '_' prefix to some unused variables to avoid warnings. * bindings.ruby : enable Wisper.log_set in tests The commit reverts the commenting out of the Whisper.log_set call in the test_whisper.rb tests. I'm no longer getting segfaults when running the tests with this which was the case earlier. One theory could be that I rebased this to include the latest ggml sync to master to make sure things still worked. With the latest changes in ggml, I can't reproduce the segfaults.	2025-03-28 17:29:56 +09:00
Lin Xiaodong	1279f0d0bc	examples : support progress_callback API for addon.node (#2941 ) Some checks failed CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run Details CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run Details CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run Details CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run Details CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run Details CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run Details CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run Details CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run Details CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run Details CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run Details CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run Details CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run Details CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run Details CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run Details CI / emscripten (Release) (push) Waiting to run Details CI / ios-xcode-build (Release) (push) Blocked by required conditions Details CI / android (push) Waiting to run Details CI / quantize (push) Waiting to run Details CI / release (push) Blocked by required conditions Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run Details Examples Tests / addon_node-ubuntu-22 (16.x) (push) Has been cancelled Details Examples Tests / addon_node-ubuntu-22 (18.x) (push) Has been cancelled Details * feat: progress supported * fix: missing params * style: Format the code to improve readability Unified code indentation ensures consistent coding style, enhancing code readability and maintainability. * feat: support prompt api --------- Co-authored-by: linxiaodong <calm.lin@wukongsch.com>	2025-03-28 06:34:26 +01:00
Georgi Gerganov	f28bf5d186	xcf : fix visionOS build Some checks are pending CI / ubuntu-22-clang (linux/amd64, Release) (push) Waiting to run Details CI / ubuntu-22-clang (linux/arm64, Debug) (push) Waiting to run Details CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run Details CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run Details CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run Details CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run Details CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run Details CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run Details CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run Details CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run Details CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run Details CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run Details CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run Details CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run Details CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run Details CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run Details CI / emscripten (Release) (push) Waiting to run Details CI / ios-xcode-build (Release) (push) Blocked by required conditions Details CI / android (push) Waiting to run Details CI / quantize (push) Waiting to run Details CI / release (push) Blocked by required conditions Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run Details ref: https://github.com/ggml-org/llama.cpp/pull/12415 ggml-ci	2025-03-27 11:06:03 +02:00
Georgi Gerganov	1fbdfb1d36	files : remove old wkv6 (#0 ) ggml-ci	2025-03-27 11:06:03 +02:00
Georgi Gerganov	ee5581633b	sync : ggml ggml-ci	2025-03-27 11:06:03 +02:00
Georgi Gerganov	8ca67df291	ggml : sync/merge cmake,riscv,powerpc, add common.cmake (ggml/0)	2025-03-27 11:06:03 +02:00
amritahs-ibm	fc6d343e76	llamafile : ppc64le MMA implementation for Q4_0. (llama/12489) This change upstreams llamafile's cpu matrix multiplication kernels for ppc64le ISA using MMA builtins. This patch handles matrix multiplication between quantised datatypes, block_q4_0 and block_q8_0. This change results in 5% - 50% improvement in total speed(ie all tokens/total time), across various batch sizes. The patch is tested with Meta-Lllama-3-8B, Mistral-7B, Llama-2-7B-chat-hf models on a IBM POWER10 machine. Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>	2025-03-27 11:06:03 +02:00
Akarshan Biswas	3199356d3a	SYCL: implement memset ggml backend buffer interface (llama/12580) * SYCL: implement memset ggml backend buffer interface * use GGML_ABORT macro * Do not wait for all queues to finish for memset operation	2025-03-27 11:06:03 +02:00
Slobodan Josic	e0c43b0bbf	HIP: Add support for RDNA4 targets (llama/12372)	2025-03-27 11:06:03 +02:00
Georgi Gerganov	f4f619ea8e	metal : refactor mat-vec code (llama/12569) * metal : refactor mat-vec code ggml-ci * metal : rename all_sum -> sum_all ggml-ci * metal : fix comments [no ci] * metal : fix nr constant [no ci] * metal : mv q6_K support nr0 > 1 ggml-ci * metal : reduce register pressure ggml-ci * metal : fix typo [no ci] * metal : reduce register pressure ggml-ci	2025-03-27 11:06:03 +02:00
Georgi Gerganov	3c4d363872	ggml : fix MUL_MAT_ID repack with Q8_K (llama/12544) * ggml : fix MUL_MAT_ID repack with Q8_K ggml-ci * ggml : improve repack templates ggml-ci	2025-03-27 11:06:03 +02:00
Dan Johansson	15aa189329	ggml-cpu : update KleidiAI to v1.5.0 (llama/12568) ggml-cpu : bug fix related to KleidiAI LHS packing Signed-off-by: Dan Johansson <dan.johansson@arm.com>	2025-03-27 11:06:03 +02:00
Akarshan Biswas	c53d5c9e85	SYCL: disable Q4_0 reorder optimization (llama/12560) ggml-ci	2025-03-27 11:06:03 +02:00
lhez	ba6f584f30	opencl: simplify kernel embedding logic in cmakefile (llama/12503) Co-authored-by: Max Krasnyansky <quic_maxk@quicinc.com>	2025-03-27 11:06:03 +02:00
R0CKSTAR	a219941812	CUDA: Fix clang warnings (llama/12540) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-03-27 11:06:03 +02:00
Jeff Bolz	a2cc8c2666	vulkan: fix mul_mat_vec failure in backend tests (llama/12529) The OOB calculation could be wrong if the last iteration was during one of the unrolled loops. Adjust the unrolling counts to avoid this. Add a couple new backend tests that hit this failure on NVIDIA GPUs.	2025-03-27 11:06:03 +02:00
Georgi Gerganov	388ed98220	ggml : fix quantized cpy op (llama/12310) * ggml : fix quantized cpy op ggml-ci * tests : add cpy tests for all types ggml-ci * tests : add BF16 copy tests ggml-ci * tests : fix loop for same-type copy ggml-ci * tests : add option to permute the dst tensor ggml-ci	2025-03-27 11:06:03 +02:00
R0CKSTAR	d487a28ae1	musa: refine compute capability (llama/12493) * musa: refine compute capability Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Address review comments Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-03-27 11:06:03 +02:00
Jeff Bolz	cbb88c4050	vulkan: Optimize mul_mat_vec p021 and nc shaders (llama/12505) * tests: add mul_mat perf/functional tests for p021/nc vulkan shaders * vulkan: Optimize mul_mat_vec p021 and nc shaders. These shaders are used in attention calculations, and when the KV cache grows large they start to dominate the run time. For the nc shader (which is called with large 'k' dimension), use unrolling and vector loads. For the p021 shader (which is called with large 'm' and small 'k' dimensions), take advantage of grouped query attention to reuse loads from the A matrix for the whole group, and reduce the number of workgroups (too much overhead from tiny dispatches). Using subgroupAdd in the p021 shader also helps, use that conditionally.	2025-03-27 11:06:03 +02:00
stduhpf	13455c0b5f	Vulkan: RTE rounding for cpy to quant (llama/12480) * Vulkan: RTE rounding for cpy to quant Co-Authored-By: Jeff Bolz <jbolz@nvidia.com> * remove trailing whitespace * avoid duplicating pipeline_cpy_f32_quant * fix copypasting issue * remove duplicated code --------- Co-authored-by: Jeff Bolz <jbolz@nvidia.com>	2025-03-27 11:06:03 +02:00
Eve	2f77a9e9bd	vulkan: workaround for AMD Windows driver 16 bit unpack8 bug (llama/12472)	2025-03-27 11:06:03 +02:00
蕭澧邦	fa2b5249ff	Fix build on Windows when ccache enabled (ggml/9954) (llama/9976) * [SYCL] Fix build on Windows when ccache enabled (llama/9954) * take effect only on windows and force it to icl --------- Co-authored-by: Romain Biessy <romain.biessy@codeplay.com>	2025-03-27 11:06:03 +02:00
Svetlozar Georgiev	5b854ebba5	sycl: cleanup oneDNN related code (llama/12097)	2025-03-27 11:06:03 +02:00
Srihari-mcw	8058f19d0b	ggml : block interleaving support for Q4_K quantization for x86 AVX2 architecture (llama/12332) * Add block interleaving support for Q4_K quantization * Remove whitespaces and fix CI/CD issues * Update pointer of bsums from int16_t to const int16_t * Add vector version of quantize_q8_K_4x8 function * Update code formatting based on review comments	2025-03-27 11:06:03 +02:00
Gaurav Garg	ae6a9bb9a5	CUDA: Improve flash decoding kernel GPU occupancy for BS=1 case (llama/12183) - Find out active blocks per SM using cudaOccupancyMaxActiveBlocksPerMultiprocessor API. Use this value to determine the optimal parallel_blocks value. - Prefer vector flash attention kernels over MMA kernel for BS=1 Fixes Issue: #12182 --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-03-27 11:06:03 +02:00
Jeff Bolz	24faba9e9b	vulkan: optimize iq1 coopmat2 dequant functions (llama/12427)	2025-03-27 11:06:03 +02:00

1 2 3 4 5 ...

2365 Commits