whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-05-31 06:20:58 +00:00

Author	SHA1	Message	Date
Daniel Bevenius	9fb9025a40	ggml : skip intermediate .air file when compiling .metallib (llama/12247) This commit updates the compilation of default.metallib to skip the intermediate .air (Apple Intermediate Representation) file. The motivation for this change is to simplify the custom command a little and avoid generating and then removing the .air file.	2025-03-27 11:06:03 +02:00
Christian Kastner	3c2abb01e8	cmake: Enable specifying exact PowerPC CPU architecture (ggml/1138) In the process, guard automatic CPU detection with GGML_NATIVE. https://gcc.gnu.org/onlinedocs/gcc/RS_002f6000-and-PowerPC-Options.html#index-mcpu-10	2025-03-27 11:06:03 +02:00
Christian Kastner	efd9407e22	cmake: Comment out GGML_BIN_DIR for now (ggml/1139) Nothing installs to it yet, so when attempting to use the cmake package, set_and_check() triggers an error if the directory doesn't already exist for other reasons.	2025-03-27 11:06:03 +02:00
Daniel Bevenius	b82ac32a6c	ggml : add logging for native build options/vars (#2935 ) Some checks failed CI / ubuntu-22-clang (linux/arm64, Debug) (push) Waiting to run Details CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run Details CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run Details CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run Details CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run Details CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run Details CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run Details CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run Details CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run Details CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run Details CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run Details CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run Details CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run Details CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run Details CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run Details CI / emscripten (Release) (push) Waiting to run Details CI / ios-xcode-build (Release) (push) Blocked by required conditions Details CI / android (push) Waiting to run Details CI / quantize (push) Waiting to run Details CI / release (push) Blocked by required conditions Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run Details Bindings Tests (Ruby) / ubuntu-22 (push) Has been cancelled Details This commit adds debug level logging for the native build options and variables to ggml/CMakeLists.txt. The motivation for this is that it can be useful to see the effective result of `GGML_NATIVE`, `GGML_NATIVE_DEFAULT`, and `INS_ENB` for a cmake build. I've found myself adding similar logging a few times now, so I thought it might be a good idea to add this. Example output, specifying `-DCMAKE_MESSAGE_LOG_LEVEL=DEBUG` when running cmake produces the following output: ```console -- GGML_NATIVE : OFF -- GGML_NATIVE_DEFAULT : OFF -- INS_ENB : OFF ```	2025-03-24 09:53:38 +01:00
Daniel Bevenius	6e8242f7fe	examples : command.wasm updates (#2904 ) This commit updates the command.wasm example by adding a server.py script to make it easy to start a local http server to try out the example, updates the build instructions, and also addresses some of the compiler warnings that were being generated. * emscripten : fix TOTAL_STACK for wasm This commit moves the TOTAL_STACK setting from the compile flags to the linker flags. This is because the TOTAL_STACK setting is a linker setting. The motivation for this change is that currently the following warnings are generated when building: ```console em++: warning: linker setting ignored during compilation: 'TOTAL_STACK' [-Wunused-command-line-argument] em++: warning: linker setting ignored during compilation: 'TOTAL_STACK' [-Wunused-command-line-argument] em++: warning: linker setting ignored during compilation: 'TOTAL_STACK' [-Wunused-command-line-argument] em++: warning: linker setting ignored during compilation: 'TOTAL_STACK' [-Wunused-command-line-argument] em++: warning: linker setting ignored during compilation: 'TOTAL_STACK' [-Wunused-command-line-argument] em++: warning: linker setting ignored during compilation: 'TOTAL_STACK' [-Wunused-command-line-argument] ``` * examples : suppress C++17 deprecation warning for std::codecvt_utf8 This commit suppresses the C++17 deprecation warning for std::codecvt_utf8 similar to what is done in examples/talk-llama/unicode.cpp. The motivation for this change is to suppress these warnings: ```console /Users/danbev/work/ai/whisper-work/examples/common.cpp:251:31: warning: 'codecvt_utf8<wchar_t>' is deprecated [-Wdeprecated-declarations] 251 \| std::wstring_convert<std::codecvt_utf8<wchar_t>> converter; \| ^ /Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/codecvt:193:28: note: 'codecvt_utf8<wchar_t>' has been explicitly marked deprecated here 193 \| class _LIBCPP_TEMPLATE_VIS _LIBCPP_DEPRECATED_IN_CXX17 codecvt_utf8 : public __codecvt_utf8<_Elem> { \| ^ /Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/__config:723:41: note: expanded from macro '_LIBCPP_DEPRECATED_IN_CXX17' 723 \| # define _LIBCPP_DEPRECATED_IN_CXX17 _LIBCPP_DEPRECATED \| ^ /Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/__config:688:49: note: expanded from macro '_LIBCPP_DEPRECATED' 688 \| # define _LIBCPP_DEPRECATED __attribute__((__deprecated__)) \| ^ /Users/danbev/work/ai/whisper-work/examples/common.cpp:251:10: warning: 'wstring_convert<std::codecvt_utf8<wchar_t>>' is deprecated [-Wdeprecated-declarations] 251 \| std::wstring_convert<std::codecvt_utf8<wchar_t>> converter; \| ^ /Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/locale:3145:28: note: 'wstring_convert<std::codecvt_utf8<wchar_t>>' has been explicitly marked deprecated here 3145 \| class _LIBCPP_TEMPLATE_VIS _LIBCPP_DEPRECATED_IN_CXX17 wstring_convert { \| ^ /Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/__config:723:41: note: expanded from macro '_LIBCPP_DEPRECATED_IN_CXX17' 723 \| # define _LIBCPP_DEPRECATED_IN_CXX17 _LIBCPP_DEPRECATED \| ^ /Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/__config:688:49: note: expanded from macro '_LIBCPP_DEPRECATED' 688 \| # define _LIBCPP_DEPRECATED __attribute__((__deprecated__)) \| ^ /Users/danbev/work/ai/whisper-work/examples/common.cpp:257:31: warning: 'codecvt_utf8<wchar_t>' is deprecated [-Wdeprecated-declarations] 257 \| std::wstring_convert<std::codecvt_utf8<wchar_t>> converter; \| ^ /Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/codecvt:193:28: note: 'codecvt_utf8<wchar_t>' has been explicitly marked deprecated here 193 \| class _LIBCPP_TEMPLATE_VIS _LIBCPP_DEPRECATED_IN_CXX17 codecvt_utf8 : public __codecvt_utf8<_Elem> { \| ^ /Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/__config:723:41: note: expanded from macro '_LIBCPP_DEPRECATED_IN_CXX17' 723 \| # define _LIBCPP_DEPRECATED_IN_CXX17 _LIBCPP_DEPRECATED \| ^ /Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/__config:688:49: note: expanded from macro '_LIBCPP_DEPRECATED' 688 \| # define _LIBCPP_DEPRECATED __attribute__((__deprecated__)) \| ^ /Users/danbev/work/ai/whisper-work/examples/common.cpp:257:10: warning: 'wstring_convert<std::codecvt_utf8<wchar_t>>' is deprecated [-Wdeprecated-declarations] 257 \| std::wstring_convert<std::codecvt_utf8<wchar_t>> converter; \| ^ /Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/locale:3145:28: note: 'wstring_convert<std::codecvt_utf8<wchar_t>>' has been explicitly marked deprecated here 3145 \| class _LIBCPP_TEMPLATE_VIS _LIBCPP_DEPRECATED_IN_CXX17 wstring_convert { \| ^ /Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/__config:723:41: note: expanded from macro '_LIBCPP_DEPRECATED_IN_CXX17' 723 \| # define _LIBCPP_DEPRECATED_IN_CXX17 _LIBCPP_DEPRECATED \| ^ /Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/__config:688:49: note: expanded from macro '_LIBCPP_DEPRECATED' 688 \| # define _LIBCPP_DEPRECATED __attribute__((__deprecated__)) \| ^ 4 warnings generated. ``` * ggml : suppress double-promotion warning in GGML_F16x4_REDUCE This commit adds a cast to `ggml_float` in the `GGML_F16x4_REDUCE` macro to suppress a double-promotion warning. Currently the following warning is generated when compiling the command.wasm example: ```console /whisper-work/ggml/src/ggml-cpu/ggml-cpu.c:1592:5: warning: implicit conversion increases floating-point precision: 'float' to 'ggml_float' (aka 'double') [-Wdouble-promotion] 1592 \| GGML_F16_VEC_REDUCE(sumf, sum); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /Users/danbev/work/ai/whisper-work/ggml/src/ggml-cpu/ggml-cpu.c:932:37: note: expanded from macro 'GGML_F16_VEC_REDUCE' 932 \| #define GGML_F16_VEC_REDUCE GGML_F16x4_REDUCE \| ^ /Users/danbev/work/ai/whisper-work/ggml/src/ggml-cpu/ggml-cpu.c:920:44: note: expanded from macro 'GGML_F16x4_REDUCE' 918 \| res = wasm_f32x4_extract_lane(x[0], 0) + \ \| ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 919 \| wasm_f32x4_extract_lane(x[0], 1) + \ \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 920 \| wasm_f32x4_extract_lane(x[0], 2) + \ \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~ 921 \| wasm_f32x4_extract_lane(x[0], 3); \ \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /whisper-work/ggml/src/ggml-cpu/ggml-cpu.c:1640:9: warning: implicit conversion increases floating-point precision: 'float' to 'ggml_float' (aka 'double') [-Wdouble-promotion] 1640 \| GGML_F16_VEC_REDUCE(sumf[k], sum[k]); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /Users/danbev/work/ai/whisper-work/ggml/src/ggml-cpu/ggml-cpu.c:932:37: note: expanded from macro 'GGML_F16_VEC_REDUCE' 932 \| #define GGML_F16_VEC_REDUCE GGML_F16x4_REDUCE \| ^ /Users/danbev/work/ai/whisper-work/ggml/src/ggml-cpu/ggml-cpu.c:920:44: note: expanded from macro 'GGML_F16x4_REDUCE' 918 \| res = wasm_f32x4_extract_lane(x[0], 0) + \ \| ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 919 \| wasm_f32x4_extract_lane(x[0], 1) + \ \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 920 \| wasm_f32x4_extract_lane(x[0], 2) + \ \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~ 921 \| wasm_f32x4_extract_lane(x[0], 3); \ \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2 warnings generated. ``` wasm_f32x4_extract_lane returns a 32-bit float and this is what the addition is performed on. But there is an implicit conversion from 32-bit float to 64-bit double when the result is assigned to `res`, which is of type `ggml_float`. My understanding here is that this is intentional and adding a cast to `ggml_float` should suppress the warning. * emscripten : add -Wno-deprecated to for emscripten This commit adds -Wno-deprecated to the CMAKE_CXX_FLAGS for emscripten builds. The motivation for this is that currently there a number of warnings generated like the following: ```console warning: JS library symbol '$print' is deprecated. Please open a bug if you have a continuing need for this symbol [-Wdeprecated] warning: JS library symbol '$printErr' is deprecated. Please open a bug if you have a continuing need for this symbol [-Wdeprecated] em++: warning: warnings in JS library compilation [-Wjs-compiler] em++: warning: linker setting ignored during compilation: 'ENVIRONMENT' [-Wunused-command-line-argument] warning: JS library symbol '$print' is deprecated. Please open a bug if you have a continuing need for this symbol [-Wdeprecated] warning: JS library symbol '$printErr' is deprecated. Please open a bug if you have a continuing need for this symbol [-Wdeprecated] em++: warning: warnings in JS library compilation [-Wjs-compiler] warning: JS library symbol '$print' is deprecated. Please open a bug if you have a continuing need for this symbol [-Wdeprecated] warning: JS library symbol '$printErr' is deprecated. Please open a bug if you have a continuing need for this symbol [-Wdeprecated] em++: warning: warnings in JS library compilation [-Wjs-compiler] em++: warning: linker setting ignored during compilation: 'ENVIRONMENT' [-Wunused-command-line-argument] em++: warning: linker setting ignored during compilation: 'ENVIRONMENT' [-Wunused-command-line-argument] ``` The downside of this is that we might miss other deprecation warnings in the future so I'm not sure if this is acceptable. But it make the wasm examples cleaner without the warnings. * examples : fix tautological-compare warning in stb_vorbis.c [no ci] This commit applies a fix to address a tautological-compare warning in stb_vorbis.c. The motivation for this is that currently the following warning is generated when compiling the commmand-wasm example: ```console /Users/danbev/work/ai/whisper-work/examples/stb_vorbis.c:1404:75: warning: pointer comparison always evaluates to false [-Wtautological-compare] 1404 \| if (f->stream_start + loc >= f->stream_end \|\| f->stream_start + loc < f->stream_start) { \| ^ 1 warning generated. ``` This fix was taken from an open pull request on the stb repository that addreses this issue: https://github.com/nothings/stb/pull/1746 * squash! examples : update command.wasm instructions [no ci] This commit adds a Python script to serve the the wasm examples build in the `build-em` directory. Initially I thought that it would be enough to start a simple python server but I did not notice that there was an error in the browser console when I did that: ```console command.js:1 Uncaught (in promise) DataCloneError: Failed to execute 'postMessage' on 'Worker': SharedArrayBuffer transfer requires self.crossOriginIsolated. at command.js:1:1206224 at new Promise (<anonymous>) at loadWasmModuleToWorker (command.js:1:1204981) at Array.map (<anonymous>) at Object.loadWasmModuleToAllWorkers (command.js:1:1206428) at command.js:1:1204318 at callRuntimeCallbacks (command.js:1:1202062) at preRun (command.js:1:6136) at run (command.js:1:1294094) at removeRunDependency (command.js:1:7046) ``` We need a few CORS headers to be set and in order hopefully make this easy for users a Python script is added to the examples directory. This should be able to server all the wasm examples provided they have been built. command.wasm's README.md is updated to reflect this change. * examples : remove unused functions This commit removed the unused functions convert_to_utf8 and convert_to_wstring from examples/common.cpp. * Revert "examples : fix tautological-compare warning in stb_vorbis.c [no ci]" This reverts commit 8e3c47d96141c7675c985562ebdc705e839e338a. We should not make this change here and instead when the upstream PR is merged we can sync with it. Refs: https://github.com/ggerganov/whisper.cpp/issues/2784	2025-03-20 07:02:18 +01:00
Georgi Gerganov	4ffb8e3e4d	cmake : fix ggml-config (ggml/0)	2025-03-08 15:13:01 +02:00
Rémy O	eebf6bc0bd	ggml-cpu: faster AVX2 variant for IQ1_M (llama/12216)	2025-03-08 15:13:01 +02:00
BB-fat	dc8f423b40	metal : simplify kernel arguments using a struct (ggml/3229) (llama/12194) * metal : refactor im2col parameters into a struct * metal: Change im2col offset types from int32_t to uint64_t to support larger memory offsets * metal : refactor sum_rows parameters into a struct * metal : refactor soft_max parameters into a struct * metal : refactor diag_mask_inf parameters into a struct * metal : refactor ssm_conv parameters into a struct * metal : refactor ssm_scan parameters into a struct * metal : refactor get_rows parameters into a struct * metal : refactor group_norm parameters into a struct * metal : refactor conv_transpose_1d parameters into a struct * metal : refactor upscale parameters into a struct * metal : refactor pad parameters into a struct * metal : refactor pad_reflect_1d parameters into a struct * metal : refactor arange parameters into a struct * metal : refactor timestep_embedding parameters into a struct * metal : refactor argsort parameters into a struct * metal : refactor leaky_relu parameters into a struct * metal : refactor pool_2d parameters into a struct * metal : fix trailing whitespace --------- Co-authored-by: alexju <alexju@tencent.com>	2025-03-08 15:13:01 +02:00
Daniel Bevenius	548e7052f1	metal : fix default.metallib build (llama/12224) This commit updates the custom command to build the default.metallib file to use the correct path to ../ggml-common.h by using the variable METALLIB_COMMON. The motivation for this change is that currently when building and specifying GGML_METAL_EMBED_LIBRARY=OFF the following error is generated: ```console [ 11%] Linking CXX shared library ../../bin/libggml.dylib [ 11%] Built target ggml make[2]: * No rule to make target `ggml/src/ggml-metal/ggml-common.h', needed by `bin/default.metallib'. Stop. make[1]: * [ggml/src/ggml-metal/CMakeFiles/ggml-metal-lib.dir/all] Error 2 ``` With the above change the build could progress but there was a follow on error about not being able to find the ggml-common.h file in ggml-metal.metal where is was included as a relative path: ```console [ 11%] Compiling Metal kernels /Users/danbev/work/llama.cpp/build/bin/ggml-metal.metal:6:10: error: '../ggml-common.h' file not found, did you mean 'ggml-common.h'? ^~~~~~~~~~~~~~~~~~ "ggml-common.h" 1 error generated. ``` Removing the relative path then allowed the build to complete successfully.	2025-03-08 15:13:01 +02:00
lhez	a34cb73dc2	opencl: Noncontiguous `norm`, `rms_norm`, disable `fp16` for some ops (llama/12217) * opencl: support noncontiguous `norm` * opencl: support noncontiguous `rms_norm` * opencl: disable fp16 for `ADD`, `MUL`, `SCALE`, `RELU`, `GELU`, `SILU`, `CLAMP`	2025-03-08 15:13:01 +02:00
xiaofei	82f9496657	cmake : fix undefined reference errors for std::filesystem in ggml (#12092 ) (llama/12094) Signed-off-by: Ray Lee <hburaylee@gmail.com> Co-authored-by: Ray Lee <hburaylee@gmail.com>	2025-03-08 15:13:01 +02:00
Johannes Gäßler	e3c85e75bd	CUDA: fix FA logic for PTX 7.0 and CC >= 7.5 (llama/12222)	2025-03-08 15:13:01 +02:00
uvos	b9eab73fa2	HIP/CUDA: set the paramerter value in maintain_cuda_graph instead of replaceing it. (llama/12209) This avoids conflict with internal cuda/hip runtimes memory managment behavior.	2025-03-08 15:13:01 +02:00
Henry Linjamäki	76385c8311	opencl : fix buffer alignment (llama/12197) Fix the following error: ``` ggml-alloc.c:99: not enough space in the buffer ggml_tallocr_alloc: not enough space in the buffer to allocate blk.17.ffn_down.weight (needed 27525120, available 27521024) ``` which occurs when `ggml_backend_opencl_context::alignment` is larger than `cl_ptr_base` (hard-coded to `0x1000`). Also, fix `ggml_backend_opencl_context::alignment` was set to `CL_DEVICE_MEM_BASE_ADDR_ALIGN` which was treated as bytes but the value is reported in bits.	2025-03-08 15:13:01 +02:00
Henry Linjamäki	442cd1d2e7	opencl : fix `ulong` kernel args were set from `int` variables (llama/12174) ... which left garbage bits in the upper half of the kernel args. This caused segmentation faults when running PoCL.	2025-03-08 15:13:01 +02:00
simon886212	bc8cb97e02	opencl : fix profile-related errors (llama/12095) Co-authored-by: ubuntu <ubuntu@localhost.localdomain>	2025-03-08 15:13:01 +02:00
Rémy O	8dcadf736b	ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions (llama/12154) * ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions * cmake: Add GGML_BMI2 build option * ggml: enable BMI2 on relevant CPU variants * ggml-cpu: include BMI2 in backend score * ggml-cpu: register BMI2 in ggml_backend_cpu_get_features * ggml-cpu: add __BMI2__ define when using MSVC	2025-03-08 15:13:01 +02:00
Akarshan Biswas	93986b61e0	SYCL: Disable f16 Unary OPs as not supported by the kernels (llama/12201)	2025-03-08 15:13:01 +02:00
Plamen Minev	bd1a9e34c9	ggml : fix GGMLMetalClass ODR (llama/12200) -- it might happen if ggml is loaded from 2 separate libraries since each one of them will expose the class. This is more of a guard since we want to use only Metal as embedded library and don't care about the other case.	2025-03-08 15:13:01 +02:00
vmobilis	cc03608e78	ggml : ggml_compute_forward_concat() for arbitrary tensor type (ggml/1118) * ggml_compute_forward_concat() for arbitrary tensor type * Check that tensors' type match * ggml-cpu.c: check type of source tensors * ggml-cpu.c: move tensor type check to ggml_compute_forward_concat() * ggml.c: check concatenated tensor type * Remove tensor type check from ggml_compute_forward_concat() in ggml-cpu.c ..., as it was moved to ggml.c.	2025-03-08 15:13:01 +02:00
Georgi Gerganov	54a54faee4	vulkan : sync (llama/0) ggml-ci	2025-03-08 15:13:01 +02:00
mgroeber9110	96a92ecc4c	ggml : portability fixes for VS 2017 (llama/12150) * Add include files for std::min/max and std::toupper/tolower * win32: move _USE_MATH_DEFINES before includes to ensure M_PI is defined * Use GGML_RESTRICT instead of "restrict" keyword everywhere, and use "__restrict" in MSVC plain C mode * win32: only use __restrict in MSVC if C11/C17 support is not enabled --------- Co-authored-by: Marcus Groeber <Marcus.Groeber@cerence.com>	2025-03-08 15:13:01 +02:00
David Huang	edd1d8686a	HIP: implement FlashAttention via rocWMMA for CDNA and RDNA3+ (llama/12032) Adds GGML_HIP_ROCWMMA_FATTN and rocwmma header check Adds rocWMMA support to fattn-wmma-f16	2025-03-08 15:13:01 +02:00
ag2s20150909	dc6f4e7c05	ggml : fix kleidiai build (llama/12159) The libggml API has changed, but this has not been updated.	2025-03-08 15:13:01 +02:00
Akarshan Biswas	74c85d154e	SYCL: Move CPY kernels to a separate file and add few missing kernels (llama/12133) * SYCL: refactor and move cpy kernels to a separate file * Add few missing cpy kernels * refactor and add debug logs	2025-03-08 15:13:01 +02:00
Diego Devesa	eb2d8b6ffd	ggml-backend : keep paths in native string type when possible (llama/12144)	2025-03-08 15:13:01 +02:00
Erik Scholz	b442dcd598	CUDA: compress mode option and default to size (llama/12029) cuda 12.8 added the option to specify stronger compression for binaries, so we now default to "size".	2025-03-08 15:13:01 +02:00
William Tambellini	c98681e6d5	ggml : upgrade init_tensor API to return a ggml_status (llama/11854) * Upgrade init_tensor API to return a ggml_status To prepare for an 'abort-free' ggml (ggml not to abort on OOMs but return a OOM status), as agreeed with Diego in the ggml repo, upgrade the init_tensor() and view_init() APIs to return a ggml_status. * misc fixes --------- Co-authored-by: slaren <slarengh@gmail.com>	2025-03-08 15:13:01 +02:00
Rémy O	3bab804981	vulkan: add specific MMV kernels for IQ2 and IQ3 quants + optimizations (llama/11595) * vulkan: implement specialized MMV kernels for IQ2 quantizations * vulkan: add MMV kernels for IQ3 quants * vulkan: Increase MMV batch size and unroll IQ LUT setup * vulkan: fix init_iq_shmem for WG sizes larger than tables * vulkan: common batch size for all I-quants	2025-03-08 15:13:01 +02:00
Johannes Gäßler	c927830a70	CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ (llama/12098)	2025-03-08 15:13:01 +02:00
Prashant Vithule	992b51b3d5	ggml: aarch64: implement SVE kernels for q2_k_q8_k vector dot (llama/12064) * Added SVE Support for Q2_K Quantized Models * Use 4-space indentation in the switch cases * removed comments lines * Remove the loop Retain the curly bracess for better understanding of code * Remove the comment like added for q3_k_q8_k kernel --------- Co-authored-by: vithulep <p.m.vithule1517@gmail.com>	2025-03-08 15:13:01 +02:00
hipudding	2c882cbe4c	CANN: Fix build error with GCC 13 (llama/11990) Remove unused header file that causes compilation failure on ARM platform with GCC 13.	2025-03-08 15:13:01 +02:00
Eve	1fbb119b1e	vulkan: matmul dequantization improvements (llama/12015) * faster dequant for old quants * dont use unpack for iq4_nl * vec2 unpack for q8	2025-03-08 15:13:01 +02:00
Daniele	40dea850fd	vulkan: improve im2col (llama/11826) * vulkan: improve im2col performance	2025-03-08 15:13:01 +02:00
Vladimir Vuksanovic	8255a830a8	cmake: Fix ggml backend dependencies and installation (llama/11818) * Fix dependencies between ggml and backends ggml backends link only to ggml-base and ggml links to all backends. * Fix installation of ggml backends Set up GNUInstallDirs before setting the installation directory of ggml backends	2025-03-08 15:13:01 +02:00
Jeff Bolz	a0f76b2da7	vulkan: fix assertion when qy_needs_dequant (llama/12068) Looks like a copy/paste bug from qx_needs_dequant.	2025-03-08 15:13:01 +02:00
Molly Sophia	394768c48b	ggml-cpu: Fix build with sve (llama/12059) * ggml-cpu: Fix build with sve Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * ggml-cpu: Remove unused variable in sve q3_k vec dot Signed-off-by: Molly Sophia <mollysophia379@gmail.com> --------- Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2025-03-08 15:13:01 +02:00
cmdr2	846e01b2c0	cuda: unary ops as float + de-duplicate (ggml/1130)	2025-03-08 15:13:01 +02:00
cmdr2	6ac8e6b2ce	cuda/vulkan: specify fp32-only support for some operations in supports_op (ggml/1129) * cuda: restrict SILU_BACK to fp32, since fp16 exceeds the desired test threshold * vulkan: specify fp32-only support for certain ops (that are now tested for fp16 as well) * f32 sigmoid in vulkan supports op * Revert "f32 sigmoid in vulkan supports op" This reverts commit c6f04b3c19bf4504c2776149c6d8cd84e0b48acb.	2025-03-08 15:13:01 +02:00
cmdr2	60d2ddebdf	cuda/cpu: Increase support for fp16 unary operations (ggml/1125) * Support fp16 unary operations in the CUDA backend * cpu: increase fp16 support for unary operators in the CPU backend * cuda: increase fp16 support for unary operators in the CUDA backend * Add test cases for fp16 unary operators * metal: update supports_op for unary operators that don't support fp16, to prevent test-backend-ops from failing * metal: fix PR comments for unary op support after fp16 unary tests	2025-03-08 15:13:01 +02:00
petterreinholdtsen	2e180184a8	Told cmake to install ggml-cpp.h as a public header file. (ggml/1126) It is used by Whisper talk-llama example. Co-authored-by: Petter Reinholdtsen <pere@debian.org>	2025-03-08 15:13:01 +02:00
Diego Devesa	339a1cba5d	whisper : support GGML_BACKEND_DL (#2843 ) Some checks are pending CI / ubuntu-22-clang (linux/amd64, Debug) (push) Waiting to run Details CI / ubuntu-22-clang (linux/amd64, Release) (push) Waiting to run Details CI / ubuntu-22-clang (linux/arm64, Debug) (push) Waiting to run Details CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run Details CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run Details CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run Details CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run Details CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run Details CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run Details CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run Details CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run Details CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run Details CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run Details CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run Details CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run Details CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run Details CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run Details CI / emscripten (Release) (push) Waiting to run Details CI / ios-xcode-build (Release) (push) Waiting to run Details CI / android (push) Waiting to run Details CI / quantize (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run Details * whisper : support GGML_BACKEND_DL * fix DTW crash * whisper.objc : fix build - add ggml-cpp.h --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-02-27 13:35:07 +01:00
cmdr2	cdaee8b4bd	Support pure float16 add/sub/mul/div operations in the CUDA (and CPU) backend (ggml/1121) * Support float16-to-float16 add/sub/mul/div operations in the CUDA backend * Add fp16 support for add/sub/mul/div on the CPU backend * Add test cases for fp16 add/sub/mul/div	2025-02-27 08:55:36 +02:00
Gian-Carlo Pascutto	4b60ff4f92	metal : copy kernels for quant to F32/F16 conversions (llama/12017) metal: use dequantize_q templates --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-02-27 08:55:36 +02:00
lhez	b43b9d928c	opencl: fix for small models (llama/11950) * opencl: fix small shape gemv, remove unused extensions * opencl: fix `transpose_16`, `dump_tensor`, enforce subgroup size * opencl: fix for token length < 4 * opencl: use wave size of 64 for all Adreno GPUs --------- Co-authored-by: Shawn Gu <quic_shawngu@quicinc.com> Co-authored-by: Skyler Szot <quic_sszot@quicinc.com>	2025-02-27 08:55:36 +02:00
Neo Zhang Jianyu	e3cb412a59	Optimize mul_mat for Q4_0 on Intel GPU (llama/12035) * opt performance by reorder for Intel GPU * detect hw type and save opt feature, and print opt feature * correct name * support optimize graph once when compute graph, record the opt status in tensor->extra, make CI passed * add env variable GGML_SYCL_DISABLE_OPT for debug * use syclex::architecture replace the custom hw define, update the guide for GGML_SYCL_DISABLE_OPT * add performance data * mv getrows functions to separeted files * fix global variables --------- Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>	2025-02-27 08:55:36 +02:00
Akarshan Biswas	ac301a7d9b	SYCL: Fix GGML_SYCL_DEBUG macro (llama/11995)	2025-02-27 08:55:36 +02:00
Aaron Teo	82e04e7670	ggml-cpu: Support s390x SIMD Instruction Set (llama/12019) * ggml: add s390x ARCH_FLAGS for compilation Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: add SIMD for s390x using vector intrinsics SIMD is activated for: * ggml_vec_dot_f32 * ggml_vec_dot_f16 * ggml_vec_mad_f32 * ggml_vec_mad_f16 * ggml_vec_mad_f32_unroll * ggml_vec_scale_f32 * ggml_vec_scale_f16 SIMD is NOT activated for: * ggml_vec_dot_f16_unroll (pending bugfix) Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: fix missing escape character in GGML_F32x4_REDUCE Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: add temporary patch for GGML_F32_ARR and GGML_F16_ARR Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: fix s390x GGML_F32x4_REDUCE Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: full SIMD activation for F32,F16 s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: add option to disable s390x VXE/VXE2 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: change vecintrin.h include to ggml-cpu-impl * add __VXE__ and __VXE2__ macros Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * cmake: add s390x target detection for VX/VXE/VXE2 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: move s390x vector intrinsics to ggml-cpu-impl.h Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x Q8_0 SIMD Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: correct documentation for Q8_0 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x reduce code complexity Q8_0 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x bugfix typo Q8_0 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x SIMD activated for Q4_1 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x inline vec_reve Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x SIMD activation for Q4_0 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: add VXE backend feature Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: remove test.py Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x SIMD activation for quantize_row_q8_0 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x SIMD activation for quantize_row_q8_1 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x SIMD activation for iq4_xs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: bugfix iq4_xs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x SIMD activation for iq4_nl Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: add float, double, and long vector data type Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: clean up iq4_xs SIMD Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: fix improper use of restrict keyword Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: update warning message for ggml_vec_tbl Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: untested implementation of ggml_vec_dot_iq2_xxs_q8_K Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: update ggml_vec_dot_q4_1_q8_1 to use typedefs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: switch to restrict for iq4_nl Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: slight dot product speed improvement for q4_1_q8_1 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x SIMD activation for q6_K Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: add missing `_t` to ggml_int8x16x4_t Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: fix missing `_t` for ggml_vec_xl_s8x4 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: fix more missing `_t` Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: add unroll and prefetch to Q8_0 increase of 3.86% for prompt processing and 32.22% for token generation Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: patch Q8_0 to use proper vector sizes Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: optimise Q8_0 dot prod compute kernel further Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: add unroll and prefetch to Q4_1 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: refactor Q6_K variable naming for readability Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: fix Q6_K typos Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x SIMD activation for Q5_K Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: fix wrong charx16_t naming Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> ggml: Q5_K y0 wrong signness Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: fix Q5_K invalid uchar type Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: fix Q5_K invalid uchar type Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x SIMD activation for Q4_K Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: fix Q4_K invalid vector intrinsics Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: simplify ggml_padd_s16 compute kernel Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: correct ggml-cpu vxe wording Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: change ggml_aligned_malloc alignment to 256 256 is the cache line size for s390x platforms Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: resolve pr merge via cherry-pick 225bbbf Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml : fix LoongArch compile error with 128-bit SIMD (llama/11701) * ggml: resolve pr merge via cherry-pick 4571953 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: cmake remove fork when determining s390x machine type thank you @ericcurtin Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> Co-authored-by: Jinyang He <hejinyang@loongson.cn> Co-authored-by: junchao-zhao <68935141+junchao-loongson@users.noreply.github.com>	2025-02-27 08:55:36 +02:00
Johannes Gäßler	38ac47cd4d	CUDA: app option to compile without FlashAttention (llama/12025)	2025-02-27 08:55:36 +02:00
Johannes Gäßler	2d70cd36d7	CUDA: optimize FA for GQA + large batches (llama/12014)	2025-02-27 08:55:36 +02:00

1 2 3 4 5 ...

610 Commits