whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-06-01 15:00:44 +00:00

Author	SHA1	Message	Date
uvos	43c744ce8b	HIP: require at least HIP 5.5	2025-02-03 22:00:57 +02:00
uvos	fc2e44490d	HIP: Prepare reduction operators for wave 64	2025-02-03 22:00:57 +02:00
uvos	f41fdad200	CUDA/HIP: add warp_size to cuda_device_info	2025-02-03 22:00:57 +02:00
Rémy Oudompheng	80fa576254	vulkan: implement initial support for IQ2 and IQ3 quantizations (llama/11360) * vulkan: initial support for IQ3_S * vulkan: initial support for IQ3_XXS * vulkan: initial support for IQ2_XXS * vulkan: initial support for IQ2_XS * vulkan: optimize Q3_K by removing branches * vulkan: implement dequantize variants for coopmat2 * vulkan: initial support for IQ2_S * vulkan: vertically realign code * port failing dequant callbacks from mul_mm * Fix array length mismatches * vulkan: avoid using workgroup size before it is referenced * tests: increase timeout for Vulkan llvmpipe backend --------- Co-authored-by: Jeff Bolz <jbolz@nvidia.com>	2025-02-03 22:00:57 +02:00
Jeff Bolz	75e7d0585e	vulkan: Catch pipeline creation failure and print an error message (llama/11436) * vulkan: Catch pipeline creation failure and print an error message Also, fix some warnings from my on-demand compile change. * vulkan: fix pipeline creation logging	2025-02-03 22:00:57 +02:00
uvos	682a6f5f87	HIP: Supress transformation warning in softmax.cu loops with bounds not known at compile time can not be unrolled. when ncols_template == 0, the bounds of the loop are not constexpr, thus llvm cant unroll the loops here.	2025-02-03 22:00:57 +02:00
Nikita Sarychev	115716d109	HIP: Only call rocblas_initialize on rocblas versions with the multiple instantation bug (llama/11080) This disables the workaround on rocblas fixed versions (>=4.0.0) to eliminate the runtime cost and unnecessary VRAM allocation of loading all tensile objects.	2025-02-03 22:00:57 +02:00
someone13574	b2cfef655b	cmake : don't fail on `GGML_CPU=OFF` (llama/11457)	2025-02-03 22:00:57 +02:00
Akarshan Biswas	22e3df0afa	SYCL : SOFTMAX F16 mask support and other fixes (llama/11261) Implemented ggml_sycl_op_soft_max() F16 src1(mask) support for which a pragma deprecation warning was added during #5021. To do this, had to decouple it from ggml_sycl_op_flatten which always considered src1 to be of fp32 type(many OP functions are dependent on it). * SYCL: SOFTMAX F16 mask support and other fixes * test-backend-ops: Add F16 mask test cases	2025-02-03 22:00:57 +02:00
Haus1	028511d349	AMD: parse the architecture as supplied by gcnArchName (llama/11244) The value provided by minor doesn't include stepping for AMD, parse the value returned by gcnArchName instead to retrieve an accurate ID.	2025-02-03 22:00:57 +02:00
Ihar Hrachyshka	70c4038842	metal: Handle null returned from MTLCreateSystemDefaultDevice() (llama/11441) This fixes segmentation fault error when running tests when no metal devices are available (for example, when not linked with Core Graphics framework or otherwise).	2025-02-03 22:00:57 +02:00
Georgi Gerganov	8639c003a9	metal : use residency sets (llama/11427) * metal : use residency sets ggml-ci * metal : restore commandBufferWithUnretainedReferences calls [no ci] * metal : release descriptors ggml-ci * metal : check env GGML_METAL_NO_RESIDENCY ggml-ci * metal : fix build + clean-up ggml-ci	2025-02-03 22:00:57 +02:00
bandoti	d5d831da65	cmake: add ggml find package (llama/11369) * Add initial ggml cmake package * Add build numbers to ggml find-package * Expand variables with GGML_ prefix * Guard against adding to cache variable twice * Add git to msys2 workflow * Handle ggml-cpu-* variants * Link ggml/ggml-base libraries to their targets * Replace main-cmake-pkg with simple-cmake-pkg * Interface features require c_std_90 * Fix typo * Removed unnecessary bracket from status message * Update examples/simple-cmake-pkg/README.md Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update examples/simple-cmake-pkg/README.md Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-02-03 22:00:57 +02:00
Jeff Bolz	7230a6e1c8	vulkan: compile shaders on-demand (llama/11406) Reduce first-run startup time and memory consumption. Should fix #11339.	2025-02-03 22:00:57 +02:00
uvos	a160fa0f3a	Hip: disable VMM on hip as it seams that it dosent work in some configurations (llama/11420)	2025-02-03 22:00:57 +02:00
uvos	0282ad8fd1	hip : Add hipGraph and VMM support to ROCM (llama/11362) * Add hipGraph support * Enable VMM on rocm	2025-02-03 22:00:57 +02:00
Johannes Gäßler	9e467815d4	CUDA: fix FP16 cuBLAS GEMM (llama/11396)	2025-02-03 22:00:57 +02:00
uvos	727891d9bf	rocBLAS: Avoid fp32->fp16->fp32 conversion on cdna (llama/11356)	2025-02-03 22:00:57 +02:00
Johannes Gäßler	c262dc80e2	CPU/CUDA: fix (GQA) mul mat back, add CUDA support (llama/11380)	2025-02-03 22:00:57 +02:00
Bernhard M. Wiedemann	30767b4c4e	cmake : avoid -march=native when reproducible build is wanted (llama/11366) See https://reproducible-builds.org/ for why this is good and https://reproducible-builds.org/specs/source-date-epoch/ for the definition of this variable. Without this patch, compiling on different machines produced different binaries, which made verification of results difficult. Fixes: #11317 This patch was done while working on reproducible builds for openSUSE.	2025-02-03 22:00:57 +02:00
amd-dwang	16eeb31933	Vulkan-run-test: fix mmq_wg_denoms (llama/11343) There should be a copy-and-paste error here. mmq_wg_denoms should be used together with warptile_mmq, instead of wg_denoms.	2025-02-03 22:00:57 +02:00
Jeff Bolz	ba523d5e22	vulkan: sort shaders for more deterministic binary (llama/11315) Fixes #11306.	2025-02-03 22:00:57 +02:00
Jeff Bolz	3736706139	vulkan: fix diag_mask_inf (llama/11323) With robustbufferaccess disabled, this shader was showing OOB stores. There is a bounds check in the code, but the workgrouop dimensions were reversed vs CUDA and it was running the wrong number of threads. So fix the workgroup dimensions and disable robustness for this pipeline.	2025-02-03 22:00:57 +02:00
Radoslav Gerganov	58640aa456	rpc : better caching of the base buffer pointer (llama/11331) There is no need to use map, just store the base pointer in the buffer context.	2025-02-03 22:00:57 +02:00
Georgi Gerganov	5183a05e56	metal : fix out-of-bounds write (llama/11314) ggml-ci	2025-02-03 22:00:57 +02:00
Jeff Bolz	0dcada42d4	vulkan: fix coopmat2 validation failures (llama/11284) mul mat and flash attention shaders were loading f32 types directly into A/B matrices, which happens to work but is technically invalid usage. For FA, we can load it as an Accumulator matrix and convert and this is not in the inner loop and is cheap enough. For mul mat, it's more efficient to do this conversion in a separate pass and have the input(s) be f16. coopmat2 requires SPIR-V 1.6 (related using to LocalSizeId). LocalSizeId requires maintenance4 be enabled, and SPIR-V 1.6 requires Vulkan 1.3.	2025-02-03 22:00:57 +02:00
Nicolò Scipione	d507b4cebe	SYCL: Introducing memory host pool (llama/11251) * Implement host pool for matrix_info Creating a new memory pool on the host to store memory location for matrix_info needed to launch gemm_batch from oneMKL/oneMath. Removing complex support in gemm_batch since it is not used in llama.cpp * Remove unnecessary headers and cast * Reorder member variable to avoid warning on initialization * Formatting * Remove unused variable * Address PR review feedback - remove warning --------- Signed-off-by: nscipione <nicolo.scipione@codeplay.com>	2025-02-03 22:00:57 +02:00
Georgi Gerganov	90171055f3	cmake : add sanitizer flags for llama.cpp (llama/11279) * cmake : add sanitizer flags for llama.cpp ggml-ci * tests : fix compile warnings ggml-ci * cmake : move sanitizer flags to llama_add_compile_flags ggml-ci * cmake : move llama.cpp compile flags to top level lists ggml-ci * cmake : apply only sanitizer flags at top level ggml-ci * tests : fix gguf context use in same_tensor_data * gguf-test: tensor data comparison * dummy : trigger ggml-ci * unicode : silence gcc warnings ggml-ci * ci : use sanitizer builds only in Debug mode ggml-ci * cmake : add status messages [no ci] --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-02-03 22:00:57 +02:00
Jeff Bolz	668306ff2b	vulkan: fix coopmat2 flash attention for non-contiguous inputs (llama/11281) Add code similar to mul_mm_cm2 to force alignment of strides, to avoid a performance regression. Add noncontiguous FA tests in test-backend-ops. Fixes #11268.	2025-02-03 22:00:57 +02:00
Radoslav Gerganov	fdc21fc87b	rpc : early register backend devices (llama/11262) Early register RPC devices and do not propagate RPC specifics in the llama model structures. ref: #10609	2025-02-03 22:00:57 +02:00
Jeff Bolz	7183a1eb72	vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl (llama/11166) * vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl Shaders are based on cpy.cu. * vulkan: support copy from q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl to f32 * ggml: copy q->f32 assumes some contiguity in the destination	2025-02-03 22:00:57 +02:00
Jeff Bolz	09f3c66648	vulkan: optimize coopmat2 q4_k/q5_k dequant functions. (llama/11206) Do masking on whole dwords, fetch all scales at once.	2025-02-03 22:00:57 +02:00
Jeff Bolz	62e2414620	vulkan: optimize coopmat2 q2_k dequant function (llama/11130)	2025-02-03 22:00:57 +02:00
Johannes Gäßler	de49024e49	CUDA: backwards pass for misc. ops, add tests (llama/11257) * CUDA: backwards pass for misc. ops, add tests * remove restrict from pointers	2025-02-03 22:00:57 +02:00
fj-y-saito	db6383094c	ggml: aarch64: implement SVE kernels for q4_K_q8_K vector dot (llama/11227) * Add SVE support for q4_K_q8_K * Update ggml/src/ggml-cpu/ggml-cpu-quants.c change to use K_SCALE_SIZE Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-02-03 22:00:57 +02:00
Eve	164f13c6a9	vulkan: scale caching for k quants + misc fixes (llama/11081) * q6_k scale caching * 16 bit unpack * q4_k test (slow) * revert it * q3_k * q2_k * little stuff * try precalculating products of a and q2_k scales * Revert "try precalculating products of a and q2_k scales" This reverts commit 65110b81f23f66331a50c6e889a7c1ab9470a86b. * unpack should be u16, add vim swap to gitignore (about time) * better q4_k scales * q5_k * better q6_k with separate paths for all threads and partial threads in use, plus some more optimizations * q2_k better dequant * q3_k optimizations * q3_k use hmask simd from cpu avx version * make the caches happy * q3_k separate out calculation * q2_k separate out * little stuff * use calc_superblock everywhere * q2_k optimize scale calculation * more barriers	2025-02-03 22:00:57 +02:00
Junil Kim	02aa86230a	fix: ggml: fix vulkan-shaders-gen build (llama/10448) * fix: ggml: fix vulkan-shaders-gen build The vulkan-shaders-gen target was not being built correctly in case of cross-compilation. Other outputs need to be built for the cross compile target, but vulkan-shaders-gen needs to be built for the host. * refactor: ggml: Improve vulkan-shaders-gen toolchain setup - Add GGML_SHADERS_GEN_TOOLCHAIN CMake option. - Auto-detect host toolchain if not set. * refactor: ggml: Improve vulkan-shaders-gen toolchain setup Use configure_file to generate host_toolchain.cmake from template * fix: ggml: Fix compile error Fix compile error not finding vulkan-shaders-gen * fix: vulkan-shaders-gen build and path handling Fix build issues with vulkan-shaders-gen: - Add target dependency for correct build order - Use CMAKE_HOST_SYSTEM_NAME for executable suffix - Fix MSVC output directory in host toolchain - Normalize path handling for cross-compilation * fix: improve host compiler detection in vulkan shader build Improve host compiler detection for vulkan shader generation: - Add NO_CMAKE_FIND_ROOT_PATH to all compiler searches - Consolidate compiler detection logic - Fix Windows-specific MSVC detection - Ensure correct compiler search in cross-compilation * refactor: Simplify CMake function for detecting host compiler Simplified the CMake function to improve the process of detecting the host compiler. * fix: Remove unnecessary Vulkan library linkage in CMakeLists.txt Since `vulkan-shader-gen.cpp` only requires the `glslc` executable and not the Vulkan headers or libraries, CMakeLists.txt needs to be corrected. (See: ecc93d0558fc3ecb8a5af69d2ece02fae4710ade) * refactor: Rename host_toolchain.cmake.in - Rename host_toolchain.cmake.in to cmake/host-toolchain.cmake.in * refactor: GGML_VULKAN_SHADERS_GEN_TOOLCHAIN Rename the macro GGML_SHADERS_GEN_TOOLCHAIN to GGML_VULKAN_SHADERS_GEN_TOOLCHAIN	2025-02-03 22:00:57 +02:00
Johannes Gäßler	54a2ee648f	RoPE: fix back, CUDA support for back + noncont. (llama/11240) * RoPE: fix back, CUDA support for back + noncont. * fix comments reg. non-cont. RoPE support [no-ci]	2025-02-03 22:00:57 +02:00
Akarshan Biswas	9700cfb0a3	SYCL: Add gated linear attention kernel (llama/11175) * SYCL: Add Gated Linear attention kernel * glahpp: add a space at the end of file * gla: Put the barrier inside the main logic loop	2025-02-03 22:00:57 +02:00
William Tambellini	8e0143e205	ggml : add option to not print stack on abort (ggml/1081) * Add option to not print stack on abort Add option/envvar to disable stack printing on abort. Also link some unittests with Threads to fix link errors on ubuntu/g++11. * Update ggml/src/ggml.c --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-02-03 22:00:57 +02:00
issixx	f12559d590	ggml-cpu : fix ggml_graph_compute_thread did not terminate on abort. (ggml/1065) some threads kept looping and failed to terminate properly after an abort during CPU execution. Co-authored-by: issi <issi@gmail.com>	2025-02-03 22:00:57 +02:00
Georgi Gerganov	589b40810a	ci : dummy commit to trigger CI Some checks are pending CI / ubuntu-latest-clang (linux/amd64, Debug) (push) Waiting to run Details CI / ubuntu-latest-clang (linux/amd64, Release) (push) Waiting to run Details CI / ubuntu-latest-clang (linux/arm64, Debug) (push) Waiting to run Details CI / ubuntu-latest-clang (linux/arm64, Release) (push) Waiting to run Details CI / ubuntu-latest-clang (linux/ppc64le, Debug) (push) Waiting to run Details CI / ubuntu-latest-clang (linux/ppc64le, Release) (push) Waiting to run Details CI / ubuntu-latest-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run Details CI / ubuntu-latest-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run Details CI / ubuntu-latest-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run Details CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run Details CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run Details CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run Details CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run Details CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run Details CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run Details CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run Details CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run Details CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run Details CI / emscripten (Release) (push) Waiting to run Details CI / ios-xcode-build (Release) (push) Waiting to run Details CI / android (push) Waiting to run Details CI / quantize (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run Details	2025-02-03 16:32:48 +02:00
KITAITI Makoto	7ffcd05267	ruby : Make context accept initial parameters, API to retrieve a segment and more (#2749 ) * Fix type signature for Whisper.log_set * Use cache file for model when offline * Extract ruby_whisper_transcribe() into a file * Extract Whisper::Error * Use FileList for ext/.{c,cpp,h} Extract Whisper::Segment * Extract Whisper::Model * Extract Whisper::Params * Extract Whisper::Context * Extract log_callback function * Write base code in C rather than C++ * Use chdir instead of Dir.chdir in Rakefile * Define alloc func for Whisper::Model * Define Whisper::Params' calback and user data reader * Add test for Whisper::Params.new with keyword arguments * Make Whisper::Params.new accept keyword arguments * Update type signatures * Update README * Update CLEAN targets * Fix document comment for Whisper::Params#new_segment_callback= * Use macro to define params * Fix dependency of build task * Set Whisper.finalize_log_callback visibility to private * Make Whisper::Context#full and full_parallel return self * Add test for Whisper::Context#full_get_segment * Add Whisper::Context#full_get_segment * Update signatures * Update README * Fix signature * Resplace #initialize with .new in signature file [skip ci] * Fix potential overflow	2025-01-21 09:39:54 +02:00
Corey Earwood	7a423f1c00	whisper.objc : fix build and CI Some checks failed CI / ubuntu-latest-clang (linux/amd64, Debug) (push) Has been cancelled Details CI / ubuntu-latest-clang (linux/amd64, Release) (push) Has been cancelled Details CI / ubuntu-latest-clang (linux/arm64, Debug) (push) Has been cancelled Details CI / ubuntu-latest-clang (linux/arm64, Release) (push) Has been cancelled Details CI / ubuntu-latest-clang (linux/ppc64le, Debug) (push) Has been cancelled Details CI / ubuntu-latest-clang (linux/ppc64le, Release) (push) Has been cancelled Details CI / ubuntu-latest-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled Details CI / ubuntu-latest-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled Details CI / ubuntu-latest-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled Details CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled Details CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled Details CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled Details CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled Details CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled Details CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled Details CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled Details CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled Details CI / emscripten (Release) (push) Has been cancelled Details CI / ios-xcode-build (Release) (push) Has been cancelled Details CI / android (push) Has been cancelled Details CI / quantize (push) Has been cancelled Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled Details	2025-01-18 12:06:06 +02:00
Georgi Gerganov	99b011a9f5	talk-llama : sync llama.cpp Some checks failed CI / ubuntu-latest-clang (linux/amd64, Debug) (push) Has been cancelled Details CI / ubuntu-latest-clang (linux/amd64, Release) (push) Has been cancelled Details CI / ubuntu-latest-clang (linux/arm64, Debug) (push) Has been cancelled Details CI / ubuntu-latest-clang (linux/arm64, Release) (push) Has been cancelled Details CI / ubuntu-latest-clang (linux/ppc64le, Debug) (push) Has been cancelled Details CI / ubuntu-latest-clang (linux/ppc64le, Release) (push) Has been cancelled Details CI / ubuntu-latest-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled Details CI / ubuntu-latest-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled Details CI / ubuntu-latest-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled Details CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled Details CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled Details CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled Details CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled Details CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled Details CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled Details CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled Details CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled Details CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled Details CI / emscripten (Release) (push) Has been cancelled Details CI / ios-xcode-build (Release) (push) Has been cancelled Details CI / android (push) Has been cancelled Details CI / quantize (push) Has been cancelled Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled Details	2025-01-14 10:38:01 +02:00
Georgi Gerganov	19d95f9f9a	sync : ggml	2025-01-14 10:38:01 +02:00
Johannes Gäßler	d5ef1737d8	GGUF: C++ refactor, backend support, misc fixes (skip) (llama/11030) ggml-ci	2025-01-14 10:38:01 +02:00
lhez	1deb41f0e7	ggml : add opencl backend (skip) (llama/10693) --------- Co-authored-by: Skyler Szot <quic_sszot@quicinc.com> Co-authored-by: Shangqing Gu <quic_shawngu@quicinc.com> Co-authored-by: Alexander Angus <quic_aangus@quicinc.com> Co-authored-by: Hongqiang Wang <quic_wangh@quicinc.com> Co-authored-by: Max Krasnyansky <quic_maxk@quicinc.com>	2025-01-14 10:38:01 +02:00
Andreas Kieslinger	2425caf4fd	cuda : CUDA Graph Compute Function Refactor (precursor for performance improvements) (llama/11042) * Refactor: Moves cuda graph executable update step to separate function. * Refactor: Moves cuda graph update check to separate function. * Refactor: Moves cuda graph maintenance (update or adjusting copy parameters) to separate function for improved readability. * Fix: Adds missing reference to maintain_cuda_graph() definition. * Refactor: Improves structure and abstractions by moving CUDA graph evaluation and capture to its own function. * Refactor: Moves node graph checks and copy ops into individual function for improved readability. * Refactor: Removes code permanently excluded from compilation to increase readability. * Style: Adds missing newline * Style: Consolidates several neighboring '#ifdef USE_CUDA_GRAPH' into a single one * Refactor: Makes 'cuda_graph_update_required' a local variable * remove double lines between functions --------- Co-authored-by: slaren <slarengh@gmail.com>	2025-01-14 10:38:01 +02:00
Radoslav Gerganov	a4b00bcaaf	ggml : do not define GGML_USE_CUDA when building with GGML_BACKEND_DL (llama/11211) Build fails when using HIP and GGML_BACKEND_DL: ``` /usr/bin/ld: ../ggml/src/libggml.so: undefined reference to `ggml_backend_cuda_reg' collect2: error: ld returned 1 exit status ``` This patch fixes this.	2025-01-14 10:38:01 +02:00

1 2 3 4 5 ...

2109 Commits