Commit Graph

  • 799eacdde4
    ruby : Add parallel transcription support (#3222) master KITAITI Makoto 2025-06-04 14:50:18 +09:00
  • 82f461eaa4
    ci : add mirror for ports.ubuntu.com (ARM packages) (#3221) Daniel Bevenius 2025-06-03 07:56:58 +02:00
  • 269dad68a2
    bindings.java : apply whisperParams in fullTranscribeWithTime instead of ignoring them (#3201) Joas Dev 2025-06-02 23:15:21 -05:00
  • 121d27a495
    musa: correct MUSA SDK rc4.0.1 download URL (#3217) R0CKSTAR 2025-06-03 12:02:12 +08:00
  • e05af2457b
    ci : use mirrors.kernel.org for Ubuntu packages (#3220) Daniel Bevenius 2025-06-02 16:46:40 +02:00
  • b505539670
    node : add language detection support (#3190) Daniel Bevenius 2025-06-02 14:58:05 +02:00
  • 7fd6fa8097 talk-llama : sync llama.cpp Georgi Gerganov 2025-06-01 14:07:36 +03:00
  • 3f46282cbe sync : ggml Georgi Gerganov 2025-06-01 14:03:21 +03:00
  • 1e16340f4b threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling (llama/12995) Max Krasnyansky 2025-05-31 15:39:19 -07:00
  • 4a50254998 CUDA: add a prop in ggml_cuda_device_infor for distinguish iGPU or dGPU in cuda (#13856) (llama/13895) Shawn yang 2025-05-31 14:48:04 +08:00
  • a5aff28198 CUDA: fix typo in FlashAttention code (llama/13926) Johannes Gäßler 2025-05-30 21:22:03 +02:00
  • 6c0472ab8f sched : avoid changing cur_copy when a graph is already allocated (llama/13922) Diego Devesa 2025-05-30 09:56:19 -07:00
  • b14cee184a cuda : prevent using split buffers with 3d/4d matrices (llama/13919) Diego Devesa 2025-05-30 07:37:18 -07:00
  • f7f92d0aab SYCL: Add mrope kernel (llama/13755) Akarshan Biswas 2025-05-30 19:40:57 +05:30
  • 1893359cfd cmake: Guard GGML_CPU_ALL_VARIANTS by architecture (llama/13890) Christian Kastner 2025-05-30 01:28:54 +02:00
  • ea643c6ae3 arm64: optimize q4_k_q8_k kernel with i8mm (llama/13886) Yibo Cai 2025-05-29 19:39:20 +08:00
  • 1d7b3c79f4 cmake: Factor out CPU architecture detection (llama/13883) Christian Kastner 2025-05-29 12:50:25 +02:00
  • ccfaac2bb0 ggml: aarch64: Implement SVE F32 kernels for Mamba Sequential Scan Algorithm (llama/13882) Vineel Abhinav 2025-05-29 14:48:43 +05:30
  • 1230d37bca ggml: aarch64: Implement SVE F32 kernels for vector functions (llama/13843) Vineel Abhinav 2025-05-29 11:31:33 +05:30
  • 9a500394ad CUDA: fix FA tg at long context for CC >= 8.9 (llama/13852) Johannes Gäßler 2025-05-28 13:33:37 +02:00
  • 0035b8527c CANN: Add SOC TYPE printing in cmake configuration (llama/13837) leo-pony 2025-05-28 11:54:20 +08:00
  • 3623186312 opencl: add new ops - argsort, div, sub, addrows, sigmoid, group_norm (llama/13787) lhez 2025-05-27 12:56:08 -07:00
  • 67beac47f3 opencl: mark mul_mat f32f32 as supporting non-contiguous tensors (llama/13790) lhez 2025-05-27 12:53:14 -07:00
  • 47a19bae25 vulkan: use timestamp queries for GGML_VULKAN_PERF (llama/13817) Jeff Bolz 2025-05-27 11:39:07 -05:00
  • 3d5c7ca4bc SYCL: add gelu_erf kernel (llama/13749) Akarshan Biswas 2025-05-27 20:52:59 +05:30
  • 4dfb2c2215 ggml : add ggml_repeat_4d (llama/13824) Xuan-Son Nguyen 2025-05-27 15:53:55 +02:00
  • ad433403ce vulkan : Remove unexpected ; (ggml/1253) Kai Pastor 2025-05-31 12:49:55 +02:00
  • 4064dd6484 cmake : Fix broken CMake error messages (ggml/1252) Kai Pastor 2025-05-31 12:39:19 +02:00
  • fd75c4995b ggml : remove ggml_graph_import and ggml_graph_export declarations (ggml/1247) Radoslav Gerganov 2025-05-30 09:11:09 +03:00
  • 0251445005
    ruby : add Core ML support (#3214) KITAITI Makoto 2025-06-01 18:16:02 +09:00
  • 98dfe8dc26
    vad : revisit timestamp alignment/mapping (#3173) Daniel Bevenius 2025-05-30 06:28:46 +02:00
  • e5e900dd00
    ruby : handle build options on installation (#3206) KITAITI Makoto 2025-05-30 01:32:49 +09:00
  • 4d18e52f55
    ggml : Fix backtrace breaking Windows build (#3203) Daniel Tang 2025-05-29 06:26:58 -04:00
  • ca890f566f sync : ggml Georgi Gerganov 2025-05-29 09:49:46 +03:00
  • 48dddbbac1 ggml : install dynamic backends (ggml/1240) Radoslav Gerganov 2025-05-29 09:49:27 +03:00
  • 5ea2c37a4c ggml : Print backtrace on uncaught C++ exceptions (ggml/1232) Daniel Tang 2025-05-27 20:58:46 -04:00
  • 73a8c5fb94
    whisper : remove whisper_load_backends function (#3196) Daniel Bevenius 2025-05-29 08:03:17 +02:00
  • 1f5fdbecb4
    ruby : add VAD support, migration to Ruby's newer API (#3197) KITAITI Makoto 2025-05-28 20:05:12 +09:00
  • 5720426d97
    whisper : install shared libs when using GGML_BACKEND_DL (#3195) Simon Booth 2025-05-28 09:15:04 +01:00
  • b9d27b1358
    tests : add a new benchmark test for long-form audio (#3185) Fujimoto Seiji 2025-05-28 14:08:44 +09:00
  • 0ed00d9d30
    ci : update windows-blas uploads action (#3192) Daniel Bevenius 2025-05-27 18:01:31 +02:00
  • 527fe6aaeb sync : fix builds - musa, ruby Georgi Gerganov 2025-05-27 18:02:37 +03:00
  • 26eb48cb08 talk-llama : sync llama.cpp Georgi Gerganov 2025-05-27 17:08:24 +03:00
  • 546928c33f sync : ggml Georgi Gerganov 2025-05-27 17:07:06 +03:00
  • 15ae9dc2a4 ggml : riscv: add xtheadvector support (llama/13720) xctan 2025-05-27 21:21:36 +08:00
  • 2e7a1e3e43 ggml-cpu: x86 feature detection is specific to x86 (llama/13811) Christian Kastner 2025-05-27 13:18:39 +02:00
  • b75babebb2 ggml : allow CUDA graphs when using pipeline parallelism (llama/13814) Diego Devesa 2025-05-27 04:05:18 -07:00
  • cc7a0105ef cuda : avoid cuGetErrorString (llama/13791) Georgi Gerganov 2025-05-26 22:14:52 +03:00
  • 195fde8804 SYCL: Add non contiguous support in RMS_NORM and NORM kernels (llama/13611) Akarshan Biswas 2025-05-26 21:10:36 +05:30
  • 25e27904ca sycl: Add more debug prints (llama/13640) Romain Biessy 2025-05-26 10:28:53 +02:00
  • 474f7be8b6 vulkan: mark IM2COL as supporting non-contig (llama/13783) Jeff Bolz 2025-05-25 23:02:07 -05:00
  • e35fecc2a1 CANN: Add the basic supports of Flash Attention kernel (llama/13627) Bizhao Shi 2025-05-26 10:20:18 +08:00
  • 1cd7028428 SYCL: revert "sycl: simplify bin_bcast_kernel (ggml/13383)" (llama/13752) Akarshan Biswas 2025-05-25 12:38:37 +05:30
  • 99596d6031 ggml-cpu : set openmp wait time if not set (llama/13758) Diego Devesa 2025-05-24 13:26:47 -07:00
  • 2d6c6862f7 ggml : add ggml_gelu_erf() CUDA kernel (llama/13719) Xuan-Son Nguyen 2025-05-24 13:06:47 +02:00
  • f1576b2659 CUDA: fix race condition in FA vector kernels (llama/13742) Johannes Gäßler 2025-05-24 11:46:19 +02:00
  • 994b4f86ab CANN: Support MUL_MAT_ID for q8_0 and q4_0 (llama/13705) Chenguang Li 2025-05-23 16:47:53 +08:00
  • 3e7eaccf55 ggml : fix the order of ggml_unary_op (llama/13718) Xuan-Son Nguyen 2025-05-23 08:12:48 +02:00
  • 191f040414 vulkan: support CPY from any type to itself (llama/13695) Jeff Bolz 2025-05-23 00:45:02 -04:00
  • 2d49d4a9b5 vulkan: Disable coopmat/coopmat2/bfloat extensions if glslc doesn't support it (llama/13696) Jeff Bolz 2025-05-23 00:33:45 -04:00
  • 000d65befb use LOG_WARN to replace std::cerr (llama/13657) Judd 2025-05-23 12:33:08 +08:00
  • f0803e6646 sycl : Remove waits from function calls (llama/13702) Nicolò Scipione 2025-05-22 13:54:43 +02:00
  • 730a00be8a SYCL: Avoid using with SYCL-Graph for unsupported nodes (llama/13587) Ewan Crawford 2025-05-22 09:24:09 +01:00
  • 316600e8ee opencl: Add support for multiple devices (llama/12622) Henry Linjamäki 2025-05-22 02:21:45 +03:00
  • 42f2b3bb65 opencl: fix couple crashes (llama/12795) Henry Linjamäki 2025-05-21 23:21:17 +03:00
  • dd6ef64060 ggml : add ggml_gelu_erf() (llama/13667) Xuan-Son Nguyen 2025-05-21 16:26:33 +02:00
  • 131ee546ca musa: Upgrade MUSA SDK version to rc4.0.1 and use mudnn::Unary::IDENTITY op to accelerate D2D memory copy (llama/13647) R0CKSTAR 2025-05-21 09:58:49 +08:00
  • 4712f7b663 vulkan: fix warnings (llama/13626) Eve 2025-05-20 21:35:16 +00:00
  • 926fe234e9 CUDA: skip fully masked-out KV in FA vec kernel (llama/13584) Johannes Gäßler 2025-05-20 14:45:07 +02:00
  • f44b53480f sycl: disable reorder for sycl mulmat (llama/13536) Svetlozar Georgiev 2025-05-20 10:34:15 +01:00
  • e04e8f1c79 metal : fix typo in FA kernel comments (llama/13651) Georgi Gerganov 2025-05-20 10:41:40 +03:00
  • ee3f177cba sycl : Overcoming workaround for mmap() allocation on Windows (llama/13482) Nicolò Scipione 2025-05-20 02:54:43 +02:00
  • 0b69f74e15 Vulkan: Add f32 accumulator support to quantized mul mat to fix GLM4 32B incoherence (llama/13607) 0cc4m 2025-05-19 17:54:08 +02:00
  • e415db0ed7 sync : ggml Georgi Gerganov 2025-05-27 17:06:49 +03:00
  • 2bb7694edb
    docs : convert README_sycl.md to utf8 format [no ci] (#3191) Daniel Bevenius 2025-05-27 10:53:50 +02:00
  • 450de0787e
    node : enable no_prints to suppress all output (#3189) Daniel Bevenius 2025-05-27 05:51:47 +02:00
  • ea9f206f18
    talk-llama : fix for swedish umlauts + expose model inference settings in talk-llama.cpp (#3187) matteng1 2025-05-26 07:57:39 +02:00
  • 13d92d08ae
    docs : fix VAD section heading levels (#3186) KITAITI Makoto 2025-05-23 17:38:26 +09:00
  • aab6976465
    ci : use dynamic libopenblas.dll for window-blas (#3177) Daniel Bevenius 2025-05-23 05:48:08 +02:00
  • 78b31ca782
    server : Add k6 Load Testing Script (#3175) Sacha Arbonel 2025-05-22 10:03:04 +02:00
  • cbe557f9b1
    docs : add VAD model download instructions [no ci] (#3180) Daniel Bevenius 2025-05-22 07:49:29 +02:00
  • 273af4aab9
    docs : replace typo "]"with ")" in README (#3179) Alpaim 2025-05-22 06:49:44 +03:00
  • bd1cb0c8e3
    whisper : remove redundant assignments (#3178) Daniel Bevenius 2025-05-21 13:23:20 +02:00
  • 62dc8f7d7b
    whisper : update CMakeLists.txt to handle deprecated gpu Warnings (#3163) Jugal Haresh Sheth 2025-05-20 10:58:25 +01:00
  • 2c4b904596
    ruby : add GGML_SYCL_DNN option to ruby bindings (#3172) Daniel Bevenius 2025-05-19 17:59:43 +02:00
  • 6b6cf19c65 talk-llama : sync llama.cpp Georgi Gerganov 2025-05-19 13:39:12 +03:00
  • 05501c218d sync : ggml Georgi Gerganov 2025-05-19 13:38:44 +03:00
  • 9da3fc27be CANN: Support MOE Model MUL_MAT_ID (llama/13042) Chenguang Li 2025-05-19 14:21:17 +08:00
  • 2c13651e08 cmake: use the current build config for vulkan-shaders-gen (llama/13595) Gilad S. 2025-05-17 21:26:43 +03:00
  • 13dca86c56 vulkan: move common FA code to flash_attn_base.comp (llama/13556) Jeff Bolz 2025-05-17 16:14:55 +09:00
  • 6d61a09bc4 vulkan: use scalar FA rather than coopmat2 when N==1 (llama/13554) Jeff Bolz 2025-05-17 15:35:47 +09:00
  • 4fedad988b metal : add FA-vec kernel for head size 64 (llama/13583) Georgi Gerganov 2025-05-16 20:32:58 +03:00
  • a8e17a244d sycl : fixed compilation warnings (llama/13582) Łukasz Ślusarczyk 2025-05-16 12:15:29 +02:00
  • 0c76acd08a gguf : use ggml log system (llama/13571) Diego Devesa 2025-05-15 10:13:11 -07:00
  • 27964db1be sycl: simplify bin_bcast_kernel (llama/13383) Atharva Dubey 2025-05-15 16:39:52 +01:00
  • 8081e7a23d sycl: reordered Q4_K MMVQ (llama/13109) Svetlozar Georgiev 2025-05-15 16:35:44 +01:00
  • d807c497a4 sycl: use oneDNN for matrices multiplication (llama/12972) Łukasz Ślusarczyk 2025-05-15 16:53:41 +02:00
  • 8e9bf548f4 arm64: optimize q6_k_q8_k kernel with i8mm (llama/13519) Yibo Cai 2025-05-15 03:53:52 +08:00
  • 0dda27bc0b CUDA: fix crash on large batch size for quant. MoE (llama/13537) Johannes Gäßler 2025-05-14 16:41:02 +02:00
  • ffa4720f25 CUDA: faster Deepseek FA, add Turing support (llama/13435) Johannes Gäßler 2025-05-14 16:08:20 +02:00