Commit Graph

  • b6f3fa4059
    stream.wasm : add HEAPU8 to exported runtime methods (#3130) master Daniel Bevenius 2025-05-08 16:58:34 +02:00
  • cb2bd11ee8 sync : ggml Georgi Gerganov 2025-05-07 17:45:14 +03:00
  • 09e6b66025 cuda : remove nrows_x in mul_mat_q_process_tile (llama/13325) R0CKSTAR 2025-05-07 15:48:23 +08:00
  • d41cf26a0f CUDA: mix virt/real CUDA archs for GGML_NATIVE=OFF (llama/13135) Johannes Gäßler 2025-05-06 23:35:51 +02:00
  • 3c67195be9 SYCL: Disable reorder optimize by default and stop setting tensor extras when optimize is disabled (llama/13254) Akarshan Biswas 2025-05-06 20:27:06 +05:30
  • f9f78a773f CUDA: fix bad asserts for partial offload (llama/13337) Johannes Gäßler 2025-05-06 13:58:51 +02:00
  • be55e25cac CUDA: fix --split-mode row for MMQ (llama/13323) Johannes Gäßler 2025-05-06 08:36:46 +02:00
  • 2ffdda99e8 CUDA: fix logic for clearing padding with -ngl 0 (llama/13320) Johannes Gäßler 2025-05-05 22:32:13 +02:00
  • 9bbedc51cc SYCL: Disable mul_mat kernels for noncontiguous tensor b (llama/13308) Akarshan Biswas 2025-05-05 13:39:10 +05:30
  • 1e1fa27add rpc : use backend registry, support dl backends (llama/13304) Diego Devesa 2025-05-04 21:25:43 +02:00
  • e1bdd148c5 ggml : activate s390x simd for Q3_K (llama/13301) Aaron Teo 2025-05-05 01:49:12 +08:00
  • 7fa8bb303f CUDA: fix race condition in MMQ stream-k fixup (llama/13299) Johannes Gäßler 2025-05-04 14:16:39 +02:00
  • 7564f5e6f1 CUDA: fix race condition in MMQ ids_dst (llama/13294) Johannes Gäßler 2025-05-04 13:58:38 +02:00
  • 22ba2e27ce vulkan: Additional type support for unary, binary, and copy (llama/13266) Jeff Bolz 2025-05-04 00:17:16 -05:00
  • 0676b2dab2
    ci : add bindings-java jar artifact to release (#3126) Daniel Bevenius 2025-05-07 16:26:54 +02:00
  • 4a512cb153 cli : avoid std::exchange Georgi Gerganov 2025-05-07 13:22:47 +03:00
  • 76171ce199 sync : ggml Georgi Gerganov 2025-05-07 13:17:48 +03:00
  • 5eac2a3fbb vulkan : fix lint (llama/0) Georgi Gerganov 2025-05-02 20:57:07 +03:00
  • 42938398f9 ggml : Enable MMA for BF16 in llamafile_sgemm (llama/13148) shalinib-ibm 2025-05-02 22:23:12 +05:30
  • a8fe90ae15 rpc : avoid uninitialized memory in serialize_tensor (llama/13210) Justin Santa Barbara 2025-05-01 17:32:11 -04:00
  • c5a5a2da5b ggml: Don't assert fail when tensor data changes (llama/13222) Jesse Gross 2025-05-01 13:46:10 -07:00
  • 8316bfd82b build : fix build info on windows (llama/13239) Diego Devesa 2025-05-01 21:48:08 +02:00
  • fd1cb9fc12 vulkan: Add bfloat16 support (llama/12554) Jeff Bolz 2025-05-01 13:49:39 -05:00
  • 17f6b8225e vulkan: Handle src1 batch dimension in non-contiguous mat-vec-mul shader (llama/13191) Jeff Bolz 2025-05-01 13:19:31 -05:00
  • 6374ea32ca vulkan : kernels for depthwise 2D convolution (CONV_2D_DW) (ggml/1204) Acly 2025-05-02 18:02:34 +02:00
  • 3a66f9f248
    ci : zip windows artifacts for release uploading (#3124) Daniel Bevenius 2025-05-07 13:12:08 +02:00
  • 0055356fbc
    cli : avoid std::exchange sync-ggml-25-05-07 Georgi Gerganov 2025-05-07 13:22:47 +03:00
  • eeaa1cd035
    sync : ggml Georgi Gerganov 2025-05-07 13:17:48 +03:00
  • a652c8bf72
    vulkan : fix lint (llama/0) Georgi Gerganov 2025-05-02 20:57:07 +03:00
  • 0630539c8a
    ggml : Enable MMA for BF16 in llamafile_sgemm (llama/13148) shalinib-ibm 2025-05-02 22:23:12 +05:30
  • a7988d76db
    rpc : avoid uninitialized memory in serialize_tensor (llama/13210) Justin Santa Barbara 2025-05-01 17:32:11 -04:00
  • 37ac0264ef
    ggml: Don't assert fail when tensor data changes (llama/13222) Jesse Gross 2025-05-01 13:46:10 -07:00
  • 5a9ccde7da
    build : fix build info on windows (llama/13239) Diego Devesa 2025-05-01 21:48:08 +02:00
  • cde0e50536
    vulkan: Add bfloat16 support (llama/12554) Jeff Bolz 2025-05-01 13:49:39 -05:00
  • df458380d6
    vulkan: Handle src1 batch dimension in non-contiguous mat-vec-mul shader (llama/13191) Jeff Bolz 2025-05-01 13:19:31 -05:00
  • 87b88ed01c
    vulkan : kernels for depthwise 2D convolution (CONV_2D_DW) (ggml/1204) Acly 2025-05-02 18:02:34 +02:00
  • 9b584b0cc0
    ci : add zip extension to xcframework artifact name (#3120) Daniel Bevenius 2025-05-07 12:02:29 +02:00
  • 09846f4e12
    whisper: remove MSVC warnings pragmas (#3090) Daniel Bevenius 2025-05-05 13:09:35 +02:00
  • bcf1ed0163
    server: update abort mechanism to handle HTTP connection closure (#3112) Sacha Arbonel 2025-05-05 07:16:54 +02:00
  • 934d4b3083
    cli : support "-" for stdout like stdin (#3050) Daniel Tang 2025-05-05 01:15:39 -04:00
  • 988dcd4b5b
    docs : Update cli documentation (#3102) Arpit Jain 2025-05-02 20:18:33 +08:00
  • 9f540ad8cb
    cmake : removed stdc++fs (#3097) Jared Tweed 2025-05-02 02:41:35 -07:00
  • 1fa17bc752
    server : update httplib.h to version 0.20.0 (#3101) Sacha Arbonel 2025-05-02 06:09:41 +02:00
  • 366082d072
    ruby : refine HTTP cache feature (#3109) KITAITI Makoto 2025-05-01 23:04:53 +09:00
  • 0778b6ff5f talk-llama : sync llama.cpp Georgi Gerganov 2025-05-01 10:43:30 +03:00
  • 5cd59c9396 sync : ggml Georgi Gerganov 2025-05-01 10:42:48 +03:00
  • d052e64d42 CUDA: batched+noncont MMQ, refactor bs>1 MoE code (llama/13199) Johannes Gäßler 2025-04-30 23:12:59 +02:00
  • 780750a108 vulkan: use uint array index to avoid glslang bug (llama/13193) Jeff Bolz 2025-04-30 07:38:37 -05:00
  • 919c78e618 ggml : fix ppc64le build (llama/13176) shalinib-ibm 2025-04-30 16:47:08 +05:30
  • dc288f84cd feat(ggml-cpu): enable z17 compile (llama/13182) Aaron Teo 2025-04-30 17:47:35 +08:00
  • 1543a3600c CUDA: fix non-cont. inputs for batched mat mul (llama/13155) Johannes Gäßler 2025-04-29 16:00:27 +02:00
  • 4872355f6e fix(rpc): Improve input validation and error handling (llama/13069) Ville Vesilehto 2025-04-28 21:00:20 +03:00
  • 1a76e97c28 SYCL: Add all missing unary kernels (llama/13074) Akarshan Biswas 2025-04-28 15:03:25 +05:30
  • 7017c1d37d musa: fix typo in cc control (llama/13144) R0CKSTAR 2025-04-28 15:33:28 +08:00
  • 670bf02662 CUDA: fix q_nope_absorbed prec for DS 2 Lite f16 (llama/13137) Johannes Gäßler 2025-04-28 09:29:26 +02:00
  • 9fff2f751c musa: fix build warning (llama/13129) R0CKSTAR 2025-04-27 19:22:49 +08:00
  • 46392f733f ggml: move fp16/bf16 conversion optimizations to CPU backend + export conversion APIs (llama/13107) SXX 2025-04-26 22:05:31 +08:00
  • eeb259909e change the reorder tensor from init to execute OP (llama/13003) Neo Zhang Jianyu 2025-04-25 17:37:51 +08:00
  • fe21ddf0dc rpc : do not wait for response when sending RPC_CMD_SET_TENSOR (llama/12943) Radoslav Gerganov 2025-04-25 10:08:08 +03:00
  • 33bdbfbb33 ggml : fix ggml_gallocr_ptr type (ggml/1205) Diego Devesa 2025-04-30 15:20:40 +02:00
  • 0f49edf0f3
    whisper : add check that target name exists (#3103) Daniel Bevenius 2025-05-01 10:05:24 +02:00
  • 25efcfe3ed
    server : add --no-gpu option to print usage output (#3098) Daniel Bevenius 2025-05-01 08:15:12 +02:00
  • edbd4cb7f5
    ruby : ignore "Downloading" output in test_log_suppress (#3106) Daniel Bevenius 2025-05-01 08:12:48 +02:00
  • 3ae9b8416a
    make : fix samples glob pattern (#3100) Georgi Gerganov 2025-04-30 14:21:51 +03:00
  • 10acc21fa3
    make : fix samples glob pattern gg/make-fix-glob Georgi Gerganov 2025-04-30 14:20:50 +03:00
  • 55d73a13f5
    ggml : suppress Windows compiler warnings (#3075) Daniel Bevenius 2025-04-29 15:47:55 +02:00
  • 2e30e6df59
    whisper : fix grammar advance stack warning (#3087) Daniel Bevenius 2025-04-28 19:11:38 +02:00
  • f0171f0616
    examples : expose language detection probabilities to server example (#3044) Sacha Arbonel 2025-04-28 18:25:45 +02:00
  • b7db9e7aac
    whisper : remove empty .gitmodules file [no ci] (#3085) Daniel Bevenius 2025-04-28 15:52:05 +02:00
  • f3c42399a3
    talk-llama : sync llama.cpp (#3084) Georgi Gerganov 2025-04-28 16:40:23 +03:00
  • 28dcdff4c5
    ci : disable publishing of java binding [no ci] (#3086) Daniel Bevenius 2025-04-28 15:38:52 +02:00
  • 50218b935d
    build : Add Moore Threads GPU support and update GitHub workflow for MUSA build (#3069) R0CKSTAR 2025-04-28 16:06:41 +08:00
  • f9b2dfdd8c
    examples : fix deprecated FFmpeg functions (#3073) Pedro 2025-04-28 01:16:50 -03:00
  • 50fda73f4c
    ruby : add encoder begin callback related methods (#3076) KITAITI Makoto 2025-04-26 04:33:11 +09:00
  • 1c20f46887
    ci : enable bindings java job (#3070) Daniel Bevenius 2025-04-25 14:56:06 +02:00
  • adaea088bc ruby : add cmake option (#0) Georgi Gerganov 2025-04-24 20:38:43 +03:00
  • 6c0d843f9d cuda : fix unused variable compile warning (#0) Georgi Gerganov 2025-04-24 18:59:06 +03:00
  • efb800557f sync : ggml Georgi Gerganov 2025-04-24 18:41:48 +03:00
  • 337becefb9 opencl : remove obsolete files (skip) (ggml/1200) Georgi Gerganov 2025-04-24 18:41:17 +03:00
  • 11ae30c19e sync : ggml Georgi Gerganov 2025-04-24 18:41:36 +03:00
  • 88c3cecd43 opencl: split ggml-opencl.cl into multiple files and cleanup (llama/12886) lhez 2025-04-24 17:46:49 +03:00
  • fe4acb33e3 ggml : fix trailing whitespaces (llama/0) Georgi Gerganov 2025-04-24 17:22:27 +03:00
  • fd5a3e1bc6 CUDA: use switch statements in constexpr functions (llama/13095) Johannes Gäßler 2025-04-24 15:57:10 +02:00
  • 01e1600edd metal : fix floating-point range of attention scores in FA kernels (llama/13090) Georgi Gerganov 2025-04-24 10:38:30 +03:00
  • cf3eb291ab vulkan: matmul gcn tuning (llama/13016) Eve 2025-04-24 07:18:33 +00:00
  • 3d54b68ea7 CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID (llama/13014) Johannes Gäßler 2025-04-22 21:27:40 +02:00
  • 11218294db ggml : add SSE 4.2 and x64 base variant for CPUs without AVX (llama/12871) Diego Devesa 2025-04-21 18:13:51 +02:00
  • 33c89ade7d SYCL: Add non-contiguous support in ROPE (llama/12993) Akarshan Biswas 2025-04-21 19:13:30 +05:30
  • 27a56e7243 vulkan: support noncontiguous rms_norm (llama/13031) Jeff Bolz 2025-04-20 03:50:02 -05:00
  • f4ca3e2f9c metal: add neg operator (llama/13029) Jeffrey Morgan 2025-04-19 22:28:40 -07:00
  • 0287a5c51b SYCL: Refactor and enable FP16 in binary broadcast OPs (llama/12975) Akarshan Biswas 2025-04-18 19:27:56 +05:30
  • 24d29c55df rpc : add RPC_CMD_HELLO (llama/12955) Radoslav Gerganov 2025-04-18 10:13:42 +03:00
  • 36019c35a3 graph : make FA compatible with MLA + add initial Metal kernels (llama/12953) Georgi Gerganov 2025-04-17 18:16:36 +03:00
  • 4e936e2afa ggml: Re-enable CUDA graphs in presence of CONT and DUP nodes (llama/12970) Alan Gray 2025-04-17 14:19:42 +01:00
  • 314ce5981e CANN: Add support for async operator submission (llama/12864) hipudding 2025-04-17 20:34:16 +08:00
  • cb7642b0f5 opencl: fix incorrect local_size index in profiling log (llama/12868) kimminsu 2025-04-17 06:25:57 +09:00
  • 7db8f278f0 vulkan: enable coopmat2 FA gqa and split_k optimizations more often (llama/12931) Jeff Bolz 2025-04-16 13:37:25 -05:00
  • be42a19eab CANN: Add 310P operator support check (llama/12962) Chenguang Li 2025-04-16 16:21:05 +08:00
  • b8755670ca metal : add FA-vec kernels for head size 96 (llama/12952) Georgi Gerganov 2025-04-15 14:45:05 +03:00
  • 483eecae62 CANN: Add x86 build ci (llama/12950) hipudding 2025-04-15 19:08:55 +08:00