Georgi Gerganov
d111a0987e
ggml : adjust is_first_call init value (llama/10193)
...
ggml-ci
2024-11-15 15:21:04 +02:00
Georgi Gerganov
915bcd2c63
metal : add quantized FA support (llama/10149)
...
* metal : add quantized FA (vec) support
ggml-ci
* metal : add quantized FA (non-vec) support
* metal : fix support check
ggml-ci
* metal : clean-up
* metal : clean-up (cont)
* metal : fix shared memory calc + reduce smem + comments
* metal : float-correctness
* metal : minor [no ci]
2024-11-15 15:21:04 +02:00
Diego Devesa
f69c8b6f1b
ggml : fix arch check in bf16_to_fp32 (llama/10164)
2024-11-15 15:21:04 +02:00
Eve
8c9044bef0
Q6_K AVX improvements (llama/10118)
...
* q6_k instruction reordering attempt
* better subtract method
* should be theoretically faster
small improvement with shuffle lut, likely because all loads are already done at that stage
* optimize bit fiddling
* handle -32 offset separately. bsums exists for a reason!
* use shift
* Update ggml-quants.c
* have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86
2024-11-15 15:21:04 +02:00
Diego Devesa
5f8e928194
ggml : fix gelu tables initialization (llama/10172)
2024-11-15 15:21:04 +02:00
Diego Devesa
25da30bd60
ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (llama/10167)
2024-11-15 15:21:04 +02:00
snadampal
542734100e
fix build break on arm64 linux (llama/10166)
...
This fixes the build break from the recent changes
to move the CPU backend to separate files
https://github.com/ggerganov/llama.cpp/pull/10144
2024-11-15 15:21:04 +02:00
Diego Devesa
b06b4c0c08
cuda : clear error after changing peer access (llama/10153)
2024-11-15 15:21:04 +02:00
Georgi Gerganov
939d36fb4c
metal : simplify f16 and f32 dequant kernels (llama/0)
2024-11-15 15:21:04 +02:00
Georgi Gerganov
1471e41180
metal : move dequantize templates to beginning of MSL source (llama/0)
2024-11-15 15:21:04 +02:00
leo-pony
35949192e9
CANN: adjust backend registry refactor. (llama/10158)
...
remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR.
2024-11-15 15:21:04 +02:00
Diego Devesa
9c817edb48
ggml : move CPU backend to a separate file (llama/10144)
2024-11-15 15:21:04 +02:00
Georgi Gerganov
24a0feb5d9
metal : minor fixup in FA kernel (llama/10143)
...
* metal : minor fixup in FA kernel
ggml-ci
* metal : use the unrolled loop variable
* metal : remove unused var
2024-11-15 15:21:04 +02:00
Diego Devesa
2ab8cce7e3
llama : add simple-chat example (llama/10124)
...
* llama : add simple-chat example
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
2024-11-15 15:21:04 +02:00
Diego Devesa
b40c255e98
llama : use smart pointers for ggml resources (llama/10117)
2024-11-15 15:21:04 +02:00
Shupei Fan
ec3e16445e
vulkan : improve ggml_vk_create_buffer error handling (llama/9898)
2024-11-15 15:21:04 +02:00
Georgi Gerganov
0665168ef3
ggml : remove ggml_scratch (llama/10121)
...
ggml-ci
2024-11-15 15:21:04 +02:00
Zhenwei Jin
5f6b992eea
build: fix build error in Windows env with OneAPI setup (llama/10107)
2024-11-15 15:21:04 +02:00
Diego Devesa
3e231ab9cc
llama : fix buffer checks for mamba and rwk (llama/10111)
...
* llama : fix buffer checks for mamba and rwk
* llama : fix missing worst case flag during reserve
* cuda : fix supports_op for norm
* disable sched SET_CAUSE
2024-11-15 15:21:04 +02:00
Diego Devesa
371bfaca8c
ggml : check tensor name lengths in gguf files (llama/10100)
2024-11-15 15:21:04 +02:00
Sergio López
91e30a3a23
kompute: add mul_mat_q4_k shader (llama/10097)
...
This is a more or less direct translation from the Metal implementation
to GLSL.
Signed-off-by: Sergio Lopez <slp@redhat.com>
2024-11-15 15:21:04 +02:00
Sergio López
1e122d66f9
kompute: add backend registry / device interfaces (llama/10045)
...
Get in line with the other backends by supporting the newer
backend/device registry interfaces.
Signed-off-by: Sergio Lopez <slp@redhat.com>
2024-11-15 15:21:04 +02:00
Diego Devesa
63a4e09a0f
ggml : fix memory leaks when loading invalid gguf files (llama/10094)
...
* ggml : fix gguf string leak when reading kv pairs fails
* ggml : avoid crashing with GGML_ABORT when the KV has an invalid type
* ggml : avoid crashing on failed memory allocations when loading a gguf file
2024-11-15 15:21:04 +02:00
xctan
75dd198870
ggml : add Q4_0_8_8 RISC-V GEMV and GEMM kernels (llama/10029)
...
* ggml : RISC-V vector gemv for q4_0_8x8
* ggml : Added WIP rvv q4_0_8x8 gemm
* ggml : Added initial implementation of rvv gemm
* ggml : optimize gemm to avoid register spillover
* ggml : Fix GCC rvv load alignment issue
* ggml : Format gemm rvv code
* ggml : Fix a typo in RVV q4_0_8_8 GEMM
2024-11-15 15:21:04 +02:00
Diego Devesa
1d48457aa6
llama : refactor model loader with backend registry (llama/10026)
2024-11-15 15:21:04 +02:00
Changyeon Kim
307712a903
ggml: Add POOL2D OP for GPU acceleration to the Vulkan backend in the MobileVLM model. (llama/9763)
...
* ggml: Add POOL2D OP for GPU ACC to the Vulkan.
- The MobileVLM model now supports inference acceleration through GPU by utilizing the Vulkan backend.
- A GGML_OP_POOL_2D shader has been added. (Pooling)
- The encoding performance of the CLIP model improved from 2.8s on the CPU to 0.7s on the GPU.
Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>
* [fix] Correct the incorrect order of the parameters.
fix casting to int.
Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>
---------
Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>
2024-11-15 15:21:04 +02:00
R0CKSTAR
fbc9a05ddf
musa: workaround for Guilty Lockup in cleaning src0 (llama/10042)
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-11-15 15:21:04 +02:00
Yuri Khrustalev
28496ac55e
cmake : make it possible linking ggml as external lib (ggml/1003)
2024-11-15 15:21:04 +02:00
Plamen Minev
b1c06c09b0
metal : fix minor string leaks (ggml/1004)
2024-11-15 15:21:04 +02:00
Georgi Gerganov
498ac0dc27
scripts : update sync
2024-11-15 15:21:04 +02:00
Raiya Araki
03af461de8
ci : fix building workflow for linux/arm64 container ( #2555 )
2024-11-15 11:07:17 +02:00
KITAITI Makoto
f19463ece2
ruby : extend API ( #2551 )
...
* Handle objs in Ruby code
* Add task to make Makefile
* Share commont constance in test suites
* Add model-related APIs
* Add Whisper::Model class
* Add tests for Whisper::Model
* Add missing LDFLAG -lstdc++
* Add tests for Whisper.log_set
* Add Whisper.set_log
* Define log level
* Add document on logging
* Add license section to README
* Add document on Whisper::Model
* Fix examples in README
* Add test for Model with GC
* Make dependency on Makefile more accurate
* Fix bug about Whisper::Model and GC
2024-11-13 21:52:56 +02:00
Jhen-Jie Hong
5f8a086e22
whisper.swiftui : add model download list & bench methods ( #2546 )
...
* swift : fix resources & exclude build
* whisper : impl whisper_timings struct & api
* whisper.swiftui : model list & bench methods
* whisper : return ptr for whisper_get_timings
* revert unnecessary change
* whisper : avoid designated initializer
* whisper.swiftui: code style changes
* whisper.swiftui : get device name / os from UIDevice
* whisper.swiftui : fix UIDevice usage
* whisper.swiftui : add memcpy and ggml_mul_mat (commented)
2024-11-13 21:51:34 +02:00
Wilson Silva
a28d82e373
ruby : fix the instructions ( #2548 )
...
#prompt doesn't exist but #initial_prompt does
2024-11-13 21:47:42 +02:00
thewh1teagle
5ccca19f0c
ggml : vulkan logs ( #2547 )
2024-11-13 21:47:15 +02:00
Stefan Sydow
300c07b94d
examples : fix ffmpeg v5 build ( #2543 )
...
remove call to 'av_register_all()' which does not exist in ffmpeg v5
anymore.
2024-11-13 21:41:52 +02:00
Vin Misra
31aea563a8
whisper : fix extra memory usage ( #2534 )
...
* passing samples_padded by ref to the threads.
* passing samples_padded by ref to the threads.
---------
Co-authored-by: Vinith Misra <physicsdemon@gmail.com>
2024-11-06 23:02:11 +02:00
Georgi Gerganov
0377596b77
whisper : backend registry init before model load
2024-11-01 10:19:05 +02:00
Georgi Gerganov
c65d0fd3c8
talk-llama : sync llama.cpp
2024-11-01 10:19:05 +02:00
Georgi Gerganov
d9efb664ac
sync : ggml
2024-11-01 10:19:05 +02:00
Ma Mingfei
b5b4b0f5de
ggml : add AMX backend (llama/8998)
2024-11-01 10:19:05 +02:00
Georgi Gerganov
ab36d02560
metal : support permuted matrix multiplicaions (llama/10033)
...
* metal : support permuted matrix multiplicaions
ggml-ci
* cont : use nb01 directly for row steps
ggml-ci
* cont : add comments [no ci]
* metal : minor refactor
* metal : minor
2024-11-01 10:19:05 +02:00
Johannes Gäßler
6e67749c00
CUDA: fix insufficient buffer clearing for MMQ (llama/10032)
2024-11-01 10:19:05 +02:00
Johannes Gäßler
ab0385f43b
CUDA: fix MMQ for non-contiguous src0, add tests (llama/10021)
...
* CUDA: fix MMQ for non-contiguous src0, add tests
* revise test code
2024-11-01 10:19:05 +02:00
bssrdf
10eb603a3c
increase cuda_cpy block size (ggml/996)
...
Co-authored-by: bssrdf <bssrdf@gmail.com>
2024-11-01 10:19:05 +02:00
Jun Hee Yoo
a3231b2f2e
metal : add POOL2D and fix IM2COL (llama/9943)
...
* add pool_2d
Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>
* fix im2col and add unittest for N>=1024
Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>
* add tests for N % 1024 != 0
Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>
* remove trailing whitespaces
Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>
* apply suggestions
Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>
* apply more optimization
- original IM2COL kernel + _ext with MIN()
Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>
* apply review: change kernel name of pool_2d
Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>
* apply review
Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>
* fix more formatting and enhance readability
Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>
---------
Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>
2024-11-01 10:19:05 +02:00
leo-pony
13db492f83
Adapt to dynamically loadable backends mechanism (llama/9970)
...
* [CANN] Adapt to dynamically loadable backends mechanism
* Fix the Bug: inference running result is garbled in debug running model for LM models who's type is Q4_0 class
* Handle the review comments of this pull request
2024-11-01 10:19:05 +02:00
Georgi Gerganov
741c138aa1
ggml : add asserts for type conversion in fattn kernels (llama/9971)
...
ggml-ci
2024-11-01 10:19:05 +02:00
Radoslav Gerganov
25f9fee6fb
rpc : pack only RPC structs (llama/9959)
2024-11-01 10:19:05 +02:00
Neo Zhang Jianyu
7c1570bee6
fix mul_mat_vec_q and *_vec_q error (llama/9939)
...
Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
2024-11-01 10:19:05 +02:00