Commit Graph

377 Commits

Author SHA1 Message Date
Alberto Cabrera Pérez
eee2fe882e sycl : fix powf call in device code (llama/8368) 2024-08-08 22:48:46 +03:00
Mahesh Madhav
0d1a11e5e2 ggml : loop tiling optimizations for scalar path (ggml/898)
Apply a loop tiling technique to the generic path, which provides
performance upside for ISAs with enough registers to take advantage
of it. Also helps the compiler optimize this path.
2024-08-08 22:48:46 +03:00
Ivan Filipov
b2ead7d6f4 ggml: add support for float16 input tensors in pooling operations (ggml/895)
* Add support for float16 tensors in 1d pooling operations

* Add support for float16 input tensors in 2d pooling operations

* code cleanup

remove unnecessary casting during srow ptr initialization

---------

Co-authored-by: vanaka11 <vanaka1189@gmail.com>
2024-08-08 22:48:46 +03:00
Tony Wasserka
8da6fd4dff vulkan : initialize vk_buffer_struct members to VK_NULL_HANDLE (ggml/893)
This prevents invalid frees when destroying a partially initialized
vk_buffer_struct. For example, this could happen in ggml_vk_create_buffer
when running out of device memory.

Co-authored-by: Tony Wasserka <neobrain@users.noreply.github.com>
2024-08-08 22:48:46 +03:00
Borislav Stanimirov
ab8ec9e940 cmake : only enable GGML_NATIVE and x86 flags if not crosscompiling (ggml/885) 2024-08-08 22:48:46 +03:00
Matt Stephenson
f68298ce06
whisper : use vulkan as gpu backend when available (#2302)
* ggml: use vulkan as gpu backend when available

Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com>

* whisper: enable using vk as default buffer type

Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com>

---------

Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com>
2024-07-16 10:21:09 +03:00
Georgi Gerganov
49868aa851 ggml : sync sycl (skip) (#0) 2024-07-08 14:53:55 +03:00
Daniel Bevenius
95f2a191c0 ggml : remove unnecessary UNUSED macro call (ggml/880)
This commit removes an UNUSED macro call that is not needed as the
variable n0 is used in the code and will not produce a warning.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-07-08 14:53:55 +03:00
Natsu
00422ec3cf cmake : add GGML_BUILD and GGML_SHARED macro definitions (llama/8281) 2024-07-08 14:53:55 +03:00
Ouadie EL FAROUKI
c5b05321e9 Enabled more data types for oneMKL gemm_batch (llama/8236) 2024-07-08 14:53:55 +03:00
Johannes Gäßler
5dc636a65a CUDA: MMQ support for iq4_nl, iq4_xs (llama/8278) 2024-07-08 14:53:55 +03:00
Daniele
73703a144f CUDA: revert part of the RDNA1 optimizations (llama/8309)
The change on the launch_bounds was causing a small performance drop in perplexity of 25 t/s
2024-07-08 14:53:55 +03:00
Johannes Gäßler
e89fdceec2 CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 (llama/8311) 2024-07-08 14:53:55 +03:00
luoyu-intel
29a2739d27 Fix WARP_SIZE=16 bug of Intel GPU (llama/8266)
* fix group_norm ut

* split softmax

* fix softmax

* add concat support condition

* revert debug code

* move QK_WARP_SIZE to presets.hpp
2024-07-08 14:53:55 +03:00
Neo Zhang Jianyu
ee6d17f6b4 rm get_work_group_size() by local cache for performance (llama/8286)
Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
2024-07-08 14:53:55 +03:00
Daniele
95e90823d9 Define and optimize RDNA1 (llama/8085) 2024-07-08 14:53:55 +03:00
Judd
005cc45df3 fix typo (llama/8267)
Co-authored-by: Judd <foldl@boxvest.com>
2024-07-08 14:53:55 +03:00
Clint Herron
c2c60dc9ba Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (llama/8258) 2024-07-08 14:53:55 +03:00
slaren
4af3194b7c cuda : update supports_op for matrix multiplication (llama/8245) 2024-07-08 14:53:55 +03:00
luoyu-intel
4a2ba1a065 Fix win build conflict of math library (llama/8230)
* fix win build conflict of math library

* fix the condition: !(win32 & SYCL)

* revert warp_size=16
2024-07-08 14:53:55 +03:00
luoyu-intel
f096cc6807 Fix the sub group size of Intel (llama/8106)
* use warp_size macro for all sycl kernels

* fix mask of permute_sub_group_by_xor

* fix rms_norm with correct warp number

* fix rms_norm_f32/group_norm_f32

* move norm to norm.cpp file

* fix quantize bug

* fix mmvq's batch size
2024-07-08 14:53:55 +03:00
Johannes Gäßler
e4bc83ab47 CUDA: refactor and optimize IQ MMVQ (llama/8215)
* CUDA: refactor and optimize IQ MMVQ

* uint -> uint32_t

* __dp4a -> ggml_cuda_dp4a

* remove MIN_CC_DP4A checks

* change default

* try CI fix
2024-07-08 14:53:55 +03:00
zhentaoyu
db7e0dbe6e Update SYCL-Rope op and Refactor (llama/8157)
* align with rope.cu and move sycl-op to a single file
2024-07-08 14:53:55 +03:00
Johannes Gäßler
bf88c94da9 CUDA: fix MMQ stream-k for --split-mode row (llama/8167) 2024-07-08 14:53:55 +03:00
John Balis
3eea171cab feat: cuda implementation for ggml_conv_transpose_1d (ggml/854)
* conv transpose 1d passing test for 1d input and kernel

* working for different input and output channel counts, added test for variable stride

* initial draft appears to work with stride other than 1

* working with all old and new conv1d  tests

* added a test for large tensors

* removed use cuda hardcoding

* restored test-conv-transpose.c

* removed unused arugments, and fixed bug where test failure would cause subsequent tests to fail

* fixed accumulator bug

* added test to test-backend-ops

* fixed mistake

* addressed review

* fixed includes

* removed blank lines

* style and warning fixes

* return failure when test fails

* fix supports_op

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-07-08 14:53:55 +03:00
slaren
04e7fa6f4f
ggml : add GGML_CUDA_USE_GRAPHS option, restore GGML_CUDA_FORCE_CUBLAS (cmake) (llama/8140) 2024-06-26 23:18:11 +03:00
Georgi Gerganov
e30c679928
whisper : reorganize source code + improve CMake (#2256)
* scripts : update sync [no ci]

* files : reorganize [no ci]

* sync : llama.cpp

* cmake : link math library

* cmake : build normal ggml library

* files : move headers to include

* objc : fix path to ggml-metal.h

* ci : fix WHISPER_CUDA -> GGML_CUDA

* scripts : sync LICENSE [no ci]
2024-06-26 19:34:09 +03:00