Georgi Gerganov
8ecb2f1f68
cmake : remove install of llama convert script [no ci] ( #2266 )
2024-07-08 14:53:55 +03:00
Georgi Gerganov
5226c3d45c
make : remove llama prints [no ci] ( #2265 )
2024-07-08 14:53:55 +03:00
Georgi Gerganov
dbf9c15e30
talk-llama : sync llama.cpp
2024-07-08 14:53:55 +03:00
Georgi Gerganov
d3f6c34976
examples : fix compile warnings [no ci] ( #0 )
2024-07-08 14:53:55 +03:00
Georgi Gerganov
425e2910a3
sync : ggml
2024-07-08 14:53:55 +03:00
Georgi Gerganov
49868aa851
ggml : sync sycl (skip) ( #0 )
2024-07-08 14:53:55 +03:00
Georgi Gerganov
ff08e30ab5
scripts : fix sync scripts
2024-07-08 14:53:55 +03:00
Daniel Bevenius
95f2a191c0
ggml : remove unnecessary UNUSED macro call (ggml/880)
...
This commit removes an UNUSED macro call that is not needed as the
variable n0 is used in the code and will not produce a warning.
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-07-08 14:53:55 +03:00
Natsu
00422ec3cf
cmake : add GGML_BUILD and GGML_SHARED macro definitions (llama/8281)
2024-07-08 14:53:55 +03:00
Ouadie EL FAROUKI
c5b05321e9
Enabled more data types for oneMKL gemm_batch (llama/8236)
2024-07-08 14:53:55 +03:00
Johannes Gäßler
5dc636a65a
CUDA: MMQ support for iq4_nl, iq4_xs (llama/8278)
2024-07-08 14:53:55 +03:00
Daniele
73703a144f
CUDA: revert part of the RDNA1 optimizations (llama/8309)
...
The change on the launch_bounds was causing a small performance drop in perplexity of 25 t/s
2024-07-08 14:53:55 +03:00
Johannes Gäßler
e89fdceec2
CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 (llama/8311)
2024-07-08 14:53:55 +03:00
luoyu-intel
29a2739d27
Fix WARP_SIZE=16 bug of Intel GPU (llama/8266)
...
* fix group_norm ut
* split softmax
* fix softmax
* add concat support condition
* revert debug code
* move QK_WARP_SIZE to presets.hpp
2024-07-08 14:53:55 +03:00
Neo Zhang Jianyu
ee6d17f6b4
rm get_work_group_size() by local cache for performance (llama/8286)
...
Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
2024-07-08 14:53:55 +03:00
Daniele
95e90823d9
Define and optimize RDNA1 (llama/8085)
2024-07-08 14:53:55 +03:00
Judd
005cc45df3
fix typo (llama/8267)
...
Co-authored-by: Judd <foldl@boxvest.com>
2024-07-08 14:53:55 +03:00
Clint Herron
c2c60dc9ba
Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (llama/8258)
2024-07-08 14:53:55 +03:00
slaren
4af3194b7c
cuda : update supports_op for matrix multiplication (llama/8245)
2024-07-08 14:53:55 +03:00
luoyu-intel
4a2ba1a065
Fix win build conflict of math library (llama/8230)
...
* fix win build conflict of math library
* fix the condition: !(win32 & SYCL)
* revert warp_size=16
2024-07-08 14:53:55 +03:00
luoyu-intel
f096cc6807
Fix the sub group size of Intel (llama/8106)
...
* use warp_size macro for all sycl kernels
* fix mask of permute_sub_group_by_xor
* fix rms_norm with correct warp number
* fix rms_norm_f32/group_norm_f32
* move norm to norm.cpp file
* fix quantize bug
* fix mmvq's batch size
2024-07-08 14:53:55 +03:00
Johannes Gäßler
e4bc83ab47
CUDA: refactor and optimize IQ MMVQ (llama/8215)
...
* CUDA: refactor and optimize IQ MMVQ
* uint -> uint32_t
* __dp4a -> ggml_cuda_dp4a
* remove MIN_CC_DP4A checks
* change default
* try CI fix
2024-07-08 14:53:55 +03:00
zhentaoyu
db7e0dbe6e
Update SYCL-Rope op and Refactor (llama/8157)
...
* align with rope.cu and move sycl-op to a single file
2024-07-08 14:53:55 +03:00
Johannes Gäßler
bf88c94da9
CUDA: fix MMQ stream-k for --split-mode row (llama/8167)
2024-07-08 14:53:55 +03:00
John Balis
3eea171cab
feat: cuda implementation for ggml_conv_transpose_1d
(ggml/854)
...
* conv transpose 1d passing test for 1d input and kernel
* working for different input and output channel counts, added test for variable stride
* initial draft appears to work with stride other than 1
* working with all old and new conv1d tests
* added a test for large tensors
* removed use cuda hardcoding
* restored test-conv-transpose.c
* removed unused arugments, and fixed bug where test failure would cause subsequent tests to fail
* fixed accumulator bug
* added test to test-backend-ops
* fixed mistake
* addressed review
* fixed includes
* removed blank lines
* style and warning fixes
* return failure when test fails
* fix supports_op
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-07-08 14:53:55 +03:00
Georgi Gerganov
64a56ebf13
ci : disable java build
2024-07-08 14:26:59 +03:00
Emmanuel Schmidbauer
bec9836849
server : add inference path to make OAI API compatible ( #2270 )
2024-07-08 14:24:58 +03:00
Georgi Gerganov
c118733a29
sync : ggml + fix sync script
2024-06-26 23:20:19 +03:00
Georgi Gerganov
bb3dd45524
make : disable CUDA graphs
2024-06-26 23:20:13 +03:00
slaren
04e7fa6f4f
ggml : add GGML_CUDA_USE_GRAPHS option, restore GGML_CUDA_FORCE_CUBLAS (cmake) (llama/8140)
2024-06-26 23:18:11 +03:00
Georgi Gerganov
9f7f36d4c9
make : disable CUDA mel build
2024-06-26 22:25:25 +03:00
Georgi Gerganov
4a62efbb95
cmake : minor fixes
2024-06-26 21:42:39 +03:00
Georgi Gerganov
0a55a70b9b
make : fix missing -O3
...
same as https://github.com/ggerganov/llama.cpp/pull/8143
2024-06-26 21:21:12 +03:00
Georgi Gerganov
dc8cc2dd6f
whisper : disable CUDA mel + fix FFMPEG
2024-06-26 20:11:38 +03:00
Georgi Gerganov
3efedb9511
sync : ggml
2024-06-26 19:40:23 +03:00
Georgi Gerganov
e30c679928
whisper : reorganize source code + improve CMake ( #2256 )
...
* scripts : update sync [no ci]
* files : reorganize [no ci]
* sync : llama.cpp
* cmake : link math library
* cmake : build normal ggml library
* files : move headers to include
* objc : fix path to ggml-metal.h
* ci : fix WHISPER_CUDA -> GGML_CUDA
* scripts : sync LICENSE [no ci]
2024-06-26 19:34:09 +03:00
mky_coder
bf4cb4abad
whisper : optimize fft() function ( #2242 )
...
Co-authored-by: Mike Fan <60965742+mike-fzy@users.noreply.github.com>
2024-06-18 18:10:33 +03:00
Georgi Gerganov
e293f17d34
talk-llama : sync llama.cpp
2024-06-18 09:45:37 +03:00
Georgi Gerganov
5d950c4b8d
whisper : use ggml_backend_sched ( #2239 )
...
* whisper : use ggml_backend_sched (wip)
* use sched in whisper_allocr
* whisper : single backend in whisper_context
* whisper : remove whisper_state->backends_used
* whisper : remove whisper_context->backend
* whisper : reset scheduler after init
* whisper : fix external encoder (e.g. CoreML)
* whisper : cleanup
* whisper : handle null GPU buffer types + fix sycl
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-06-18 09:39:40 +03:00
Georgi Gerganov
820446e230
fix : remove extra files
2024-06-18 09:39:40 +03:00
Georgi Gerganov
54d5823ebe
scripts : sync ggml-blas
2024-06-18 09:39:40 +03:00
Georgi Gerganov
5181494e9f
build : update make / cmake
2024-06-18 09:39:40 +03:00
Georgi Gerganov
4a6e6e8b30
sync : ggml
2024-06-18 09:39:40 +03:00
slaren
de29b193f6
move BLAS to a separate backend (cont) (llama/6210)
...
ggml-ci
2024-06-18 09:39:40 +03:00
0cc4m
922971041b
Vulkan Shader Refactor, Memory Debugging Option (llama/7947)
...
* Refactor shaders, extract GLSL code from ggml_vk_generate_shaders.py into vulkan-shaders directory
* Improve debug log code
* Add memory debug output option
* Fix flake8
* Fix unnecessary high llama-3 VRAM use
2024-06-18 09:39:40 +03:00
Georgi Gerganov
63a767a134
scripts : stop sync whisper example from ggml
2024-06-18 09:39:40 +03:00
Georgi Gerganov
30841fa786
cmake : fix sycl build ( #0 )
2024-06-16 18:19:48 +03:00
Georgi Gerganov
3b1ac03828
ggml : remove OpenCL ( #0 )
2024-06-16 18:19:48 +03:00
Georgi Gerganov
990de617b5
sycl : sync ( #0 )
2024-06-16 18:19:48 +03:00
Georgi Gerganov
6975600b4b
cuda : enable CUDA graphs ( #0 )
2024-06-16 18:19:48 +03:00