Johannes Gäßler
|
d052e64d42
|
CUDA: batched+noncont MMQ, refactor bs>1 MoE code (llama/13199)
|
2025-05-01 13:29:02 +03:00 |
|
Johannes Gäßler
|
3d54b68ea7
|
CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID (llama/13014)
* CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID
* fix logic for RoPE support, CUDA graphs
|
2025-04-24 20:39:16 +03:00 |
|
Diego Devesa
|
2a4b5c9d7e
|
cuda : optimize argmax (llama/10441)
* cuda : optimize argmax
* remove unused parameter
ggml-ci
* fixup : use full warps
ggml-ci
* Apply suggestions from code review
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* fix ub
* ggml : check ne00 <= INT32_MAX in argmax and argsort
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
|
2024-12-08 20:14:35 +02:00 |
|
slaren
|
dd916a2852
|
ggml : reduce hash table reset cost (llama/8698)
* ggml : reduce hash table reset cost
* fix unreachable code warnings after GGML_ASSERT(false)
* GGML_ASSERT(false) -> GGML_ABORT("fatal error")
* GGML_ABORT use format string
|
2024-08-08 22:48:46 +03:00 |
|
Johannes Gäßler
|
15d71189e9
|
CUDA: optimize and refactor MMQ (llama/8416)
* CUDA: optimize and refactor MMQ
* explicit q8_1 memory layouts, add documentation
|
2024-08-08 22:48:46 +03:00 |
|
Georgi Gerganov
|
e30c679928
|
whisper : reorganize source code + improve CMake (#2256)
* scripts : update sync [no ci]
* files : reorganize [no ci]
* sync : llama.cpp
* cmake : link math library
* cmake : build normal ggml library
* files : move headers to include
* objc : fix path to ggml-metal.h
* ci : fix WHISPER_CUDA -> GGML_CUDA
* scripts : sync LICENSE [no ci]
|
2024-06-26 19:34:09 +03:00 |
|