Ivan
|
2fc1d20f9e
|
cuda: add q8_0->f32 cpy operation (llama/9571)
llama: enable K-shift for quantized KV cache
It will fail on unsupported backends or quant types.
|
2024-09-24 19:45:08 +03:00 |
|
slaren
|
709a22b92d
|
cuda : fix defrag with quantized KV (llama/9319)
|
2024-09-24 19:45:08 +03:00 |
|
slaren
|
dd916a2852
|
ggml : reduce hash table reset cost (llama/8698)
* ggml : reduce hash table reset cost
* fix unreachable code warnings after GGML_ASSERT(false)
* GGML_ASSERT(false) -> GGML_ABORT("fatal error")
* GGML_ABORT use format string
|
2024-08-08 22:48:46 +03:00 |
|
Clint Herron
|
c2c60dc9ba
|
Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (llama/8258)
|
2024-07-08 14:53:55 +03:00 |
|
Georgi Gerganov
|
e30c679928
|
whisper : reorganize source code + improve CMake (#2256)
* scripts : update sync [no ci]
* files : reorganize [no ci]
* sync : llama.cpp
* cmake : link math library
* cmake : build normal ggml library
* files : move headers to include
* objc : fix path to ggml-metal.h
* ci : fix WHISPER_CUDA -> GGML_CUDA
* scripts : sync LICENSE [no ci]
|
2024-06-26 19:34:09 +03:00 |
|