Georgi Gerganov
de4d067f1e
talk-llama : sync llama.cpp
2024-03-15 14:21:59 +02:00
Georgi Gerganov
8e409d1113
talk-llama : sync llama.cpp
2024-03-08 11:55:50 +02:00
Georgi Gerganov
05d1b61af4
talk-llama : sync llama.cpp
2024-03-08 11:52:47 +02:00
Georgi Gerganov
25d313b38b
talk-llama : sync llama.cpp
2024-02-28 13:04:05 +02:00
Georgi Gerganov
3170841ed9
talk-llama : sync llama.cpp
2024-02-25 20:00:10 +02:00
Georgi Gerganov
a2506909b1
talk-llama : sync llama.cpp
2024-02-22 23:30:53 +02:00
Georgi Gerganov
59119f4f20
talk-llama : sync llama.cpp
2024-02-20 12:09:57 +02:00
Georgi Gerganov
551529290d
talk-llama : sync llama.cpp
2024-02-12 10:39:58 +02:00
Georgi Gerganov
02b4c52c12
talk-llama : sync llama.cpp
2024-02-10 10:10:59 +02:00
Georgi Gerganov
e72e4158de
talk-llama : sync llama.cpp
2024-01-28 19:44:10 +02:00
Georgi Gerganov
ef3c9ed9eb
talk-llama : sync llama.cpp
2024-01-27 17:24:53 +02:00
Georgi Gerganov
1f50a7d29f
sync : llama.cpp
2024-01-17 21:23:33 +02:00
Georgi Gerganov
2a5874441d
talk-llama : llama.cpp
2024-01-14 11:06:28 +02:00
Georgi Gerganov
f001a3b7b6
talk-llama : sync llama.cpp
2024-01-14 00:13:17 +02:00
Georgi Gerganov
40ae0962f4
talk-llama : sync llama.cpp
2024-01-12 22:04:51 +02:00
Georgi Gerganov
00b7a4be02
talk-llama : sync llama.cpp
2024-01-11 22:10:10 +02:00
Georgi Gerganov
3b8c2dff57
talk-llama : sync latest llama.cpp
2024-01-06 17:22:57 +02:00
Georgi Gerganov
3a5302108d
sync : ggml (ggml_scale, ggml_row_size, etc.) ( #1677 )
...
* sync : ggml
* sync : llama.cpp
* talk-llama : fix obsolete param
* ggml-alloc : fix ggml_tallocr_is_own
* talk.wasm : update to new ggml
* ggml : fix type punning in ggml_scale
* ggml : cuda jetson + arm quants warnings
2023-12-22 17:53:39 +02:00
Georgi Gerganov
f96e1c5b78
sync : ggml (backend v2, k-quants, CUDA opts, Metal opts, etc.) ( #1422 )
...
* sync : ggml (backend v2, k-quants, CUDA opts, Metal opts, etc.)
* metal : allow env metal variable to override resource path (#1415 )
* Allow env variable to override resource path
* Update ggml-metal.m
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* sync : restore common / main from `master`
* sync : restore whisper from `master`
* talk-llama : update to latest llama.cpp
* ruby : fix build
* ggml : fix 32-bit ARM build
* ggml : fix MIN / MAX macro collisions + update ios bindings
* ggml : fix ifdefs and MIN / MAX again
* exampels : fix Obj-C and Swift examples
* ggml : fix 32-bit ARM compatibility
* ggml : one more attempt to fix 32-bit ARM compat
* whisper : fix support for larger graphs
---------
Co-authored-by: Chris Raethke <codesoda@users.noreply.github.com>
2023-11-03 21:35:05 +02:00
Georgi Gerganov
1ca4041b86
talk-llama : update to latest llama.cpp
2023-09-15 20:06:31 +03:00
Georgi Gerganov
77eab3fbfe
talk-llama : sync latest llama.cpp ( close #922 , close #954 )
2023-05-23 14:04:39 +03:00
Georgi Gerganov
0cb820e0f9
talk-llama : fix build + sync latest llama.cpp
2023-05-14 18:46:42 +03:00
Luis Herrera
4e4d00c67a
talk-llama : only copy used KV cache in get / set state ( #890 )
...
---------
Co-authored-by: ejones <evan.q.jones@gmail.com>
2023-05-08 20:59:21 +03:00
Luis Herrera
be5911a9f3
talk-llama : add --session support ( #845 )
...
* feat: adding session support
* readme: adding --session info in examples/talk-llama
* llama: adding session fixes
* readme: updating session doc
* talk-llama: update the value of need_to_save_session to true in order to save the session in the subsequent interaction
* talk-llama: adding missing function which updates session_tokens
2023-05-01 20:18:10 +03:00
Georgi Gerganov
794b162a46
whisper : add integer quantization support ( #540 )
...
* whisper : add integer quantization support
* examples : add common-ggml + prepare to add "quantize" tool
* whisper : quantization tool ready
* whisper : fix F32 support
* whisper : try to fix shared lib linkage
* wasm : update quantized models to Q5
* bench.wasm : remove "medium" button
* bench.wasm : fix custom model button
* ggml : add Q5_0 and Q5_1 WASM SIMD
* wasm : add quantized models to all WASM examples
* wasm : bump DB version number to 2
* talk-llama : update example to latest llama.cpp
* node : increase test timeout to 10s
* readme : add information for model quantization
* wasm : add links to other examples
2023-04-30 18:51:57 +03:00
Georgi Gerganov
ea36831459
talk-llama : update to latest llama.cpp (improved performance)
2023-04-10 22:59:13 +03:00
Georgi Gerganov
4a0deb8b1e
talk-llama : add new example + sync ggml from llama.cpp ( #664 )
...
* talk-llama : talk with LLaMA AI
* talk.llama : disable EOS token
* talk-llama : add README instructions
* ggml : fix build in debug
2023-03-27 21:00:32 +03:00