Commit Graph

16 Commits

Author SHA1 Message Date
afa1bca1e3 fix(llama.cpp): set -1 as default for max tokens (#2087)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-04-20 20:20:10 +02:00
03adc1f60d Add tensor_parallel_size setting to vllm setting items (#2085)
Signed-off-by: Taikono-Himazin <kazu@po.harenet.ne.jp>
2024-04-20 14:37:02 +00:00
bbea62b907 feat(functions): support models with no grammar, add tests (#2068)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-04-18 22:43:12 +02:00
af9e5a2d05 Revert #1963 (#2056)
* Revert "fix(fncall): fix regression introduced in #1963 (#2048)"

This reverts commit 6b06d4e0af.

* Revert "fix: action-tmate back to upstream, dead code removal (#2038)"

This reverts commit fdec8a9d00.

* Revert "feat(grpc): return consumed token count and update response accordingly (#2035)"

This reverts commit e843d7df0e.

* Revert "refactor: backend/service split, channel-based llm flow (#1963)"

This reverts commit eed5706994.

* feat(grpc): return consumed token count and update response accordingly

Fixes: #1920

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-04-17 23:33:49 +02:00
eed5706994 refactor: backend/service split, channel-based llm flow (#1963)
Refactor: channel based llm flow and services split

---------

Signed-off-by: Dave Lee <dave@gray101.com>
2024-04-13 09:45:34 +02:00
12c0d9443e feat: use tokenizer.apply_chat_template() in vLLM (#1990)
Use tokenizer.apply_chat_template() in vLLM

Signed-off-by: Ludovic LEROUX <ludovic@inpher.io>
2024-04-11 19:20:22 +02:00
8342553214 fix(llama.cpp): set better defaults for llama.cpp (#1961)
fix(defaults): set better defaults for llama.cpp

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-04-06 22:56:45 +02:00
ff77d3bc22 fix(seed): generate random seed per-request if -1 is set (#1952)
* fix(seed): generate random seed per-request if -1 is set

Also update ci with new workflows and allow the aio tests to run with an
api key

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* docs(openvino): Add OpenVINO example

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-04-03 22:25:47 +02:00
9bc209ba73 fix(welcome): stable model list (#1949) 2024-04-02 19:25:32 +02:00
e8f02c083f fix(functions): respect when selected from string (#1940)
* fix(functions): respect when selected from string

* fix(toolschoice): decode both string and objects
2024-04-01 19:39:54 +02:00
600152df23 fix(config): pass by config options, respect defaults (#1878)
This bug had the unpleasant effect that it ignored defaults passed by
the CLI. For instance threads could be changed only via model config
file.
2024-03-22 20:55:11 +01:00
843f93e1ab fix(config): default to debug=false if not set (#1853) 2024-03-18 18:59:39 +01:00
b9e77d394b feat(model-help): display help text in markdown (#1825)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-03-13 21:50:46 +01:00
f895d06605 fix(config): set better defaults for inferencing (#1822)
* fix(defaults): set better defaults for inferencing

This changeset aim to have better defaults and to properly detect when
no inference settings are provided with the model.

If not specified, we defaults to mirostat sampling, and offload all the
GPU layers (if a GPU is detected).

Related to https://github.com/mudler/LocalAI/issues/1373 and https://github.com/mudler/LocalAI/issues/1723

* Adapt tests

* Also pre-initialize default seed
2024-03-13 10:05:30 +01:00
939411300a Bump vLLM version + more options when loading models in vLLM (#1782)
* Bump vLLM version to 0.3.2

* Add vLLM model loading options

* Remove transformers-exllama

* Fix install exllama
2024-03-01 22:48:53 +01:00
1c312685aa refactor: move remaining api packages to core (#1731)
* core 1

* api/openai/files fix

* core 2 - core/config

* move over core api.go and tests to the start of core/http

* move over localai specific endpoints to core/http, begin the service/endpoint split there

* refactor big chunk on the plane

* refactor chunk 2 on plane, next step: port and modify changes to request.go

* easy fixes for request.go, major changes not done yet

* lintfix

* json tag lintfix?

* gitignore and .keep files

* strange fix attempt: rename the config dir?
2024-03-01 16:19:53 +01:00