Commit Graph

35 Commits

Author SHA1 Message Date
Ettore Di Giacinto
f943c4b803
Revert "feat: include tokens usage for streamed output" (#4336)
Revert "feat: include tokens usage for streamed output (#4282)"

This reverts commit 0d6c3a7d57.
2024-12-08 17:53:36 +01:00
Ettore Di Giacinto
cea5a0ea42
feat(template): read jinja templates from gguf files (#4332)
* Read jinja templates as fallback

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Move templating out of model loader

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Test TemplateMessages

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Set role and content from transformers

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Tests: be more flexible

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* More jinja

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Small refactoring and adaptations

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-08 13:50:33 +01:00
mintyleaf
0d6c3a7d57
feat: include tokens usage for streamed output (#4282)
Use pb.Reply instead of []byte with Reply.GetMessage() in llama grpc to get the proper usage data in reply streaming mode at the last [DONE] frame

Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-11-28 14:47:56 +01:00
siddimore
50a3b54e34
feat(api): add correlationID to Track Chat requests (#3668)
* Add CorrelationID to chat request

Signed-off-by: Siddharth More <siddimore@gmail.com>

* remove get_token_metrics

Signed-off-by: Siddharth More <siddimore@gmail.com>

* Add CorrelationID to proto

Signed-off-by: Siddharth More <siddimore@gmail.com>

* fix correlation method name

Signed-off-by: Siddharth More <siddimore@gmail.com>

* Update core/http/endpoints/openai/chat.go

Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Signed-off-by: Siddharth More <siddimore@gmail.com>

* Update core/http/endpoints/openai/chat.go

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Signed-off-by: Siddharth More <siddimore@gmail.com>

---------

Signed-off-by: Siddharth More <siddimore@gmail.com>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-09-28 17:23:56 +02:00
Ettore Di Giacinto
191bc2e50a
feat(api): allow to pass audios to backends (#3603)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-09-19 12:26:53 +02:00
Ettore Di Giacinto
fbb9facda4
feat(api): allow to pass videos to backends (#3601)
This prepares the API to receive videos as well for video understanding.

It works similarly to images, where the request should be in the form:

{
 "type": "video_url",
 "video_url": { "url": "url or base64 data" }
}

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-09-19 11:21:59 +02:00
Ettore Di Giacinto
cf747bcdec
feat: extract output with regexes from LLMs (#3491)
* feat: extract output with regexes from LLMs

This changset adds `extract_regex` to the LLM config. It is a list of
regexes that can match output and will be used to re extract text from
the LLM output. This is particularly useful for LLMs which outputs final
results into tags.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add tests, enhance output in case of configuration error

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-09-13 13:27:36 +02:00
Ettore Di Giacinto
fbaae8528d
fix(chat): re-generated uuid, created, and text on each request (#3359)
This was noticed by models returning content besides function calls.
Sadly we can't test that easily in the CI so it got unnoticed.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-08-22 10:56:05 +02:00
Ettore Di Giacinto
e198347886
feat(openai): add json_schema format type and strict mode (#3193)
* feat(openai): add json_schema and strict mode

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* handle err vs _

security scanners prefer if we put these branches in, and I tend to agree.

Signed-off-by: Dave <dave@gray101.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Dave <dave@gray101.com>
Co-authored-by: Dave <dave@gray101.com>
2024-08-07 15:27:02 -04:00
Ettore Di Giacinto
2169c3497d
feat(grammar): add llama3.1 schema (#3015)
* wip

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* get rid of panics

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* expose it properly from the config

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Simplify

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* forgot to commit

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Remove focus on test

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Small fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-07-26 20:11:29 +02:00
Ettore Di Giacinto
5eda7f578d
refactor: break down json grammar parser in different files (#3004)
* refactor: break down json grammar parser in different files

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: patch to `refactor_grammars` - propagate errors (#3006)

propagate errors around

Signed-off-by: Dave Lee <dave@gray101.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Dave Lee <dave@gray101.com>
Co-authored-by: Dave <dave@gray101.com>
2024-07-25 08:41:00 +02:00
Ettore Di Giacinto
bf9dd1de7f
feat(functions): parse broken JSON when we parse the raw results, use dynamic rules for grammar keys (#2912)
* feat(functions): enhance parsing with broken JSON when we parse the raw results

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* breaking: make function name by default

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(grammar): dynamically generate grammars with mutating keys

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor: simplify condition

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Update docs

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-07-18 17:52:22 +02:00
Ettore Di Giacinto
59ef426fbf
feat(model-list): be consistent, skip known files from listing (#2760)
fix(model-list): be consistent, skip known files from listing

This changeset does two things:

- Removes the dependency of listing models from the OpenAI schema.
- Tries to reduce confusion between ListModels() in model loader and in
  the service - now there is only one ListModels which is in services
and does not depend anymore on the OpenAI schema
- The OpenAI-schema functions were moved nearby the OpenAI specific
  endpoints that needs the schema
- Drops the ListModel Service structure as there was no real need for
  it.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-07-10 15:28:39 +02:00
Sertaç Özercan
5866fc8ded
chore: fix go.mod module (#2635)
Signed-off-by: Sertac Ozercan <sozercan@gmail.com>
2024-06-23 08:24:36 +00:00
Ettore Di Giacinto
3b7a78adda
fix(stream): do not break channel consumption (#2517)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-07 17:20:42 +02:00
Ettore Di Giacinto
3f7212c660
feat(functions): better free string matching, allow to expect strings after JSON (#2445)
Allow now any non-character, both as suffix and prefix when mixed grammars are enabled

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-31 09:36:27 +02:00
Prajwal S Nayak
4d98dd9ce7
feat(image): support response_type in the OpenAI API request (#2347)
* Change response_format type to string to match OpenAI Spec

Signed-off-by: prajwal <prajwalnayak7@gmail.com>

* updated response_type type to interface

Signed-off-by: prajwal <prajwalnayak7@gmail.com>

* feat: correctly parse generic struct

Signed-off-by: mudler <mudler@localai.io>

* add tests

Signed-off-by: mudler <mudler@localai.io>

---------

Signed-off-by: prajwal <prajwalnayak7@gmail.com>
Signed-off-by: mudler <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: mudler <mudler@localai.io>
2024-05-29 14:40:54 +02:00
Ettore Di Giacinto
669cd06dd9
feat(functions): allow parallel calls with mixed/no grammars (#2432)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-28 21:06:09 +02:00
Ettore Di Giacinto
491e1d752b
feat(functions): relax mixedgrammars (#2365)
* feat(functions): relax mixedgrammars

Extend even more the functionalities and when mixed mode is enabled,
tolerate also both strings and JSON in the result - in this case we make
sure that the JSON can be correctly parsed.

This also updates the examples and the gallery model to configure the
grammar.

The changeset also breaks current function/grammar configuration as it
reserves now a stanza in the YAML config.

For example:

```yaml
function:
  grammar:
    # This allows the grammar to also return messages
    mixed_mode: true
    # Suffix to add to the grammar
    # prefix: '<tool_call>\n'
    # Force parallel calls in the grammar
    # parallel_calls: true
```

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor, add a way to disable mixed json and freestring

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fix linting issues

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-22 00:14:16 +02:00
Ettore Di Giacinto
beb598e4f9
feat(functions): mixed JSON BNF grammars (#2328)
feat(functions): support mixed JSON BNF grammar

This PR provides new options to control how functions are extracted from
the LLM, and also provides more control on how JSON grammars can be used
(also in conjunction).

New YAML settings introduced:

- `grammar_message`: when enabled, the generated grammar can also decide
  to push strings and not only JSON objects. This allows the LLM to pick
to either respond freely or using JSON.
- `grammar_prefix`: Allows to prefix a string to the JSON grammar
  definition.
- `replace_results`: Is a map that allows to replace strings in the LLM
  result.

As an example, consider the following settings for Hermes-2-Pro-Mistral,
which allow extracting both JSON results coming from the model, and the
ones coming from the grammar:

```yaml
function:
  # disable injecting the "answer" tool
  disable_no_action: true
  # This allows the grammar to also return messages
  grammar_message: true
  # Suffix to add to the grammar
  grammar_prefix: '<tool_call>\n'
  return_name_in_function_response: true
  # Without grammar uncomment the lines below
  # Warning: this is relying only on the capability of the
  # LLM model to generate the correct function call.
  # no_grammar: true
  # json_regex_match: "(?s)<tool_call>(.*?)</tool_call>"
  replace_results:
    "<tool_call>": ""
    "\'": "\""
```

Note: To disable entirely grammars usage in the example above, uncomment the
`no_grammar` and `json_regex_match`.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-15 20:03:18 +02:00
Ettore Di Giacinto
c89271b2e4
feat(llama.cpp): add distributed llama.cpp inferencing (#2324)
* feat(llama.cpp): support distributed llama.cpp

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat: let tweak how chat messages are merged together

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Makefile: register to ALL_GRPC_BACKENDS

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactoring, allow disable auto-detection of backends

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* minor fixups

Signed-off-by: mudler <mudler@localai.io>

* feat: add cmd to start rpc-server from llama.cpp

Signed-off-by: mudler <mudler@localai.io>

* ci: add ccache

Signed-off-by: mudler <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: mudler <mudler@localai.io>
2024-05-15 01:17:02 +02:00
Ettore Di Giacinto
84e2407afa
feat(functions): allow to set JSON matcher (#2319)
Signed-off-by: mudler <mudler@localai.io>
2024-05-14 09:39:20 +02:00
Ettore Di Giacinto
efa32a2677
feat(grammar): support models with specific construct (#2291)
When enabling grammar with functions, it might be useful to
allow more flexibility to support models that are fine-tuned against returning
function calls of the form of { "name": "function_name", "arguments" {...} }
rather then { "function": "function_name", "arguments": {..} }.

This might call out to a more generic approach later on, but for the moment being we can easily support both
as we have just to specific different types.

If needed we can expand on this later on

Signed-off-by: mudler <mudler@localai.io>
2024-05-12 01:13:22 +02:00
Ettore Di Giacinto
bbea62b907
feat(functions): support models with no grammar, add tests (#2068)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-04-18 22:43:12 +02:00
Ettore Di Giacinto
af9e5a2d05
Revert #1963 (#2056)
* Revert "fix(fncall): fix regression introduced in #1963 (#2048)"

This reverts commit 6b06d4e0af.

* Revert "fix: action-tmate back to upstream, dead code removal (#2038)"

This reverts commit fdec8a9d00.

* Revert "feat(grpc): return consumed token count and update response accordingly (#2035)"

This reverts commit e843d7df0e.

* Revert "refactor: backend/service split, channel-based llm flow (#1963)"

This reverts commit eed5706994.

* feat(grpc): return consumed token count and update response accordingly

Fixes: #1920

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-04-17 23:33:49 +02:00
Dave
eed5706994
refactor: backend/service split, channel-based llm flow (#1963)
Refactor: channel based llm flow and services split

---------

Signed-off-by: Dave Lee <dave@gray101.com>
2024-04-13 09:45:34 +02:00
Ludovic Leroux
12c0d9443e
feat: use tokenizer.apply_chat_template() in vLLM (#1990)
Use tokenizer.apply_chat_template() in vLLM

Signed-off-by: Ludovic LEROUX <ludovic@inpher.io>
2024-04-11 19:20:22 +02:00
cryptk
b85dad0286
feat: first pass at improving logging (#1956)
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
2024-04-04 09:24:22 +02:00
Ettore Di Giacinto
35290e146b
fix(grammar): respect JSONmode and grammar from user input (#1935)
* fix(grammar): Fix JSON mode and custom grammar

* tests(aio): add jsonmode test

* tests(aio): add functioncall test

* fix(aio): use hermes-2-pro-mistral as llm for CPU profile

* add phi-2-orange
2024-03-31 13:04:09 +02:00
Ettore Di Giacinto
957f428fd5
fix(tools): correctly render tools response in templates (#1932)
* fix(tools): allow to correctly display both Functions and Tools

* models(hermes-2-pro): correctly display function results
2024-03-30 19:02:07 +01:00
Ettore Di Giacinto
123a5a2e16
feat(swagger): Add swagger API doc (#1926)
* makefile(build): add minimal and api build target

* feat(swagger): Add swagger
2024-03-29 22:29:33 +01:00
Ettore Di Giacinto
e533dcf506
feat(functions/aio): all-in-one images, function template enhancements (#1862)
* feat(startup): allow to specify models from local files

* feat(aio): add Dockerfile, make targets, aio profiles

* feat(template): add Function and LastMessage

* add hermes2-pro-mistral

* update hermes2 definition

* feat(template): add sprig

* feat(template): expose FunctionCall

* feat(aio): switch llm for text
2024-03-21 01:12:20 +01:00
Dave
1c312685aa
refactor: move remaining api packages to core (#1731)
* core 1

* api/openai/files fix

* core 2 - core/config

* move over core api.go and tests to the start of core/http

* move over localai specific endpoints to core/http, begin the service/endpoint split there

* refactor big chunk on the plane

* refactor chunk 2 on plane, next step: port and modify changes to request.go

* easy fixes for request.go, major changes not done yet

* lintfix

* json tag lintfix?

* gitignore and .keep files

* strange fix attempt: rename the config dir?
2024-03-01 16:19:53 +01:00
Ettore Di Giacinto
db926896bd
Revert "[Refactor]: Core/API Split" (#1550)
Revert "[Refactor]: Core/API Split (#1506)"

This reverts commit ab7b4d5ee9.
2024-01-05 18:04:46 +01:00
Dave
ab7b4d5ee9
[Refactor]: Core/API Split (#1506)
Refactors api folder to core, creates firm split between backend code and api frontend.
2024-01-05 15:34:56 +01:00