We default to a soft kill, however, we might want to force killing
backends after a while to avoid hanging requests (which may hallucinate
indefinetly)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(refactor): track internally started models by ID
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Just extend options, no need to copy
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Improve debugging for rerankers failures
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Simplify model loading with rerankers
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Be more consistent when generating model options
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Uncommitted code
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Make deleteProcess more idiomatic
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Adapt CLI for sound generation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixup threads definition
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Handle corner case where c.Seed is nil
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Consistently use ModelOptions
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Adapt new code to refactoring
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Dave <dave@gray101.com>
* chore(refactor): track grpcProcess in the model structure
This avoids to have to handle in two parts the data relative to the same
model. It makes it easier to track and use mutex with.
This also fixes races conditions while accessing to the model.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(tests): run protogen-go before starting aio tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(tests): install protoc in aio tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(refactor): drop duplicated shutdown logics
- Handle locking in Shutdown and CheckModelIsLoaded in a more go-idiomatic way
- Drop duplicated code and re-organize shutdown code
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: drop leftover
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: improve logging and add missing locks
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(shutdown): do not shutdown immediately busy backends
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(refactor): avoid duplicate functions
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: multiplicative backoff for shutdown (#3547)
* multiplicative backoff for shutdown
Rather than always retry every two seconds, back off the shutdown attempt rate?
Signed-off-by: Dave <dave@gray101.com>
* Update loader.go
Signed-off-by: Dave <dave@gray101.com>
* add clamp of 2 minutes
Signed-off-by: Dave Lee <dave@gray101.com>
---------
Signed-off-by: Dave <dave@gray101.com>
Signed-off-by: Dave Lee <dave@gray101.com>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Dave <dave@gray101.com>
Signed-off-by: Dave Lee <dave@gray101.com>
Co-authored-by: Dave <dave@gray101.com>
Due to a previous refactor we moved the client constructor tight to the
model address, however that was just a string which we would use to
build the client each time.
With this change we make the loader to return a *Model which carries a
constructor for the client and stores the client on the first
connection.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
fix(model-list): be consistent, skip known files from listing
This changeset does two things:
- Removes the dependency of listing models from the OpenAI schema.
- Tries to reduce confusion between ListModels() in model loader and in
the service - now there is only one ListModels which is in services
and does not depend anymore on the OpenAI schema
- The OpenAI-schema functions were moved nearby the OpenAI specific
endpoints that needs the schema
- Drops the ListModel Service structure as there was no real need for
it.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(gallery): op now supports deletion of models
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Wire things with WebUI(WIP)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* minor improvements
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactor(template): isolate and add tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Dave <dave@gray101.com>
Co-authored-by: Dave <dave@gray101.com>
* feat(tools): support Tools in the API
Co-authored-by: =?UTF-8?q?Stephan=20A=C3=9Fmus?= <stephan.assmus@sap.com>
* feat(tools): support function streaming
* Adhere to new return types when using tools instead of functions
* Keep backward compatibility with function calling
* Evaluate function names in chat templates
* Disable recovery with --debug
* Correctly stream out the entire result
* Detect when llm chooses to reply and to not perform any action in SSE
* Feedback from code review
---------
Co-authored-by: =?UTF-8?q?Stephan=20A=C3=9Fmus?= <stephan.assmus@sap.com>
* feat: Allow inline templates
* feat: Allow to specify url in model config files
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* feat: support 'huggingface://' format
* style: reuse-code from gallery
---------
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* feat: allow to run parallel requests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fixup
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Aman Karmani <aman@tmm1.net>
Lays some of the groundwork for LLAMA2 compatibility as well as other future models with complex prompting schemes.
Started small refactoring in pkg/model/loader.go regarding template loading. Currently still a part of ModelLoader, but should be easy to add template loading for situations other than overall prompt templates and the new chat-specific per-message templates
Adds support for new chat-endpoint-specific, per-message templates as an alternative to the existing Role: XYZ sprintf method.
Includes a temporary prompt template as an example, since I have a few questions before we merge in the model-gallery side changes (see )
Minor debug logging changes.