Ludovic Leroux 0135e1e3b9
fix: vllm - use AsyncLLMEngine to allow true streaming mode (#1749)
* fix: use vllm AsyncLLMEngine to bring true stream

Current vLLM implementation uses the LLMEngine, which was designed for offline batch inference, which results in the streaming mode outputing all blobs at once at the end of the inference.

This PR reworks the gRPC server to use asyncio and gRPC.aio, in combination with vLLM's AsyncLLMEngine to bring true stream mode.

This PR also passes more parameters to vLLM during inference (presence_penalty, frequency_penalty, stop, ignore_eos, seed, ...).

* Remove unused import
2024-02-24 11:48:45 +01:00
2024-02-16 15:11:53 +01:00
2023-04-13 01:13:14 +02:00
2023-07-06 21:25:10 +02:00
2023-04-19 18:43:10 +02:00
2024-01-18 19:41:08 +01:00
2024-02-24 00:06:46 +01:00
2024-02-22 16:35:06 +01:00
2023-05-04 15:01:29 +02:00



LocalAI

LocalAI forks LocalAI stars LocalAI pull-requests

💡 Get help - FAQ 💭Discussions 💬 Discord 📖 Documentation website

💻 Quickstart 📣 News 🛫 Examples 🖼️ Models 🚀 Roadmap

testsBuild and Releasebuild container imagesBump dependenciesArtifact Hub

Follow LocalAI_API Join LocalAI Discord Community

LocalAI is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API thats compatible with OpenAI API specifications for local inferencing. It allows you to run LLMs, generate images, audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families. Does not require GPU.

🔥🔥 Hot topics / Roadmap

Roadmap

Hot topics (looking for contributors):

If you want to help and contribute, issues up for grabs: https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3A%22up+for+grabs%22

💻 Getting started

For a detailed step-by-step introduction, refer to the Getting Started guide. For those in a hurry, here's a straightforward one-liner to launch a LocalAI instance with phi-2 using docker:

docker run -ti -p 8080:8080 localai/localai:v2.7.0-ffmpeg-core phi-2

🚀 Features

💻 Usage

Check out the Getting started section in our documentation.

🔗 Community and integrations

Build and deploy custom containers:

WebUIs:

Model galleries

UI / Management Programs

Other:

🔗 Resources

📖 🎥 Media, Blogs, Social

Citation

If you utilize this repository, data in a downstream project, please consider citing it with:

@misc{localai,
  author = {Ettore Di Giacinto},
  title = {LocalAI: The free, Open source OpenAI alternative},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/go-skynet/LocalAI}},

❤️ Sponsors

Do you find LocalAI useful?

Support the project by becoming a backer or sponsor. Your logo will show up here with a link to your website.

A huge thank you to our generous sponsors who support this project:

Spectro Cloud logo_600x600px_transparent bg
Spectro Cloud
Spectro Cloud kindly supports LocalAI by providing GPU and computing resources to run tests on lamdalabs!

And a huge shout-out to individuals sponsoring the project by donating hardware or backing the project.

🌟 Star history

LocalAI Star history Chart

📖 License

LocalAI is a community-driven project created by Ettore Di Giacinto.

MIT - Author Ettore Di Giacinto

🙇 Acknowledgements

LocalAI couldn't have been built without the help of great software already available from the community. Thank you!

🤗 Contributors

This is a community project, a special thanks to our contributors! 🤗

Description
🤖 The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed, P2P inference
Readme MIT 154 MiB
Languages
Go 87.4%
Python 3.2%
JavaScript 2.9%
HTML 2.5%
C++ 2.1%
Other 1.8%