mirror of
https://github.com/mudler/LocalAI.git
synced 2024-12-24 06:46:39 +00:00
0135e1e3b9
* fix: use vllm AsyncLLMEngine to bring true stream Current vLLM implementation uses the LLMEngine, which was designed for offline batch inference, which results in the streaming mode outputing all blobs at once at the end of the inference. This PR reworks the gRPC server to use asyncio and gRPC.aio, in combination with vLLM's AsyncLLMEngine to bring true stream mode. This PR also passes more parameters to vLLM during inference (presence_penalty, frequency_penalty, stop, ignore_eos, seed, ...). * Remove unused import |
||
---|---|---|
.. | ||
autogptq | ||
bark | ||
common-env/transformers | ||
coqui | ||
diffusers | ||
exllama | ||
exllama2 | ||
mamba | ||
petals | ||
sentencetransformers | ||
transformers | ||
transformers-musicgen | ||
vall-e-x | ||
vllm | ||
README.md |
Common commands about conda environment
Create a new empty conda environment
conda create --name <env-name> python=<your version> -y
conda create --name autogptq python=3.11 -y
To activate the environment
As of conda 4.4
conda activate autogptq
The conda version older than 4.4
source activate autogptq
Install the packages to your environment
Sometimes you need to install the packages from the conda-forge channel
By using conda
conda install <your-package-name>
conda install -c conda-forge <your package-name>
Or by using pip
pip install <your-package-name>