mirror of
https://github.com/mudler/LocalAI.git
synced 2024-12-28 08:28:51 +00:00
ba5ab26f2e
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
998 B
998 B
+++ disableToc = false title = "vLLM" weight = 4 +++
vLLM is a fast and easy-to-use library for LLM inference.
LocalAI has a built-in integration with vLLM, and it can be used to run models. You can check out vllm
performance here.
Setup
Create a YAML file for the model you want to use with vllm
.
To setup a model, you need to just specify the model name in the YAML config file:
name: vllm
backend: vllm
parameters:
model: "facebook/opt-125m"
# Decomment to specify a quantization method (optional)
# quantization: "awq"
The backend will automatically download the required files in order to run the model.
Usage
Use the completions
endpoint by specifying the vllm
backend:
curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
"model": "vllm",
"prompt": "Hello, my name is",
"temperature": 0.1, "top_p": 0.1
}'