mirror of
https://github.com/mudler/LocalAI.git
synced 2024-12-23 14:32:25 +00:00
39 lines
1003 B
Markdown
39 lines
1003 B
Markdown
|
|
||
|
+++
|
||
|
disableToc = false
|
||
|
title = "🆕 vLLM"
|
||
|
weight = 4
|
||
|
+++
|
||
|
|
||
|
[vLLM](https://github.com/vllm-project/vllm) is a fast and easy-to-use library for LLM inference.
|
||
|
|
||
|
LocalAI has a built-in integration with vLLM, and it can be used to run models. You can check out `vllm` performance [here](https://github.com/vllm-project/vllm#performance).
|
||
|
|
||
|
## Setup
|
||
|
|
||
|
Create a YAML file for the model you want to use with `vllm`.
|
||
|
|
||
|
To setup a model, you need to just specify the model name in the YAML config file:
|
||
|
```yaml
|
||
|
name: vllm
|
||
|
backend: vllm
|
||
|
parameters:
|
||
|
model: "facebook/opt-125m"
|
||
|
|
||
|
# Decomment to specify a quantization method (optional)
|
||
|
# quantization: "awq"
|
||
|
```
|
||
|
|
||
|
The backend will automatically download the required files in order to run the model.
|
||
|
|
||
|
|
||
|
## Usage
|
||
|
|
||
|
Use the `completions` endpoint by specifying the `vllm` backend:
|
||
|
```
|
||
|
curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
|
||
|
"model": "vllm",
|
||
|
"prompt": "Hello, my name is",
|
||
|
"temperature": 0.1, "top_p": 0.1
|
||
|
}'
|
||
|
```
|