LocalAI/vllm.md at d6565f3b999e9dbadd76bc41dc25dcd1a1df43b8

mirror of https://github.com/mudler/LocalAI.git synced 2024-12-28 08:28:51 +00:00

Ettore Di Giacinto ba5ab26f2e

docs: Add llava, update hot topics (#1322 )

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2023-11-23 18:54:55 +01:00

998 B

Raw Blame History

+++ disableToc = false title = "vLLM" weight = 4 +++

vLLM is a fast and easy-to-use library for LLM inference.

LocalAI has a built-in integration with vLLM, and it can be used to run models. You can check out vllm performance here.

Setup

Create a YAML file for the model you want to use with vllm.

To setup a model, you need to just specify the model name in the YAML config file:

name: vllm
backend: vllm
parameters:
    model: "facebook/opt-125m"

# Decomment to specify a quantization method (optional)
# quantization: "awq"

The backend will automatically download the required files in order to run the model.

Usage

Use the completions endpoint by specifying the vllm backend:

curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{   
   "model": "vllm",
   "prompt": "Hello, my name is",
   "temperature": 0.1, "top_p": 0.1
 }'

998 B Raw Blame History

Setup

Usage

998 B

Raw Blame History