mirror of
https://github.com/mudler/LocalAI.git
synced 2025-01-04 11:54:12 +00:00
c5c77d2b0d
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
39 lines
1.2 KiB
Markdown
39 lines
1.2 KiB
Markdown
|
|
+++
|
|
disableToc = false
|
|
title = "🦙 AutoGPTQ"
|
|
weight = 3
|
|
+++
|
|
|
|
[AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) is an easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
|
|
|
|
## Prerequisites
|
|
|
|
This is an extra backend - in the container images is already available and there is nothing to do for the setup.
|
|
|
|
If you are building LocalAI locally, you need to install [AutoGPTQ manually](https://github.com/PanQiWei/AutoGPTQ#quick-installation).
|
|
|
|
|
|
## Model setup
|
|
|
|
The models are automatically downloaded from `huggingface` if not present the first time. It is possible to define models via `YAML` config file, or just by querying the endpoint with the `huggingface` repository model name. For example, create a `YAML` config file in `models/`:
|
|
|
|
```
|
|
name: orca
|
|
backend: autogptq
|
|
model_base_name: "orca_mini_v2_13b-GPTQ-4bit-128g.no-act.order"
|
|
parameters:
|
|
model: "TheBloke/orca_mini_v2_13b-GPTQ"
|
|
# ...
|
|
```
|
|
|
|
Test with:
|
|
|
|
```bash
|
|
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
|
|
"model": "orca",
|
|
"messages": [{"role": "user", "content": "How are you?"}],
|
|
"temperature": 0.1
|
|
}'
|
|
```
|