LocalAI/autogptq.md at 66fa4f1767e71740d6b5e33f8bee3c77ce64f962

mirror of https://github.com/mudler/LocalAI.git synced 2024-12-28 08:28:51 +00:00

Ettore Di Giacinto c5c77d2b0d

docs: Initial import from localai-website (#1312 )

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2023-11-22 18:13:50 +01:00

1.2 KiB

Raw Blame History

+++ disableToc = false title = "🦙 AutoGPTQ" weight = 3 +++

AutoGPTQ is an easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Prerequisites

This is an extra backend - in the container images is already available and there is nothing to do for the setup.

If you are building LocalAI locally, you need to install AutoGPTQ manually.

Model setup

The models are automatically downloaded from huggingface if not present the first time. It is possible to define models via YAML config file, or just by querying the endpoint with the huggingface repository model name. For example, create a YAML config file in models/:

name: orca
backend: autogptq
model_base_name: "orca_mini_v2_13b-GPTQ-4bit-128g.no-act.order"
parameters:
  model: "TheBloke/orca_mini_v2_13b-GPTQ"
# ...

Test with:

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{                                                                                                         
   "model": "orca",
   "messages": [{"role": "user", "content": "How are you?"}],
   "temperature": 0.1
 }'

1.2 KiB Raw Blame History

Prerequisites

Model setup

1.2 KiB

Raw Blame History