LocalAI/exllama.md at 66fa4f1767e71740d6b5e33f8bee3c77ce64f962

mirror of https://github.com/mudler/LocalAI.git synced 2024-12-28 08:28:51 +00:00

Ettore Di Giacinto c5c77d2b0d

docs: Initial import from localai-website (#1312 )

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2023-11-22 18:13:50 +01:00

1.4 KiB

Raw Blame History

+++ disableToc = false title = "🦙 Exllama" weight = 2 +++

Exllama is a "A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights"

Prerequisites

This is an extra backend - in the container images is already available and there is nothing to do for the setup.

If you are building LocalAI locally, you need to install exllama manually first.

Model setup

Download the model as a folder inside the model directory and create a YAML file specifying the exllama backend. For instance with the TheBloke/WizardLM-7B-uncensored-GPTQ model:

$ git lfs install
$ cd models && git clone https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GPTQ
$ ls models/                                                                 
.keep                        WizardLM-7B-uncensored-GPTQ/ exllama.yaml
$ cat models/exllama.yaml                                                     
name: exllama
parameters:
  model: WizardLM-7B-uncensored-GPTQ
backend: exllama
# ...

Test with:

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{                                                                                                         
   "model": "exllama",
   "messages": [{"role": "user", "content": "How are you?"}],
   "temperature": 0.1
 }'

1.4 KiB Raw Blame History

Prerequisites

Model setup

1.4 KiB

Raw Blame History