mirror of
https://github.com/mudler/LocalAI.git
synced 2024-12-28 08:28:51 +00:00
c5c77d2b0d
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
1.4 KiB
1.4 KiB
+++ disableToc = false title = "🦙 Exllama" weight = 2 +++
Exllama is a "A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights"
Prerequisites
This is an extra backend - in the container images is already available and there is nothing to do for the setup.
If you are building LocalAI locally, you need to install exllama manually first.
Model setup
Download the model as a folder inside the model
directory and create a YAML file specifying the exllama
backend. For instance with the TheBloke/WizardLM-7B-uncensored-GPTQ
model:
$ git lfs install
$ cd models && git clone https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GPTQ
$ ls models/
.keep WizardLM-7B-uncensored-GPTQ/ exllama.yaml
$ cat models/exllama.yaml
name: exllama
parameters:
model: WizardLM-7B-uncensored-GPTQ
backend: exllama
# ...
Test with:
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "exllama",
"messages": [{"role": "user", "content": "How are you?"}],
"temperature": 0.1
}'