Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
9.5 KiB
+++ disableToc = false title = "Model compatibility" weight = 4 +++
LocalAI is compatible with the models supported by llama.cpp supports also GPT4ALL-J and cerebras-GPT with ggml.
{{% notice note %}}
LocalAI will attempt to automatically load models which are not explicitly configured for a specific backend. You can specify the backend to use by configuring a model with a YAML file. See [the advanced section]({{%relref "advanced" %}}) for more details.
{{% /notice %}}
Hardware requirements
Depending on the model you are attempting to run might need more RAM or CPU resources. Check out also here for ggml
based backends. rwkv
is less expensive on resources.
Model compatibility table
Besides llama based models, LocalAI is compatible also with other architectures. The table below lists all the compatible models families and the associated binding repository.
Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
---|---|---|---|---|---|---|
[llama.cpp]({{%relref "model-compatibility/llama-cpp" %}}) | Vicuna, Alpaca, LLaMa | yes | GPT and Functions | yes** | yes | CUDA, openCL, cuBLAS, Metal |
gpt4all-llama | Vicuna, Alpaca, LLaMa | yes | GPT | no | yes | N/A |
gpt4all-mpt | MPT | yes | GPT | no | yes | N/A |
gpt4all-j | GPT4ALL-J | yes | GPT | no | yes | N/A |
falcon-ggml (binding) | Falcon (*) | yes | GPT | no | no | N/A |
gpt2 (binding) | GPT2, Cerebras | yes | GPT | no | no | N/A |
dolly (binding) | Dolly | yes | GPT | no | no | N/A |
gptj (binding) | GPTJ | yes | GPT | no | no | N/A |
mpt (binding) | MPT | yes | GPT | no | no | N/A |
replit (binding) | Replit | yes | GPT | no | no | N/A |
gptneox (binding) | GPT NeoX, RedPajama, StableLM | yes | GPT | no | no | N/A |
starcoder (binding) | Starcoder | yes | GPT | no | no | N/A |
bloomz (binding) | Bloom | yes | GPT | no | no | N/A |
rwkv (binding) | rwkv | yes | GPT | no | yes | N/A |
bert (binding) | bert | no | Embeddings only | yes | no | N/A |
whisper | whisper | no | Audio | no | no | N/A |
stablediffusion (binding) | stablediffusion | no | Image | no | no | N/A |
langchain-huggingface | Any text generators available on HuggingFace through API | yes | GPT | no | no | N/A |
piper (binding) | Any piper onnx model | no | Text to voice | no | no | N/A |
falcon (binding) | Falcon *** | yes | GPT | no | yes | CUDA |
huggingface-embeddings sentence-transformers |
BERT | no | Embeddings only | yes | no | N/A |
bark |
bark | no | Audio generation | no | no | yes |
AutoGPTQ |
GPTQ | yes | GPT | yes | no | N/A |
exllama |
GPTQ | yes | GPT only | no | no | N/A |
diffusers |
SD,... | no | Image generation | no | no | N/A |
vall-e-x |
Vall-E | no | Audio generation and Voice cloning | no | no | CPU/CUDA |
vllm |
Various GPTs and quantization formats | yes | GPT | no | no | CPU/CUDA |
Note: any backend name listed above can be used in the backend
field of the model configuration file (See [the advanced section]({{%relref "advanced" %}})).
- * 7b ONLY
- ** doesn't seem to be accurate
- *** 7b and 40b with the
ggccv
format, for instance: https://huggingface.co/TheBloke/WizardLM-Uncensored-Falcon-40B-GGML
Tested with:
- Automatically by CI with OpenLLAMA and GPT4ALL.
- LLaMA 🦙
- Vicuna
- Alpaca
- GPT4ALL (see also using GPT4All)
- GPT4ALL-J (no changes required)
- Koala 🐨
- Cerebras-GPT
- WizardLM
- RWKV models with rwkv.cpp
- bloom.cpp
- Chinese LLaMA / Alpaca
- Vigogne (French)
- OpenBuddy 🐶 (Multilingual)
- Pygmalion 7B / Metharme 7B
- HuggingFace Inference models available through API
- Falcon
Note: You might need to convert some models from older models to the new format, for indications, see the README in llama.cpp for instance to run gpt4all
.