LocalAI/_index.en.md at 69f53211a17f0cad77c7011b43f314366fdb1e54

mirror of https://github.com/mudler/LocalAI.git synced 2024-12-23 14:32:25 +00:00

Ettore Di Giacinto c5c77d2b0d

docs: Initial import from localai-website (#1312 )

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2023-11-22 18:13:50 +01:00

9.5 KiB

Raw Blame History

+++ disableToc = false title = "Model compatibility" weight = 4 +++

LocalAI is compatible with the models supported by llama.cpp supports also GPT4ALL-J and cerebras-GPT with ggml.

LocalAI will attempt to automatically load models which are not explicitly configured for a specific backend. You can specify the backend to use by configuring a model with a YAML file. See [the advanced section]({{%relref "advanced" %}}) for more details.

Hardware requirements

Depending on the model you are attempting to run might need more RAM or CPU resources. Check out also here for ggml based backends. rwkv is less expensive on resources.

Model compatibility table

Besides llama based models, LocalAI is compatible also with other architectures. The table below lists all the compatible models families and the associated binding repository.

Backend and Bindings	Compatible models	Completion/Chat endpoint	Capability	Embeddings support	Token stream support	Acceleration
[llama.cpp]({{%relref "model-compatibility/llama-cpp" %}})	Vicuna, Alpaca, LLaMa	yes	GPT and Functions	yes**	yes	CUDA, openCL, cuBLAS, Metal
gpt4all-llama	Vicuna, Alpaca, LLaMa	yes	GPT	no	yes	N/A
gpt4all-mpt	MPT	yes	GPT	no	yes	N/A
gpt4all-j	GPT4ALL-J	yes	GPT	no	yes	N/A
falcon-ggml (binding)	Falcon (*)	yes	GPT	no	no	N/A
gpt2 (binding)	GPT2, Cerebras	yes	GPT	no	no	N/A
dolly (binding)	Dolly	yes	GPT	no	no	N/A
gptj (binding)	GPTJ	yes	GPT	no	no	N/A
mpt (binding)	MPT	yes	GPT	no	no	N/A
replit (binding)	Replit	yes	GPT	no	no	N/A
gptneox (binding)	GPT NeoX, RedPajama, StableLM	yes	GPT	no	no	N/A
starcoder (binding)	Starcoder	yes	GPT	no	no	N/A
bloomz (binding)	Bloom	yes	GPT	no	no	N/A
rwkv (binding)	rwkv	yes	GPT	no	yes	N/A
bert (binding)	bert	no	Embeddings only	yes	no	N/A
whisper	whisper	no	Audio	no	no	N/A
stablediffusion (binding)	stablediffusion	no	Image	no	no	N/A
langchain-huggingface	Any text generators available on HuggingFace through API	yes	GPT	no	no	N/A
piper (binding)	Any piper onnx model	no	Text to voice	no	no	N/A
falcon (binding)	Falcon ***	yes	GPT	no	yes	CUDA
`huggingface-embeddings` sentence-transformers	BERT	no	Embeddings only	yes	no	N/A
`bark`	bark	no	Audio generation	no	no	yes
`AutoGPTQ`	GPTQ	yes	GPT	yes	no	N/A
`exllama`	GPTQ	yes	GPT only	no	no	N/A
`diffusers`	SD,...	no	Image generation	no	no	N/A
`vall-e-x`	Vall-E	no	Audio generation and Voice cloning	no	no	CPU/CUDA
`vllm`	Various GPTs and quantization formats	yes	GPT	no	no	CPU/CUDA

Note: any backend name listed above can be used in the backend field of the model configuration file (See [the advanced section]({{%relref "advanced" %}})).

* 7b ONLY
** doesn't seem to be accurate
*** 7b and 40b with the ggccv format, for instance: https://huggingface.co/TheBloke/WizardLM-Uncensored-Falcon-40B-GGML

Tested with:

Automatically by CI with OpenLLAMA and GPT4ALL.
LLaMA 🦙
Vicuna
Alpaca
GPT4ALL (see also using GPT4All)
GPT4ALL-J (no changes required)
Koala 🐨
Cerebras-GPT
WizardLM
RWKV models with rwkv.cpp
bloom.cpp
Chinese LLaMA / Alpaca
Vigogne (French)
OpenBuddy 🐶 (Multilingual)
Pygmalion 7B / Metharme 7B
HuggingFace Inference models available through API
Falcon

Note: You might need to convert some models from older models to the new format, for indications, see the README in llama.cpp for instance to run gpt4all.

9.5 KiB Raw Blame History

Hardware requirements

Model compatibility table

9.5 KiB

Raw Blame History