mirror of
https://github.com/mudler/LocalAI.git
synced 2024-12-18 20:27:57 +00:00
docs: add aikit to integrations (#1412)
* docs: add aikit to integrations Signed-off-by: Sertac Ozercan <sozercan@gmail.com> * docs: add to readme Signed-off-by: Sertac Ozercan <sozercan@gmail.com> --------- Signed-off-by: Sertac Ozercan <sozercan@gmail.com> Co-authored-by: lunamidori5 <118759930+lunamidori5@users.noreply.github.com>
This commit is contained in:
parent
86fac272d8
commit
1b7ed5e2e6
19
README.md
19
README.md
@ -21,7 +21,7 @@
|
||||
</p>
|
||||
|
||||
> :bulb: Get help - [❓FAQ](https://localai.io/faq/) [💭Discussions](https://github.com/go-skynet/LocalAI/discussions) [:speech_balloon: Discord](https://discord.gg/uJAeKSAGDy) [:book: Documentation website](https://localai.io/)
|
||||
>
|
||||
>
|
||||
> [💻 Quickstart](https://localai.io/basics/getting_started/) [📣 News](https://localai.io/basics/news/) [ 🛫 Examples ](https://github.com/go-skynet/LocalAI/tree/master/examples/) [ 🖼️ Models ](https://localai.io/models/) [ 🚀 Roadmap ](https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3Aroadmap)
|
||||
|
||||
[![tests](https://github.com/go-skynet/LocalAI/actions/workflows/test.yml/badge.svg)](https://github.com/go-skynet/LocalAI/actions/workflows/test.yml)[![Build and Release](https://github.com/go-skynet/LocalAI/actions/workflows/release.yaml/badge.svg)](https://github.com/go-skynet/LocalAI/actions/workflows/release.yaml)[![build container images](https://github.com/go-skynet/LocalAI/actions/workflows/image.yml/badge.svg)](https://github.com/go-skynet/LocalAI/actions/workflows/image.yml)[![Bump dependencies](https://github.com/go-skynet/LocalAI/actions/workflows/bump_deps.yaml/badge.svg)](https://github.com/go-skynet/LocalAI/actions/workflows/bump_deps.yaml)[![Artifact Hub](https://img.shields.io/endpoint?url=https://artifacthub.io/badge/repository/localai)](https://artifacthub.io/packages/search?repo=localai)
|
||||
@ -54,7 +54,7 @@
|
||||
<p align="center">
|
||||
|
||||
<a href="https://twitter.com/intent/tweet?text=Check%20this%20GitHub%20repository%20out.%20LocalAI%20-%20Let%27s%20you%20easily%20run%20LLM%20locally.&url=https://github.com/go-skynet/LocalAI&hashtags=LocalAI,AI" target="blank">
|
||||
<img src="https://img.shields.io/twitter/follow/_LocalAI?label=Share Repo on Twitter&style=social" alt="Follow _LocalAI"/></a>
|
||||
<img src="https://img.shields.io/twitter/follow/_LocalAI?label=Share Repo on Twitter&style=social" alt="Follow _LocalAI"/></a>
|
||||
<a href="https://t.me/share/url?text=Check%20this%20GitHub%20repository%20out.%20LocalAI%20-%20Let%27s%20you%20easily%20run%20LLM%20locally.&url=https://github.com/go-skynet/LocalAI" target="_blank"><img src="https://img.shields.io/twitter/url?label=Telegram&logo=Telegram&style=social&url=https://github.com/go-skynet/LocalAI" alt="Share on Telegram"/></a>
|
||||
<a href="https://api.whatsapp.com/send?text=Check%20this%20GitHub%20repository%20out.%20LocalAI%20-%20Let%27s%20you%20easily%20run%20LLM%20locally.%20https://github.com/go-skynet/LocalAI"><img src="https://img.shields.io/twitter/url?label=whatsapp&logo=whatsapp&style=social&url=https://github.com/go-skynet/LocalAI" /></a> <a href="https://www.reddit.com/submit?url=https://github.com/go-skynet/LocalAI&title=Check%20this%20GitHub%20repository%20out.%20LocalAI%20-%20Let%27s%20you%20easily%20run%20LLM%20locally.
|
||||
" target="blank">
|
||||
@ -85,12 +85,12 @@ In a nutshell:
|
||||
|
||||
- Local, OpenAI drop-in alternative REST API. You own your data.
|
||||
- NO GPU required. NO Internet access is required either
|
||||
- Optional, GPU Acceleration is available in `llama.cpp`-compatible LLMs. See also the [build section](https://localai.io/basics/build/index.html).
|
||||
- Optional, GPU Acceleration is available in `llama.cpp`-compatible LLMs. See also the [build section](https://localai.io/basics/build/index.html).
|
||||
- Supports multiple models
|
||||
- 🏃 Once loaded the first time, it keep models loaded in memory for faster inference
|
||||
- ⚡ Doesn't shell-out, but uses C++ bindings for a faster inference and better performance.
|
||||
|
||||
LocalAI was created by [Ettore Di Giacinto](https://github.com/mudler/) and is a community-driven project, focused on making the AI accessible to anyone. Any contribution, feedback and PR is welcome!
|
||||
LocalAI was created by [Ettore Di Giacinto](https://github.com/mudler/) and is a community-driven project, focused on making the AI accessible to anyone. Any contribution, feedback and PR is welcome!
|
||||
|
||||
Note that this started just as a [fun weekend project](https://localai.io/#backstory) in order to try to create the necessary pieces for a full AI assistant like `ChatGPT`: the community is growing fast and we are working hard to make it better and more stable. If you want to help, please consider contributing (see below)!
|
||||
|
||||
@ -112,6 +112,9 @@ Check out the [Getting started](https://localai.io/basics/getting_started/index.
|
||||
|
||||
### 🔗 Community and integrations
|
||||
|
||||
Build and deploy custom containers:
|
||||
- https://github.com/sozercan/aikit
|
||||
|
||||
WebUIs:
|
||||
- https://github.com/Jirubizu/localai-admin
|
||||
- https://github.com/go-skynet/LocalAI-frontend
|
||||
@ -129,7 +132,7 @@ Other:
|
||||
- [How to install in Kubernetes](https://localai.io/basics/getting_started/index.html#run-localai-in-kubernetes)
|
||||
- [Projects integrating LocalAI](https://localai.io/integrations/)
|
||||
- [How tos section](https://localai.io/howtos/) (curated by our community)
|
||||
|
||||
|
||||
## :book: 🎥 [Media, Blogs, Social](https://localai.io/basics/news/#media-blogs-social)
|
||||
|
||||
- [Create a slackbot for teams and OSS projects that answer to documentation](https://mudler.pm/posts/smart-slackbot-for-teams/)
|
||||
@ -159,12 +162,12 @@ Support the project by becoming [a backer or sponsor](https://github.com/sponsor
|
||||
|
||||
A huge thank you to our generous sponsors who support this project:
|
||||
|
||||
| ![Spectro Cloud logo_600x600px_transparent bg](https://github.com/go-skynet/LocalAI/assets/2420543/68a6f3cb-8a65-4a4d-99b5-6417a8905512) |
|
||||
| ![Spectro Cloud logo_600x600px_transparent bg](https://github.com/go-skynet/LocalAI/assets/2420543/68a6f3cb-8a65-4a4d-99b5-6417a8905512) |
|
||||
|:-----------------------------------------------:|
|
||||
| [Spectro Cloud](https://www.spectrocloud.com/) |
|
||||
| [Spectro Cloud](https://www.spectrocloud.com/) |
|
||||
| Spectro Cloud kindly supports LocalAI by providing GPU and computing resources to run tests on lamdalabs! |
|
||||
|
||||
And a huge shout-out to individuals sponsoring the project by donating hardware or backing the project.
|
||||
And a huge shout-out to individuals sponsoring the project by donating hardware or backing the project.
|
||||
|
||||
- [Sponsor list](https://github.com/sponsors/mudler)
|
||||
- JDAM00 (donating HW for the CI)
|
||||
|
178
docs/content/integrations/AIKit.md
Normal file
178
docs/content/integrations/AIKit.md
Normal file
@ -0,0 +1,178 @@
|
||||
|
||||
+++
|
||||
disableToc = false
|
||||
title = "AIKit"
|
||||
description="AI + BuildKit = AIKit: Build and deploy large language models easily"
|
||||
weight = 2
|
||||
+++
|
||||
|
||||
GitHub Link - https://github.com/sozercan/aikit
|
||||
|
||||
[AIKit](https://github.com/sozercan/aikit) is a quick, easy, and local or cloud-agnostic way to get started to host and deploy large language models (LLMs) for inference. No GPU, internet access or additional tools are needed to get started except for [Docker](https://docs.docker.com/desktop/install/linux-install/)!
|
||||
|
||||
AIKit uses [LocalAI](https://localai.io/) under-the-hood to run inference. LocalAI provides a drop-in replacement REST API that is OpenAI API compatible, so you can use any OpenAI API compatible client, such as [Kubectl AI](https://github.com/sozercan/kubectl-ai), [Chatbot-UI](https://github.com/sozercan/chatbot-ui) and many more, to send requests to open-source LLMs powered by AIKit!
|
||||
|
||||
> At this time, AIKit is tested with LocalAI `llama` backend. Other backends may work but are not tested. Please open an issue if you'd like to see support for other backends.
|
||||
|
||||
## Features
|
||||
|
||||
- 🐳 No GPU, Internet access or additional tools needed except for [Docker](https://docs.docker.com/desktop/install/linux-install/)!
|
||||
- 🤏 Minimal image size, resulting in less vulnerabilities and smaller attack surface with a custom [distroless](https://github.com/GoogleContainerTools/distroless)-based image
|
||||
- 🚀 Easy to use declarative configuration
|
||||
- ✨ OpenAI API compatible to use with any OpenAI API compatible client
|
||||
- 🚢 Kubernetes deployment ready
|
||||
- 📦 Supports multiple models with a single image
|
||||
- 🖥️ Supports GPU-accelerated inferencing with NVIDIA GPUs
|
||||
- 🔐 Signed images for `aikit` and pre-made models
|
||||
|
||||
## Pre-made Models
|
||||
|
||||
AIKit comes with pre-made models that you can use out-of-the-box!
|
||||
|
||||
### CPU
|
||||
- 🦙 Llama 2 7B Chat: `ghcr.io/sozercan/llama2:7b`
|
||||
- 🦙 Llama 2 13B Chat: `ghcr.io/sozercan/llama2:13b`
|
||||
- 🐬 Orca 2 13B: `ghcr.io/sozercan/orca2:13b`
|
||||
|
||||
### NVIDIA CUDA
|
||||
|
||||
- 🦙 Llama 2 7B Chat (CUDA): `ghcr.io/sozercan/llama2:7b-cuda`
|
||||
- 🦙 Llama 2 13B Chat (CUDA): `ghcr.io/sozercan/llama2:13b-cuda`
|
||||
- 🐬 Orca 2 13B (CUDA): `ghcr.io/sozercan/orca2:13b-cuda`
|
||||
|
||||
> CUDA models includes CUDA v12. They are used with [NVIDIA GPU acceleration](#gpu-acceleration-support).
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Creating an image
|
||||
|
||||
> This section shows how to create a custom image with models of your choosing. If you want to use one of the pre-made models, skip to [running models](#running-models).
|
||||
>
|
||||
> Please see [models folder](./models/) for pre-made model definitions. You can find more model examples at [go-skynet/model-gallery](https://github.com/go-skynet/model-gallery).
|
||||
|
||||
Create an `aikitfile.yaml` with the following structure:
|
||||
|
||||
```yaml
|
||||
#syntax=ghcr.io/sozercan/aikit:latest
|
||||
apiVersion: v1alpha1
|
||||
models:
|
||||
- name: llama-2-7b-chat
|
||||
source: https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf
|
||||
```
|
||||
|
||||
> This is the simplest way to get started to build an image. For full `aikitfile` specification, see [specs](docs/specs.md).
|
||||
|
||||
First, create a buildx buildkit instance. Alternatively, if you are using Docker v24 with [containerd image store](https://docs.docker.com/storage/containerd/) enabled, you can skip this step.
|
||||
|
||||
```bash
|
||||
docker buildx create --use --name aikit-builder
|
||||
```
|
||||
|
||||
Then build your image with:
|
||||
|
||||
```bash
|
||||
docker buildx build . -t my-model -f aikitfile.yaml --load
|
||||
```
|
||||
|
||||
This will build a local container image with your model(s). You can see the image with:
|
||||
|
||||
```bash
|
||||
docker images
|
||||
REPOSITORY TAG IMAGE ID CREATED SIZE
|
||||
my-model latest e7b7c5a4a2cb About an hour ago 5.51GB
|
||||
```
|
||||
|
||||
### Running models
|
||||
|
||||
You can start the inferencing server for your models with:
|
||||
|
||||
```bash
|
||||
# for pre-made models, replace "my-model" with the image name
|
||||
docker run -d --rm -p 8080:8080 my-model
|
||||
```
|
||||
|
||||
You can then send requests to `localhost:8080` to run inference from your models. For example:
|
||||
|
||||
```bash
|
||||
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
|
||||
"model": "llama-2-7b-chat",
|
||||
"messages": [{"role": "user", "content": "explain kubernetes in a sentence"}]
|
||||
}'
|
||||
{"created":1701236489,"object":"chat.completion","id":"dd1ff40b-31a7-4418-9e32-42151ab6875a","model":"llama-2-7b-chat","choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"\nKubernetes is a container orchestration system that automates the deployment, scaling, and management of containerized applications in a microservices architecture."}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
|
||||
```
|
||||
|
||||
## Kubernetes Deployment
|
||||
|
||||
It is easy to get started to deploy your models to Kubernetes!
|
||||
|
||||
Make sure you have a Kubernetes cluster running and `kubectl` is configured to talk to it, and your model images are accessible from the cluster.
|
||||
|
||||
> You can use [kind](https://kind.sigs.k8s.io/) to create a local Kubernetes cluster for testing purposes.
|
||||
|
||||
```bash
|
||||
# create a deployment
|
||||
# for pre-made models, replace "my-model" with the image name
|
||||
kubectl create deployment my-llm-deployment --image=my-model
|
||||
|
||||
# expose it as a service
|
||||
kubectl expose deployment my-llm-deployment --port=8080 --target-port=8080 --name=my-llm-service
|
||||
|
||||
# easy to scale up and down as needed
|
||||
kubectl scale deployment my-llm-deployment --replicas=3
|
||||
|
||||
# port-forward for testing locally
|
||||
kubectl port-forward service/my-llm-service 8080:8080
|
||||
|
||||
# send requests to your model
|
||||
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
|
||||
"model": "llama-2-7b-chat",
|
||||
"messages": [{"role": "user", "content": "explain kubernetes in a sentence"}]
|
||||
}'
|
||||
{"created":1701236489,"object":"chat.completion","id":"dd1ff40b-31a7-4418-9e32-42151ab6875a","model":"llama-2-7b-chat","choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"\nKubernetes is a container orchestration system that automates the deployment, scaling, and management of containerized applications in a microservices architecture."}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
|
||||
```
|
||||
|
||||
> For an example Kubernetes deployment and service YAML, see [kubernetes folder](./kubernetes/). Please note that these are examples, you may need to customize them (such as properly configured resource requests and limits) based on your needs.
|
||||
|
||||
## GPU Acceleration Support
|
||||
|
||||
> At this time, only NVIDIA GPU acceleration is supported. Please open an issue if you'd like to see support for other GPU vendors.
|
||||
|
||||
### NVIDIA
|
||||
|
||||
AIKit supports GPU accelerated inferencing with [NVIDIA Container Toolkit](https://github.com/NVIDIA/nvidia-container-toolkit). You must also have [NVIDIA Drivers](https://www.nvidia.com/en-us/drivers/unix/) installed on your host machine.
|
||||
|
||||
For Kubernetes, [NVIDIA GPU Operator](https://github.com/NVIDIA/gpu-operator) provides a streamlined way to install the NVIDIA drivers and container toolkit to configure your cluster to use GPUs.
|
||||
|
||||
To get started with GPU-accelerated inferencing, make sure to set the following in your `aikitfile` and build your model.
|
||||
|
||||
```yaml
|
||||
runtime: cuda # use NVIDIA CUDA runtime
|
||||
f16: true # use float16 precision
|
||||
gpu_layers: 35 # number of layers to offload to GPU
|
||||
low_vram: true # for devices with low VRAM
|
||||
```
|
||||
|
||||
> Make sure to customize these values based on your model and GPU specs.
|
||||
|
||||
After building the model, you can run it with [`--gpus all`](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html#gpu-enumeration) flag to enable GPU support:
|
||||
|
||||
```bash
|
||||
# for pre-made models, replace "my-model" with the image name
|
||||
docker run --rm --gpus all -p 8080:8080 my-model
|
||||
```
|
||||
|
||||
If GPU acceleration is working, you'll see output that is similar to following in the debug logs:
|
||||
|
||||
```bash
|
||||
5:32AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:43735): stderr ggml_init_cublas: found 1 CUDA devices:
|
||||
5:32AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:43735): stderr Device 0: Tesla T4, compute capability 7.5
|
||||
...
|
||||
5:32AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:43735): stderr llm_load_tensors: using CUDA for GPU acceleration
|
||||
5:32AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:43735): stderr llm_load_tensors: mem required = 70.41 MB (+ 2048.00 MB per state)
|
||||
5:32AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:43735): stderr llm_load_tensors: offloading 32 repeating layers to GPU
|
||||
5:32AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:43735): stderr llm_load_tensors: offloading non-repeating layers to GPU
|
||||
5:32AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:43735): stderr llm_load_tensors: offloading v cache to GPU
|
||||
5:32AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:43735): stderr llm_load_tensors: offloading k cache to GPU
|
||||
5:32AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:43735): stderr llm_load_tensors: offloaded 35/35 layers to GPU
|
||||
5:32AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:43735): stderr llm_load_tensors: VRAM used: 5869 MB
|
||||
```
|
Loading…
Reference in New Issue
Block a user