chore: drop examples folder now that LocalAI-examples has been created (#4017)

Signed-off-by: Dave Lee <dave@gray101.com>
2025-06-14 21:18:07 +00:00 · 2024-10-30 04:10:33 -04:00
parent 3d4bb757d2
commit cde0139363
159 changed files with 28 additions and 8960 deletions
--- a/Generation/musicgen.bru
+++ b/Generation/musicgen.bru
--- a/examples/bruno/LocalAI
+++ b/examples/bruno/LocalAI
--- a/monitor/backend-shutdown.bru
+++ b/monitor/backend-shutdown.bru
--- a/examples/bruno/LocalAI
+++ b/examples/bruno/LocalAI
--- a/Requests/environments/localhost.bru
+++ b/Requests/environments/localhost.bru
--- a/examples/bruno/LocalAI
+++ b/examples/bruno/LocalAI
--- a/examples/bruno/LocalAI
+++ b/examples/bruno/LocalAI
--- a/examples/bruno/LocalAI
+++ b/examples/bruno/LocalAI
--- a/examples/bruno/LocalAI
+++ b/examples/bruno/LocalAI
--- a/examples/bruno/LocalAI
+++ b/examples/bruno/LocalAI
--- a/examples/bruno/LocalAI
+++ b/examples/bruno/LocalAI
--- a/text/chat/chat-completions
+++ b/text/chat/chat-completions
--- a/text/chat/chat-completions
+++ b/text/chat/chat-completions
--- a/examples/bruno/LocalAI
+++ b/examples/bruno/LocalAI
--- a/examples/bruno/LocalAI
+++ b/examples/bruno/LocalAI
--- a/examples/bruno/LocalAI
+++ b/examples/bruno/LocalAI
--- a/examples/bruno/LocalAI
+++ b/examples/bruno/LocalAI
--- a/Requests/model
+++ b/Requests/model
@ -0,0 +1,11 @@
 meta {
  name: model delete
  type: http
  seq: 7
 }
 post {
  url: {{PROTOCOL}}{{HOST}}:{{PORT}}/models/galleries
  body: none
  auth: none
 }
--- a/examples/bruno/LocalAI
+++ b/examples/bruno/LocalAI
--- a/examples/bruno/LocalAI
+++ b/examples/bruno/LocalAI
--- a/Requests/transcription/gb1.ogg
+++ b/Requests/transcription/gb1.ogg
--- a/Requests/transcription/transcribe.bru
+++ b/Requests/transcription/transcribe.bru
@ -0,0 +1,16 @@
 meta {
  name: transcribe
  type: http
  seq: 1
 }
 post {
  url: {{PROTOCOL}}{{HOST}}:{{PORT}}/v1/audio/transcriptions
  body: multipartForm
  auth: none
 }
 body:multipart-form {
  file: @file(transcription/gb1.ogg)
  model: whisper-1
 }
--- a/examples/bruno/LocalAI
+++ b/examples/bruno/LocalAI
--- a/Requests/tts/musicgen.bru
+++ b/Requests/tts/musicgen.bru
--- a/README.md
+++ b/README.md
@ -85,6 +85,7 @@ local-ai run oci://localai/phi-2:latest
 ## 📰 Latest project news
 - Oct 2024: examples moved to [LocalAI-examples](https://github.com/mudler/LocalAI-examples)
 - Aug 2024:  🆕 FLUX-1, [P2P Explorer](https://explorer.localai.io)
 - July 2024: 🔥🔥 🆕 P2P Dashboard, LocalAI Federated mode and AI Swarms: https://github.com/mudler/LocalAI/pull/2723
 - June 2024: 🆕 You can browse now the model gallery without LocalAI! Check out https://models.localai.io
--- a/examples/README.md
+++ b/examples/README.md
@ -1,190 +0,0 @@
 # Examples
 | [ChatGPT OSS alternative](https://github.com/go-skynet/LocalAI/tree/master/examples/chatbot-ui)                                                                                                                | [Image generation](https://localai.io/api-endpoints/index.html#image-generation)                                                                                                              |
 |------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------|
 |  ![Screenshot from 2023-04-26 23-59-55](https://user-images.githubusercontent.com/2420543/234715439-98d12e03-d3ce-4f94-ab54-2b256808e05e.png)            | ![b6441997879](https://github.com/go-skynet/LocalAI/assets/2420543/d50af51c-51b7-4f39-b6c2-bf04c403894c)                  |
 |                                                                    [Telegram bot](https://github.com/go-skynet/LocalAI/tree/master/examples/telegram-bot)   | [Flowise](https://github.com/go-skynet/LocalAI/tree/master/examples/flowise)                                                                                                                     |
 |------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------|
 ![Screenshot from 2023-06-09 00-36-26](https://github.com/go-skynet/LocalAI/assets/2420543/e98b4305-fa2d-41cf-9d2f-1bb2d75ca902)   |  ![Screenshot from 2023-05-30 18-01-03](https://github.com/go-skynet/LocalAI/assets/2420543/02458782-0549-4131-971c-95ee56ec1af8)|    |
 Here is a list of projects that can easily be integrated with the LocalAI backend. 
 ### Projects
 ### AutoGPT
 _by [@mudler](https://github.com/mudler)_
 This example shows how to use AutoGPT with LocalAI.
 [Check it out here](https://github.com/go-skynet/LocalAI/tree/master/examples/autoGPT/)
 ### Chatbot-UI
 _by [@mkellerman](https://github.com/mkellerman)_
 ![Screenshot from 2023-04-26 23-59-55](https://user-images.githubusercontent.com/2420543/234715439-98d12e03-d3ce-4f94-ab54-2b256808e05e.png)
 This integration shows how to use LocalAI with [mckaywrigley/chatbot-ui](https://github.com/mckaywrigley/chatbot-ui).
 [Check it out here](https://github.com/go-skynet/LocalAI/tree/master/examples/chatbot-ui/)
 There is also a separate example to show how to manually setup a model: [example](https://github.com/go-skynet/LocalAI/tree/master/examples/chatbot-ui-manual/)
 ### K8sGPT
 _by [@mudler](https://github.com/mudler)_
 This example show how to use LocalAI inside Kubernetes with [k8sgpt](https://k8sgpt.ai).
 ![Screenshot from 2023-06-19 23-58-47](https://github.com/go-skynet/go-ggml-transformers.cpp/assets/2420543/cab87409-ee68-44ae-8d53-41627fb49509)
 ### Fine-tuning a model and convert it to gguf to use it with LocalAI
 _by [@mudler](https://github.com/mudler)_
 This example is an e2e example on how to fine-tune a model with [axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) and convert it to gguf to use it with LocalAI.
 [Check it out here](https://github.com/mudler/LocalAI/tree/master/examples/e2e-fine-tuning/)
 ### Flowise
 _by [@mudler](https://github.com/mudler)_
 This example shows how to use [FlowiseAI/Flowise](https://github.com/FlowiseAI/Flowise) with LocalAI.
 [Check it out here](https://github.com/go-skynet/LocalAI/tree/master/examples/flowise/)
 ### Discord bot
 _by [@mudler](https://github.com/mudler)_
 Run a discord bot which lets you talk directly with a model
 [Check it out here](https://github.com/go-skynet/LocalAI/tree/master/examples/discord-bot/), or for a live demo you can talk with our bot in #random-bot in our discord server.
 ### Langchain
 _by [@dave-gray101](https://github.com/dave-gray101)_
 A ready to use example to show e2e how to integrate LocalAI with langchain
 [Check it out here](https://github.com/go-skynet/LocalAI/tree/master/examples/langchain/)
 ### Langchain Python
 _by [@mudler](https://github.com/mudler)_
 A ready to use example to show e2e how to integrate LocalAI with langchain
 [Check it out here](https://github.com/go-skynet/LocalAI/tree/master/examples/langchain-python/)
 ### LocalAI functions
 _by [@mudler](https://github.com/mudler)_
 A ready to use example to show how to use OpenAI functions with LocalAI
 [Check it out here](https://github.com/go-skynet/LocalAI/tree/master/examples/functions/)
 ### LocalAI WebUI
 _by [@dhruvgera](https://github.com/dhruvgera)_
 ![image](https://user-images.githubusercontent.com/42107491/235344183-44b5967d-ba22-4331-804c-8da7004a5d35.png)
 A light, community-maintained web interface for LocalAI
 [Check it out here](https://github.com/go-skynet/LocalAI/tree/master/examples/localai-webui/)
 ### How to run rwkv models
 _by [@mudler](https://github.com/mudler)_
 A full example on how to run RWKV models with LocalAI
 [Check it out here](https://github.com/go-skynet/LocalAI/tree/master/examples/rwkv/)
 ### PrivateGPT
 _by [@mudler](https://github.com/mudler)_
 A full example on how to run PrivateGPT with LocalAI
 [Check it out here](https://github.com/go-skynet/LocalAI/tree/master/examples/privateGPT/)
 ### Slack bot
 _by [@mudler](https://github.com/mudler)_
 Run a slack bot which lets you talk directly with a model
 [Check it out here](https://github.com/go-skynet/LocalAI/tree/master/examples/slack-bot/)
 ### Slack bot (Question answering)
 _by [@mudler](https://github.com/mudler)_
 Run a slack bot, ideally for teams, which lets you ask questions on a documentation website, or a github repository.
 [Check it out here](https://github.com/go-skynet/LocalAI/tree/master/examples/slack-qa-bot/)
 ### Question answering on documents with llama-index
 _by [@mudler](https://github.com/mudler)_
 Shows how to integrate with [Llama-Index](https://gpt-index.readthedocs.io/en/stable/getting_started/installation.html) to enable question answering on a set of documents.
 [Check it out here](https://github.com/go-skynet/LocalAI/tree/master/examples/query_data/)
 ### Question answering on documents with langchain and chroma
 _by [@mudler](https://github.com/mudler)_
 Shows how to integrate with `Langchain` and `Chroma` to enable question answering on a set of documents.
 [Check it out here](https://github.com/go-skynet/LocalAI/tree/master/examples/langchain-chroma/)
 ### Telegram bot
 _by [@mudler](https://github.com/mudler)
 ![Screenshot from 2023-06-09 00-36-26](https://github.com/go-skynet/LocalAI/assets/2420543/e98b4305-fa2d-41cf-9d2f-1bb2d75ca902)
 Use LocalAI to power a Telegram bot assistant, with Image generation and audio support!
 [Check it out here](https://github.com/go-skynet/LocalAI/tree/master/examples/telegram-bot/)
 ### Template for Runpod.io
 _by [@fHachenberg](https://github.com/fHachenberg)_
 Allows to run any LocalAI-compatible model as a backend on the servers of https://runpod.io
 [Check it out here](https://runpod.io/gsc?template=uv9mtqnrd0&ref=984wlcra)
 ### Continue
 _by [@gruberdev](https://github.com/gruberdev)_
 <img src="continue/img/screen.png" width="600" height="200" alt="Screenshot">
 Demonstrates how to integrate an open-source copilot alternative that enhances code analysis, completion, and improvements. This approach seamlessly integrates with any LocalAI model, offering a more user-friendly experience.
 [Check it out here](https://github.com/go-skynet/LocalAI/tree/master/examples/continue/)
 ### Streamlit bot
 _by [@majoshi1](https://github.com/majoshi1)_
 ![Screenshot](streamlit-bot/streamlit-bot.png)
 A chat bot made using `Streamlit` & LocalAI.
 [Check it out here](https://github.com/go-skynet/LocalAI/tree/master/examples/streamlit-bot/)
 ## Want to contribute?
 Create an issue, and put `Example: <description>` in the title! We will post your examples here.
--- a/examples/autoGPT/.env.example
+++ b/examples/autoGPT/.env.example
@ -1,9 +0,0 @@
 # CPU .env docs: https://localai.io/howtos/easy-setup-docker-cpu/
 # GPU .env docs: https://localai.io/howtos/easy-setup-docker-gpu/
 OPENAI_API_KEY=sk---anystringhere
 OPENAI_API_BASE=http://api:8080/v1
 # Models to preload at start
 # Here we configure gpt4all as gpt-3.5-turbo and bert as embeddings,
 # see other options in the model gallery at https://github.com/go-skynet/model-gallery
 PRELOAD_MODELS=[{"url": "github:go-skynet/model-gallery/gpt4all-j.yaml", "name": "gpt-3.5-turbo"}, { "url": "github:go-skynet/model-gallery/bert-embeddings.yaml", "name": "text-embedding-ada-002"}]
--- a/examples/autoGPT/README.md
+++ b/examples/autoGPT/README.md
@ -1,36 +0,0 @@
 # AutoGPT
 Example of integration with [AutoGPT](https://github.com/Significant-Gravitas/Auto-GPT).
 ## Run
 ```bash
 # Clone LocalAI
 git clone https://github.com/go-skynet/LocalAI
 cd LocalAI/examples/autoGPT
 cp -rfv .env.example .env
 # Edit the .env file to set a different model by editing `PRELOAD_MODELS`.
 vim .env
 docker-compose run --rm auto-gpt
 ```
 Note: The example automatically downloads the `gpt4all` model as it is under a permissive license. The GPT4All model does not seem to be enough to run AutoGPT. WizardLM-7b-uncensored seems to perform better (with `f16: true`).
 ## Without docker
 Run AutoGPT with `OPENAI_API_BASE` pointing to the LocalAI endpoint. If you run it locally for instance:
 ```
 OPENAI_API_BASE=http://localhost:8080 python ...
 ```
 Note: you need a model named `gpt-3.5-turbo` and `text-embedding-ada-002`. You can preload those in LocalAI at start by setting in the env:
 ```
 PRELOAD_MODELS=[{"url": "github:go-skynet/model-gallery/gpt4all-j.yaml", "name": "gpt-3.5-turbo"}, { "url": "github:go-skynet/model-gallery/bert-embeddings.yaml", "name": "text-embedding-ada-002"}]
 ```
--- a/examples/autoGPT/docker-compose.yaml
+++ b/examples/autoGPT/docker-compose.yaml
@ -1,42 +0,0 @@
 version: "3.9"
 services:
  api:
    image: quay.io/go-skynet/local-ai:latest
    ports:
      - 8080:8080
    env_file:
      - .env
    environment:
      - DEBUG=true
      - MODELS_PATH=/models
    volumes:
      - ./models:/models:cached
    command: ["/usr/bin/local-ai" ]
  auto-gpt:
    image: significantgravitas/auto-gpt
    depends_on:
      api:
        condition: service_healthy
      redis:
        condition: service_started
    env_file:
      - .env
    environment:
      MEMORY_BACKEND: ${MEMORY_BACKEND:-redis}
      REDIS_HOST: ${REDIS_HOST:-redis}
    profiles: ["exclude-from-up"]
    volumes:
      - ./auto_gpt_workspace:/app/autogpt/auto_gpt_workspace
      - ./data:/app/data
      ## allow auto-gpt to write logs to disk
      - ./logs:/app/logs
      ## uncomment following lines if you want to make use of these files
      ## you must have them existing in the same folder as this docker-compose.yml
      #- type: bind
      #  source: ./azure.yaml
      #  target: /app/azure.yaml
      #- type: bind
      #  source: ./ai_settings.yaml
      #  target: /app/ai_settings.yaml
  redis:
    image: "redis/redis-stack-server:latest"
--- a/examples/chainlit/Dockerfile
+++ b/examples/chainlit/Dockerfile
@ -1,25 +0,0 @@
 # Use an official Python runtime as a parent image
 FROM python:3.12-slim
 # Set the working directory in the container
 WORKDIR /app
 # Copy the current directory contents into the container at /app
 COPY requirements.txt /app
 # Install c++ compiler
 RUN apt-get update \
 && DEBIAN_FRONTEND=noninteractive apt-get install -y build-essential \
 && apt-get clean \
 && rm -rf /var/lib/apt/lists/*
 # Install any needed packages specified in requirements.txt
 RUN pip install --no-cache-dir -r requirements.txt \
 && DEBIAN_FRONTEND=noninteractive apt-get remove -y build-essential \
 && apt-get clean \
 && rm -rf /var/lib/apt/lists/*
 COPY . /app
 # Run app.py when the container launches
 CMD ["chainlit", "run", "-h", "--host", "0.0.0.0", "main.py" ]
--- a/examples/chainlit/README.md
+++ b/examples/chainlit/README.md
@ -1,25 +0,0 @@
 # LocalAI Demonstration with Embeddings and Chainlit
 This demonstration shows you how to use embeddings with existing data in `LocalAI`, and how to integrate it with Chainlit for an interactive querying experience. We are using the `llama_index` library to facilitate the embedding and querying processes, and `chainlit` to provide an interactive interface. The `Weaviate` client is used as the embedding source.
 ## Prerequisites
 Before proceeding, make sure you have the following installed:
 - Weaviate client
 - LocalAI and its dependencies
 - Chainlit and its dependencies
 ## Getting Started
 1. Clone this repository:
 2. Navigate to the project directory:
 3. Run the example: `chainlit run main.py`
 # Highlight on `llama_index` and `chainlit`
 `llama_index` is the key library that facilitates the process of embedding and querying data in LocalAI. It provides a seamless interface to integrate various components, such as `WeaviateVectorStore`, `LocalAI`, `ServiceContext`, and more, for a smooth querying experience.
 `chainlit` is used to provide an interactive interface for users to query the data and see the results in real-time. It integrates with llama_index to handle the querying process and display the results to the user.
 In this example, `llama_index` is used to set up the `VectorStoreIndex` and `QueryEngine`, and `chainlit` is used to handle the user interactions with `LocalAI` and display the results.
--- a/examples/chainlit/config.yaml
+++ b/examples/chainlit/config.yaml
@ -1,16 +0,0 @@
 localAI:
  temperature: 0
  modelName: gpt-3.5-turbo
  apiBase: http://local-ai.default
  apiKey: stub
  streaming: True
 weviate:
  url: http://weviate.local
  index: AIChroma
 query:
  mode: hybrid
  topK: 1
  alpha: 0.0
  chunkSize: 1024
 embedding:
  model: BAAI/bge-small-en-v1.5
--- a/examples/chainlit/main.py
+++ b/examples/chainlit/main.py
@ -1,82 +0,0 @@
 import os
 import weaviate
 from llama_index.storage.storage_context import StorageContext
 from llama_index.vector_stores import WeaviateVectorStore
 from llama_index.query_engine.retriever_query_engine import RetrieverQueryEngine
 from llama_index.callbacks.base import CallbackManager
 from llama_index import (
    LLMPredictor,
    ServiceContext,
    StorageContext,
    VectorStoreIndex,
 )
 import chainlit as cl
 from llama_index.llms import LocalAI
 from llama_index.embeddings import HuggingFaceEmbedding
 import yaml
 # Load the configuration file
 with open("config.yaml", "r") as ymlfile:
    cfg = yaml.safe_load(ymlfile)
 # Get the values from the configuration file or set the default values
 temperature = cfg['localAI'].get('temperature', 0)
 model_name = cfg['localAI'].get('modelName', "gpt-3.5-turbo")
 api_base = cfg['localAI'].get('apiBase', "http://local-ai.default")
 api_key = cfg['localAI'].get('apiKey', "stub")
 streaming = cfg['localAI'].get('streaming', True)
 weaviate_url = cfg['weviate'].get('url', "http://weviate.default")
 index_name = cfg['weviate'].get('index', "AIChroma")
 query_mode = cfg['query'].get('mode', "hybrid")
 topK = cfg['query'].get('topK', 1)
 alpha = cfg['query'].get('alpha', 0.0)
 embed_model_name = cfg['embedding'].get('model', "BAAI/bge-small-en-v1.5")
 chunk_size = cfg['query'].get('chunkSize', 1024)
 embed_model = HuggingFaceEmbedding(model_name=embed_model_name)
 llm = LocalAI(temperature=temperature, model_name=model_name, api_base=api_base, api_key=api_key, streaming=streaming)
 llm.globally_use_chat_completions = True;
 client = weaviate.Client(weaviate_url)
 vector_store = WeaviateVectorStore(weaviate_client=client, index_name=index_name)
 storage_context = StorageContext.from_defaults(vector_store=vector_store)
@cl.on_chat_start
 async def factory():
    llm_predictor = LLMPredictor(
        llm=llm
    )
    service_context = ServiceContext.from_defaults(embed_model=embed_model, callback_manager=CallbackManager([cl.LlamaIndexCallbackHandler()]), llm_predictor=llm_predictor, chunk_size=chunk_size)
    index = VectorStoreIndex.from_vector_store(
        vector_store,
        storage_context=storage_context,
        service_context=service_context
    )
    query_engine = index.as_query_engine(vector_store_query_mode=query_mode, similarity_top_k=topK, alpha=alpha, streaming=True)
    cl.user_session.set("query_engine", query_engine)
@cl.on_message
 async def main(message: cl.Message):
    query_engine = cl.user_session.get("query_engine")
    response = await cl.make_async(query_engine.query)(message.content)
    response_message = cl.Message(content="")
    for token in response.response_gen:
        await response_message.stream_token(token=token)
    if response.response_txt:
        response_message.content = response.response_txt
    await response_message.send()
--- a/examples/chainlit/requirements.txt
+++ b/examples/chainlit/requirements.txt
@ -1,6 +0,0 @@
 llama_index==0.11.20
 requests==2.32.3
 weaviate_client==4.9.0
 transformers
 torch
 chainlit
--- a/examples/chatbot-ui-manual/README.md
+++ b/examples/chatbot-ui-manual/README.md
@ -1,50 +0,0 @@
 # chatbot-ui
 Example of integration with [mckaywrigley/chatbot-ui](https://github.com/mckaywrigley/chatbot-ui).
 ![Screenshot from 2023-04-26 23-59-55](https://user-images.githubusercontent.com/2420543/234715439-98d12e03-d3ce-4f94-ab54-2b256808e05e.png)
 ## Setup
 ```bash
 # Clone LocalAI
 git clone https://github.com/go-skynet/LocalAI
 cd LocalAI/examples/chatbot-ui
 # (optional) Checkout a specific LocalAI tag
 # git checkout -b build <TAG>
 # Download gpt4all-j to models/
 wget https://gpt4all.io/models/ggml-gpt4all-j.bin -O models/ggml-gpt4all-j
 # start with docker-compose
 docker-compose up -d --pull always
 # or you can build the images with:
 # docker-compose up -d --build
 ```
 Then browse to `http://localhost:3000` to view the Web UI.
 ## Pointing chatbot-ui to a separately managed LocalAI service
 If you want to use the [chatbot-ui example](https://github.com/go-skynet/LocalAI/tree/master/examples/chatbot-ui) with an externally managed LocalAI service, you can alter the `docker-compose.yaml` file so that it looks like the below. You will notice the file is smaller, because we have removed the section that would normally start the LocalAI service. Take care to update the IP address (or FQDN) that the chatbot-ui service tries to access (marked `<<LOCALAI_IP>>` below):
 ```yaml
 version: '3.6'
 services:
  chatgpt:
    image: ghcr.io/mckaywrigley/chatbot-ui:main
    ports:
      - 3000:3000
    environment:
      - 'OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXX'
      - 'OPENAI_API_HOST=http://<<LOCALAI_IP>>:8080'
 ```
 Once you've edited the `docker-compose.yaml`, you can start it with `docker compose up`, then browse to `http://localhost:3000` to view the Web UI.
 ## Accessing chatbot-ui
 Open http://localhost:3000 for the Web UI.
--- a/examples/chatbot-ui-manual/docker-compose.yaml
+++ b/examples/chatbot-ui-manual/docker-compose.yaml
@ -1,24 +0,0 @@
 version: '3.6'
 services:
  api:
    image: quay.io/go-skynet/local-ai:latest
    build:
      context: ../../
      dockerfile: Dockerfile
    ports:
      - 8080:8080
    environment:
      - DEBUG=true
      - MODELS_PATH=/models
    volumes:
      - ./models:/models:cached
    command: ["/usr/bin/local-ai" ]
  chatgpt:
    image: ghcr.io/mckaywrigley/chatbot-ui:main
    ports:
      - 3000:3000
    environment:
      - 'OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXX'
      - 'OPENAI_API_HOST=http://api:8080'
--- a/examples/chatbot-ui-manual/models
+++ b/examples/chatbot-ui-manual/models
@ -1 +0,0 @@
 ../models
--- a/examples/chatbot-ui/README.md
+++ b/examples/chatbot-ui/README.md
@ -1,46 +0,0 @@
 # chatbot-ui
 Example of integration with [mckaywrigley/chatbot-ui](https://github.com/mckaywrigley/chatbot-ui).
 ![Screenshot from 2023-04-26 23-59-55](https://user-images.githubusercontent.com/2420543/234715439-98d12e03-d3ce-4f94-ab54-2b256808e05e.png)
 ## Run
 In this example LocalAI will download the gpt4all model and set it up as "gpt-3.5-turbo". See the `docker-compose.yaml`
 ```bash
 # Clone LocalAI
 git clone https://github.com/go-skynet/LocalAI
 cd LocalAI/examples/chatbot-ui
 # start with docker-compose
 docker-compose up --pull always
 # or you can build the images with:
 # docker-compose up -d --build
 ```
 Then browse to `http://localhost:3000` to view the Web UI.
 ## Pointing chatbot-ui to a separately managed LocalAI service
 If you want to use the [chatbot-ui example](https://github.com/go-skynet/LocalAI/tree/master/examples/chatbot-ui) with an externally managed LocalAI service, you can alter the `docker-compose.yaml` file so that it looks like the below. You will notice the file is smaller, because we have removed the section that would normally start the LocalAI service. Take care to update the IP address (or FQDN) that the chatbot-ui service tries to access (marked `<<LOCALAI_IP>>` below):
 ```yaml
 version: '3.6'
 services:
  chatgpt:
    image: ghcr.io/mckaywrigley/chatbot-ui:main
    ports:
      - 3000:3000
    environment:
      - 'OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXX'
      - 'OPENAI_API_HOST=http://<<LOCALAI_IP>>:8080'
 ```
 Once you've edited the `docker-compose.yaml`, you can start it with `docker compose up`, then browse to `http://localhost:3000` to view the Web UI.
 ## Accessing chatbot-ui
 Open http://localhost:3000 for the Web UI.
--- a/examples/chatbot-ui/docker-compose.yaml
+++ b/examples/chatbot-ui/docker-compose.yaml
@ -1,37 +0,0 @@
 version: '3.6'
 services:
  api:
    image: quay.io/go-skynet/local-ai:latest
    # As initially LocalAI will download the models defined in PRELOAD_MODELS
    # you might need to tweak the healthcheck values here according to your network connection.
    # Here we give a timespan of 20m to download all the required files.
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 1m
      timeout: 20m
      retries: 20
    build:
      context: ../../
      dockerfile: Dockerfile
    ports:
      - 8080:8080
    environment:
      - DEBUG=true
      - MODELS_PATH=/models
      # You can preload different models here as well.
      # See: https://github.com/go-skynet/model-gallery
      - 'PRELOAD_MODELS=[{"url": "github:go-skynet/model-gallery/gpt4all-j.yaml", "name": "gpt-3.5-turbo"}]'
    volumes:
      - ./models:/models:cached
    command: ["/usr/bin/local-ai" ]
  chatgpt:
    depends_on:
      api:
        condition: service_healthy
    image: ghcr.io/mckaywrigley/chatbot-ui:main
    ports:
      - 3000:3000
    environment:
      - 'OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXX'
      - 'OPENAI_API_HOST=http://api:8080'
--- a/examples/configurations/README.md
+++ b/examples/configurations/README.md
@ -1,95 +0,0 @@
 ## Advanced configuration
 This section contains examples on how to install models manually with config files.
 ### Prerequisites
 First clone LocalAI:
 ```bash
 git clone https://github.com/go-skynet/LocalAI
 cd LocalAI
 ```
 Setup the model you prefer from the examples below and then start LocalAI:
 ```bash
 docker compose up -d --pull always
 ```
 If LocalAI is already started, you can restart it with 
 ```bash
 docker compose restart
 ```
 See also the getting started: https://localai.io/basics/getting_started/
 You can also start LocalAI just with docker:
 ```
 docker run -p 8080:8080 -v $PWD/models:/models -ti --rm quay.io/go-skynet/local-ai:master --models-path /models --threads 4
 ```
 ### Mistral
 To setup mistral copy the files inside `mistral` in the `models` folder:
 ```bash
 cp -r examples/configurations/mistral/* models/
 ```
 Now download the model:
 ```bash
 wget https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-GGUF/resolve/main/mistral-7b-openorca.Q6_K.gguf -O models/mistral-7b-openorca.Q6_K.gguf
 ```
 ### LLaVA
 ![llava](https://github.com/mudler/LocalAI/assets/2420543/cb0a0897-3b58-4350-af66-e6f4387b58d3)
 #### Setup
 ```
 cp -r examples/configurations/llava/* models/
 wget https://huggingface.co/mys/ggml_bakllava-1/resolve/main/ggml-model-q4_k.gguf -O models/ggml-model-q4_k.gguf
 wget https://huggingface.co/mys/ggml_bakllava-1/resolve/main/mmproj-model-f16.gguf -O models/mmproj-model-f16.gguf
 ```
 #### Try it out
 ```
 curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "llava",
     "messages": [{"role": "user", "content": [{"type":"text", "text": "What is in the image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" }}], "temperature": 0.9}]}'
 ```
 ### Phi-2
 ```
 cp -r examples/configurations/phi-2.yaml models/
 curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "phi-2",
     "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}]
 }'
 ```
 ### Mixtral
 ```
 cp -r examples/configuration/mixtral/* models/
 wget https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF/resolve/main/mixtral-8x7b-instruct-v0.1.Q2_K.gguf -O models/mixtral-8x7b-instruct-v0.1.Q2_K.gguf
 ```
 #### Test it out
 ```
 curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
     "model": "mixtral",
     "prompt": "How fast is light?",                                                                                    
     "temperature": 0.1 }'
 ```
--- a/examples/configurations/llava/chat-simple.tmpl
+++ b/examples/configurations/llava/chat-simple.tmpl
@ -1,3 +0,0 @@
 A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.
 {{.Input}}
 ASSISTANT:
--- a/examples/configurations/llava/llava.yaml
+++ b/examples/configurations/llava/llava.yaml
@ -1,19 +0,0 @@
 backend: llama-cpp
 context_size: 4096
 f16: true
 threads: 11
 gpu_layers: 90
 mmap: true
 name: llava
 roles:
  user: "USER:"
  assistant: "ASSISTANT:"
  system: "SYSTEM:"
 parameters:
  model: ggml-model-q4_k.gguf
  temperature: 0.2
  top_k: 40
  top_p: 0.95
 template:
  chat: chat-simple
 mmproj: mmproj-model-f16.gguf
--- a/examples/configurations/mistral/chatml-block.tmpl
+++ b/examples/configurations/mistral/chatml-block.tmpl
@ -1,3 +0,0 @@
 {{.Input}}
 <|im_start|>assistant
--- a/examples/configurations/mistral/chatml.tmpl
+++ b/examples/configurations/mistral/chatml.tmpl
@ -1,3 +0,0 @@
 <|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "user"}}user{{end}}
 {{if .Content}}{{.Content}}{{end}}
 <|im_end|>
--- a/examples/configurations/mistral/completion.tmpl
+++ b/examples/configurations/mistral/completion.tmpl
@ -1 +0,0 @@
 {{.Input}}
--- a/examples/configurations/mistral/mistral.yaml
+++ b/examples/configurations/mistral/mistral.yaml
@ -1,16 +0,0 @@
 name: mistral
 mmap: true
 parameters:
  model: mistral-7b-openorca.Q6_K.gguf
  temperature: 0.2
  top_k: 40
  top_p: 0.95
 template:
  chat_message: chatml
  chat: chatml-block
  completion: completion
 context_size: 4096
 f16: true
 stopwords:
 - <|im_end|>
 threads: 4
--- a/examples/configurations/mixtral/mixtral
+++ b/examples/configurations/mixtral/mixtral
@ -1 +0,0 @@
 [INST] {{.Input}} [/INST] 
--- a/examples/configurations/mixtral/mixtral-chat
+++ b/examples/configurations/mixtral/mixtral-chat
@ -1 +0,0 @@
 [INST] {{.Input}} [/INST] 
--- a/examples/configurations/mixtral/mixtral.yaml
+++ b/examples/configurations/mixtral/mixtral.yaml
@ -1,16 +0,0 @@
 context_size: 512
 f16: true
 threads: 11
 gpu_layers: 90
 name: mixtral
 mmap: true
 parameters:
  model: mixtral-8x7b-instruct-v0.1.Q2_K.gguf
  temperature: 0.2
  top_k: 40
  top_p: 0.95
  batch: 512
  tfz: 1.0
 template:
  chat: mixtral-chat
  completion: mixtral
--- a/examples/configurations/phi-2.yaml
+++ b/examples/configurations/phi-2.yaml
@ -1,29 +0,0 @@
 name: phi-2
 context_size: 2048
 f16: true
 gpu_layers: 90
 mmap: true
 trimsuffix: 
 - "\n"
 parameters:
  model: huggingface://TheBloke/phi-2-GGUF/phi-2.Q8_0.gguf
  temperature: 0.2
  top_k: 40
  top_p: 0.95
  seed: -1
 mirostat: 2
 mirostat_eta: 1.0
 mirostat_tau: 1.0
 template:
  chat: &template |-
    Instruct: {{.Input}}
    Output:
  completion: *template
 usage: |
      To use this model, interact with the API (in another terminal) with curl for instance:
      curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
          "model": "phi-2",
          "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}]
      }'
--- a/examples/continue/README.md
+++ b/examples/continue/README.md
@ -1,53 +0,0 @@
 # Continue
 ![logo](https://continue.dev/docs/assets/images/continue-cover-logo-aa135cc83fe8a14af480d1633ed74eb5.png)
 This document presents an example of integration with [continuedev/continue](https://github.com/continuedev/continue).
 ![Screenshot](https://continue.dev/docs/assets/images/continue-screenshot-1f36b99467817f755739d7f4c4c08fe3.png)
 For a live demonstration, please click on the link below:
 - [How it works (Video demonstration)](https://www.youtube.com/watch?v=3Ocrc-WX4iQ)
 ## Integration Setup Walkthrough
 1. [As outlined in `continue`'s documentation](https://continue.dev/docs/getting-started), install the [Visual Studio Code extension from the marketplace](https://marketplace.visualstudio.com/items?itemName=Continue.continue) and open it.
 2. In this example, LocalAI will download the gpt4all model and set it up as "gpt-3.5-turbo". Refer to the `docker-compose.yaml` file for details.
    ```bash
    # Clone LocalAI
    git clone https://github.com/go-skynet/LocalAI
    cd LocalAI/examples/continue
    # Start with docker-compose
    docker-compose up --build -d
    ```
 3. Type `/config` within Continue's VSCode extension, or edit the file located at `~/.continue/config.py` on your system with the following configuration:
    ```py
    from continuedev.src.continuedev.libs.llm.openai import OpenAI
    config = ContinueConfig(
       ...
       models=Models(
            default=OpenAI(
               api_key="my-api-key",
               model="gpt-3.5-turbo",
               api_base="http://localhost:8080",
            )
       ),
    )
    ```
 This setup enables you to make queries directly to your model running in the Docker container. Note that the `api_key` does not need to be properly set up; it is included here as a placeholder.
 If editing the configuration seems confusing, you may copy and paste the provided default `config.py` file over the existing one in `~/.continue/config.py` after initializing the extension in the VSCode IDE.
 ## Additional Resources
 - [Official Continue documentation](https://continue.dev/docs/intro)
 - [Documentation page on using self-hosted models](https://continue.dev/docs/customization#self-hosting-an-open-source-model)
 - [Official extension link](https://marketplace.visualstudio.com/items?itemName=Continue.continue)
--- a/examples/continue/config.py
+++ b/examples/continue/config.py
@ -1,148 +0,0 @@
 """
 This is the Continue configuration file.
 See https://continue.dev/docs/customization to learn more.
 """
 import subprocess
 from continuedev.src.continuedev.core.main import Step
 from continuedev.src.continuedev.core.sdk import ContinueSDK
 from continuedev.src.continuedev.core.models import Models
 from continuedev.src.continuedev.core.config import CustomCommand, SlashCommand, ContinueConfig
 from continuedev.src.continuedev.plugins.context_providers.github import GitHubIssuesContextProvider
 from continuedev.src.continuedev.plugins.context_providers.google import GoogleContextProvider
 from continuedev.src.continuedev.plugins.policies.default import DefaultPolicy
 from continuedev.src.continuedev.libs.llm.openai import OpenAI, OpenAIServerInfo
 from continuedev.src.continuedev.libs.llm.ggml import GGML
 from continuedev.src.continuedev.plugins.steps.open_config import OpenConfigStep
 from continuedev.src.continuedev.plugins.steps.clear_history import ClearHistoryStep
 from continuedev.src.continuedev.plugins.steps.feedback import FeedbackStep
 from continuedev.src.continuedev.plugins.steps.comment_code import CommentCodeStep
 from continuedev.src.continuedev.plugins.steps.share_session import ShareSessionStep
 from continuedev.src.continuedev.plugins.steps.main import EditHighlightedCodeStep
 from continuedev.src.continuedev.plugins.context_providers.search import SearchContextProvider
 from continuedev.src.continuedev.plugins.context_providers.diff import DiffContextProvider
 from continuedev.src.continuedev.plugins.context_providers.url import URLContextProvider
 class CommitMessageStep(Step):
    """
    This is a Step, the building block of Continue.
    It can be used below as a slash command, so that
    run will be called when you type '/commit'.
    """
    async def run(self, sdk: ContinueSDK):
        # Get the root directory of the workspace
        dir = sdk.ide.workspace_directory
        # Run git diff in that directory
        diff = subprocess.check_output(
            ["git", "diff"], cwd=dir).decode("utf-8")
        # Ask the LLM to write a commit message,
        # and set it as the description of this step
        self.description = await sdk.models.default.complete(
            f"{diff}\n\nWrite a short, specific (less than 50 chars) commit message about the above changes:")
 config = ContinueConfig(
    # If set to False, we will not collect any usage data
    # See here to learn what anonymous data we collect: https://continue.dev/docs/telemetry
    allow_anonymous_telemetry=True,
    models = Models(
        default = OpenAI(
            api_key = "my-api-key",
            model = "gpt-3.5-turbo",
            openai_server_info = OpenAIServerInfo(
                api_base = "http://localhost:8080",
                model = "gpt-3.5-turbo"
            )
        )
    ),
    # Set a system message with information that the LLM should always keep in mind
    # E.g. "Please give concise answers. Always respond in Spanish."
    system_message=None,
    # Set temperature to any value between 0 and 1. Higher values will make the LLM
    # more creative, while lower values will make it more predictable.
    temperature=0.5,
    # Custom commands let you map a prompt to a shortened slash command
    # They are like slash commands, but more easily defined - write just a prompt instead of a Step class
    # Their output will always be in chat form
    custom_commands=[
        # CustomCommand(
        #     name="test",
        #     description="Write unit tests for the higlighted code",
        #     prompt="Write a comprehensive set of unit tests for the selected code. It should setup, run tests that check for correctness including important edge cases, and teardown. Ensure that the tests are complete and sophisticated. Give the tests just as chat output, don't edit any file.",
        # )
    ],
    # Slash commands let you run a Step from a slash command
    slash_commands=[
        # SlashCommand(
        #     name="commit",
        #     description="This is an example slash command. Use /config to edit it and create more",
        #     step=CommitMessageStep,
        # )
        SlashCommand(
            name="edit",
            description="Edit code in the current file or the highlighted code",
            step=EditHighlightedCodeStep,
        ),
        SlashCommand(
            name="config",
            description="Customize Continue - slash commands, LLMs, system message, etc.",
            step=OpenConfigStep,
        ),
        SlashCommand(
            name="comment",
            description="Write comments for the current file or highlighted code",
            step=CommentCodeStep,
        ),
        SlashCommand(
            name="feedback",
            description="Send feedback to improve Continue",
            step=FeedbackStep,
        ),
        SlashCommand(
            name="clear",
            description="Clear step history",
            step=ClearHistoryStep,
        ),
        SlashCommand(
            name="share",
            description="Download and share the session transcript",
            step=ShareSessionStep,
        )
    ],
    # Context providers let you quickly select context by typing '@'
    # Uncomment the following to
    # - quickly reference GitHub issues
    # - show Google search results to the LLM
    context_providers=[
        # GitHubIssuesContextProvider(
        #     repo_name="<your github username or organization>/<your repo name>",
        #     auth_token="<your github auth token>"
        # ),
        # GoogleContextProvider(
        #     serper_api_key="<your serper.dev api key>"
        # )
        SearchContextProvider(),
        DiffContextProvider(),
        URLContextProvider(
            preset_urls = [
                # Add any common urls you reference here so they appear in autocomplete
            ]
        )
    ],
    # Policies hold the main logic that decides which Step to take next
    # You can use them to design agents, or deeply customize Continue
    policy=DefaultPolicy()
 )
--- a/examples/continue/docker-compose.yml
+++ b/examples/continue/docker-compose.yml
@ -1,27 +0,0 @@
 version: '3.6'
 services:
  api:
    image: quay.io/go-skynet/local-ai:latest
    # As initially LocalAI will download the models defined in PRELOAD_MODELS
    # you might need to tweak the healthcheck values here according to your network connection.
    # Here we give a timespan of 20m to download all the required files.
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 1m
      timeout: 20m
      retries: 20
    build:
      context: ../../
      dockerfile: Dockerfile
    ports:
      - 8080:8080
    environment:
      - DEBUG=true
      - MODELS_PATH=/models
      # You can preload different models here as well.
      # See: https://github.com/go-skynet/model-gallery
      - 'PRELOAD_MODELS=[{"url": "github:go-skynet/model-gallery/gpt4all-j.yaml", "name": "gpt-3.5-turbo"}]'
    volumes:
      - ./models:/models:cached
    command: ["/usr/bin/local-ai" ]
--- a/examples/continue/img/screen.png
+++ b/examples/continue/img/screen.png
--- a/examples/discord-bot/.env.example
+++ b/examples/discord-bot/.env.example
@ -1,9 +0,0 @@
 # CPU .env docs: https://localai.io/howtos/easy-setup-docker-cpu/
 # GPU .env docs: https://localai.io/howtos/easy-setup-docker-gpu/
 OPENAI_API_KEY=x
 DISCORD_BOT_TOKEN=x
 DISCORD_CLIENT_ID=x
 OPENAI_API_BASE=http://api:8080
 ALLOWED_SERVER_IDS=x
 SERVER_TO_MODERATION_CHANNEL=1:1
--- a/examples/discord-bot/README.md
+++ b/examples/discord-bot/README.md
@ -1,76 +0,0 @@
 # discord-bot
 ![Screenshot from 2023-05-01 07-58-19](https://user-images.githubusercontent.com/2420543/235413924-0cb2e75b-f2d6-4119-8610-44386e44afb8.png)
 ## Setup
 ```bash
 # Clone LocalAI
 git clone https://github.com/go-skynet/LocalAI
 cd LocalAI/examples/discord-bot
 # (optional) Checkout a specific LocalAI tag
 # git checkout -b build <TAG>
 # Download gpt4all-j to models/
 wget https://gpt4all.io/models/ggml-gpt4all-j.bin -O models/ggml-gpt4all-j
 # Set the discord bot options (see: https://github.com/go-skynet/gpt-discord-bot#setup)
 cp -rfv .env.example .env
 vim .env
 # start with docker-compose
 docker-compose up -d --build
 ```
 Note: see setup options here: https://github.com/go-skynet/gpt-discord-bot#setup
 Open up the URL in the console and give permission to the bot in your server. Start a thread with `/chat ..`
 ## Kubernetes
 - install the local-ai chart first
 - change OPENAI_API_BASE to point to the API address and apply the discord-bot manifest:
 ```yaml
 apiVersion: v1
 kind: Namespace
 metadata:
  name: discord-bot
 ---
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: localai
  namespace: discord-bot
  labels:
    app: localai
 spec:
  selector:
    matchLabels:
      app: localai
  replicas: 1
  template:
    metadata:
      labels:
        app: localai
      name: localai
    spec:
      containers:
        - name: localai-discord
          env:
          - name: OPENAI_API_KEY
            value: "x"
          - name: DISCORD_BOT_TOKEN
            value: ""
          - name: DISCORD_CLIENT_ID
            value: ""
          - name: OPENAI_API_BASE
            value: "http://local-ai.default.svc.cluster.local:8080"
          - name: ALLOWED_SERVER_IDS
            value: "xx"
          - name: SERVER_TO_MODERATION_CHANNEL
            value: "1:1"
          image: quay.io/go-skynet/gpt-discord-bot:main
 ```
--- a/examples/discord-bot/docker-compose.yaml
+++ b/examples/discord-bot/docker-compose.yaml
@ -1,21 +0,0 @@
 version: '3.6'
 services:
  api:
    image: quay.io/go-skynet/local-ai:latest
    build:
      context: ../../
      dockerfile: Dockerfile
    ports:
      - 8080:8080
    environment:
      - DEBUG=true
      - MODELS_PATH=/models
    volumes:
      - ./models:/models:cached
    command: ["/usr/bin/local-ai" ]
  bot:
    image: quay.io/go-skynet/gpt-discord-bot:main
    env_file:
    - .env
--- a/examples/discord-bot/models
+++ b/examples/discord-bot/models
@ -1 +0,0 @@
 ../models
--- a/examples/e2e-fine-tuning/README.md
+++ b/examples/e2e-fine-tuning/README.md
@ -1,83 +0,0 @@
 This is an example of fine-tuning a LLM model to use with [LocalAI](https://github.com/mudler/LocalAI) written by [@mudler](https://github.com/mudler).
 Specifically, this example shows how to use [axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) to fine-tune a LLM model to consume with LocalAI as a `gguf` model.
 A notebook is provided that currently works on _very small_ datasets on Google colab on the free instance. It is far from producing good models, but it gives a sense of how to use the code to use with a better dataset and configurations, and how to use the model produced with LocalAI. [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mudler/LocalAI/blob/master/examples/e2e-fine-tuning/notebook.ipynb)
 ## Requirements
 For this example you will need at least a 12GB VRAM of GPU and a Linux box.
 The notebook is tested on Google Colab with a Tesla T4 GPU.
 ## Clone this directory
 Clone the repository and enter the example directory:
 ```bash
 git clone http://github.com/mudler/LocalAI
 cd LocalAI/examples/e2e-fine-tuning
 ```
 ## Install dependencies
 ```bash
 # Install axolotl and dependencies
 git clone https://github.com/OpenAccess-AI-Collective/axolotl && pushd axolotl && git checkout 797f3dd1de8fd8c0eafbd1c9fdb172abd9ff840a && popd #0.3.0
 pip install packaging
 pushd axolotl && pip install -e '.[flash-attn,deepspeed]' && popd
 # https://github.com/oobabooga/text-generation-webui/issues/4238
 pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.3.0/flash_attn-2.3.0+cu117torch2.0cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
 ```
 Configure accelerate:
 ```bash
 accelerate config default
 ```
 ## Fine-tuning
 We will need to configure axolotl. In this example is provided a file to use `axolotl.yaml` that uses openllama-3b for fine-tuning. Copy the `axolotl.yaml` file and edit it to your needs. The dataset needs to be next to it as `dataset.json`. The format used is `completion` which is a list of JSON objects with a `text` field with the full text to train the LLM with.
 If you have a big dataset, you can pre-tokenize it to speedup the fine-tuning process:
 ```bash
 # Optional pre-tokenize (run only if big dataset)
 python -m axolotl.cli.preprocess axolotl.yaml
 ```
 Now we are ready to start the fine-tuning process:
 ```bash
 # Fine-tune
 accelerate launch -m axolotl.cli.train axolotl.yaml
 ```
 After we have finished the fine-tuning, we merge the Lora base with the model:
 ```bash
 # Merge lora
 python3 -m axolotl.cli.merge_lora axolotl.yaml --lora_model_dir="./qlora-out" --load_in_8bit=False --load_in_4bit=False
 ```
 And we convert it to the gguf format that LocalAI can consume:
 ```bash
 # Convert to gguf
 git clone https://github.com/ggerganov/llama.cpp.git
 pushd llama.cpp && make GGML_CUDA=1 && popd
 # We need to convert the pytorch model into ggml for quantization
 # It crates 'ggml-model-f16.bin' in the 'merged' directory.
 pushd llama.cpp && python convert.py --outtype f16 \
    ../qlora-out/merged/pytorch_model-00001-of-00002.bin && popd
 # Start off by making a basic q4_0 4-bit quantization.
 # It's important to have 'ggml' in the name of the quant for some
 # software to recognize it's file format.
 pushd llama.cpp &&  ./quantize ../qlora-out/merged/ggml-model-f16.gguf \
    ../custom-model-q4_0.bin q4_0
 ```
 Now you should have ended up with a `custom-model-q4_0.bin` file that you can copy in the LocalAI models directory and use it with LocalAI.
--- a/examples/e2e-fine-tuning/axolotl.yaml
+++ b/examples/e2e-fine-tuning/axolotl.yaml
@ -1,63 +0,0 @@
 base_model: openlm-research/open_llama_3b_v2
 model_type: LlamaForCausalLM
 tokenizer_type: LlamaTokenizer
 load_in_8bit: false
 load_in_4bit: true
 strict: false
 push_dataset_to_hub: false
 datasets:
 - path: dataset.json
  ds_type: json
  type: completion
 dataset_prepared_path:
 val_set_size: 0.05
 adapter: qlora
 lora_model_dir:
 sequence_len: 1024
 sample_packing: true
 lora_r: 8
 lora_alpha: 32
 lora_dropout: 0.05
 lora_target_modules:
 lora_target_linear: true
 lora_fan_in_fan_out:
 wandb_project:
 wandb_entity:
 wandb_watch:
 wandb_run_id:
 wandb_log_model:
 output_dir: ./qlora-out
 gradient_accumulation_steps: 1
 micro_batch_size: 2
 num_epochs: 4
 optimizer: paged_adamw_32bit
 torchdistx_path:
 lr_scheduler: cosine
 learning_rate: 0.0002
 train_on_inputs: false
 group_by_length: false
 bf16: false
 fp16: true
 tf32: false
 gradient_checkpointing: true
 early_stopping_patience:
 resume_from_checkpoint:
 local_rank:
 logging_steps: 1
 xformers_attention:
 flash_attention: false
 gptq_groupsize:
 gptq_model_v1:
 warmup_steps: 20
 eval_steps: 0.05
 save_steps:
 debug:
 deepspeed:
 weight_decay: 0.1
 fsdp:
 fsdp_config:
 special_tokens:
 bos_token: "<s>"
 eos_token: "</s>"
 unk_token: "<unk>"
--- a/examples/e2e-fine-tuning/notebook.ipynb
+++ b/examples/e2e-fine-tuning/notebook.ipynb
--- a/examples/flowise/README.md
+++ b/examples/flowise/README.md
@ -1,30 +0,0 @@
 # flowise
 Example of integration with [FlowiseAI/Flowise](https://github.com/FlowiseAI/Flowise).
 ![Screenshot from 2023-05-30 18-01-03](https://github.com/go-skynet/LocalAI/assets/2420543/02458782-0549-4131-971c-95ee56ec1af8)
 You can check a demo video in the Flowise PR: https://github.com/FlowiseAI/Flowise/pull/123
 ## Run
 In this example LocalAI will download the gpt4all model and set it up as "gpt-3.5-turbo". See the `docker-compose.yaml`
 ```bash
 # Clone LocalAI
 git clone https://github.com/go-skynet/LocalAI
 cd LocalAI/examples/flowise
 # start with docker-compose
 docker-compose up --pull always
 ```
 ## Accessing flowise
 Open http://localhost:3000.
 ## Using LocalAI
 Search for LocalAI in the integration, and use the `http://api:8080/` as URL.
--- a/examples/flowise/docker-compose.yaml
+++ b/examples/flowise/docker-compose.yaml
@ -1,37 +0,0 @@
 version: '3.6'
 services:
  api:
    image: quay.io/go-skynet/local-ai:latest
    # As initially LocalAI will download the models defined in PRELOAD_MODELS
    # you might need to tweak the healthcheck values here according to your network connection.
    # Here we give a timespan of 20m to download all the required files.
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 1m
      timeout: 20m
      retries: 20
    build:
      context: ../../
      dockerfile: Dockerfile
    ports:
      - 8080:8080
    environment:
      - DEBUG=true
      - MODELS_PATH=/models
      # You can preload different models here as well.
      # See: https://github.com/go-skynet/model-gallery
      - 'PRELOAD_MODELS=[{"url": "github:go-skynet/model-gallery/gpt4all-j.yaml", "name": "gpt-3.5-turbo"}]'
    volumes:
      - ./models:/models:cached
    command: ["/usr/bin/local-ai" ]
  flowise:
    depends_on:
      api:
        condition: service_healthy
    image: flowiseai/flowise
    ports:
      - 3000:3000
    volumes:
        - ~/.flowise:/root/.flowise
    command: /bin/sh -c "sleep 3; flowise start"
--- a/examples/functions/.env.example
+++ b/examples/functions/.env.example
@ -1,13 +0,0 @@
 # CPU .env docs: https://localai.io/howtos/easy-setup-docker-cpu/
 # GPU .env docs: https://localai.io/howtos/easy-setup-docker-gpu/
 OPENAI_API_KEY=sk---anystringhere
 OPENAI_API_BASE=http://api:8080/v1
 # Models to preload at start
 # Here we configure gpt4all as gpt-3.5-turbo and bert as embeddings,
 # see other options in the model gallery at https://github.com/go-skynet/model-gallery
 PRELOAD_MODELS=[{"url": "github:go-skynet/model-gallery/openllama-7b-open-instruct.yaml", "name": "gpt-3.5-turbo"}]
 ## Change the default number of threads
 #THREADS=14
--- a/examples/functions/Dockerfile
+++ b/examples/functions/Dockerfile
@ -1,5 +0,0 @@
 FROM python:3.12-slim-bullseye
 COPY . /app
 WORKDIR /app
 RUN pip install --no-cache-dir -r requirements.txt
 ENTRYPOINT [ "python", "./functions-openai.py" ]
--- a/examples/functions/README.md
+++ b/examples/functions/README.md
@ -1,21 +0,0 @@
 # LocalAI functions
 Example of using LocalAI functions, see the [OpenAI](https://openai.com/blog/function-calling-and-other-api-updates) blog post.
 ## Run
 ```bash
 # Clone LocalAI
 git clone https://github.com/go-skynet/LocalAI
 cd LocalAI/examples/functions
 cp -rfv .env.example .env
 # Edit the .env file to set a different model by editing `PRELOAD_MODELS`.
 vim .env
 docker-compose run --rm functions
 ```
 Note: The example automatically downloads the `openllama` model as it is under a permissive license.
--- a/examples/functions/docker-compose.yaml
+++ b/examples/functions/docker-compose.yaml
@ -1,23 +0,0 @@
 version: "3.9"
 services:
  api:
    image: quay.io/go-skynet/local-ai:master
    ports:
      - 8080:8080
    env_file:
      - .env
    environment:
      - DEBUG=true
      - MODELS_PATH=/models
    volumes:
      - ./models:/models:cached
    command: ["/usr/bin/local-ai" ]
  functions:
    build:
      context: .
      dockerfile: Dockerfile
    depends_on:
      api:
        condition: service_healthy
    env_file:
      - .env
--- a/examples/functions/functions-openai.py
+++ b/examples/functions/functions-openai.py
@ -1,76 +0,0 @@
 import openai
 import json
 # Example dummy function hard coded to return the same weather
 # In production, this could be your backend API or an external API
 def get_current_weather(location, unit="fahrenheit"):
    """Get the current weather in a given location"""
    weather_info = {
        "location": location,
        "temperature": "72",
        "unit": unit,
        "forecast": ["sunny", "windy"],
    }
    return json.dumps(weather_info)
 def run_conversation():
    # Step 1: send the conversation and available functions to GPT
    messages = [{"role": "user", "content": "What's the weather like in Boston?"}]
    functions = [
        {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA",
                    },
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                },
                "required": ["location"],
            },
        }
    ]
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=messages,
        functions=functions,
        function_call="auto",  # auto is default, but we'll be explicit
    )
    response_message = response["choices"][0]["message"]
    # Step 2: check if GPT wanted to call a function
    if response_message.get("function_call"):
        # Step 3: call the function
        # Note: the JSON response may not always be valid; be sure to handle errors
        available_functions = {
            "get_current_weather": get_current_weather,
        }  # only one function in this example, but you can have multiple
        function_name = response_message["function_call"]["name"]
        fuction_to_call = available_functions[function_name]
        function_args = json.loads(response_message["function_call"]["arguments"])
        function_response = fuction_to_call(
            location=function_args.get("location"),
            unit=function_args.get("unit"),
        )
        # Step 4: send the info on the function call and function response to GPT
        messages.append(response_message)  # extend conversation with assistant's reply
        messages.append(
            {
                "role": "function",
                "name": function_name,
                "content": function_response,
            }
        )  # extend conversation with function response
        second_response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=messages,
        )  # get a new response from GPT where it can see the function response
        return second_response
 print(run_conversation())
--- a/examples/functions/requirements.txt
+++ b/examples/functions/requirements.txt
@ -1,2 +0,0 @@
 langchain==0.3.4
 openai==1.52.2
--- a/examples/github-actions/workflow.yml
+++ b/examples/github-actions/workflow.yml
@ -1,83 +0,0 @@
 name: Use LocalAI in GHA
 on:
  pull_request:
     types:
       - closed
 jobs:
  notify-discord:
    if: ${{ (github.event.pull_request.merged == true) && (contains(github.event.pull_request.labels.*.name, 'area/ai-model')) }}
    env:
        MODEL_NAME: hermes-2-theta-llama-3-8b
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4
      with:
        fetch-depth: 0 # needed to checkout all branches for this Action to work
    # Starts the LocalAI container
    - name: Start LocalAI
      run: |
        echo "Starting LocalAI..."
        docker run -e -ti -d --name local-ai -p 8080:8080 localai/localai:master-ffmpeg-core run --debug $MODEL_NAME
        until [ "`docker inspect -f {{.State.Health.Status}} local-ai`" == "healthy" ]; do echo "Waiting for container to be ready";  docker logs --tail 10 local-ai; sleep 2; done
    # Check the PR diff using the current branch and the base branch of the PR
    - uses: GrantBirki/git-diff-action@v2.7.0
      id: git-diff-action
      with:
            json_diff_file_output: diff.json
            raw_diff_file_output: diff.txt
            file_output_only: "true"
    # Ask to explain the diff to LocalAI
    - name: Summarize
      env:
        DIFF: ${{ steps.git-diff-action.outputs.raw-diff-path }}
      id: summarize
      run: |
            input="$(cat $DIFF)"
            # Define the LocalAI API endpoint
            API_URL="http://localhost:8080/chat/completions"
            # Create a JSON payload using jq to handle special characters
            json_payload=$(jq -n --arg input "$input" '{
            model: "'$MODEL_NAME'",
            messages: [
                {
                role: "system",
                content: "Write a message summarizing the change diffs"
                },
                {
                role: "user",
                content: $input
                }
            ]
            }')
            # Send the request to LocalAI
            response=$(curl -s -X POST $API_URL \
            -H "Content-Type: application/json" \
            -d "$json_payload")
            # Extract the summary from the response
            summary="$(echo $response | jq -r '.choices[0].message.content')"
            # Print the summary
            #  -H "Authorization: Bearer $API_KEY" \
            echo "Summary:"
            echo "$summary"
            echo "payload sent"
            echo "$json_payload"
            {
                echo 'message<<EOF'
                echo "$summary"
                echo EOF
              } >> "$GITHUB_OUTPUT"
    # Send the summary somewhere (e.g. Discord)
    - name: Discord notification
      env:
        DISCORD_WEBHOOK: ${{ secrets.DISCORD_WEBHOOK_URL }}
        DISCORD_USERNAME: "discord-bot"
        DISCORD_AVATAR: ""
      uses: Ilshidur/action-discord@master
      with:
        args: ${{ steps.summarize.outputs.message }}
--- a/examples/insomnia/Insomnia_LocalAI.json
+++ b/examples/insomnia/Insomnia_LocalAI.json
--- a/examples/insomnia/README.md
+++ b/examples/insomnia/README.md
@ -1,17 +0,0 @@
 # Insomnia
 Developer Testing Request Collection for [Insomnia](https://insomnia.rest/), an open-source REST client
 ## Instructions
 * Install Insomnia as normal
 * [Import](https://docs.insomnia.rest/insomnia/import-export-data) `Insomnia_LocalAI.json`
 * Control + E opens the environment settings - 
 | **Parameter Name** | **Default Value** | **Description**                          |
 |--------------------|-------------------|------------------------------------------|
 | HOST               | localhost         | LocalAI base URL                         |
 | PORT               | 8080              | LocalAI port                             |
 | DEFAULT_MODEL      | gpt-3.5-turbo     | Name of the model used on most requests. |
 ** you may want to duplicate localhost into a "Private" environment to avoid saving private settings back to this file **
--- a/examples/k8sgpt/README.md
+++ b/examples/k8sgpt/README.md
@ -1,72 +0,0 @@
 # k8sgpt example
 This example show how to use LocalAI with k8sgpt
 ![Screenshot from 2023-06-19 23-58-47](https://github.com/go-skynet/go-ggml-transformers.cpp/assets/2420543/cab87409-ee68-44ae-8d53-41627fb49509)
 ## Create the cluster locally with Kind (optional)
 If you want to test this locally without a remote Kubernetes cluster, you can use kind.
 Install [kind](https://kind.sigs.k8s.io/) and create a cluster:
 ```
 kind create cluster
 ```
 ## Setup LocalAI
 We will use [helm](https://helm.sh/docs/intro/install/):
 ```
 helm repo add go-skynet https://go-skynet.github.io/helm-charts/
 helm repo update
 # Clone LocalAI
 git clone https://github.com/go-skynet/LocalAI
 cd LocalAI/examples/k8sgpt
 # modify values.yaml preload_models with the models you want to install.
 # CHANGE the URL to a model in huggingface.
 helm install local-ai go-skynet/local-ai --create-namespace --namespace local-ai --values values.yaml
 ```
 ## Setup K8sGPT
 ```
 # Install k8sgpt
 helm repo add k8sgpt https://charts.k8sgpt.ai/
 helm repo update
 helm install release k8sgpt/k8sgpt-operator -n k8sgpt-operator-system --create-namespace --version 0.0.17
 ```
 Apply the k8sgpt-operator configuration:
 ```
 kubectl apply -f - << EOF
 apiVersion: core.k8sgpt.ai/v1alpha1
 kind: K8sGPT
 metadata:
  name: k8sgpt-local-ai
  namespace: default
 spec:
  backend: localai
  baseUrl: http://local-ai.local-ai.svc.cluster.local:8080/v1
  noCache: false
  model: gpt-3.5-turbo
  version: v0.3.0
  enableAI: true
 EOF
 ```
 ## Test
 Apply a broken pod:
 ```
 kubectl apply -f broken-pod.yaml
 ```
 ## ArgoCD Deployment Example
 [Deploy K8sgpt + localai with Argocd](https://github.com/tyler-harpool/gitops/tree/main/infra/k8gpt)
--- a/examples/k8sgpt/broken-pod.yaml
+++ b/examples/k8sgpt/broken-pod.yaml
@ -1,14 +0,0 @@
 apiVersion: v1
 kind: Pod
 metadata:
  name: broken-pod
 spec:
  containers:
    - name: broken-pod
      image: nginx:1.27.2
      livenessProbe:
        httpGet:
          path: /
          port: 90
        initialDelaySeconds: 3
        periodSeconds: 3
--- a/examples/k8sgpt/values.yaml
+++ b/examples/k8sgpt/values.yaml
@ -1,96 +0,0 @@
 replicaCount: 1
 deployment:
  # https://quay.io/repository/go-skynet/local-ai?tab=tags
  image: quay.io/go-skynet/local-ai:v1.40.0
  env:
    threads: 4
    debug: "true"
    context_size: 512
    galleries: '[{"name":"model-gallery", "url":"github:go-skynet/model-gallery/index.yaml"}, {"url": "github:go-skynet/model-gallery/huggingface.yaml","name":"huggingface"}]'
    preload_models: '[{ "id": "huggingface@thebloke__open-llama-13b-open-instruct-ggml__open-llama-13b-open-instruct.ggmlv3.q3_k_m.bin", "name": "gpt-3.5-turbo", "overrides": { "f16": true, "mmap": true }}]'
  modelsPath: "/models"
 resources:
  {}
  # We usually recommend not to specify default resources and to leave this as a conscious
  # choice for the user. This also increases chances charts run on environments with little
  # resources, such as Minikube. If you do want to specify resources, uncomment the following
  # lines, adjust them as necessary, and remove the curly braces after 'resources:'.
  # limits:
  #   cpu: 100m
  #   memory: 128Mi
  # requests:
  #   cpu: 100m
  #   memory: 128Mi
 # Prompt templates to include
 # Note: the keys of this map will be the names of the prompt template files
 promptTemplates:
  {}
  # ggml-gpt4all-j.tmpl: |
  #   The prompt below is a question to answer, a task to complete, or a conversation to respond to; decide which and write an appropriate response.
  #   ### Prompt:
  #   {{.Input}}
  #   ### Response:
 # Models to download at runtime
 models:
  # Whether to force download models even if they already exist
  forceDownload: false
  # The list of URLs to download models from
  # Note: the name of the file will be the name of the loaded model
  list:
  #- url: "https://gpt4all.io/models/ggml-gpt4all-j.bin"
      # basicAuth: base64EncodedCredentials
  # Persistent storage for models and prompt templates.
  # PVC and HostPath are mutually exclusive. If both are enabled,
  # PVC configuration takes precedence. If neither are enabled, ephemeral
  # storage is used.
  persistence:
    pvc:
      enabled: false
      size: 6Gi
      accessModes:
        - ReadWriteOnce
      annotations: {}
      # Optional
      storageClass: ~
    hostPath:
      enabled: false
      path: "/models"
 service:
  type: ClusterIP
  port: 8080
  annotations: {}
  # If using an AWS load balancer, you'll need to override the default 60s load balancer idle timeout
  # service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "1200"
 ingress:
  enabled: false
  className: ""
  annotations:
    {}
    # kubernetes.io/ingress.class: nginx
    # kubernetes.io/tls-acme: "true"
  hosts:
    - host: chart-example.local
      paths:
        - path: /
          pathType: ImplementationSpecific
  tls: []
  #  - secretName: chart-example-tls
  #    hosts:
  #      - chart-example.local
 nodeSelector: {}
 tolerations: []
 affinity: {}
--- a/examples/kubernetes/deployment-intel-arc.yaml
+++ b/examples/kubernetes/deployment-intel-arc.yaml
@ -1,68 +0,0 @@
 apiVersion: v1
 kind: Namespace
 metadata:
  name: local-ai
 ---
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: models-pvc
  namespace: local-ai
 spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
 ---
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: local-ai
  namespace: local-ai
  labels:
    app: local-ai
 spec:
  selector:
    matchLabels:
      app: local-ai
  replicas: 1
  template:
    metadata:
      labels:
        app: local-ai
      name: local-ai
    spec:
      containers:
        - args:
          - phi-2
          env:
          - name: DEBUG
            value: "true"
          name: local-ai
          image: quay.io/go-skynet/local-ai:master-sycl-f32-ffmpeg-core
          imagePullPolicy: Always
          resources:
            limits:
              gpu.intel.com/i915: 1
          volumeMounts:
            - name: models-volume
              mountPath: /build/models
      volumes:
        - name: models-volume
          persistentVolumeClaim:
            claimName: models-pvc
 ---
 apiVersion: v1
 kind: Service
 metadata:
  name: local-ai
  namespace: local-ai
 spec:
  selector:
    app: local-ai
  type: LoadBalancer
  ports:
    - protocol: TCP
      port: 8080
      targetPort: 8080
--- a/examples/kubernetes/deployment-nvidia.yaml
+++ b/examples/kubernetes/deployment-nvidia.yaml
@ -1,69 +0,0 @@
 apiVersion: v1
 kind: Namespace
 metadata:
  name: local-ai
 ---
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: models-pvc
  namespace: local-ai
 spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi
 ---
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: local-ai
  namespace: local-ai
  labels:
    app: local-ai
 spec:
  selector:
    matchLabels:
      app: local-ai
  replicas: 1
  template:
    metadata:
      labels:
        app: local-ai
      name: local-ai
    spec:
      runtimeClassName: "nvidia"
      containers:
        - args:
          - phi-2
          env:
          - name: DEBUG
            value: "true"
          name: local-ai
          image: quay.io/go-skynet/local-ai:master-cublas-cuda12
          imagePullPolicy: IfNotPresent
          resources:
            limits:
              nvidia.com/gpu: 1
          volumeMounts:
            - name: models-volume
              mountPath: /build/models
      volumes:
        - name: models-volume
          persistentVolumeClaim:
            claimName: models-pvc
 ---
 apiVersion: v1
 kind: Service
 metadata:
  name: local-ai
  namespace: local-ai
 spec:
  selector:
    app: local-ai
  type: NodePort
  ports:
    - protocol: TCP
      targetPort: 8080
      port: 8080
--- a/examples/kubernetes/deployment.yaml
+++ b/examples/kubernetes/deployment.yaml
@ -1,65 +0,0 @@
 apiVersion: v1
 kind: Namespace
 metadata:
  name: local-ai
 ---
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: models-pvc
  namespace: local-ai
 spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
 ---
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: local-ai
  namespace: local-ai
  labels:
    app: local-ai
 spec:
  selector:
    matchLabels:
      app: local-ai
  replicas: 1
  template:
    metadata:
      labels:
        app: local-ai
      name: local-ai
    spec:
      containers:
        - args:
          - phi-2
          env:
          - name: DEBUG
            value: "true"
          name: local-ai
          image: quay.io/go-skynet/local-ai:master-ffmpeg-core
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - name: models-volume
              mountPath: /build/models
      volumes:
        - name: models-volume
          persistentVolumeClaim:
            claimName: models-pvc
 ---
 apiVersion: v1
 kind: Service
 metadata:
  name: local-ai
  namespace: local-ai
 spec:
  selector:
    app: local-ai
  type: LoadBalancer
  ports:
    - protocol: TCP
      port: 8080
      targetPort: 8080
--- a/examples/langchain-chroma/.env.example
+++ b/examples/langchain-chroma/.env.example
@ -1,8 +0,0 @@
 # CPU .env docs: https://localai.io/howtos/easy-setup-docker-cpu/
 # GPU .env docs: https://localai.io/howtos/easy-setup-docker-gpu/
 THREADS=4
 CONTEXT_SIZE=512
 MODELS_PATH=/models
 DEBUG=true
 # BUILD_TYPE=generic
--- a/examples/langchain-chroma/.gitignore
+++ b/examples/langchain-chroma/.gitignore
@ -1,4 +0,0 @@
 db/
 state_of_the_union.txt
 models/bert
 models/ggml-gpt4all-j
--- a/examples/langchain-chroma/README.md
+++ b/examples/langchain-chroma/README.md
@ -1,63 +0,0 @@
 # Data query example
 This example makes use of [langchain and chroma](https://blog.langchain.dev/langchain-chroma/) to enable question answering on a set of documents.
 ## Setup
 Download the models and start the API:
 ```bash
 # Clone LocalAI
 git clone https://github.com/go-skynet/LocalAI
 cd LocalAI/examples/langchain-chroma
 wget https://huggingface.co/skeskinen/ggml/resolve/main/all-MiniLM-L6-v2/ggml-model-q4_0.bin -O models/bert
 wget https://gpt4all.io/models/ggml-gpt4all-j.bin -O models/ggml-gpt4all-j
 # configure your .env
 # NOTE: ensure that THREADS does not exceed your machine's CPU cores
 mv .env.example .env
 # start with docker-compose
 docker-compose up -d --build
 # tail the logs & wait until the build completes
 docker logs -f langchain-chroma-api-1
 ```
 ### Python requirements
 ```
 pip install -r requirements.txt
 ```
 ### Create a storage
 In this step we will create a local vector database from our document set, so later we can ask questions on it with the LLM.
 Note: **OPENAI_API_KEY** is not required. However the library might fail if no API_KEY is passed by, so an arbitrary string can be used.
 ```bash
 export OPENAI_API_BASE=http://localhost:8080/v1
 export OPENAI_API_KEY=sk-
 wget https://raw.githubusercontent.com/hwchase17/chat-your-data/master/state_of_the_union.txt
 python store.py
 ```
 After it finishes, a directory "db" will be created with the vector index database.
 ## Query
 We can now query the dataset. 
 ```bash
 export OPENAI_API_BASE=http://localhost:8080/v1
 export OPENAI_API_KEY=sk-
 python query.py
 # President Trump recently stated during a press conference regarding tax reform legislation that "we're getting rid of all these loopholes." He also mentioned that he wants to simplify the system further through changes such as increasing the standard deduction amount and making other adjustments aimed at reducing taxpayers' overall burden.    
 ```
 Keep in mind now things are hit or miss!
--- a/examples/langchain-chroma/docker-compose.yml
+++ b/examples/langchain-chroma/docker-compose.yml
@ -1,15 +0,0 @@
 version: '3.6'
 services:
  api:
    image: quay.io/go-skynet/local-ai:latest
    build:
      context: ../../
      dockerfile: Dockerfile
    ports:
      - 8080:8080
    env_file:
      - ../../.env
    volumes:
      - ./models:/models:cached
    command: ["/usr/bin/local-ai"]
--- a/examples/langchain-chroma/models
+++ b/examples/langchain-chroma/models
@ -1 +0,0 @@
 ../models
--- a/examples/langchain-chroma/query.py
+++ b/examples/langchain-chroma/query.py
@ -1,23 +0,0 @@
 import os
 from langchain.vectorstores import Chroma
 from langchain.embeddings import OpenAIEmbeddings
 from langchain.chat_models import ChatOpenAI
 from langchain.chains import RetrievalQA
 from langchain.vectorstores.base import VectorStoreRetriever
 base_path = os.environ.get('OPENAI_API_BASE', 'http://localhost:8080/v1')
 # Load and process the text
 embedding = OpenAIEmbeddings(model="text-embedding-ada-002", openai_api_base=base_path)
 persist_directory = 'db'
 # Now we can load the persisted database from disk, and use it as normal. 
 llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo", openai_api_base=base_path)
 vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)
 retriever = VectorStoreRetriever(vectorstore=vectordb)
 qa = RetrievalQA.from_llm(llm=llm, retriever=retriever)
 query = "What the president said about taxes ?"
 print(qa.run(query))
--- a/examples/langchain-chroma/requirements.txt
+++ b/examples/langchain-chroma/requirements.txt
@ -1,4 +0,0 @@
 langchain==0.3.3
 openai==1.52.2
 chromadb==0.5.13
 llama-index==0.11.20
--- a/examples/langchain-chroma/store.py
+++ b/examples/langchain-chroma/store.py
@ -1,25 +0,0 @@
 import os
 from langchain.vectorstores import Chroma
 from langchain.embeddings import OpenAIEmbeddings
 from langchain.text_splitter import CharacterTextSplitter
 from langchain.document_loaders import TextLoader
 base_path = os.environ.get('OPENAI_API_BASE', 'http://localhost:8080/v1')
 # Load and process the text
 loader = TextLoader('state_of_the_union.txt')
 documents = loader.load()
 text_splitter = CharacterTextSplitter(chunk_size=300, chunk_overlap=70)
 texts = text_splitter.split_documents(documents)
 # Embed and store the texts
 # Supplying a persist_directory will store the embeddings on disk
 persist_directory = 'db'
 embedding = OpenAIEmbeddings(model="text-embedding-ada-002", openai_api_base=base_path)
 vectordb = Chroma.from_documents(documents=texts, embedding=embedding, persist_directory=persist_directory)
 vectordb.persist()
 vectordb = None
--- a/examples/langchain-huggingface/README.md
+++ b/examples/langchain-huggingface/README.md
@ -1,68 +0,0 @@
 # Data query example
 Example of integration with HuggingFace Inference API with help of [langchaingo](https://github.com/tmc/langchaingo).
 ## Setup
 Download the LocalAI and start the API:
 ```bash
 # Clone LocalAI
 git clone https://github.com/go-skynet/LocalAI
 cd LocalAI/examples/langchain-huggingface
 docker-compose up -d
 ```
 Node: Ensure you've set `HUGGINGFACEHUB_API_TOKEN` environment variable, you can generate it
 on [Settings / Access Tokens](https://huggingface.co/settings/tokens) page of HuggingFace site.
 This is an example `.env` file for LocalAI:
 ```ini
 MODELS_PATH=/models
 CONTEXT_SIZE=512
 HUGGINGFACEHUB_API_TOKEN=hg_123456
 ```
 ## Using remote models
 Now you can use any remote models available via HuggingFace API, for example let's enable using of
 [gpt2](https://huggingface.co/gpt2) model in `gpt-3.5-turbo.yaml` config:
 ```yml
 name: gpt-3.5-turbo
 parameters:
  model: gpt2
  top_k: 80
  temperature: 0.2
  top_p: 0.7
 context_size: 1024
 backend: "langchain-huggingface"
 stopwords:
 - "HUMAN:"
 - "GPT:"
 roles:
  user: " "
  system: " "
 template:
  completion: completion
  chat: gpt4all
 ```
 Here is you can see in field `parameters.model` equal `gpt2` and `backend` equal `langchain-huggingface`.
 ## How to use
 ```shell
 # Now API is accessible at localhost:8080
 curl http://localhost:8080/v1/models
 # {"object":"list","data":[{"id":"gpt-3.5-turbo","object":"model"}]}
 curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
  "model": "gpt-3.5-turbo",
  "prompt": "A long time ago in a galaxy far, far away",
  "temperature": 0.7
 }'
 ```
--- a/examples/langchain-huggingface/docker-compose.yml
+++ b/examples/langchain-huggingface/docker-compose.yml
@ -1,15 +0,0 @@
 version: '3.6'
 services:
  api:
    image: quay.io/go-skynet/local-ai:latest
    build:
      context: ../../
      dockerfile: Dockerfile
    ports:
      - 8080:8080
    env_file:
      - ../../.env
    volumes:
      - ./models:/models:cached
    command: ["/usr/bin/local-ai"]
--- a/examples/langchain-huggingface/models
+++ b/examples/langchain-huggingface/models
@ -1 +0,0 @@
 ../models
--- a/examples/langchain-python/README.md
+++ b/examples/langchain-python/README.md
@ -1,29 +0,0 @@
 ## Langchain-python
 Langchain example from [quickstart](https://python.langchain.com/en/latest/getting_started/getting_started.html).
 To interact with langchain, you can just set the `OPENAI_API_BASE` URL and provide a token with a random string.
 See the example below:
 ```
 # Clone LocalAI
 git clone https://github.com/go-skynet/LocalAI
 cd LocalAI/examples/langchain-python
 # start with docker-compose
 docker-compose up --pull always
 pip install langchain
 pip install openai
 export OPENAI_API_BASE=http://localhost:8080
 # Note: **OPENAI_API_KEY** is not required. However the library might fail if no API_KEY is passed by, so an arbitrary string can be used.
 export OPENAI_API_KEY=sk-
 python test.py
 # A good company name for a company that makes colorful socks would be "Colorsocks".
 python agent.py
 ```
--- a/examples/langchain-python/agent.py
+++ b/examples/langchain-python/agent.py
@ -1,44 +0,0 @@
 ## This is a fork/based from https://gist.github.com/wiseman/4a706428eaabf4af1002a07a114f61d6
 from io import StringIO
 import sys
 import os
 from typing import Dict, Optional
 from langchain.agents import load_tools
 from langchain.agents import initialize_agent
 from langchain.agents.tools import Tool
 from langchain.llms import OpenAI
 base_path = os.environ.get('OPENAI_API_BASE', 'http://localhost:8080/v1')
 model_name = os.environ.get('MODEL_NAME', 'gpt-3.5-turbo')
 class PythonREPL:
    """Simulates a standalone Python REPL."""
    def __init__(self):
        pass        
    def run(self, command: str) -> str:
        """Run command and returns anything printed."""
        old_stdout = sys.stdout
        sys.stdout = mystdout = StringIO()
        try:
            exec(command, globals())
            sys.stdout = old_stdout
            output = mystdout.getvalue()
        except Exception as e:
            sys.stdout = old_stdout
            output = str(e)
        return output
 llm = OpenAI(temperature=0.0, openai_api_base=base_path, model_name=model_name)
 python_repl = Tool(
        "Python REPL",
        PythonREPL().run,
        """A Python shell. Use this to execute python commands. Input should be a valid python command.
        If you expect output it should be printed out.""",
    )
 tools = [python_repl]
 agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)
 agent.run("What is the 10th fibonacci number?")
--- a/examples/langchain-python/docker-compose.yaml
+++ b/examples/langchain-python/docker-compose.yaml
@ -1,27 +0,0 @@
 version: '3.6'
 services:
  api:
    image: quay.io/go-skynet/local-ai:latest
    # As initially LocalAI will download the models defined in PRELOAD_MODELS
    # you might need to tweak the healthcheck values here according to your network connection.
    # Here we give a timespan of 20m to download all the required files.
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 1m
      timeout: 20m
      retries: 20
    build:
      context: ../../
      dockerfile: Dockerfile
    ports:
      - 8080:8080
    environment:
      - DEBUG=true
      - MODELS_PATH=/models
      # You can preload different models here as well.
      # See: https://github.com/go-skynet/model-gallery
      - 'PRELOAD_MODELS=[{"url": "github:go-skynet/model-gallery/gpt4all-j.yaml", "name": "gpt-3.5-turbo"}]'
    volumes:
      - ./models:/models:cached
    command: ["/usr/bin/local-ai" ]
--- a/examples/langchain-python/test.py
+++ b/examples/langchain-python/test.py
@ -1,6 +0,0 @@
 from langchain.llms import OpenAI
 llm = OpenAI(temperature=0.9,model_name="gpt-3.5-turbo")
 text = "What would be a good company name for a company that makes colorful socks?"
 print(llm(text))
--- a/examples/langchain/.gitignore
+++ b/examples/langchain/.gitignore
@ -1,2 +0,0 @@
 models/ggml-koala-13B-4bit-128g
 models/ggml-gpt4all-j
--- a/examples/langchain/JS.Dockerfile
+++ b/examples/langchain/JS.Dockerfile
@ -1,6 +0,0 @@
 FROM node:lts-alpine
 COPY ./langchainjs-localai-example /app
 WORKDIR /app
 RUN npm install
 RUN npm run build
 ENTRYPOINT [ "npm", "run", "start" ]
--- a/examples/langchain/PY.Dockerfile
+++ b/examples/langchain/PY.Dockerfile
@ -1,5 +0,0 @@
 FROM python:3.13-bullseye
 COPY ./langchainpy-localai-example /app
 WORKDIR /app
 RUN pip install --no-cache-dir -r requirements.txt
 ENTRYPOINT [ "python", "./full_demo.py" ]
--- a/examples/langchain/README.md
+++ b/examples/langchain/README.md
@ -1,30 +0,0 @@
 # langchain
 Example of using langchain, with the standard OpenAI llm module, and LocalAI. Has docker compose profiles for both the Typescript and Python versions.
 **Please Note** - This is a tech demo example at this time. ggml-gpt4all-j has pretty terrible results for most langchain applications with the settings used in this example.
 ## Setup
 ```bash
 # Clone LocalAI
 git clone https://github.com/go-skynet/LocalAI
 cd LocalAI/examples/langchain
 # (optional) - Edit the example code in typescript.
 # vi ./langchainjs-localai-example/index.ts
 # Download gpt4all-j to models/
 wget https://gpt4all.io/models/ggml-gpt4all-j.bin -O models/ggml-gpt4all-j
 # start with docker-compose for typescript!
 docker-compose --profile ts up --build
 # or start with docker-compose for python!
 docker-compose --profile py up --build
 ```
 ## Copyright
 Some of the example code in index.mts and full_demo.py is adapted from the langchainjs project and is Copyright (c) Harrison Chase. Used under the terms of the MIT license, as is the remainder of this code.
--- a/examples/langchain/docker-compose.yaml
+++ b/examples/langchain/docker-compose.yaml
@ -1,43 +0,0 @@
 version: '3.6'
 services:
  api:
    image: quay.io/go-skynet/local-ai:latest
    build:
      context: ../../
      dockerfile: Dockerfile
    ports:
      - 8080:8080
    environment:
      - DEBUG=true
      - MODELS_PATH=/models
    volumes:
      - ./models:/models:cached
    command: ["/usr/bin/local-ai" ]
  js:
    build:
      context: .
      dockerfile: JS.Dockerfile
    profiles:
      - js
      - ts
    depends_on:
    - "api"
    environment:
      - 'OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXX'
      - 'OPENAI_API_BASE=http://api:8080/v1'
      - 'MODEL_NAME=gpt-3.5-turbo' #gpt-3.5-turbo' # ggml-gpt4all-j' # ggml-koala-13B-4bit-128g'
  py:
    build:
      context: .
      dockerfile: PY.Dockerfile
    profiles:
      - py
    depends_on:
    - "api"
    environment:
      - 'OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXX'
      - 'OPENAI_API_BASE=http://api:8080/v1'
      - 'MODEL_NAME=gpt-3.5-turbo' #gpt-3.5-turbo' # ggml-gpt4all-j' # ggml-koala-13B-4bit-128g'
--- a/examples/langchain/langchainjs-localai-example/.gitignore
+++ b/examples/langchain/langchainjs-localai-example/.gitignore
@ -1,2 +0,0 @@
 node_modules/
 dist/
--- a/examples/langchain/langchainjs-localai-example/.vscode/launch.json
+++ b/examples/langchain/langchainjs-localai-example/.vscode/launch.json
@ -1,20 +0,0 @@
 {
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
        {
            "type": "node",
            "request": "launch",
            "name": "Launch Program",
            // "skipFiles": [
            //     "<node_internals>/**"
            // ],
            "program": "${workspaceFolder}\\dist\\index.mjs",
            "outFiles": [
                "${workspaceFolder}/**/*.js"
            ]
        }
    ]
 }
--- a/Show More
+++ b/Show More
		`@ -1,2 +0,0 @@`
			`models/ggml-koala-13B-4bit-128g`
			`models/ggml-gpt4all-j`