LocalAI/examples/query_data
James Braza e34b5f0119
Cleaning up examples/ models and starter .env files ()
Closes https://github.com/go-skynet/LocalAI/issues/1066 and
https://github.com/go-skynet/LocalAI/issues/1065

Standardizes all `examples/`:
- Models in one place (other than `rwkv`, which was one-offy)
- Env files as `.env.example` with `cp`
    - Also standardizes comments and links docs
2023-10-02 18:14:10 +02:00
..
data example(add): document query example 2023-05-05 21:56:31 +02:00
.gitignore example(add): document query example 2023-05-05 21:56:31 +02:00
docker-compose.yml docs: fix langchain-chroma example () 2023-05-18 22:50:21 +02:00
models Cleaning up examples/ models and starter .env files () 2023-10-02 18:14:10 +02:00
query.py [query_data example] max_chunk_overlap in PromptHelper must be in 0..1 range () 2023-09-04 19:12:53 +02:00
README.md feat: allow to override model config () 2023-05-20 17:03:53 +02:00
store.py [query_data example] max_chunk_overlap in PromptHelper must be in 0..1 range () 2023-09-04 19:12:53 +02:00
update.py examples: fix default parameter 2023-05-07 10:13:57 +02:00

Data query example

This example makes use of Llama-Index to enable question answering on a set of documents.

It loosely follows the quickstart.

Summary of the steps:

  • prepare the dataset (and store it into data)
  • prepare a vector index database to run queries on
  • run queries

Requirements

You will need a training data set. Copy that over data.

Setup

Start the API:

# Clone LocalAI
git clone https://github.com/go-skynet/LocalAI

cd LocalAI/examples/query_data

wget https://huggingface.co/skeskinen/ggml/resolve/main/all-MiniLM-L6-v2/ggml-model-q4_0.bin -O models/bert
wget https://gpt4all.io/models/ggml-gpt4all-j.bin -O models/ggml-gpt4all-j

# start with docker-compose
docker-compose up -d --build

Create a storage

In this step we will create a local vector database from our document set, so later we can ask questions on it with the LLM.

Note: OPENAI_API_KEY is not required. However the library might fail if no API_KEY is passed by, so an arbitrary string can be used.

export OPENAI_API_BASE=http://localhost:8080/v1
export OPENAI_API_KEY=sk-

python store.py

After it finishes, a directory "storage" will be created with the vector index database.

Query

We can now query the dataset.

export OPENAI_API_BASE=http://localhost:8080/v1
export OPENAI_API_KEY=sk-

python query.py

Update

To update our vector database, run update.py

export OPENAI_API_BASE=http://localhost:8080/v1
export OPENAI_API_KEY=sk-

python update.py