LocalAI/gpt-vision.md at f93fe303500cf2b8f90a6e38dad9bba36303239d

mirror of https://github.com/mudler/LocalAI.git synced 2025-05-06 10:38:17 +00:00

Ettore Di Giacinto 148adebe16

Also change icons on GPT vision page

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2024-06-03 16:58:53 +02:00

1.9 KiB

Raw Blame History

+++ disableToc = false title = "🥽 GPT Vision" weight = 14 url = "/features/gpt-vision/" +++

LocalAI supports understanding images by using LLaVA, and implements the GPT Vision API from OpenAI.

Usage

OpenAI docs: https://platform.openai.com/docs/guides/vision

To let LocalAI understand and reply with what sees in the image, use the /v1/chat/completions endpoint, for example with curl:

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "llava",
     "messages": [{"role": "user", "content": [{"type":"text", "text": "What is in the image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" }}], "temperature": 0.9}]}'

Grammars and function tools can be used as well in conjunction with vision APIs:

 curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "llava", "grammar": "root ::= (\"yes\" | \"no\")",
     "messages": [{"role": "user", "content": [{"type":"text", "text": "Is there some grass in the image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" }}], "temperature": 0.9}]}'

Setup

All-in-One images have already shipped the llava model as gpt-4-vision-preview, so no setup is needed in this case.

To setup the LLaVa models, follow the full example in the configuration examples.

1.9 KiB Raw Blame History

Usage

Setup

1.9 KiB

Raw Blame History