From 7fc0947d033dc1b5a547cf516ef7dc130ab26593 Mon Sep 17 00:00:00 2001 From: ianarawjo Date: Mon, 19 Jun 2023 20:39:01 -0400 Subject: [PATCH] Update README.md --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index eba4327..c9d5ca0 100644 --- a/README.md +++ b/README.md @@ -6,7 +6,9 @@ ChainForge is a data flow prompt engineering environment for analyzing and evaluating LLM responses. It is geared towards early-stage, quick-and-dirty exploration of prompts and response quality that goes beyond ad-hoc chatting with individual LLMs. With ChainForge, you can: - Query multiple LLMs at once to test prompt ideas and variations quickly and effectively. - Compare response quality across prompt permutations, across models, and across model settings to choose the best prompt and model for your use case. - - Setup an evaluation metric (scoring function) and immediately visualize results across prompts, prompt parameters, models, and model settings. + - Setup an evaluation metric (scoring function) and immediately visualize results across prompts, prompt parameters, models, and model settings. + +ChainForge comes with a number of example evaluation flows so you can get a sense of what's possible, including 188 example flows generated from benchmarks in OpenAI evals. **This is an open alpha of Chainforge.** Functionality is powerful but limited. We currently support OpenAI models GPT3.5 and GPT4, Anthropic's Claude, Google PaLM2, and [Dalai](https://github.com/cocktailpeanut/dalai)-hosted models Alpaca and Llama. You can change the exact model and individual model settings. Visualization nodes support numeric and boolean evaluation metrics. Try it and let us know what you'd like to see in the future! :)