From 3bfee02ec08dfd6e32201f5c50f0de1ee669bff9 Mon Sep 17 00:00:00 2001 From: ianarawjo Date: Sun, 11 Jun 2023 12:19:04 -0400 Subject: [PATCH] Add Tabular Data node and new Evaluator features to GUIDE.md --- GUIDE.md | 55 +++++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 47 insertions(+), 8 deletions(-) diff --git a/GUIDE.md b/GUIDE.md index e6a81d8..8edf099 100644 --- a/GUIDE.md +++ b/GUIDE.md @@ -1,4 +1,4 @@ -# User Guide +# Node and Features Guide An explanation of all nodes and features currently available in the alpha version of ChainForge. @@ -59,6 +59,30 @@ You can also add a prompt template as a field, and an input hook will appear: This way, you can chain prompt templates together to, for instance, test what the best prompt _template_ is for your use case. All prompt variables will be accessible later on in an evaluation chain, including the templates themselves. +------------------ +## Tabular Data Node + +Tabular data provides an easy way to enter associated prompt parameters or import existing datasets and benchmarks. A typical use case is **ground truth evaluation**, where we have some inputs to a prompt, and an "ideal" or expected answer: + +Screen Shot 2023-06-10 at 2 23 13 PM + +Here, we see **variables `{first}`, `{last}`, and `{invention}` "carry together" when filling the prompt template**: ChainForge knows they are all associated with one another, connected via the row. Thus, it constructs 4 prompts from the input parameters. This is different than using separate Textfields nodes as input, which will calculate the cross product of all inputs (as described in Prompt Node above). + +You can press Import data to import files with format `jsonl`, `xlsx`, and `csv`. + +> **Note** +> Excel and CSV files must have a header row with column names. + +To insert a row or delete one, right-click on a row cell: + +tabular-data-row-dropdown + +To insert a column, rename or delete one, click on the column `...` button: + +tabular-data-col-dropdown + +You can also change cell text by simply editing it. + ------------------ ## CSV Node Create a comma-separated list of values to input into a prompt parameter: @@ -96,15 +120,20 @@ For instance, here is a basic evaluator to check the length of the response: Screen Shot 2023-05-22 at 1 50 13 PM -The `response` argument is a `ResponseInfo` object: +The `response` argument is a `ResponseInfo` object. From the source code: ```python class ResponseInfo: - text: str - prompt: str - var: dict - llm: str + """Stores info about a single LLM response. Passed to evaluator functions.""" + text: str # The text of the LLM response + prompt: str # The text of the prompt using to query the LLM + var: dict # A dictionary of arguments that filled in the prompt template used to generate the final prompt + meta: dict # A dictionary of metadata ('metavars') that is 'carried alongside' data used to generate the prompt + llm: str # The name of the LLM queried (the nickname in ChainForge) + + def __str__(self): + return self.text ``` -Use `var` to get access to prompt parameter values. For instance, suppose we have the prompt: +Use `var` to get access to values that were input into a prompt template. For instance, suppose we have the prompt: > What year was {game} released? @@ -126,7 +155,17 @@ def evaluate(response): return release_year(response.var['game']) == extract_year(response.text) ``` -**Return values must currently be of the following types:** +A simpler solution, however, may be to use a Tabular Data node and `response.meta` with the year of the game's release as a column of the table. For instance, here is an analogous situation of comparing the LLM's response to the 'ground truth' for math problem questions: + +Screen Shot 2023-06-11 at 11 51 28 AM + +We use `response.meta['Expected']` to get the value of the table associated with the inputs. Notice that "Expected" _was not an input parameter to the prompt_. Instead, "Expected" is associated with a prompt input variable `question` (which you could access using `response.var['question']`). Using `meta` (short for metadata) like this can be quite useful when writing more complex evaluations. + +If you're curious about the response format or need to debug your evaluations, Evaluator Nodes expose `print` output within the `evaluate` function, so you can use Python `print` or `raise Exception` functions to get feedback: + +Screen Shot 2023-06-10 at 8 29 38 PM + +### Return values of Evaluator Nodes must currently be of the following types: - Numeric - Boolean (`true` or `false`) - Dictionaries with numeric data (key-value pairs of type `{: }`