### Set a prompt and number of responses requested
Below is a Prompt Node (right) with a TextFields node as input data. You can write your prompt in the text field at the top. Use `{}` template hooks to declare input variables, which you can attach to other nodes. For example, here is a prompt node with one input parameter:
With ChainForge, you can query one or multiple LLMs simulatenously with the same prompts. Click `Add +` in the drop-down list to add an LLM, or click the Trash icon to remove one. GPT3.5 (ChatGPT) is added by default.
See the `INSTALL_GUIDE.md` for currently supported LLMs.
### Prompt Templating in ChainForge
ChainForge uses single braces `{var}` for variables. You can escape braces with `\`; for instance, `function foo() \{ return true; \}` in a TextFields
node will generate a prompt `function foo() { return true; }`.
> **Warning**
> All of your prompt variables should have unique names across an entire flow. If you use duplicate names, behavior is not guaranteed.
ChainForge includes power features for generating tons of permutations of prompts via template variables.
If you have multiple template variables input to a prompt node, ChainForge will calculate the _cross product_ of all inputs: all combinations of all input variables.
For instance, for the prompt `What {time} did {game} come out in the US?` where `time` could be `year` or `month`, and `game` could be one of 3 games `Pokemon Blue`, `Kirby's Dream Land`, and `Ocarina of Time`, we have `2 x 3 = 6` combinations:
-`What year did Pokemon Blue come out in the US?`
-`What month did Pokemon Blue come out in the US?`
-`What year did Kirby's Dream Land come out in the US?`
-`What month did Kirby's Dream Land come out in the US?`
Tabular data provides an easy way to enter associated prompt parameters or import existing datasets and benchmarks. A typical use case is **ground truth evaluation**, where we have some inputs to a prompt, and an "ideal" or expected answer:
<imgwidth="1377"alt="Screen Shot 2023-06-10 at 2 23 13 PM"src="https://github.com/ianarawjo/ChainForge/assets/5251713/e3dd6941-47d4-4eee-b8b1-d9007f7aae15">
Here, we see **variables `{first}`, `{last}`, and `{invention}` "carry together" when filling the prompt template**: ChainForge knows they are all associated with one another, connected via the row. Thus, it constructs 4 prompts from the input parameters. This is different than using separate Textfields nodes as input, which will calculate the cross product of all inputs (as described in Prompt Node above).
You can press Import data to import files with format `jsonl`, `xlsx`, and `csv`.
> **Note**
> Excel and CSV files must have a header row with column names.
To insert a row or delete one, right-click on a row cell:
Score responses by writing an evaluate function in Python or JavaScript. This section will refer to Python evaluator, but the JavaScript one is similar.
A simpler solution, however, may be to use a Tabular Data node and `response.meta` with the year of the game's release as a column of the table. For instance, here is an analogous situation of comparing the LLM's response to the 'ground truth' for math problem questions:
<imgwidth="1770"alt="Screen Shot 2023-06-11 at 11 51 28 AM"src="https://github.com/ianarawjo/ChainForge/assets/5251713/3a038fa6-46af-42d8-ac82-e94f7c239b10">
We use `response.meta['Expected']` to get the value of the table associated with the inputs. Notice that "Expected" _was not an input parameter to the prompt_. Instead, "Expected" is associated with a prompt input variable `question` (which you could access using `response.var['question']`). Using `meta` (short for metadata) like this can be quite useful when writing more complex evaluations.
If you're curious about the response format or need to debug your evaluations, Evaluator Nodes expose `print` output within the `evaluate` function, so you can use Python `print` or `raise Exception` functions to get feedback:
<imgwidth="377"alt="Screen Shot 2023-06-10 at 8 29 38 PM"src="https://github.com/ianarawjo/ChainForge/assets/5251713/6863c427-ef59-4e8d-92c3-fe8e92ad7415">
### Return values of Evaluator Nodes must currently be of the following types:
If you return a dictionary with more than one key, metrics will be plotted in a parallel coordinates plot. For example, for four keys in the dictionary:
> Currently, you can only attach Evaluator nodes to Vis Nodes. This may change in the future.
**Currently supported plots by type of evaluation results, number of LLMs, and number of prompt parameters:**
- Numeric, one LLM, one prompt parameter: Simple box-and-whiskers plot
- Numeric, multiple LLMs, no prompt parameters: Simple box-and-whiskers plot, where categories are LLMs
- Numeric, multiple LLMs, one prompt parameter: Box-and-whiskers plot grouped by LLM
- Numeric, one LLM, two prompt parameters: 3D scatterplot (_**Experimental**_)
- Boolean, multiple LLMs, no prompt parameters: Stacked histogram of true/false values per LLM
- Boolean, multiple LLMs, one prompt parameter: Stacked histogram of `true` values per LLM, grouped by parameter value
- Dictionary (key-metric), one LLM, one prompt parameter: Parallel coordinates plot
- _Dictionary (key-metric), multiple LLMs: Currently unsupported.
To complare across LLMs with multiple metrics, currently just remove all but one LLM from the upstream prompt node to select what LLM you wish to display._
You can export your flow as a `cforge` file (JSON) by clicking the `Export` button at the top of the screen.
Import flows via the `Import` button.
> **Note**
> The exported file contains the entire cache of LLM responses, available in the `cache/` directory where the `chainforge` package is installed.
> When you import a flow, these cache files are re-saved to the importing user's local `cache/` directory. This saves money and time:
> LLMs don't need to be re-queried by the user importing your flow. If for some reason you wish to delete cache'd responses, remove the `cache` folder in the package directory.