<imgwidth="702"alt="Screen Shot 2023-05-22 at 12 40 12 PM"src="https://github.com/ianarawjo/ChainForge/assets/5251713/9116b310-4104-4955-bf52-be0e852db5ba">
Increase `Num responses per prompt` to sample `n` responses for every query to every LLM.
Note that if you have multiple template variables, ChainForge will calculate the _cross product_ of all inputs: all combinations of all input variables.
For instance, for the prompt `What {time} did {game} come out in the US?` where `time` could be `year` or `month`, and `game` could be one of 3 games `Pokemon Blue`, `Kirby's Dream Land`, and `Ocarina of Time`, we have `2 x 3 = 6` combinations:
-`What year did Pokemon Blue come out in the US?`
-`What month did Pokemon Blue come out in the US?`
-`What year did Kirby's Dream Land come out in the US?`
-`What month did Kirby's Dream Land come out in the US?`
-`What year did`... etc etc
**Add / change LLMs to query**: Click `Add +` in the drop-down list to add an LLM, or click the Trash icon to remove one. GPT3.5 (ChatGPT) is added by default. (See the `INSTALL_GUIDE.md` for currently supported LLMs.)
**Prompt the selected LLMs with the provided query**:
If you return a dictionary with more than one key, metrics will be plotted in a parallel coordinates plot. For example, for four keys in the dictionary:
<imgwidth="466"alt="Screen Shot 2023-05-22 at 12 58 17 PM"src="https://github.com/ianarawjo/ChainForge/assets/5251713/a220f8d4-5fd7-4bfd-adfa-4062440078fd">
Plot evaluation scores quickly with a plot that makes sense for the shape of the input data.
To plot data, attached the output of an Evaluator node to a Vis Node. The output you see
will depend on the shape of your input data (see below). Use the `MultiSelect` at the top to select the prompt parameters you're interested in.
For instance, in `basic_comparison.cforge` in `examples/`, we can plot length of response by `{game}` across LLMs:
<imgwidth="555"alt="Screen Shot 2023-05-22 at 1 02 02 PM"src="https://github.com/ianarawjo/ChainForge/assets/5251713/04a36f5a-beae-4873-aaaf-7e62eb0bcccc">
Or maybe we don't care about the `{game}` parameter, only the overall length of responses per LLM to get a sense of the response complexity.
We can plot this by simply removing the parameter:
<imgwidth="553"alt="Screen Shot 2023-05-22 at 1 02 07 PM"src="https://github.com/ianarawjo/ChainForge/assets/5251713/f91302db-b1ec-40b8-a545-1ce52cc487ec">
> **Note**
> Currently, you can only attach Evaluator nodes to Vis Nodes. This may change in the future.
**Currently supported plots by type of evaluation results, number of LLMs, and number of prompt parameters:**
- Numeric, one LLM, one prompt parameter: Simple box-and-whiskers plot
- Numeric, multiple LLMs, no prompt parameters: Simple box-and-whiskers plot, where categories are LLMs
- Numeric, multiple LLMs, one prompt parameter: Box-and-whiskers plot grouped by LLM
- Numeric, one LLM, two prompt parameters: 3D scatterplot (_**Experimental**_)
- Boolean, multiple LLMs, no prompt parameters: Stacked histogram of true/false values per LLM
- Boolean, multiple LLMs, one prompt parameter: Stacked histogram of `true` values per LLM, grouped by parameter value
- Dictionary (key-metric), one LLM, one prompt parameter: Parallel coordinates plot
- _Dictionary (key-metric), multiple LLMs: Currently unsupported.
To complare across LLMs with multiple metrics, currently just remove all but one LLM from the upstream prompt node to select what LLM you wish to display._
You can export your flow as a `cforge` file (JSON) by clicking the `Export` button at the top of the screen.
Import flows via the `Import` button.
> **Note**
> The exported file contains the entire cache of LLM responses, available in the `cache/` directory where the `chainforge` package is installed.
> When you import a flow, these cache files are re-saved to the importing user's local `cache/` directory. This saves money and time:
> LLMs don't need to be re-queried by the user importing your flow. If for some reason you wish to delete cache'd responses, remove the `cache` folder in the package directory.