From cca6205b70d21d54438eb78a0bc51d6c54a2e276 Mon Sep 17 00:00:00 2001 From: Ian Arawjo Date: Mon, 8 Jan 2024 18:35:14 -0500 Subject: [PATCH] Remove guide and install_guide, as they are outdated and repeats of the docs --- GUIDE.md | 272 ----------------------------------------------- INSTALL_GUIDE.md | 138 ------------------------ 2 files changed, 410 deletions(-) delete mode 100644 GUIDE.md delete mode 100644 INSTALL_GUIDE.md diff --git a/GUIDE.md b/GUIDE.md deleted file mode 100644 index 8147fbb..0000000 --- a/GUIDE.md +++ /dev/null @@ -1,272 +0,0 @@ -# Node and Features Guide - -An explanation of all nodes and features currently available in the alpha version of ChainForge. - -## Prompt Nodes and Prompt Templating - -### Set a prompt and number of responses requested -Below is a Prompt Node (right) with a TextFields node as input data. You can write your prompt in the text field at the top. Use `{}` template hooks to declare input variables, which you can attach to other nodes. For example, here is a prompt node with one input parameter: - -Screen Shot 2023-05-22 at 12 40 12 PM - -You can increase `Num responses per prompt` to sample more than 1 response for every prompt to every LLM. - -### Set LLMs to query - -With ChainForge, you can query one or multiple LLMs simulatenously with the same prompts. Click `Add +` in the drop-down list to add an LLM, or click the Trash icon to remove one. GPT3.5 (ChatGPT) is added by default. - -See the `INSTALL_GUIDE.md` for currently supported LLMs. - -### Prompt Templating in ChainForge - -ChainForge uses single braces `{var}` for variables. You can escape braces with `\`; for instance, `function foo() \{ return true; \}` in a TextFields -node will generate a prompt `function foo() { return true; }`. - -> **Warning** -> All of your prompt variables should have unique names across an entire flow. If you use duplicate names, behavior is not guaranteed. - -ChainForge includes power features for generating tons of permutations of prompts via template variables. -If you have multiple template variables input to a prompt node, ChainForge will calculate the _cross product_ of all inputs: all combinations of all input variables. -For instance, for the prompt `What {time} did {game} come out in the US?` where `time` could be `year` or `month`, and `game` could be one of 3 games `Pokemon Blue`, `Kirby's Dream Land`, and `Ocarina of Time`, we have `2 x 3 = 6` combinations: - - - `What year did Pokemon Blue come out in the US?` - - `What month did Pokemon Blue come out in the US?` - - `What year did Kirby's Dream Land come out in the US?` - - `What month did Kirby's Dream Land come out in the US?` - - `What year did`... etc - -There is an exception: if multiple inputs are the columns of Tabular Data nodes, then those variables will _carry together_. -This lets you pass associated information, such as a city and a country, defined in rows of a table. -For more information, see the Tabular Data section below. - -Finally, you may use a special hashtag `#` before a template variable name -to denote _implicit_ template variables that should be filled -_using prior variable and metavariable history associated with the input to that node_. -This is best explained with a practical example: - -Screen Shot 2023-08-01 at 11 30 01 AM - -Here, I have a Prompt Node with an _explicit_ template `{question}`. Each input (value in a table row) -has an associated metavariable, the value of the column `Expected`. I can use this value in any later prompt templates via `{#Expected}`, -even if they are further down a prompt chain. Note that we could've also used `{#question}` in the LLM Scorer here -to use the original value of `{question}` associated with each response into our LLM Scorer prompt. - -See the Code Evaluator section below for more details on what `vars` and `metavars` are. - -### Query the selected LLMs with all prompts - -When you are ready, hover over the Run button: - -Screen Shot 2023-05-22 at 1 45 43 PM - -A tooltip will provide feedback on how many responses it will send (sometimes -this can be quite a lot if you provide many values to a template variable). -If you are sure, press Run: - -![prompt-node-run-example](https://github.com/ianarawjo/ChainForge/assets/5251713/888c9805-442c-43a2-8402-f003a96f56db) - -ChainForge will now query all LLMs at once, simultaneously (within reasonable rate limits), -and provides live feedback on its current progress. - -> **Note** -> Once requests are sent, you cannot currently stop them mid-way through. -> However, all responses from an LLM are cache'd the moment ChainForge receives them, so you won't lose money if something goes wrong. -> Due to the asynchronous nature of API requests, sometimes requests may fail to complete or hang. -> If you get stuck or want to stop sending requests, restart the backend server by pressing `Ctrl+C` twice and re-running `chainforge serve`. -> We are working on improving the user experience in the future. - ------------------- -## TextFields Node - -Text fields provide a way to define input values to prompt parameters. Each text field counts as a single input value to a prompt template. -Click the `+` button to add a text field: - -Screen Shot 2023-05-22 at 1 48 02 PM - -You can also add a prompt template as a field, and an input hook will appear: - -Screen Shot 2023-05-22 at 1 47 37 PM - -This way, you can chain prompt templates together to, for instance, test what the best prompt _template_ is for your use case. -All prompt variables will be accessible later on in an evaluation chain, including the templates themselves. - ------------------- -## Tabular Data Node - -Tabular data provides an easy way to enter associated prompt parameters or import existing datasets and benchmarks. A typical use case is **ground truth evaluation**, where we have some inputs to a prompt, and an "ideal" or expected answer: - -Screen Shot 2023-06-10 at 2 23 13 PM - -Here, we see **variables `{first}`, `{last}`, and `{invention}` "carry together" when filling the prompt template**: ChainForge knows they are all associated with one another, connected via the row. Thus, it constructs 4 prompts from the input parameters. This is different than using separate Textfields nodes as input, which will calculate the cross product of all inputs (as described in Prompt Node above). - -You can press Import data to import files with format `jsonl`, `xlsx`, and `csv`. - -> **Note** -> Excel and CSV files must have a header row with column names. - -To insert a row or delete one, right-click on a row cell: - -tabular-data-row-dropdown - -To insert a column, rename or delete one, click on the column `...` button: - -tabular-data-col-dropdown - -You can also change cell text by simply editing it. - ------------------- -## CSV Node -Create a comma-separated list of values to input into a prompt parameter: - -Screen Shot 2023-05-22 at 1 48 45 PM - -You can escape `,` by enclosing values in quotes, e.g. `"this,is,an,example"`. -_You cannot currently add a prompt template to a CSV node._ - ------------------- -## Inspect node - -Inspect responses by attaching an Inspect node to Prompt or Evaluator nodes. Group responses by input variables or LLMs, at arbitrary depth: - -Screen Shot 2023-05-19 at 4 14 38 PM - -Use `Export Data` to export the data as an Excel `xlsx` file, e.g: - -Screen Shot 2023-05-19 at 11 13 22 AM - -will produce: - -Screen Shot 2023-05-19 at 11 16 46 AM - -If you've scored responses with an evaluator node, this exports the scores as well. - ------------------- -## Code Evaluator Node - -Score responses by writing an evaluate function in Python or JavaScript. This section will refer to Python evaluator, but the JavaScript one is similar. - -To use a code evaluator, you must declare a `def evaluate(response)` function which will be called by ChainForge for every response in the input. -You can add other helper functions or `import` statements as well. - -For instance, here is a basic evaluator to check the length of the response: - -Screen Shot 2023-05-22 at 1 50 13 PM - -The `response` argument is a `ResponseInfo` object. From the source code: -```python -class ResponseInfo: - """Stores info about a single LLM response. Passed to evaluator functions.""" - text: str # The text of the LLM response - prompt: str # The text of the prompt using to query the LLM - var: dict # A dictionary of arguments that filled in the prompt template used to generate the final prompt - meta: dict # A dictionary of metadata ('metavars') that is 'carried alongside' data used to generate the prompt - llm: str # The name of the LLM queried (the nickname in ChainForge) - - def __str__(self): - return self.text -``` -Use `var` to get access to values that were input into a prompt template. For instance, suppose we have the prompt: - -> What year was {game} released? - -We can use `response.var['game']` to get the value of `game` input to the specific prompt that generated an individual response. -You might use this data to, for instance, compare the output value (say a year) to a database: - -```python -import re, ... - -def release_year(game): - # ... Lookup the game's year in a database here ... - -def extract_year(text): - # Use a regex to extract a year from the response text: - matches = re.findall(r'\b(19|20)\d{2}\b', text) - return matches[0] if len(matches) > 0 else '' - -def evaluate(response): - return release_year(response.var['game']) == extract_year(response.text) -``` - -A simpler solution, however, may be to use a Tabular Data node and `response.meta` with the year of the game's release as a column of the table. For instance, here is an analogous situation of comparing the LLM's response to the 'ground truth' for math problem questions: - -Screen Shot 2023-06-11 at 11 51 28 AM - -We use `response.meta['Expected']` to get the value of the table associated with the inputs. Notice that "Expected" _was not an input parameter to the prompt_. Instead, "Expected" is associated with a prompt input variable `question` (which you could access using `response.var['question']`). Using `meta` (short for metadata) like this can be quite useful when writing more complex evaluations. - -If you're curious about the response format or need to debug your evaluations, Evaluator Nodes expose `print` output within the `evaluate` function, so you can use Python `print` or `raise Exception` functions to get feedback: - -Screen Shot 2023-06-10 at 8 29 38 PM - -### Return values of Evaluator Nodes must currently be of the following types: - - Numeric - - Boolean (`true` or `false`) - - Dictionaries with numeric data (key-value pairs of type `{: }` - -If you return a dictionary with more than one key, metrics will be plotted in a parallel coordinates plot. For example, for four keys in the dictionary: - -Screen Shot 2023-05-18 at 11 10 10 AM - -You can also use a single-key dictionary to label the metric axis of a Vis Node: - -Screen Shot 2023-05-22 at 12 57 02 PM - ------------------- -## LLM Scorer Node - -An LLM Scorer uses a single model to score responses (by default GPT-4 at temperature 0). You must -write a scoring prompt that includes the expected format of output (e.g., "Reply true or false."). The -text of the input will be pasted directly below your prompt, in triple-` tags. - -For instance, here is GPT-4 scoring whether Falcon-7b's responses to math problems are true: - -Screen Shot 2023-08-01 at 11 30 01 AM - -We've used an implicit template variable, `{#Expected}`, to use the metavariable "Expected" associate with each response (from the table to the left). - -> **Note** -> You can also use LLMs to score responses through prompt chaining. However, this requires running outputs through a code evaluator node. -> The LLM Scorer simplifies the process by attaching LLM scores directly as evaluation results, without modifying what LLM generated the response. - ------------------- -## Vis Node - -Visualization nodes are the heart of ChainForge. -Plot evaluation scores quickly with a plot that makes sense for the shape of the input data. - -To plot data, attached the output of an Evaluator node to a Vis Node. The output you see -will depend on the shape of your input data (see below). Use the `MultiSelect` at the top to select the prompt parameters you're interested in. -For instance, in `basic_comparison.cforge` in `examples/`, we can plot length of response by `{game}` across LLMs: - -Screen Shot 2023-05-22 at 1 02 02 PM - -Or maybe we don't care about the `{game}` parameter, only the overall length of responses per LLM to get a sense of the response complexity. -We can plot this by simply removing the parameter: - -Screen Shot 2023-05-22 at 1 02 07 PM - -> **Note** -> Currently, you can only attach Evaluator nodes to Vis Nodes. This may change in the future. - -**Currently supported plots by type of evaluation results, number of LLMs, and number of prompt parameters:** - - Numeric, one LLM, one prompt parameter: Simple box-and-whiskers plot - - Numeric, multiple LLMs, no prompt parameters: Simple box-and-whiskers plot, where categories are LLMs - - Numeric, multiple LLMs, one prompt parameter: Box-and-whiskers plot grouped by LLM - - Numeric, one LLM, two prompt parameters: 3D scatterplot (_**Experimental**_) - - Boolean, multiple LLMs, no prompt parameters: Stacked histogram of true/false values per LLM - - Boolean, multiple LLMs, one prompt parameter: Stacked histogram of `true` values per LLM, grouped by parameter value - - Dictionary (key-metric), one LLM, one prompt parameter: Parallel coordinates plot - - _Dictionary (key-metric), multiple LLMs: Currently unsupported. - To complare across LLMs with multiple metrics, currently just remove all but one LLM from the upstream prompt node to select what LLM you wish to display._ - ------------------- -## Exporting / Importing flows - -Share your evaluation flows with others. -You can export your flow as a `cforge` file (JSON) by clicking the `Export` button at the top of the screen. -Import flows via the `Import` button. - -> **Note** -> The exported file contains the entire cache of LLM responses, available in the `cache/` directory where the `chainforge` package is installed. -> When you import a flow, these cache files are re-saved to the importing user's local `cache/` directory. This saves money and time: -> LLMs don't need to be re-queried by the user importing your flow. If for some reason you wish to delete cache'd responses, remove the `cache` folder in the package directory. - diff --git a/INSTALL_GUIDE.md b/INSTALL_GUIDE.md deleted file mode 100644 index 56d328d..0000000 --- a/INSTALL_GUIDE.md +++ /dev/null @@ -1,138 +0,0 @@ -# Installation Guide - -There are two ways to use ChainForge: - - Via the web version, hosted at https://chainforge.ai/play/. This requires no installation, but is slightly more limited than the full version. - - By installing on your local machine. This has the benefit of being able to load API keys from environment variables, run Python evaluator nodes, and query Dalai-hosted models. - -This document concerns the latter (local installation). - -## Step 1. Install on your machine -The simplest and safest way to install the latest public build of ChainForge is to: - 1. Create a new directory and `cd` into it - 2. _(Optional, but recommended!)_ Create a virtual environment. On Mac, you can do - ```bash - python -m venv venv - source venv/bin/activate - ``` - 3. Install `chainforge` via `pip`: - ```bash - pip install chainforge - ``` - 4. Run: - ```bash - chainforge serve - ``` - > **Note** - > If you'd like to run ChainForge on a different hostname and port, specify `--host` and `--port`. For instance, ```chainforge serve --host 0.0.0.0 --port 3400``` - 5. Open [localhost:8000](http://localhost:8000/) on a recent version of Google Chrome, Mozilla Firefox, Microsoft Edge (Chromium), or Brave browser. - -> **Note** -> ChainForge beta version currently does not support other browsers, but if you want support, please open an Issue or make a Pull Request. The main barrier at the moment is that CSS formatting is slightly different for Safari and other browsers; to fix this, we'd need to correct said formatting problems. - -## Step 2. Get and set API keys for certain model providers - -Though you can run Chainforge, you can't do anything with it without the ability to call an LLM. Currently we support model providers: - - OpenAI models GPT3.5 and GPT4, including all variants and function calls - - HuggingFace models (via the HuggingFace Inference and Inference Endpoints API) - - Anthropic models - - Google PaLM2 chat and text bison models - - Aleph Alpha Luminous Models - - Azure OpenAI Endpoints - - (Locally run) Alpaca and Llama models [Dalai](https://github.com/cocktailpeanut/dalai)-served Alpaca.7b at port 4000. - - To query models like Alpaca and Llama run on your local machine via Dalai, [install `dalai`](https://github.com/cocktailpeanut/dalai) and follow the instructions to download `alpaca.7b`. When everything is setup, run `npx dalai serve 4000`. - -### How to Set API keys for specific model providers (non-Dalai models) -To use a specific model provider, you need to do two things: - 1. **Get an API key.** HuggingFace API keys [are free](https://huggingface.co/docs/api-inference/quicktour). OpenAI API keys are easy to access, and you can even [get one for free during a trial period](https://openaimaster.com/how-to-get-openai-api-key-for-free/). For other providers, see their pages and sign up for access. - 3. **Set the relevant API key in ChainForge.** You can input your API keys manually via the Settings button in the top-right corner. However, this can become tedious fast. If you'd prefer to not be bothered every time you load ChainForge, you can set them as environment keys. To do so, follow [this guide](https://help.openai.com/en/articles/5112595-best-practices-for-api-key-safety), section 3, "Use Environment Variables in place of your API key." When following the instructions, **swap OPENAI_API_KEY for the alias of your specific model provider, listed below:** - - OpenAI: `OPENAI_API_KEY` - - HuggingFace: `HUGGINGFACE_API_KEY` - - Anthropic: `ANTHROPIC_API_KEY` - - Google PaLM2: `PALM_API_KEY` - - Aleph Alpha: `ALEPH_ALPHA_API_KEY` - - Azure OpenAI: Set two keys, `AZURE_OPENAI_KEY` and `AZURE_OPENAI_ENDPOINT`. Note that the endpoint should look like a base URL. For examples on what these keys look like, see the [Azure OpenAI documentation](https://learn.microsoft.com/en-us/azure/cognitive-services/openai/chatgpt-quickstart?tabs=command-line&pivots=programming-language-javascript). - - When you are done setting the API key(s), **reopen your terminal**. _(This is because the terminal loads the environment variables when it is first opened, so it needs to be refreshed before running `chainforge serve`.)_ - -For instance, to set an OpenAI API key as an environment variable on Macs, do this from the terminal: - ```bash - echo "export OPENAI_API_KEY='yourkey'" >> ~/.zshrc - source ~/.zshrc - echo $OPENAI_API_KEY - ``` -Then, make sure to **reopen your terminal**. - -## Step 3. Check out Examples! - -Click Example Flows to get a sense of what ChainForge is capable of. A popular choice is ground truth evaluations, which use Tabular Data nodes. - ---------------------------------- -# For developers - -Below is a guide for running the beta version of ChainForge directly from source, for people who want to modify, develop or extend it. -Note that these steps may change in the future. - -### Install requirements -Before you can run ChainForge, you need to install dependencies. `cd` into `chainforge` and run - -```bash -pip install -r requirements.txt -``` - -to install requirements. (Ideally, you will run this in a `virtualenv`.) - -To install Node.js requirements, first make sure you have Node.js installed. Then `cd` into `chainforge/react-server` and run: - -```bash -npm install -``` - -> You might run into dependency conflicts. You can re-run with `--force` to force continued installation. - -### Serving ChainForge manually - -To serve ChainForge manually, you have two options: - 1. Run everything from a single Python script, which requires building the React app to static files, or - 2. Serve the React front-end separately from the Flask back-end and take advantage of React hot reloading. - -We recommend the former option for end-users, and the latter for developers. - -#### Option 1: Build React app as static files (end-users) - -`cd` into `react-server` directory and run: - -``` -npm run build -``` - -Wait a moment while it builds the React app to static files. - -#### Option 2: Serve React front-end with hot reloading (developers) - -`cd` into `react-server` directory and run the following to serve the React front-end: - -``` -npm run start -``` - -### Serving the backend - -Regardless of which option you chose, `cd` into the root ChainForge directory and run: - -```bash -python -m chainforge.app serve -``` - -This script spins up a Flask server on port 8000. Note that most of the app logic is fully in the browser, but some calls (like loading API keys as environment variables, or querying Dalai-hosted models or Anthropic API) still go through the Flask server. - -If you built the React app statically, go to `localhost:8000` in a web browser to view the app (ideally in Google Chrome). -If you served the React app with hot reloading with `npm run start`, go to the server address you ran it on (usually `localhost:3000`). - -## Problems? - -Open an [Issue](https://github.com/ianarawjo/ChainForge/issues). - -# Contributing to ChainForge - -If you want to contribute, welcome! Please [fork this repository](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork) and submit a [Pull Request](https://github.com/ianarawjo/ChainForge/pulls) with your changes. - -If you have access to the main repository, we request that you add a branch `dev/` and develop changes from there. When you are ready to push changes, say to address an open Issue, make a Pull Request on the `experimental` repository and assign the main developer (Ian Arawjo) to it.