Skip to content

Commit

Permalink
docs: Fix images in Custom Task Evaluation (GITBOOK-749)
Browse files Browse the repository at this point in the history
  • Loading branch information
Eric Xiao authored and gitbook-bot committed Jul 22, 2024
1 parent 70c6487 commit ee7365e
Show file tree
Hide file tree
Showing 5 changed files with 4 additions and 4 deletions.
Binary file added docs/.gitbook/assets/Benchmark Performance.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/.gitbook/assets/Choose LLM.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/.gitbook/assets/Create Template.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/.gitbook/assets/Golden Dataset.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 4 additions & 4 deletions docs/evaluation/concepts-evals/building-your-own-evals.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Then, you need the **golden dataset**. This should be representative of the type

Building such a dataset is laborious, but you can often find a standardized one for the most common use cases (as we did in the code above)

<figure><img src="https://storage.cloud.google.com/arize-assets/phoenix/assets/images/Create_Your_Own_Template_Golden_Dataset.png" alt=""><figcaption><p>Golden Dataset</p></figcaption></figure>
<figure><img src="../../.gitbook/assets/Golden Dataset.png" alt=""><figcaption><p>Build a golden dataset</p></figcaption></figure>

The Eval inferences are designed or easy benchmarking and pre-set downloadable test inferences. The inferences are pre-tested, many are hand crafted and designed for testing specific Eval tasks.

Expand All @@ -37,7 +37,7 @@ df.head()

Then you need to decide **which LLM** you want to use for evaluation. This could be a different LLM from the one you are using for your application. For example, you may be using Llama for your application and GPT-4 for your eval. Often this choice is influenced by questions of cost and accuracy.

<figure><img src="https://storage.cloud.google.com/arize-assets/phoenix/assets/images/Create_Your_Own_Template_Pick_Model.png" alt=""><figcaption><p>Decide your LLM for evaluation</p></figcaption></figure>
<figure><img src="../../.gitbook/assets/Choose LLM.png" alt=""><figcaption><p>Decide on LLM for evaluation</p></figcaption></figure>

### 4. Build the Eval Template

Expand All @@ -51,7 +51,7 @@ Be explicit about the following:
* **What are we asking?** In our example, we’re asking the LLM to tell us if the document was relevant to the query
* **What are the possible output formats?** In our example, it is binary relevant/irrelevant, but it can also be multi-class (e.g., fully relevant, partially relevant, not relevant).

<figure><img src="https://storage.cloud.google.com/arize-assets/phoenix/assets/images/Create_Your_Own_Template.png" alt=""><figcaption><p>Building the eval template</p></figcaption></figure>
<figure><img src="../../.gitbook/assets/Create Template.png" alt=""><figcaption><p>Build eval template</p></figcaption></figure>

In order to create a new template all that is needed is the setting of the input string to the Eval function.

Expand Down Expand Up @@ -94,4 +94,4 @@ MY_CUSTOM_TEMPLATE = PromptTemplate("This is a test {prompt}")

You now need to run the eval across your golden dataset. Then you can **generate metrics** (overall accuracy, precision, recall, F1, etc.) to determine the benchmark. It is important to look at more than just overall accuracy. We’ll discuss that below in more detail.

<figure><img src="https://storage.cloud.google.com/arize-assets/phoenix/assets/images/Create_Your_Own_Template_Benchmark.png" alt=""><figcaption><p>Benchmark performance</p></figcaption></figure>
<figure><img src="../../.gitbook/assets/Benchmark Performance.png" alt=""><figcaption><p>Benchmark performance</p></figcaption></figure>

0 comments on commit ee7365e

Please sign in to comment.