Skip to content

Commit

Permalink
Update sample documentation and add column_mapping (#140)
Browse files Browse the repository at this point in the history
* update promptflow-eval dependencies to azure-ai-evaluation

* clear local variables

* fix errors and remove 'question' col from data

* small fix in evaluator config

* update docs and add column mapping

* pre-commit fixes
  • Loading branch information
slister1001 authored Oct 29, 2024
1 parent 933d29e commit af861e2
Show file tree
Hide file tree
Showing 8 changed files with 89 additions and 17 deletions.
50 changes: 50 additions & 0 deletions scenarios/evaluate/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
page_type: sample
languages:
- python
products:
- ai-services
- azure-openai
description: Evaluate.
---

## Evaluate

### Overview

This tutorial provides a step-by-step guide on how to evaluate Generative AI models with Azure. Each of these samples uses the `azure-ai-evaluation` SDK.

### Objective

The main objective of this tutorial is to help users understand the process of evaluating an AI model in Azure. By the end of this tutorial, you should be able to:

- Simulate interactions with an AI model
- Evaluate both deployed model endpoints and applications
- Evaluate using quantitative NLP metrics, qualitative metrics, and custom metrics

Our samples cover the following tools for evaluation of AI models in Azure:

| Sample name | adversarial | simulator | conversation starter | index | raw text | against model endpoint | against app | qualitative metrics | custom metrics | quantitative NLP metrics |
|----------------------------------------|-------------|-----------|---------------------|-------|----------|-----------------------|-------------|---------------------|----------------|----------------------|
| simulate_adversarial.ipynb | X | X | | | | X | | | | |
| simulate_conversation_starter.ipynb | | X | X | | | X | | | | |
| simulate_input_index.ipynb | | X | | X | | X | | | | |
| simulate_input_text.ipynb | | X | | | X | X | | | | |
| evaluate_endpoints.ipynb | | | | | | X | | X | | |
| evaluate_app.ipynb | | | | | | | X | X | | |
| evaluate_qualitative.ipynb | | | | | | X | | X | | |
| evaluate_custom.ipynb | | | | | | X | | | X | |
| evaluate_quantitative.ipynb | | | | | | X | | | | X |
| evaluate_safety_risk.ipynb | X | | | | | X | | | | |
| simulate_and_evaluate_endpoint.py | | X | | | X | X | | X | | |



### Pre-requisites
To use the `azure-ai-evaluation` SDK, install with```pythonpip install azure-ai-evaluation```Python 3.8 or later is required to use this package.- See our Python reference documentation for our `azure-ai-evaluation` SDK[here](https://aka.ms/azureaieval-python-ref) for more granular details oninput/output requirements and usage instructions.- Check out our Github repo for `azure-ai-evaluation` SDK [here]( https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/evaluation/azure-ai-evaluation).


### Programming Languages
- Python

### Estimated Runtime: 30 mins
Original file line number Diff line number Diff line change
Expand Up @@ -375,7 +375,13 @@
" \"relevance\": relevance_evaluator,\n",
" },\n",
" evaluator_config={\n",
" \"relevance\": {\"response\": \"${target.response}\", \"context\": \"${data.context}\", \"query\": \"${data.query}\"},\n",
" \"relevance\": {\n",
" \"column_mapping\": {\n",
" \"response\": \"${target.response}\",\n",
" \"context\": \"${data.context}\",\n",
" \"query\": \"${data.query}\",\n",
" },\n",
" },\n",
" },\n",
" )"
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,16 @@ products:
description: Evaluate with quantitative evaluators
---

## Evaluate with math evaluators
## Evaluate with quantitative NLP evaluators

### Overview

This notebook demonstrates how to use math-based evaluators to assess the quality of generated text by comparing it to reference text.
This notebook demonstrates how to use NLP-based evaluators to assess the quality of generated text by comparing it to reference text.

### Objective

The primary goal of this tutorial is to guide users in leveraging the `azure-ai-evaluation` SDK to evaluate datasets using various math metrics. By the end of this tutorial, you'll be able to:
- Understand different math evaluators such as `BleuScoreEvaluator`, `GleuScoreEvaluator`, `MeteorScoreEvaluator`, and `RougeScoreEvaluator`.
The primary goal of this tutorial is to guide users in leveraging the `azure-ai-evaluation` SDK to evaluate datasets using various NLP metrics. By the end of this tutorial, you'll be able to:
- Understand different NLP evaluators such as `BleuScoreEvaluator`, `GleuScoreEvaluator`, `MeteorScoreEvaluator`, and `RougeScoreEvaluator`.
- Evaluate dataset using these evaluators.

### Programming Languages
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Evaluate with quantitative evaluators\n",
"# Evaluate with quantitative NLP evaluators\n",
"\n",
"## Objective\n",
"This notebook demonstrates how to use math-based evaluators to assess the quality of generated text by comparing it to reference text. By the end of this tutorial, you'll be able to:\n",
" - Understand different math evaluators such as `BleuScoreEvaluator`, `GleuScoreEvaluator`, `MeteorScoreEvaluator`, and `RougeScoreEvaluator`.\n",
"This notebook demonstrates how to use NLP-based evaluators to assess the quality of generated text by comparing it to reference text. By the end of this tutorial, you'll be able to:\n",
" - Understand different NLP evaluators such as `BleuScoreEvaluator`, `GleuScoreEvaluator`, `MeteorScoreEvaluator`, and `RougeScoreEvaluator`.\n",
" - Evaluate dataset using these evaluators.\n",
"\n",
"## Time\n",
Expand All @@ -34,7 +34,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Math Evaluators"
"## NLP Evaluators"
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -256,16 +256,24 @@
" \"similarity\": similarity_evaluator,\n",
" },\n",
" evaluator_config={\n",
" \"content_safety\": {\"query\": \"${data.query}\", \"response\": \"${target.response}\"},\n",
" \"coherence\": {\"response\": \"${target.response}\", \"query\": \"${data.query}\"},\n",
" \"relevance\": {\"response\": \"${target.response}\", \"context\": \"${data.context}\", \"query\": \"${data.query}\"},\n",
" \"content_safety\": {\"column_mapping\": {\"query\": \"${data.query}\", \"response\": \"${target.response}\"}},\n",
" \"coherence\": {\"column_mapping\": {\"response\": \"${target.response}\", \"query\": \"${data.query}\"}},\n",
" \"relevance\": {\n",
" \"column_mapping\": {\"response\": \"${target.response}\", \"context\": \"${data.context}\", \"query\": \"${data.query}\"}\n",
" },\n",
" \"groundedness\": {\n",
" \"response\": \"${target.response}\",\n",
" \"context\": \"${data.context}\",\n",
" \"query\": \"${data.query}\",\n",
" \"column_mapping\": {\n",
" \"response\": \"${target.response}\",\n",
" \"context\": \"${data.context}\",\n",
" \"query\": \"${data.query}\",\n",
" }\n",
" },\n",
" \"fluency\": {\n",
" \"column_mapping\": {\"response\": \"${target.response}\", \"context\": \"${data.context}\", \"query\": \"${data.query}\"}\n",
" },\n",
" \"similarity\": {\n",
" \"column_mapping\": {\"response\": \"${target.response}\", \"context\": \"${data.context}\", \"query\": \"${data.query}\"}\n",
" },\n",
" \"fluency\": {\"response\": \"${target.response}\", \"context\": \"${data.context}\", \"query\": \"${data.query}\"},\n",
" \"similarity\": {\"response\": \"${target.response}\", \"context\": \"${data.context}\", \"query\": \"${data.query}\"},\n",
" },\n",
")"
]
Expand Down
4 changes: 4 additions & 0 deletions scenarios/evaluate/evaluate_safety_risk/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,10 @@ The main objective of this tutorial is to help users understand how to use the a
- Evaluate the generated dataset for Protected Material and Indirect Attack Jailbreak vulnerability
- Use Azure AI Content Safety filter prompts to mitigate found vulnerabilities

### Basic requirements

To use Azure AI Safety Evaluation for different scenarios(simulation, annotation, etc..), you need an **Azure AI Project.** You should provide Azure AI project to run your safety evaluations or simulations with. First[create an Azure AI hub](https://learn.microsoft.com/en-us/azure/ai-studio/concepts/ai-resources)then [create an Azure AI project]( https://learn.microsoft.com/en-us/azure/ai-studio/how-to/create-projects?tabs=ai-studio).You **do not** need to provide your own LLM deployment as the Azure AI Safety Evaluation servicehosts adversarial models for both simulation and evaluation of harmful content andconnects to it via your Azure AI project.Ensure that your Azure AI project is in one of the supported regions for your desiredevaluation metric:#### Region support for evaluations| Region | Hate and unfairness, sexual, violent, self-harm, XPIA | Groundedness | Protected material || - | - | - | - ||UK South | Will be deprecated 12/1/24| no | no ||East US 2 | yes| yes | yes ||Sweden Central | yes| yes | no|US North Central | yes| no | no ||France Central | yes| no | no ||SwitzerlandWest| yes | no |no|For built-in quality and performance metrics, connect your own deployment of LLMs and therefore youcan evaluate in any region your deployment is in.#### Region support for adversarial simulation| Region | Adversarial simulation || - | - ||UK South | yes||East US 2 | yes||Sweden Central | yes||US North Central | yes||France Central | yes|

### Programming Languages
- Python

Expand Down
4 changes: 4 additions & 0 deletions scenarios/evaluate/simulate_adversarial/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,8 @@ By the end of this tutorial, you should be able to:
### Programming Languages
- Python

### Basic requirements

To use Azure AI Safety Evaluation for different scenarios(simulation, annotation, etc..), you need an **Azure AI Project.** You should provide Azure AI project to run your safety evaluations or simulations with. First[create an Azure AI hub](https://learn.microsoft.com/en-us/azure/ai-studio/concepts/ai-resources)then [create an Azure AI project]( https://learn.microsoft.com/en-us/azure/ai-studio/how-to/create-projects?tabs=ai-studio).You **do not** need to provide your own LLM deployment as the Azure AI Safety Evaluation servicehosts adversarial models for both simulation and evaluation of harmful content andconnects to it via your Azure AI project.Ensure that your Azure AI project is in one of the supported regions for your desiredevaluation metric:#### Region support for evaluations| Region | Hate and unfairness, sexual, violent, self-harm, XPIA | Groundedness | Protected material || - | - | - | - ||UK South | Will be deprecated 12/1/24| no | no ||East US 2 | yes| yes | yes ||Sweden Central | yes| yes | no|US North Central | yes| no | no ||France Central | yes| no | no ||SwitzerlandWest| yes | no |no|For built-in quality and performance metrics, connect your own deployment of LLMs and therefore youcan evaluate in any region your deployment is in.#### Region support for adversarial simulation| Region | Adversarial simulation || - | - ||UK South | yes||East US 2 | yes||Sweden Central | yes||US North Central | yes||France Central | yes|

### Estimated Runtime: 20 mins

0 comments on commit af861e2

Please sign in to comment.