Skip to content

Commit

Permalink
Adding sample to evaluate groundedness (#142)
Browse files Browse the repository at this point in the history
* update promptflow-eval dependencies to azure-ai-evaluation

* clear local variables

* fix errors and remove 'question' col from data

* small fix in evaluator config

* add groundedness sample

* adding and fixing readme
  • Loading branch information
slister1001 authored Nov 11, 2024
1 parent faaa35d commit c06c3c7
Show file tree
Hide file tree
Showing 3 changed files with 402 additions and 1 deletion.
24 changes: 23 additions & 1 deletion scenarios/evaluate/simulate_adversarial/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,28 @@ By the end of this tutorial, you should be able to:

### Basic requirements

To use Azure AI Safety Evaluation for different scenarios(simulation, annotation, etc..), you need an **Azure AI Project.** You should provide Azure AI project to run your safety evaluations or simulations with. First[create an Azure AI hub](https://learn.microsoft.com/en-us/azure/ai-studio/concepts/ai-resources)then [create an Azure AI project]( https://learn.microsoft.com/en-us/azure/ai-studio/how-to/create-projects?tabs=ai-studio).You **do not** need to provide your own LLM deployment as the Azure AI Safety Evaluation servicehosts adversarial models for both simulation and evaluation of harmful content andconnects to it via your Azure AI project.Ensure that your Azure AI project is in one of the supported regions for your desiredevaluation metric:#### Region support for evaluations| Region | Hate and unfairness, sexual, violent, self-harm, XPIA | Groundedness | Protected material || - | - | - | - ||UK South | Will be deprecated 12/1/24| no | no ||East US 2 | yes| yes | yes ||Sweden Central | yes| yes | no|US North Central | yes| no | no ||France Central | yes| no | no ||SwitzerlandWest| yes | no |no|For built-in quality and performance metrics, connect your own deployment of LLMs and therefore youcan evaluate in any region your deployment is in.#### Region support for adversarial simulation| Region | Adversarial simulation || - | - ||UK South | yes||East US 2 | yes||Sweden Central | yes||US North Central | yes||France Central | yes|
To use Azure AI Safety Evaluation for different scenarios(simulation, annotation, etc..), you need an **Azure AI Project.** You should provide Azure AI project to run your safety evaluations or simulations with. First[create an Azure AI hub](https://learn.microsoft.com/en-us/azure/ai-studio/concepts/ai-resources)then [create an Azure AI project]( https://learn.microsoft.com/en-us/azure/ai-studio/how-to/create-projects?tabs=ai-studio).You **do not** need to provide your own LLM deployment as the Azure AI Safety Evaluation servicehosts adversarial models for both simulation and evaluation of harmful content andconnects to it via your Azure AI project.Ensure that your Azure AI project is in one of the supported regions for your desiredevaluation metric:

#### Region support for evaluations

| Region | Hate and unfairness, sexual, violent, self-harm, XPIA | Groundedness | Protected material |
| - | - | - | - |
|UK South | Will be deprecated 12/1/24| no | no |
|East US 2 | yes| yes | yes |
|Sweden Central | yes| yes | no|
|US North Central | yes| no | no |
|France Central | yes| no | no |
|SwitzerlandWest| yes | no |no|

For built-in quality and performance metrics, connect your own deployment of LLMs and therefore youcan evaluate in any region your deployment is in.

#### Region support for adversarial simulation
| Region | Adversarial simulation |
| - | - |
|UK South | yes|
|East US 2 | yes|
|Sweden Central | yes|
|US North Central | yes|
|France Central | yes|

### Estimated Runtime: 20 mins
53 changes: 53 additions & 0 deletions scenarios/evaluate/simulate_evaluate_groundedness/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
---
page_type: sample
languages:
- python
products:
- ai-services
- azure-openai
description: Simulator and evaluator for assessing groundedness in custom applications using adversarial questions
---

## Simulator and Evaluator for Groundedness (simulate_evaluate_groundedness.ipynb)

### Overview

This tutorial provides a step-by-step guide on how to use the simulator and evaluator to assess the groundedness of responses in a custom application.

### Objective

The main objective of this tutorial is to help users understand the process of creating and using a simulator and evaluator to test the groundedness of responses in a custom application. By the end of this tutorial, you should be able to:
- Use the simulator to generate adversarial questions
- Run the evaluator to assess the groundedness of the responses

### Programming Languages
- Python

### Basic Requirements

To use Azure AI Safety Evaluation for different scenarios (simulation, annotation, etc.), you need an **Azure AI Project.** You should provide an Azure AI project to run your safety evaluations or simulations with. First, [create an Azure AI hub](https://learn.microsoft.com/en-us/azure/ai-studio/concepts/ai-resources) then [create an Azure AI project](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/create-projects?tabs=ai-studio). You **do not** need to provide your own LLM deployment as the Azure AI Safety Evaluation service hosts adversarial models for both simulation and evaluation of harmful content and connects to it via your Azure AI project. Ensure that your Azure AI project is in one of the supported regions for your desired evaluation metric:

#### Region Support for Evaluations

| Region | Hate and Unfairness, Sexual, Violent, Self-Harm, XPIA | Groundedness | Protected Material |
| - | - | - | - |
| UK South | Will be deprecated 12/1/24 | no | no |
| East US 2 | yes | yes | yes |
| Sweden Central | yes | yes | no |
| US North Central | yes | no | no |
| France Central | yes | no | no |
| Switzerland West | yes | no | no |

For built-in quality and performance metrics, connect your own deployment of LLMs and therefore you can evaluate in any region your deployment is in.

#### Region Support for Adversarial Simulation

| Region | Adversarial Simulation |
| - | - |
| UK South | yes |
| East US 2 | yes |
| Sweden Central | yes |
| US North Central | yes |
| France Central | yes |

### Estimated Runtime: 20 mins
Loading

0 comments on commit c06c3c7

Please sign in to comment.