-
Notifications
You must be signed in to change notification settings - Fork 182
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adding sample to evaluate groundedness (#142)
* update promptflow-eval dependencies to azure-ai-evaluation * clear local variables * fix errors and remove 'question' col from data * small fix in evaluator config * add groundedness sample * adding and fixing readme
- Loading branch information
1 parent
faaa35d
commit c06c3c7
Showing
3 changed files
with
402 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
53 changes: 53 additions & 0 deletions
53
scenarios/evaluate/simulate_evaluate_groundedness/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
--- | ||
page_type: sample | ||
languages: | ||
- python | ||
products: | ||
- ai-services | ||
- azure-openai | ||
description: Simulator and evaluator for assessing groundedness in custom applications using adversarial questions | ||
--- | ||
|
||
## Simulator and Evaluator for Groundedness (simulate_evaluate_groundedness.ipynb) | ||
|
||
### Overview | ||
|
||
This tutorial provides a step-by-step guide on how to use the simulator and evaluator to assess the groundedness of responses in a custom application. | ||
|
||
### Objective | ||
|
||
The main objective of this tutorial is to help users understand the process of creating and using a simulator and evaluator to test the groundedness of responses in a custom application. By the end of this tutorial, you should be able to: | ||
- Use the simulator to generate adversarial questions | ||
- Run the evaluator to assess the groundedness of the responses | ||
|
||
### Programming Languages | ||
- Python | ||
|
||
### Basic Requirements | ||
|
||
To use Azure AI Safety Evaluation for different scenarios (simulation, annotation, etc.), you need an **Azure AI Project.** You should provide an Azure AI project to run your safety evaluations or simulations with. First, [create an Azure AI hub](https://learn.microsoft.com/en-us/azure/ai-studio/concepts/ai-resources) then [create an Azure AI project](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/create-projects?tabs=ai-studio). You **do not** need to provide your own LLM deployment as the Azure AI Safety Evaluation service hosts adversarial models for both simulation and evaluation of harmful content and connects to it via your Azure AI project. Ensure that your Azure AI project is in one of the supported regions for your desired evaluation metric: | ||
|
||
#### Region Support for Evaluations | ||
|
||
| Region | Hate and Unfairness, Sexual, Violent, Self-Harm, XPIA | Groundedness | Protected Material | | ||
| - | - | - | - | | ||
| UK South | Will be deprecated 12/1/24 | no | no | | ||
| East US 2 | yes | yes | yes | | ||
| Sweden Central | yes | yes | no | | ||
| US North Central | yes | no | no | | ||
| France Central | yes | no | no | | ||
| Switzerland West | yes | no | no | | ||
|
||
For built-in quality and performance metrics, connect your own deployment of LLMs and therefore you can evaluate in any region your deployment is in. | ||
|
||
#### Region Support for Adversarial Simulation | ||
|
||
| Region | Adversarial Simulation | | ||
| - | - | | ||
| UK South | yes | | ||
| East US 2 | yes | | ||
| Sweden Central | yes | | ||
| US North Central | yes | | ||
| France Central | yes | | ||
|
||
### Estimated Runtime: 20 mins |
Oops, something went wrong.