Skip to content

[AI Evaluation] Add EvaluationMetric property for contexts used in evaluation #6033

@shyamnamboodiripad

Description

@shyamnamboodiripad

When looking at scores for metrics like Equivalence and Groundedness in the report, there is currently no way to know the grounding context that was used for the evaluation. We have heard feedback that this makes it harder for anyone who is viewing the report to assess / debug why the corresponding score was low / high.

This issue tracks the following changes to address this -

  • Introduce a first-class property with type Dictionary<string, string>? on EvaluationMetric to store the contexts (if any) that were used as part of the evaluation.
  • Update GroundednessEvaluator and EquivalenceEvaluator to store their respective contexts in this property.
  • Update the generated evaluation report to display the contexts. For now, we can display contexts if available when hovering on the card for a particular metric. But eventually, it should be possible to click on a card for a metric and view the contexts associated with this metric in a details section below the cards.

Metadata

Metadata

Labels

area-ai-evalMicrosoft.Extensions.AI.Evaluation and related

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions