-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update binarization to be individual params #40168
Conversation
remove required keys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors evaluators to accept individual threshold parameters instead of using a single dictionary-based threshold, improving parameter clarity and type safety.
- Updated QAEvaluator to have separate parameters for groundedness, relevance, coherence, fluency, similarity, and f1_score thresholds.
- Updated ContentSafetyEvaluator to accept individual thresholds for violence, sexual content, self-harm, and hate/unfairness evaluations.
- Updated RougeScoreEvaluator, sample usage, and tests to use individual threshold parameters.
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_qa/_qa.py | Replaced dictionary-based thresholds with individual parameters and updated type checking. |
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_content_safety/_content_safety.py | Modified threshold parameter to individual thresholds with type checking for int. |
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_rouge/_rouge.py | Updated threshold parameters to individual floats with corresponding type checks. |
sdk/evaluation/azure-ai-evaluation/samples/evaluation_samples_threshold.py | Updated usage examples for QAEvaluator and RougeScoreEvaluator to reflect individual thresholds. |
sdk/evaluation/azure-ai-evaluation/tests/unittests/test_evaluators/test_threshold_behavior.py | Updated tests to use individual threshold parameters instead of dictionaries. |
Comments suppressed due to low confidence (1)
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_qa/_qa.py:82
- Consider enforcing a stricter type check for 'f1_score_threshold' (e.g., ensure it is a float) rather than accepting an int, to be consistent with its documented type.
for name, value in [
API change check APIView has identified API level changes in this PR and created following API reviews. |
Description
Please add an informative description that covers that changes made by the pull request and link all relevant issues.
If an SDK is being regenerated based on a new swagger spec, a link to the pull request containing these swagger spec changes has been included above.
All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines