Adding intent_resolution_evaluator to prp/agent_evaluators branch #40065

JoseCSantos · 2025-03-13T18:56:28Z

Description

Please add an informative description that covers that changes made by the pull request and link all relevant issues.

If an SDK is being regenerated based on a new swagger spec, a link to the pull request containing these swagger spec changes has been included above.

All SDK Contribution checklist:

The pull request does not introduce [breaking changes]
CHANGELOG is updated for new features, bug fixes or other significant changes.
I have read the contribution guidelines.

General Guidelines and Best Practices

Title of the pull request is clear and informative.
There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

Pull request includes test coverage for the included changes.

…valuators

github-actions · 2025-03-13T18:56:52Z

Thank you for your contribution @JoseCSantos! We will review the pull request and get back to you soon.

...azure-ai-evaluation/azure/ai/evaluation/_evaluators/_intent_resolution/_intent_resolution.py

...e-ai-evaluation/azure/ai/evaluation/_evaluators/_intent_resolution/intent_resolution.prompty

...azure-ai-evaluation/azure/ai/evaluation/_evaluators/_intent_resolution/_intent_resolution.py

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_common/utils.py

...azure-ai-evaluation/azure/ai/evaluation/_evaluators/_intent_resolution/_intent_resolution.py

stevepon · 2025-03-17T22:59:30Z

...azure-ai-evaluation/azure/ai/evaluation/_evaluators/_intent_resolution/_intent_resolution.py

+        Example with list of messages:
+            evaluator = IntentResolutionEvaluator(model_config)
+            query: [{'role': 'system', 'content': 'You are a friendly and helpful customer service agent.'}, {'createdAt': 1700000060, 'role': 'user', 'content': [{'type': 'text', 'text': 'Hi, I need help with the last 2 orders on my account #888. Could you please update me on their status?'}]}]
+            response: [{'createdAt': 1700000070, 'run_id': '0', 'role': 'assistant', 'content': [{'type': 'text', 'text': 'Hello! Let me quickly look up your account details.'}]}, {'createdAt': 1700000075, 'run_id': '0', 'role': 'assistant', 'content': [{'type': 'tool_call', 'tool_call': {'id': 'tool_call_20250310_001', 'type': 'function', 'function': {'name': 'get_orders', 'arguments': {'account_number': '888'}}}}]}, {'createdAt': 1700000080, 'run_id': '0', 'tool_call_id': 'tool_call_20250310_001', 'role': 'tool', 'content': [{'type': 'tool_result', 'tool_result': '[{ "order_id": "123" }, { "order_id": "124" }]'}]}, {'createdAt': 1700000085, 'run_id': '0', 'role': 'assistant', 'content': [{'type': 'text', 'text': 'Thanks for your patience. I see two orders on your account. Let me fetch the details for both.'}]}, {'createdAt': 1700000090, 'run_id': '0', 'role': 'assistant', 'content': [{'type': 'tool_call', 'tool_call': {'id': 'tool_call_20250310_002', 'type': 'function', 'function': {'name': 'get_order', 'arguments': {'order_id': '123'}}}}, {'type': 'tool_call', 'tool_call': {'id': 'tool_call_20250310_003', 'type': 'function', 'function': {'name': 'get_order', 'arguments': {'order_id': '124'}}}}]}, {'createdAt': 1700000095, 'run_id': '0', 'tool_call_id': 'tool_call_20250310_002', 'role': 'tool', 'content': [{'type': 'tool_result', 'tool_result': '{ "order": { "id": "123", "status": "shipped", "delivery_date": "2025-03-15" } }'}]}, {'createdAt': 1700000100, 'run_id': '0', 'tool_call_id': 'tool_call_20250310_003', 'role': 'tool', 'content': [{'type': 'tool_result', 'tool_result': '{ "order": { "id": "124", "status": "delayed", "expected_delivery": "2025-03-20" } }'}]}, {'createdAt': 1700000105, 'run_id': '0', 'role': 'assistant', 'content': [{'type': 'text', 'text': 'The order with ID 123 has been shipped and is expected to be delivered on March 15, 2025. However, the order with ID 124 is delayed and should now arrive by March 20, 2025. Is there anything else I can help you with?'}]}]


I've mentioned this elsewhere but maybe not on this PR. This is a nit, does not need to be done now. But I think for examples we should try to use formatted datetime instead of unixtime. The LLMs will understand it much better, and since users are liable to follow any examples we provide, we should try to set a good example in that respect.

…0065) * Add intent resolution evaluator * updated intent_resolution evaluator logic * Remove spurious print statements * Address reviewers feedback * add threshold key, update result to pass/fail rather than True/False * Add example + remove repeated fields * Harden check_score_is_valid function

* Tool Call Accuracy Evaluator (#40068) * Tool Call Accuracy Evaluator * Review comments * Updating score key and output structure * Tool Call Accuracy Evaluator * Review comments * Updating score key and output structure * Updating prompt * Renaming parameter * Converter from AI Service threads/runs to evaluator-compatible schema (#40047) * WIP AIAgentConverter * Added the v1 of the converter * Updated the AIAgentConverter with different output schemas. * ruff format * Update the top schema to have: query, response, tool_definitions * "agentic" is not a recognized word, change the wording. * System message always comes first in query with multiple runs. * Add support for getting inputs from local files with run_ids. * Export AIAgentConverter through azure.ai.evaluation, local read updates * Use from ._models import * Ruff format again. * For ComputeInstance and AmlCompute update disableLocalAuth property based on ssh_public_access (#39934) * add disableLocalAuth for computeInstance * fix disableLocalAuthAuth issue for amlCompute * update compute instance * update recordings * temp changes * Revert "temp changes" This reverts commit 64e3c38. * update recordings * fix tests * Simplify the API by rolling up the static methods and hiding internals. * Lock the ._converters._ai_services behind an import error. * Print to install azure-ai-projects if we can't import AIAgentConverter * By default, include all previous runs' tool calls and results. * Don't crash if there is no content in historical thread messages. * Parallelize the calls to get step_details for each run_id. * Addressing PR comments. * Use a single underscore to hide internal static members. --------- Co-authored-by: Prashant Dhote <168401122+pdhotems@users.noreply.github.com> * Adding intent_resolution_evaluator to prp/agent_evaluators branch (#40065) * Add intent resolution evaluator * updated intent_resolution evaluator logic * Remove spurious print statements * Address reviewers feedback * add threshold key, update result to pass/fail rather than True/False * Add example + remove repeated fields * Harden check_score_is_valid function * Add Task Adherence and Completeness (#40098) * Agentic Evaluator - Response Completeness * Added Change Log for Response Completeness Agentic Evaluator * Task Adherence Agentic Evaluator * Add Task Adherence Evaluator to changelog * fixing contracts for Completeness and Task Adherence Evaluators * Enhancing Contract for Task Adherence and Response Completeness Agentic Evaluator * update completeness implementation. * update the completeness evaluator response to include threshold comparison. * updating the implementation for completeness. * updating the type for completeness score. * updating the parsing logic for llm output of completeness. * updating the response dict for completeness. * Adding Task adherence * Adding Task Adherence evaluator with samples * Delete old files * updating the exception for completeness evaluator. * Changing docstring * Adding changelog * Use _result_key * Add admonition --------- Co-authored-by: Shiprajain01 <shiprajain01@microsoft.com> Co-authored-by: ShipraJain01 <103409614+ShipraJain01@users.noreply.github.com> Co-authored-by: Chandra Sekhar Gupta Aravapalli <caravapalli@microsoft.com> * Adding bug bash sample and instructions (#40125) * Adding bug bash sample and instructions * Updating instructions * Update instructions.md * Adding instructions and evaluator to agent evaluation sample * add bug bash sample notebook for response completeness evaluator. (#40139) * add bug bash sample notebook for response completeness evaluator. * update the notebook for completeness. --------- Co-authored-by: Chandra Sekhar Gupta Aravapalli <caravapalli@microsoft.com> * Sample specific for tool call accuracy evaluator (#40135) * Update instructions.md * Add IntentResolution evaluator bug bash notebook (#40144) * Add intent resolution evaluator * updated intent_resolution evaluator logic * Remove spurious print statements * Address reviewers feedback * add threshold key, update result to pass/fail rather than True/False * Add example + remove repeated fields * Harden check_score_is_valid function * Sample notebook to demo intent_resolution evaluator * Add synthetic data and section on how to test data from disk * Update instructions.md * Update _tool_call_accuracy.py * Improve task adherence prompt and add sample notebook for bugbash (#40146) * For ComputeInstance and AmlCompute update disableLocalAuth property based on ssh_public_access (#39934) * add disableLocalAuth for computeInstance * fix disableLocalAuthAuth issue for amlCompute * update compute instance * update recordings * temp changes * Revert "temp changes" This reverts commit 64e3c38. * update recordings * fix tests * Add resource prefix for safe secret standard alerts (#40028) Add the prefix to identify RGs that we are creating in our TME tenant to identify them as potentially using local auth and violating our safe secret standards. Co-authored-by: Wes Haggard <Wes.Haggard@microsoft.com> * Add examples to task_adherence prompt. Add Task Adherence sample notebook * Undo changes to New-TestResources.ps1 * Add sample .env file --------- Co-authored-by: Prashant Dhote <168401122+pdhotems@users.noreply.github.com> Co-authored-by: Azure SDK Bot <53356347+azure-sdk@users.noreply.github.com> Co-authored-by: Wes Haggard <Wes.Haggard@microsoft.com> * [AIAgentConverter] Added support for converting entire threads. (#40178) * Implemented prepare_evaluation_data * Add support for retrieving multiple threads into the same file. * Parallelize thread preparing across threads. * Set the maximum number of workers in thread pools to 10. * Users/singankit/tool call accuracy evaluator tests (#40190) * Raising error when tool call not found * Adding unit tests for tool call accuracy evaluator * Updating sample * update output of converter for tool calls * add built-ins * handle file search * remove extra files * revert * revert --------- Co-authored-by: Ankit Singhal <30610298+singankit@users.noreply.github.com> Co-authored-by: Sandy <16922860+thecsw@users.noreply.github.com> Co-authored-by: Prashant Dhote <168401122+pdhotems@users.noreply.github.com> Co-authored-by: Jose Santos <jcsantos@microsoft.com> Co-authored-by: ghyadav <103428325+ghyadav@users.noreply.github.com> Co-authored-by: Shiprajain01 <shiprajain01@microsoft.com> Co-authored-by: ShipraJain01 <103409614+ShipraJain01@users.noreply.github.com> Co-authored-by: Chandra Sekhar Gupta Aravapalli <caravapalli@microsoft.com> Co-authored-by: Ankit Singhal <anksing@microsoft.com> Co-authored-by: Chandra Sekhar Gupta <38103118+guptha23@users.noreply.github.com> Co-authored-by: Azure SDK Bot <53356347+azure-sdk@users.noreply.github.com> Co-authored-by: Wes Haggard <Wes.Haggard@microsoft.com> Co-authored-by: spon <stevenpon@microsoft.com>

JoseCSantos added 4 commits March 13, 2025 18:37

Add intent resolution evaluator

0c8fca1

updated intent_resolution evaluator logic

2497aec

Added initial version of intent_resolution_evaluator into prp/agent_e…

23e6027

…valuators

Remove spurious print statements

9b996ef

JoseCSantos requested a review from a team as a code owner March 13, 2025 18:56

github-actions bot added Community Contribution Community members are working on the issue customer-reported Issues that are reported by GitHub users external to the Azure organization. Evaluation Issues related to the client library for Azure AI Evaluation labels Mar 13, 2025

stevepon reviewed Mar 13, 2025

View reviewed changes

...azure-ai-evaluation/azure/ai/evaluation/_evaluators/_intent_resolution/_intent_resolution.py Outdated Show resolved Hide resolved

stevepon reviewed Mar 13, 2025

View reviewed changes

...azure-ai-evaluation/azure/ai/evaluation/_evaluators/_intent_resolution/_intent_resolution.py Outdated Show resolved Hide resolved

stevepon reviewed Mar 13, 2025

View reviewed changes

...azure-ai-evaluation/azure/ai/evaluation/_evaluators/_intent_resolution/_intent_resolution.py Outdated Show resolved Hide resolved

stevepon reviewed Mar 13, 2025

View reviewed changes

...azure-ai-evaluation/azure/ai/evaluation/_evaluators/_intent_resolution/_intent_resolution.py Outdated Show resolved Hide resolved

stevepon reviewed Mar 13, 2025

View reviewed changes

...azure-ai-evaluation/azure/ai/evaluation/_evaluators/_intent_resolution/_intent_resolution.py Outdated Show resolved Hide resolved

stevepon reviewed Mar 13, 2025

View reviewed changes

...azure-ai-evaluation/azure/ai/evaluation/_evaluators/_intent_resolution/_intent_resolution.py Outdated Show resolved Hide resolved

stevepon reviewed Mar 13, 2025

View reviewed changes

...azure-ai-evaluation/azure/ai/evaluation/_evaluators/_intent_resolution/_intent_resolution.py Outdated Show resolved Hide resolved

stevepon reviewed Mar 13, 2025

View reviewed changes

...e-ai-evaluation/azure/ai/evaluation/_evaluators/_intent_resolution/intent_resolution.prompty Outdated Show resolved Hide resolved

stevepon reviewed Mar 13, 2025

View reviewed changes

...e-ai-evaluation/azure/ai/evaluation/_evaluators/_intent_resolution/intent_resolution.prompty Outdated Show resolved Hide resolved

stevepon reviewed Mar 13, 2025

View reviewed changes

...e-ai-evaluation/azure/ai/evaluation/_evaluators/_intent_resolution/intent_resolution.prompty Outdated Show resolved Hide resolved

stevepon reviewed Mar 13, 2025

View reviewed changes

...e-ai-evaluation/azure/ai/evaluation/_evaluators/_intent_resolution/intent_resolution.prompty Show resolved Hide resolved

stevepon reviewed Mar 13, 2025

View reviewed changes

...e-ai-evaluation/azure/ai/evaluation/_evaluators/_intent_resolution/intent_resolution.prompty Show resolved Hide resolved

stevepon reviewed Mar 13, 2025

View reviewed changes

...e-ai-evaluation/azure/ai/evaluation/_evaluators/_intent_resolution/intent_resolution.prompty Outdated Show resolved Hide resolved

kristapratico removed customer-reported Issues that are reported by GitHub users external to the Azure organization. Community Contribution Community members are working on the issue labels Mar 13, 2025

singankit reviewed Mar 13, 2025

View reviewed changes

...azure-ai-evaluation/azure/ai/evaluation/_evaluators/_intent_resolution/_intent_resolution.py Outdated Show resolved Hide resolved

singankit reviewed Mar 13, 2025

View reviewed changes

...azure-ai-evaluation/azure/ai/evaluation/_evaluators/_intent_resolution/_intent_resolution.py Outdated Show resolved Hide resolved

singankit reviewed Mar 13, 2025

View reviewed changes

...azure-ai-evaluation/azure/ai/evaluation/_evaluators/_intent_resolution/_intent_resolution.py Outdated Show resolved Hide resolved

singankit reviewed Mar 13, 2025

View reviewed changes

...azure-ai-evaluation/azure/ai/evaluation/_evaluators/_intent_resolution/_intent_resolution.py Outdated Show resolved Hide resolved

singankit reviewed Mar 13, 2025

View reviewed changes

...azure-ai-evaluation/azure/ai/evaluation/_evaluators/_intent_resolution/_intent_resolution.py Outdated Show resolved Hide resolved

singankit reviewed Mar 13, 2025

View reviewed changes

...azure-ai-evaluation/azure/ai/evaluation/_evaluators/_intent_resolution/_intent_resolution.py Outdated Show resolved Hide resolved

singankit reviewed Mar 13, 2025

View reviewed changes

...azure-ai-evaluation/azure/ai/evaluation/_evaluators/_intent_resolution/_intent_resolution.py Outdated Show resolved Hide resolved

singankit reviewed Mar 13, 2025

View reviewed changes

...azure-ai-evaluation/azure/ai/evaluation/_evaluators/_intent_resolution/_intent_resolution.py Outdated Show resolved Hide resolved

Address reviewers feedback

ab09de4

stevepon reviewed Mar 14, 2025

View reviewed changes

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_common/utils.py Outdated Show resolved Hide resolved

stevepon reviewed Mar 14, 2025

View reviewed changes

...azure-ai-evaluation/azure/ai/evaluation/_evaluators/_intent_resolution/_intent_resolution.py Show resolved Hide resolved

stevepon reviewed Mar 14, 2025

View reviewed changes

...azure-ai-evaluation/azure/ai/evaluation/_evaluators/_intent_resolution/_intent_resolution.py Show resolved Hide resolved

stevepon reviewed Mar 14, 2025

View reviewed changes

...azure-ai-evaluation/azure/ai/evaluation/_evaluators/_intent_resolution/_intent_resolution.py Show resolved Hide resolved

changliu2 reviewed Mar 15, 2025

View reviewed changes

...azure-ai-evaluation/azure/ai/evaluation/_evaluators/_intent_resolution/_intent_resolution.py Outdated Show resolved Hide resolved

JoseCSantos added 4 commits March 15, 2025 00:47

add threshold key, update result to pass/fail rather than True/False

076c7d1

Add example + remove repeated fields

30fb73b

Harden check_score_is_valid function

e624022

Merge branch 'prp/agent_evaluators' into prp/agent_evaluators

956e01a

stevepon reviewed Mar 17, 2025

View reviewed changes

stevepon approved these changes Mar 17, 2025

View reviewed changes

singankit approved these changes Mar 17, 2025

View reviewed changes

singankit merged commit 0bee1d5 into Azure:prp/agent_evaluators Mar 17, 2025
3 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding intent_resolution_evaluator to prp/agent_evaluators branch #40065

Adding intent_resolution_evaluator to prp/agent_evaluators branch #40065

JoseCSantos commented Mar 13, 2025 •

edited

Loading

github-actions bot commented Mar 13, 2025

stevepon Mar 17, 2025

Adding intent_resolution_evaluator to prp/agent_evaluators branch #40065

Adding intent_resolution_evaluator to prp/agent_evaluators branch #40065

Conversation

JoseCSantos commented Mar 13, 2025 • edited Loading

Description

All SDK Contribution checklist:

General Guidelines and Best Practices

Testing Guidelines

github-actions bot commented Mar 13, 2025

stevepon Mar 17, 2025

Choose a reason for hiding this comment

JoseCSantos commented Mar 13, 2025 •

edited

Loading