Adding bug bash sample and instructions #40125

singankit · 2025-03-18T20:46:26Z

Description

Please add an informative description that covers that changes made by the pull request and link all relevant issues.

If an SDK is being regenerated based on a new swagger spec, a link to the pull request containing these swagger spec changes has been included above.

All SDK Contribution checklist:

The pull request does not introduce [breaking changes]
CHANGELOG is updated for new features, bug fixes or other significant changes.
I have read the contribution guidelines.

General Guidelines and Best Practices

Title of the pull request is clear and informative.
There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

Pull request includes test coverage for the included changes.

JoseCSantos · 2025-03-18T21:02:18Z

Great PR. Great first step to evaluate the evaluators.

JoseCSantos · 2025-03-18T21:06:08Z

sdk/evaluation/azure-ai-evaluation/samples/bug_bash/agent_evaluation.ipynb

+    "        \"resource_group_name\": \"<resource_group_name>\",\n",
+    "    }\n",
+    ")\n",
+    "pprint(f'AI Foundary URL: {response.get(\"studio_url\")}')"


Maybe best to show a sample output here ?

* Adding bug bash sample and instructions * Updating instructions

* Tool Call Accuracy Evaluator (#40068) * Tool Call Accuracy Evaluator * Review comments * Updating score key and output structure * Tool Call Accuracy Evaluator * Review comments * Updating score key and output structure * Updating prompt * Renaming parameter * Converter from AI Service threads/runs to evaluator-compatible schema (#40047) * WIP AIAgentConverter * Added the v1 of the converter * Updated the AIAgentConverter with different output schemas. * ruff format * Update the top schema to have: query, response, tool_definitions * "agentic" is not a recognized word, change the wording. * System message always comes first in query with multiple runs. * Add support for getting inputs from local files with run_ids. * Export AIAgentConverter through azure.ai.evaluation, local read updates * Use from ._models import * Ruff format again. * For ComputeInstance and AmlCompute update disableLocalAuth property based on ssh_public_access (#39934) * add disableLocalAuth for computeInstance * fix disableLocalAuthAuth issue for amlCompute * update compute instance * update recordings * temp changes * Revert "temp changes" This reverts commit 64e3c38. * update recordings * fix tests * Simplify the API by rolling up the static methods and hiding internals. * Lock the ._converters._ai_services behind an import error. * Print to install azure-ai-projects if we can't import AIAgentConverter * By default, include all previous runs' tool calls and results. * Don't crash if there is no content in historical thread messages. * Parallelize the calls to get step_details for each run_id. * Addressing PR comments. * Use a single underscore to hide internal static members. --------- Co-authored-by: Prashant Dhote <168401122+pdhotems@users.noreply.github.com> * Adding intent_resolution_evaluator to prp/agent_evaluators branch (#40065) * Add intent resolution evaluator * updated intent_resolution evaluator logic * Remove spurious print statements * Address reviewers feedback * add threshold key, update result to pass/fail rather than True/False * Add example + remove repeated fields * Harden check_score_is_valid function * Add Task Adherence and Completeness (#40098) * Agentic Evaluator - Response Completeness * Added Change Log for Response Completeness Agentic Evaluator * Task Adherence Agentic Evaluator * Add Task Adherence Evaluator to changelog * fixing contracts for Completeness and Task Adherence Evaluators * Enhancing Contract for Task Adherence and Response Completeness Agentic Evaluator * update completeness implementation. * update the completeness evaluator response to include threshold comparison. * updating the implementation for completeness. * updating the type for completeness score. * updating the parsing logic for llm output of completeness. * updating the response dict for completeness. * Adding Task adherence * Adding Task Adherence evaluator with samples * Delete old files * updating the exception for completeness evaluator. * Changing docstring * Adding changelog * Use _result_key * Add admonition --------- Co-authored-by: Shiprajain01 <shiprajain01@microsoft.com> Co-authored-by: ShipraJain01 <103409614+ShipraJain01@users.noreply.github.com> Co-authored-by: Chandra Sekhar Gupta Aravapalli <caravapalli@microsoft.com> * Adding bug bash sample and instructions (#40125) * Adding bug bash sample and instructions * Updating instructions * Update instructions.md * Adding instructions and evaluator to agent evaluation sample * add bug bash sample notebook for response completeness evaluator. (#40139) * add bug bash sample notebook for response completeness evaluator. * update the notebook for completeness. --------- Co-authored-by: Chandra Sekhar Gupta Aravapalli <caravapalli@microsoft.com> * Sample specific for tool call accuracy evaluator (#40135) * Update instructions.md * Add IntentResolution evaluator bug bash notebook (#40144) * Add intent resolution evaluator * updated intent_resolution evaluator logic * Remove spurious print statements * Address reviewers feedback * add threshold key, update result to pass/fail rather than True/False * Add example + remove repeated fields * Harden check_score_is_valid function * Sample notebook to demo intent_resolution evaluator * Add synthetic data and section on how to test data from disk * Update instructions.md * Update _tool_call_accuracy.py * Improve task adherence prompt and add sample notebook for bugbash (#40146) * For ComputeInstance and AmlCompute update disableLocalAuth property based on ssh_public_access (#39934) * add disableLocalAuth for computeInstance * fix disableLocalAuthAuth issue for amlCompute * update compute instance * update recordings * temp changes * Revert "temp changes" This reverts commit 64e3c38. * update recordings * fix tests * Add resource prefix for safe secret standard alerts (#40028) Add the prefix to identify RGs that we are creating in our TME tenant to identify them as potentially using local auth and violating our safe secret standards. Co-authored-by: Wes Haggard <Wes.Haggard@microsoft.com> * Add examples to task_adherence prompt. Add Task Adherence sample notebook * Undo changes to New-TestResources.ps1 * Add sample .env file --------- Co-authored-by: Prashant Dhote <168401122+pdhotems@users.noreply.github.com> Co-authored-by: Azure SDK Bot <53356347+azure-sdk@users.noreply.github.com> Co-authored-by: Wes Haggard <Wes.Haggard@microsoft.com> * [AIAgentConverter] Added support for converting entire threads. (#40178) * Implemented prepare_evaluation_data * Add support for retrieving multiple threads into the same file. * Parallelize thread preparing across threads. * Set the maximum number of workers in thread pools to 10. * Users/singankit/tool call accuracy evaluator tests (#40190) * Raising error when tool call not found * Adding unit tests for tool call accuracy evaluator * Updating sample * update output of converter for tool calls * add built-ins * handle file search * remove extra files * revert * revert --------- Co-authored-by: Ankit Singhal <30610298+singankit@users.noreply.github.com> Co-authored-by: Sandy <16922860+thecsw@users.noreply.github.com> Co-authored-by: Prashant Dhote <168401122+pdhotems@users.noreply.github.com> Co-authored-by: Jose Santos <jcsantos@microsoft.com> Co-authored-by: ghyadav <103428325+ghyadav@users.noreply.github.com> Co-authored-by: Shiprajain01 <shiprajain01@microsoft.com> Co-authored-by: ShipraJain01 <103409614+ShipraJain01@users.noreply.github.com> Co-authored-by: Chandra Sekhar Gupta Aravapalli <caravapalli@microsoft.com> Co-authored-by: Ankit Singhal <anksing@microsoft.com> Co-authored-by: Chandra Sekhar Gupta <38103118+guptha23@users.noreply.github.com> Co-authored-by: Azure SDK Bot <53356347+azure-sdk@users.noreply.github.com> Co-authored-by: Wes Haggard <Wes.Haggard@microsoft.com> Co-authored-by: spon <stevenpon@microsoft.com>

Adding bug bash sample and instructions

52de920

singankit marked this pull request as ready for review March 18, 2025 20:46

singankit requested a review from a team as a code owner March 18, 2025 20:46

github-actions bot added the Evaluation Issues related to the client library for Azure AI Evaluation label Mar 18, 2025

JoseCSantos approved these changes Mar 18, 2025

View reviewed changes

Updating instructions

647763b

singankit merged commit b940ea4 into prp/agent_evaluators Mar 18, 2025
3 of 4 checks passed

singankit deleted the users/singankit/prp_bug_bash_sample branch March 18, 2025 22:17

singankit added a commit that referenced this pull request Mar 24, 2025

Adding bug bash sample and instructions (#40125)

86ac038

* Adding bug bash sample and instructions * Updating instructions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding bug bash sample and instructions #40125

Adding bug bash sample and instructions #40125

singankit commented Mar 18, 2025

JoseCSantos commented Mar 18, 2025

JoseCSantos Mar 18, 2025

Adding bug bash sample and instructions #40125

Adding bug bash sample and instructions #40125

Conversation

singankit commented Mar 18, 2025

Description

All SDK Contribution checklist:

General Guidelines and Best Practices

Testing Guidelines

JoseCSantos commented Mar 18, 2025

JoseCSantos Mar 18, 2025

Choose a reason for hiding this comment