You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Tool Call Accuracy Evaluator (#40068)
* Tool Call Accuracy Evaluator
* Review comments
* Updating score key and output structure
* Tool Call Accuracy Evaluator
* Review comments
* Updating score key and output structure
* Updating prompt
* Renaming parameter
* Converter from AI Service threads/runs to evaluator-compatible schema (#40047)
* WIP AIAgentConverter
* Added the v1 of the converter
* Updated the AIAgentConverter with different output schemas.
* ruff format
* Update the top schema to have: query, response, tool_definitions
* "agentic" is not a recognized word, change the wording.
* System message always comes first in query with multiple runs.
* Add support for getting inputs from local files with run_ids.
* Export AIAgentConverter through azure.ai.evaluation, local read updates
* Use from ._models import
* Ruff format again.
* For ComputeInstance and AmlCompute update disableLocalAuth property based on ssh_public_access (#39934)
* add disableLocalAuth for computeInstance
* fix disableLocalAuthAuth issue for amlCompute
* update compute instance
* update recordings
* temp changes
* Revert "temp changes"
This reverts commit 64e3c38.
* update recordings
* fix tests
* Simplify the API by rolling up the static methods and hiding internals.
* Lock the ._converters._ai_services behind an import error.
* Print to install azure-ai-projects if we can't import AIAgentConverter
* By default, include all previous runs' tool calls and results.
* Don't crash if there is no content in historical thread messages.
* Parallelize the calls to get step_details for each run_id.
* Addressing PR comments.
* Use a single underscore to hide internal static members.
---------
Co-authored-by: Prashant Dhote <168401122+pdhotems@users.noreply.github.com>
* Adding intent_resolution_evaluator to prp/agent_evaluators branch (#40065)
* Add intent resolution evaluator
* updated intent_resolution evaluator logic
* Remove spurious print statements
* Address reviewers feedback
* add threshold key, update result to pass/fail rather than True/False
* Add example + remove repeated fields
* Harden check_score_is_valid function
* Add Task Adherence and Completeness (#40098)
* Agentic Evaluator - Response Completeness
* Added Change Log for Response Completeness Agentic Evaluator
* Task Adherence Agentic Evaluator
* Add Task Adherence Evaluator to changelog
* fixing contracts for Completeness and Task Adherence Evaluators
* Enhancing Contract for Task Adherence and Response Completeness Agentic Evaluator
* update completeness implementation.
* update the completeness evaluator response to include threshold comparison.
* updating the implementation for completeness.
* updating the type for completeness score.
* updating the parsing logic for llm output of completeness.
* updating the response dict for completeness.
* Adding Task adherence
* Adding Task Adherence evaluator with samples
* Delete old files
* updating the exception for completeness evaluator.
* Changing docstring
* Adding changelog
* Use _result_key
* Add admonition
---------
Co-authored-by: Shiprajain01 <shiprajain01@microsoft.com>
Co-authored-by: ShipraJain01 <103409614+ShipraJain01@users.noreply.github.com>
Co-authored-by: Chandra Sekhar Gupta Aravapalli <caravapalli@microsoft.com>
* Adding bug bash sample and instructions (#40125)
* Adding bug bash sample and instructions
* Updating instructions
* Update instructions.md
* Adding instructions and evaluator to agent evaluation sample
* add bug bash sample notebook for response completeness evaluator. (#40139)
* add bug bash sample notebook for response completeness evaluator.
* update the notebook for completeness.
---------
Co-authored-by: Chandra Sekhar Gupta Aravapalli <caravapalli@microsoft.com>
* Sample specific for tool call accuracy evaluator (#40135)
* Update instructions.md
* Add IntentResolution evaluator bug bash notebook (#40144)
* Add intent resolution evaluator
* updated intent_resolution evaluator logic
* Remove spurious print statements
* Address reviewers feedback
* add threshold key, update result to pass/fail rather than True/False
* Add example + remove repeated fields
* Harden check_score_is_valid function
* Sample notebook to demo intent_resolution evaluator
* Add synthetic data and section on how to test data from disk
* Update instructions.md
* Update _tool_call_accuracy.py
* Improve task adherence prompt and add sample notebook for bugbash (#40146)
* For ComputeInstance and AmlCompute update disableLocalAuth property based on ssh_public_access (#39934)
* add disableLocalAuth for computeInstance
* fix disableLocalAuthAuth issue for amlCompute
* update compute instance
* update recordings
* temp changes
* Revert "temp changes"
This reverts commit 64e3c38.
* update recordings
* fix tests
* Add resource prefix for safe secret standard alerts (#40028)
Add the prefix to identify RGs that we are creating in our TME
tenant to identify them as potentially using local auth and violating
our safe secret standards.
Co-authored-by: Wes Haggard <Wes.Haggard@microsoft.com>
* Add examples to task_adherence prompt. Add Task Adherence sample notebook
* Undo changes to New-TestResources.ps1
* Add sample .env file
---------
Co-authored-by: Prashant Dhote <168401122+pdhotems@users.noreply.github.com>
Co-authored-by: Azure SDK Bot <53356347+azure-sdk@users.noreply.github.com>
Co-authored-by: Wes Haggard <Wes.Haggard@microsoft.com>
* [AIAgentConverter] Added support for converting entire threads. (#40178)
* Implemented prepare_evaluation_data
* Add support for retrieving multiple threads into the same file.
* Parallelize thread preparing across threads.
* Set the maximum number of workers in thread pools to 10.
* Users/singankit/tool call accuracy evaluator tests (#40190)
* Raising error when tool call not found
* Adding unit tests for tool call accuracy evaluator
* Updating sample
* update output of converter for tool calls
* add built-ins
* handle file search
* remove extra files
* revert
* revert
---------
Co-authored-by: Ankit Singhal <30610298+singankit@users.noreply.github.com>
Co-authored-by: Sandy <16922860+thecsw@users.noreply.github.com>
Co-authored-by: Prashant Dhote <168401122+pdhotems@users.noreply.github.com>
Co-authored-by: Jose Santos <jcsantos@microsoft.com>
Co-authored-by: ghyadav <103428325+ghyadav@users.noreply.github.com>
Co-authored-by: Shiprajain01 <shiprajain01@microsoft.com>
Co-authored-by: ShipraJain01 <103409614+ShipraJain01@users.noreply.github.com>
Co-authored-by: Chandra Sekhar Gupta Aravapalli <caravapalli@microsoft.com>
Co-authored-by: Ankit Singhal <anksing@microsoft.com>
Co-authored-by: Chandra Sekhar Gupta <38103118+guptha23@users.noreply.github.com>
Co-authored-by: Azure SDK Bot <53356347+azure-sdk@users.noreply.github.com>
Co-authored-by: Wes Haggard <Wes.Haggard@microsoft.com>
Co-authored-by: spon <stevenpon@microsoft.com>
0 commit comments