Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converter from AI Service threads/runs to evaluator-compatible schema #40047

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
03af2ba
WIP AIAgentConverter
thecsw Mar 11, 2025
e24d77f
Added the v1 of the converter
thecsw Mar 11, 2025
73e3939
Updated the AIAgentConverter with different output schemas.
thecsw Mar 12, 2025
c53b72f
ruff format
thecsw Mar 12, 2025
149a4cc
Update the top schema to have: query, response, tool_definitions
thecsw Mar 12, 2025
8d87168
"agentic" is not a recognized word, change the wording.
thecsw Mar 12, 2025
b3f5ef2
System message always comes first in query with multiple runs.
thecsw Mar 12, 2025
465c1c7
Add support for getting inputs from local files with run_ids.
thecsw Mar 12, 2025
5219616
Export AIAgentConverter through azure.ai.evaluation, local read updates
thecsw Mar 13, 2025
eed1375
Use from ._models import
thecsw Mar 13, 2025
3758eae
Ruff format again.
thecsw Mar 13, 2025
c33e4f7
For ComputeInstance and AmlCompute update disableLocalAuth property b…
pdhotems Mar 13, 2025
1741980
Simplify the API by rolling up the static methods and hiding internals.
thecsw Mar 13, 2025
55294ae
Merge branch 'main' into sandy/ai_services_evaluator
thecsw Mar 13, 2025
a30abdf
Lock the ._converters._ai_services behind an import error.
thecsw Mar 13, 2025
7ec8c35
Print to install azure-ai-projects if we can't import AIAgentConverter
thecsw Mar 13, 2025
50e819f
By default, include all previous runs' tool calls and results.
thecsw Mar 14, 2025
7637357
Don't crash if there is no content in historical thread messages.
thecsw Mar 14, 2025
8bb6cd3
Parallelize the calls to get step_details for each run_id.
thecsw Mar 14, 2025
6deb358
Merge branch 'prp/agent_evaluators' into sandy/ai_services_evaluator
thecsw Mar 16, 2025
cc8df22
Addressing PR comments.
thecsw Mar 17, 2025
85def50
Use a single underscore to hide internal static members.
thecsw Mar 17, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,16 @@
OpenAIModelConfiguration,
)

# The converter from the AI service to the evaluator schema requires a dependency on
# ai.projects, but we also don't want to force users installing ai.evaluations to pull
# in ai.projects. So we only import it if it's available and the user has ai.projects.
try:
from ._converters._ai_services import AIAgentConverter
_patch_all = ["AIAgentConverter"]
except ImportError:
print("Could not import AIAgentConverter. Please install the dependency with `pip install azure-ai-projects`.")
_patch_all = []

__all__ = [
"evaluate",
"CoherenceEvaluator",
Expand Down Expand Up @@ -72,3 +82,5 @@
"ISAEvaluator",
"ToolCallAccuracyEvaluator",
]

__all__.extend([p for p in _patch_all if p not in __all__])
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# ---------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# ---------------------------------------------------------

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,259 @@
import datetime
import json

from pydantic import BaseModel

from azure.ai.projects.models import RunStepFunctionToolCall

from typing import List, Optional, Union

# Message roles constants.
_SYSTEM = "system"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newer APIs use "developer" here (they're essentially interchangeable, just depends on API version). So might want to make sure that's supported too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, didn't know that. What API version should we be bound to or set it to developer by default? I'll look into versioning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for output, we can just use system, at least for now. But when reading in messages from threads/etc., you might need to cover the case where it could say "developer". Or maybe we just default to use whatever the thread itself is using, actually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, for this converter's purpose I don't think it matters. I take it back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's write system for now—I was thinking of possibly capturing the model it ran on, so if it were o1, we could mark the instructions as coming from developer. The thread itself only gives us instructions without an explicit marking for either.

_USER = "user"
_AGENT = "assistant"
_TOOL = "tool"

# Constant definitions for what tool details include.
_TOOL_CALL = "tool_call"
_TOOL_RESULT = "tool_result"
_FUNCTION = "function"

# This is returned by AI services in the API to filter against tool invocations.
_TOOL_CALLS = "tool_calls"


class Message(BaseModel):
"""Represents a message in a conversation with agents, assistants, and tools. We need to export these structures
to JSON for evaluators and we have custom fields such as createdAt, run_id, and tool_call_id, so we cannot use
the standard pydantic models provided by OpenAI.

:param createdAt: The timestamp when the message was created.
:type createdAt: datetime.datetime
:param run_id: The ID of the run associated with the message. Optional.
:type run_id: Optional[str]
:param role: The role of the message sender (e.g., system, user, tool, assistant).
:type role: str
:param content: The content of the message, which can be a string or a list of dictionaries.
:type content: Union[str, List[dict]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR, it can be a quick follown so we don't need to be blocked, but I've suggested we include name in order to support multi-agent flows. Any objections? @singankit ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can user provide a name ? I do not see it being generated right now by service.

"""

createdAt: Optional[Union[datetime.datetime, int]] = None # SystemMessage wouldn't have this
run_id: Optional[str] = None
tool_call_id: Optional[str] = None # see ToolMessage
role: str
content: Union[str, List[dict]]


class SystemMessage(Message):
"""Represents a system message in a conversation with agents, assistants, and tools.

:param role: The role of the message sender, which is always 'system'.
:type role: str
"""

role: str = _SYSTEM


class UserMessage(Message):
"""Represents a user message in a conversation with agents, assistants, and tools.

:param role: The role of the message sender, which is always 'user'.
:type role: str
"""

role: str = _USER


class ToolMessage(Message):
"""Represents a tool message in a conversation with agents, assistants, and tools.

:param run_id: The ID of the run associated with the message.
:type run_id: str
:param role: The role of the message sender, which is always 'tool'.
:type role: str
:param tool_call_id: The ID of the tool call associated with the message. Optional.
:type tool_call_id: Optional[str]
"""

run_id: str
role: str = _TOOL
tool_call_id: Optional[str] = None


class AssistantMessage(Message):
"""Represents an assistant message.

:param run_id: The ID of the run associated with the message.
:type run_id: str
:param role: The role of the message sender, which is always 'assistant'.
:type role: str
"""

run_id: str
role: str = _AGENT


class ToolDefinition(BaseModel):
"""Represents a tool definition that will be used in the agent.

:param name: The name of the tool.
:type name: str
:param description: A description of the tool.
:type description: str
:param parameters: The parameters required by the tool.
:type parameters: dict
"""

name: str
description: Optional[str] = None
parameters: dict


class ToolCall:
"""Represents a tool call, used as an intermediate step in the conversion process.

:param created: The timestamp when the tool call was created.
:type created: datetime.datetime
:param completed: The timestamp when the tool call was completed.
:type completed: datetime.datetime
:param details: The details of the tool call.
:type details: RunStepFunctionToolCall
"""

def __init__(self, created: datetime.datetime, completed: datetime.datetime, details: RunStepFunctionToolCall):
self.created = created
self.completed = completed
self.details = details


class EvaluatorData(BaseModel):
"""Represents the result of a conversion.

:param query: A list of messages representing the system message, chat history, and user query.
:type query: List[Message]
:param response: A list of messages representing the assistant's response, including tool calls and results.
:type response: List[Message]
:param tool_definitions: A list of tool definitions used in the agent.
:type tool_definitions: List[ToolDefinition]
"""

query: List[Message]
response: List[Message]
tool_definitions: List[ToolDefinition]

def to_json(self):
"""Converts the result to a JSON string.

:return: The JSON representation of the result.
:rtype: str
"""
return self.model_dump_json(exclude={}, exclude_none=True)


def break_tool_call_into_messages(tool_call: ToolCall, run_id: str) -> List[Message]:
"""
Breaks a tool call into a list of messages, including the tool call and its result.

:param tool_call: The tool call to be broken into messages.
:type tool_call: ToolCall
:param run_id: The ID of the run associated with the messages.
:type run_id: str
:return: A list of messages representing the tool call and its result.
:rtype: List[Message]
"""
# We will use this as our accumulator.
messages: List[Message] = []

# As of March 17th, 2025, we only support custom functions due to built-in code interpreters and bing grounding
# tooling not reporting their function calls in the same way. Code interpreters don't include the tool call at
# all in most of the cases, and bing would only show the API URL, without arguments or results.
# Bing grounding would have "bing_grounding" in details with "requesturl" that will just be the API path with query.
# TODO: Work with AI Services to add converter support for BingGrounding and CodeInterpreter.
if not hasattr(tool_call.details, _FUNCTION):
return messages

# This is the internals of the content object that will be included with the tool call.
tool_call_id = tool_call.details.id
content_tool_call = {
"type": _TOOL_CALL,
_TOOL_CALL: {
"id": tool_call_id,
"type": _FUNCTION,
_FUNCTION: {
"name": tool_call.details.function.name,
"arguments": safe_loads(tool_call.details.function.arguments),
},
},
}

# We format it into an assistant message, where the content is a singleton list of the content object.
# It should be a tool message, since this is the call, but the given schema treats this message as
# assistant's action of calling the tool.
messages.append(AssistantMessage(run_id=run_id, content=[to_dict(content_tool_call)], createdAt=tool_call.created))

# Now, onto the tool result, which only includes the result of the function call.
content_tool_call_result = {"type": _TOOL_RESULT, _TOOL_RESULT: safe_loads(tool_call.details.function.output)}

# Since this is a tool's action of returning, we put it as a tool message.
messages.append(
ToolMessage(
run_id=run_id,
tool_call_id=tool_call_id,
content=[to_dict(content_tool_call_result)],
createdAt=tool_call.completed,
)
)
return messages


def to_dict(obj) -> dict:
"""
Converts an object to a dictionary.

:param obj: The object to be converted.
:type obj: Any
:return: The dictionary representation of the object.
:rtype: dict
"""
return json.loads(json.dumps(obj))


def safe_loads(data: str) -> Union[dict, str]:
"""
Safely loads a JSON string into a Python dictionary or returns the original string if loading fails.
:param data: The JSON string to be loaded.
:type data: str
:return: The loaded dictionary or the original string.
:rtype: Union[dict, str]
"""
try:
return json.loads(data)
except json.JSONDecodeError:
return data


def convert_message(msg: dict) -> Message:
"""
Converts a dictionary to the appropriate Message subclass.

:param msg: The message dictionary.
:type msg: dict
:return: The Message object.
:rtype: Message
"""
role = msg["role"]
if role == "system":
return SystemMessage(content=str(msg["content"]))
elif role == "user":
return UserMessage(content=msg["content"], createdAt=msg["createdAt"])
elif role == "assistant":
return AssistantMessage(run_id=str(msg["run_id"]), content=msg["content"], createdAt=msg["createdAt"])
elif role == "tool":
return ToolMessage(
run_id=str(msg["run_id"]),
tool_call_id=str(msg["tool_call_id"]),
content=msg["content"],
createdAt=msg["createdAt"],
)
else:
raise ValueError(f"Unknown role: {role}")
1 change: 1 addition & 0 deletions sdk/ml/azure-ai-ml/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
### Features Added

### Bugs Fixed
- Fix for compute Instance, disableLocalAuth property should be depend on ssh public access enabled.

## 1.26.0 (2025-03-11)

Expand Down
2 changes: 1 addition & 1 deletion sdk/ml/azure-ai-ml/assets.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@
"AssetsRepo": "Azure/azure-sdk-assets",
"AssetsRepoPrefixPath": "python",
"TagPrefix": "python/ml/azure-ai-ml",
"Tag": "python/ml/azure-ai-ml_a2c955e6e2"
"Tag": "python/ml/azure-ai-ml_305b890d5b"
}
Original file line number Diff line number Diff line change
Expand Up @@ -251,7 +251,7 @@ def _to_rest_object(self) -> ComputeResource:
),
)
remote_login_public_access = "Enabled"
disableLocalAuth = not (self.ssh_public_access_enabled and self.ssh_settings is not None)
disableLocalAuth = not (self.ssh_settings)
if self.ssh_public_access_enabled is not None:
remote_login_public_access = "Enabled" if self.ssh_public_access_enabled else "Disabled"

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -280,12 +280,14 @@ def _to_rest_object(self) -> ComputeResource:
subnet_resource = None

ssh_settings = None
disable_local_auth = True
if self.ssh_public_access_enabled is not None or self.ssh_settings is not None:
ssh_settings = CiSShSettings()
ssh_settings.ssh_public_access = "Enabled" if self.ssh_public_access_enabled else "Disabled"
ssh_settings.admin_public_key = (
self.ssh_settings.ssh_key_value if self.ssh_settings and self.ssh_settings.ssh_key_value else None
)
disable_local_auth = not self.ssh_public_access_enabled

personal_compute_instance_settings = None
if self.create_on_behalf_of:
Expand Down Expand Up @@ -330,6 +332,7 @@ def _to_rest_object(self) -> ComputeResource:
description=self.description,
compute_type=self.type,
properties=compute_instance_prop,
disable_local_auth=disable_local_auth,
)
return ComputeResource(
location=self.location,
Expand Down
12 changes: 2 additions & 10 deletions sdk/ml/azure-ai-ml/tests/compute/e2etests/test_compute.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ def test_aml_compute_create_and_delete(self, client: MLClient, rand_compute_name
assert compute_resource_get.name == compute_name
assert compute_resource_get.tier == "dedicated"
assert compute_resource_get.location == compute.location
assert compute_resource_get.ssh_public_access_enabled == True
assert compute_resource_get.ssh_settings.admin_username == "azureuser"

compute_resource_get.idle_time_before_scale_down = 200
compute_update_poller = client.compute.begin_update(compute_resource_get)
Expand All @@ -46,7 +48,6 @@ def test_aml_compute_create_and_delete(self, client: MLClient, rand_compute_name
# so this is a preferred approach to assert
assert isinstance(outcome, LROPoller)

@pytest.mark.skip(reason="not enough capacity")
def test_compute_instance_create_and_delete(
self, client: MLClient, rand_compute_name: Callable[[str], str]
) -> None:
Expand All @@ -65,20 +66,11 @@ def test_compute_instance_create_and_delete(
assert isinstance(compute_resource_list, ItemPaged)
compute_resource_get = client.compute.get(name=compute_name)
assert compute_resource_get.name == compute_name
assert compute_resource_get.identity.type == "system_assigned"
outcome = client.compute.begin_delete(name=compute_name)
# the compute is getting deleted , but not waiting on the poller! so immediately returning
# so this is a preferred approach to assert
assert isinstance(outcome, LROPoller)

@pytest.mark.skipif(
condition=not is_live(),
reason=(
"Test takes 5 minutes in automation. "
"Already have unit tests verifying correct _restclient method is called. "
"Can be validated in live build only."
),
)
def test_compute_instance_stop_start_restart(
self, client: MLClient, rand_compute_name: Callable[[str], str]
) -> None:
Expand Down
Loading