Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-Modal-Content-Safety-Evaluators #38002

Merged
merged 107 commits into from
Oct 28, 2024
Merged
Show file tree
Hide file tree
Changes from 92 commits
Commits
Show all changes
107 commits
Select commit Hold shift + click to select a range
368bdf4
Initial-Commit-multimodal
w-javed Oct 3, 2024
920c46c
Fix
w-javed Oct 4, 2024
17c7dac
Sync eng/common directory with azure-sdk-tools for PR 9092 (#37713)
azure-sdk Oct 3, 2024
5d8ca40
Removing private parameter from __call__ of AdversarialSimulator (#37…
nagkumar91 Oct 3, 2024
6e5bd48
Enabling option to disable response payload on writes (#37365)
FabianMeiswinkel Oct 3, 2024
10f9ac7
deprecate azure_germany (#37654)
xiangyan99 Oct 4, 2024
db68d9d
Add default impl to handle token challenges (#37652)
xiangyan99 Oct 4, 2024
793c3fc
Make Credentials Required for Content Safety and Protected Materials …
needuv Oct 4, 2024
4d4e5bc
addFeedRangesAndUseFeedRangeInQueryChangeFeed (#37687)
xinlian12 Oct 4, 2024
22f081c
Update release date for core (#37723)
xiangyan99 Oct 4, 2024
71e44d4
Improvements to mindependency dev_requirement conflict resolution (#3…
scbedd Oct 4, 2024
ee45fa1
Need to add environment to subscription configuration (#37726)
azure-sdk Oct 4, 2024
2e2366b
Enable samples for formrecognizer (#37676)
xiangyan99 Oct 4, 2024
0faf959
Merge branch 'main' into multi-moodal-sdk-support
w-javed Oct 5, 2024
088ed3b
multi-modal-changes
w-javed Oct 14, 2024
3ff2c2a
Merge-conflicts
w-javed Oct 14, 2024
5ff2668
fixes
w-javed Oct 14, 2024
5c270cd
Fix with latest
w-javed Oct 16, 2024
c473df7
merge-conflicts
w-javed Oct 16, 2024
99d0cf0
dict-fix
w-javed Oct 16, 2024
d570130
adding-protected-material
w-javed Oct 17, 2024
4bc8a34
Merge branch 'main' into multi-moodal-sdk-support
w-javed Oct 17, 2024
6ad0164
adding-protected-material
w-javed Oct 17, 2024
7655b9e
adding-protected-material
w-javed Oct 17, 2024
3c4b816
bumping-version
w-javed Oct 17, 2024
255add0
adding assets
w-javed Oct 17, 2024
49a8ad8
Added image in simulator
w-javed Oct 17, 2024
f64d4d3
Merge branch 'main' into multi-moodal-sdk-support
w-javed Oct 17, 2024
acad134
Added image in simulator
w-javed Oct 17, 2024
60eae73
bumping-version
w-javed Oct 18, 2024
e14c6d7
push-asset
w-javed Oct 18, 2024
b12ef57
merge-conflict-fix
w-javed Oct 18, 2024
e070237
assets
w-javed Oct 18, 2024
66548f5
pushing asset
w-javed Oct 18, 2024
f05726c
merge-conflicts
w-javed Oct 19, 2024
1fe065b
remove-containt-on-key
w-javed Oct 19, 2024
82fd655
asset
w-javed Oct 19, 2024
6dddb96
asset2
w-javed Oct 19, 2024
c5fa8cf
asset3
w-javed Oct 19, 2024
74b3582
asset4
w-javed Oct 19, 2024
2e11e9d
adding conftest
w-javed Oct 20, 2024
651fc00
conftest
w-javed Oct 20, 2024
3ed59e8
cred fix
w-javed Oct 20, 2024
4031d46
asset-new
w-javed Oct 20, 2024
b5fc1c5
fix
w-javed Oct 20, 2024
24a52aa
asset
w-javed Oct 20, 2024
73e62c6
adding multi-modal-without-tests
w-javed Oct 20, 2024
c89b341
asset-from-main
w-javed Oct 20, 2024
b63910d
asset-from-main
w-javed Oct 20, 2024
ca4c3e6
fix
w-javed Oct 20, 2024
8b28458
adding one test only
w-javed Oct 20, 2024
1eb9304
new asset
w-javed Oct 20, 2024
dd53e67
tests,fix: Sanitizer should replace with enum value not enum name
kdestin Oct 21, 2024
9fcd7f8
test-asset
w-javed Oct 21, 2024
d64704d
[AutoRelease] t2-containerservicefleet-2024-09-24-42036(can only be m…
azure-sdk Oct 21, 2024
78b11c9
[AutoRelease] t2-dns-2024-09-25-81486(can only be merged by SDK owner…
azure-sdk Oct 21, 2024
6962ca2
[AutoRelease] t2-appconfiguration-2024-10-09-68726(can only be merged…
azure-sdk Oct 21, 2024
aa2bb08
code and test (#37855)
azure-sdk Oct 21, 2024
52f3784
[AutoRelease] t2-servicefabricmanagedclusters-2024-10-08-57405(can on…
azure-sdk Oct 21, 2024
09724d8
[AutoRelease] t2-containerinstance-2024-10-21-66631(can only be merge…
azure-sdk Oct 21, 2024
9f5f7f9
[sdk generation pipeline] bump typespec-python 0.36.1 (#38008)
msyyc Oct 21, 2024
45e049c
[AutoRelease] t2-dnsresolver-2024-10-12-16936(can only be merged by S…
azure-sdk Oct 21, 2024
617f8aa
Merge branch 'main' into multi-moodal-sdk-support-one-test
w-javed Oct 21, 2024
706eea3
new asset after fix in conftest
w-javed Oct 21, 2024
bd08adf
asset
w-javed Oct 21, 2024
3c71da9
chore: Update assets.json
kdestin Oct 21, 2024
6d48318
Move perf pipelines to TME subscription (#38020)
azure-sdk Oct 21, 2024
14d4675
fix
w-javed Oct 21, 2024
d4b8272
after-comments
w-javed Oct 21, 2024
33e3075
fix
w-javed Oct 21, 2024
237443b
fix
w-javed Oct 21, 2024
525379e
asset
w-javed Oct 22, 2024
9603f21
Merge branch 'main' into multi-moodal-sdk-support-one-test
w-javed Oct 22, 2024
af143e8
new asset with 1 test recording only
w-javed Oct 22, 2024
511e4b5
chore: Update assets.json
kdestin Oct 22, 2024
ac94148
conftest fix
w-javed Oct 23, 2024
4f445f6
Merge branch 'main' into multi-moodal-sdk-support-one-test
w-javed Oct 23, 2024
dc7cb7d
assets change
w-javed Oct 23, 2024
48bec25
new test
w-javed Oct 23, 2024
307b4e4
few changes
w-javed Oct 23, 2024
20093f8
removing proxy start
w-javed Oct 23, 2024
e819a80
added all tests
w-javed Oct 24, 2024
6f1595e
merge with asset
w-javed Oct 24, 2024
60a823f
asset
w-javed Oct 24, 2024
b6334eb
fixes
w-javed Oct 25, 2024
9092831
Merge branch 'main' into multi-moodal-sdk-support-one-test
w-javed Oct 25, 2024
93ba2f0
fixes with asset
w-javed Oct 25, 2024
1be5ef1
asset-after-tax
w-javed Oct 25, 2024
78f8ec2
enabling 2 more tests
w-javed Oct 25, 2024
84d4eac
unit test fix
w-javed Oct 25, 2024
7a9eb2b
asset
w-javed Oct 25, 2024
30eba68
new asset
w-javed Oct 25, 2024
ca7cdfe
fixes per comments
w-javed Oct 25, 2024
9afeb5f
changes by black
w-javed Oct 25, 2024
dd67a01
merge fix
w-javed Oct 25, 2024
c5fb4f1
merge fix
w-javed Oct 25, 2024
1701076
pylint fix
w-javed Oct 25, 2024
9397aaf
merge conflict
w-javed Oct 25, 2024
a1d9be9
pylint fix
w-javed Oct 26, 2024
47ff5fd
ground test fix
w-javed Oct 26, 2024
a7689a2
fixes - pylint, black, mypy
w-javed Oct 26, 2024
9c09880
more tests
w-javed Oct 26, 2024
ebd21d3
docstring fixes
w-javed Oct 26, 2024
c9db879
merge conflict
w-javed Oct 26, 2024
9fdef18
doc string fix
w-javed Oct 26, 2024
058c37a
asset
w-javed Oct 26, 2024
1e15809
few updates after Nagkumar review
w-javed Oct 28, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion sdk/evaluation/azure-ai-evaluation/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# Release History


## 1.0.0b5 (Unreleased)

### Features Added
- Adding evaluator for multimodal use cases

### Breaking Changes
- Renamed environment variable `PF_EVALS_BATCH_USE_ASYNC` to `AI_EVALS_BATCH_USE_ASYNC`.
Expand Down
2 changes: 1 addition & 1 deletion sdk/evaluation/azure-ai-evaluation/assets.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@
"AssetsRepo": "Azure/azure-sdk-assets",
"AssetsRepoPrefixPath": "python",
"TagPrefix": "python/evaluation/azure-ai-evaluation",
"Tag": "python/evaluation/azure-ai-evaluation_1390701e9d"
"Tag": "python/evaluation/azure-ai-evaluation_3eeaa3bdee"
}
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,14 @@
SexualEvaluator,
ViolenceEvaluator,
)
from ._evaluators._multimodal._content_safety_multimodal import (
ContentSafetyMultimodalEvaluator,
HateUnfairnessMultimodalEvaluator,
SelfHarmMultimodalEvaluator,
SexualMultimodalEvaluator,
ViolenceMultimodalEvaluator,
)
from ._evaluators._multimodal._protected_material import ProtectedMaterialMultimodalEvaluator
from ._evaluators._f1_score import F1ScoreEvaluator
from ._evaluators._fluency import FluencyEvaluator
from ._evaluators._gleu import GleuScoreEvaluator
Expand Down Expand Up @@ -63,4 +71,10 @@
"Conversation",
"Message",
"EvaluationResult",
"ContentSafetyMultimodalEvaluator",
"HateUnfairnessMultimodalEvaluator",
"SelfHarmMultimodalEvaluator",
"SexualMultimodalEvaluator",
"ViolenceMultimodalEvaluator",
"ProtectedMaterialMultimodalEvaluator"
]
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,12 @@
from urllib.parse import urlparse

import jwt
import json

from promptflow.core._errors import MissingRequiredPackage
from azure.ai.evaluation._exceptions import ErrorBlame, ErrorCategory, ErrorTarget, EvaluationException
from azure.ai.evaluation._http_utils import AsyncHttpPipeline, get_async_http_client
from azure.ai.evaluation._model_configurations import AzureAIProject
from azure.ai.evaluation._model_configurations import AzureAIProject, Message
from azure.core.credentials import TokenCredential
from azure.core.pipeline.policies import AsyncRetryPolicy

Expand All @@ -26,7 +28,7 @@
_InternalAnnotationTasks,
_InternalEvaluationMetrics,
)
from .utils import get_harm_severity_level
from .utils import get_harm_severity_level, retrieve_content_type

try:
version = importlib.metadata.version("azure-ai-evaluation")
Expand Down Expand Up @@ -169,7 +171,6 @@ async def submit_request(query: str, response: str, metric: str, rai_svc_url: st
operation_id = result["location"].split("/")[-1]
return operation_id


async def fetch_result(operation_id: str, rai_svc_url: str, credential: TokenCredential, token: str) -> Dict:
"""Fetch the annotation result from Responsible AI service

Expand Down Expand Up @@ -265,7 +266,11 @@ def _parse_content_harm_response(batch_response: List[Dict], metric_name: str) -
if key == EvaluationMetrics.HATE_FAIRNESS:
key = EvaluationMetrics.HATE_UNFAIRNESS

result: Dict[str, Union[str, float]] = {key: math.nan, key + "_score": math.nan, key + "_reason": ""}
result: Dict[str, Union[str, float]] = {
(key.value if hasattr(key, 'value') else key): math.nan,
f"{(key.value if hasattr(key, 'value') else key)}_score": math.nan,
f"{(key.value if hasattr(key, 'value') else key)}_reason": math.nan
}

response = batch_response[0]
if metric_name not in response:
Expand Down Expand Up @@ -314,6 +319,8 @@ def _parse_content_harm_response(batch_response: List[Dict], metric_name: str) -
reason = ""

harm_score = metric_value
if metric_value == 'n/a':
return result
if not math.isnan(metric_value):
# int(math.nan) causes a value error, and math.nan is already handled
# by get_harm_severity_level
Expand Down Expand Up @@ -442,3 +449,106 @@ async def evaluate_with_rai_service(
result = parse_response(annotation_response, metric_name)

return result

def generate_payload_multimodal(content_type: str, messages, metric: str) -> Dict:
"""Generate the payload for the annotation request
:param content_type: The type of the content representing multimodal or images.
:type content_type: str
:param messages: The normalized list of messages to be entered as the "Contents" in the payload.
:type messages: str
:param metric: The evaluation metric to use. This determines the task type, and whether a "MetricList" is needed
in the payload.
:type metric: str
:return: The payload for the annotation request.
:rtype: Dict
"""
include_metric = True
task = Tasks.CONTENT_HARM
if metric == EvaluationMetrics.PROTECTED_MATERIAL:
task = Tasks.PROTECTED_MATERIAL
include_metric = False

return (
{
"ContentType": content_type,
"Contents": [{"messages" : messages }],
"AnnotationTask": task,
"MetricList": [metric],
}
if include_metric
else {
"ContentType": content_type,
"Contents": [{"messages" : messages }],
"AnnotationTask": task,
}
)

async def submit_multimodal_request(messages, metric: str, rai_svc_url: str, token: str) -> str:
"""Submit request to Responsible AI service for evaluation and return operation ID
:param messages: The normalized list of messages to be entered as the "Contents" in the payload.
:type messages: str
:param metric: The evaluation metric to use.
:type metric: str
:param rai_svc_url: The Responsible AI service URL.
:type rai_svc_url: str
:param token: The Azure authentication token.
:type token: str
:return: The operation ID.
:rtype: str
"""
## handle json payload and payload from inference sdk strongly type messages
if len(messages) > 0 and not isinstance(messages[0], Dict):
w-javed marked this conversation as resolved.
Show resolved Hide resolved
try:
from azure.ai.inference.models import ChatRequestMessage
except ImportError:
error_message = "Please install 'azure-ai-inference' package to use SystemMessage, UserMessage, AssistantMessage"
raise MissingRequiredPackage(message=error_message)
else:
if len(messages) > 0 and isinstance(messages[0], ChatRequestMessage):
messages = [message.as_dict() for message in messages]

filtered_messages = [message for message in messages if message["role"] != "system"]
assistant_messages = [message for message in messages if message["role"] == "assistant"]
content_type = retrieve_content_type(assistant_messages, metric)
payload = generate_payload_multimodal(content_type, filtered_messages, metric)

## calling rai service for annotation
url = rai_svc_url + "/submitannotation"
headers = get_common_headers(token)
async with get_async_http_client() as client:
response = await client.post( # pylint: disable=too-many-function-args,unexpected-keyword-arg
url, json=payload, headers=headers, timeout=CommonConstants.DEFAULT_HTTP_TIMEOUT
)
if response.status_code != 202:
print("Fail evaluating '%s' with error message: %s" % (payload["Contents"], response.text))
w-javed marked this conversation as resolved.
Show resolved Hide resolved
response.raise_for_status()
result = response.json()
operation_id = result["location"].split("/")[-1]
return operation_id

async def evaluate_with_rai_service_multimodal(
messages, metric_name: str, project_scope: AzureAIProject, credential: TokenCredential
):
""" "Evaluate the content safety of the response using Responsible AI service
:param messages: The normalized list of messages.
:type messages: str
:param metric_name: The evaluation metric to use.
:type metric_name: str
:param project_scope: The Azure AI project scope details.
:type project_scope: Dict
:param credential: The Azure authentication credential.
:type credential:
~azure.core.credentials.TokenCredential
:return: The parsed annotation result.
:rtype: List[List[Dict]]
"""

# Get RAI service URL from discovery service and check service availability
token = await fetch_or_reuse_token(credential)
rai_svc_url = await get_rai_svc_url(project_scope, token)
await ensure_service_availability(rai_svc_url, token, Tasks.CONTENT_HARM)
# Submit annotation request and fetch result
operation_id = await submit_multimodal_request(messages, metric_name, rai_svc_url, token)
annotation_response = cast(List[Dict], await fetch_result(operation_id, rai_svc_url, credential, token))
result = parse_response(annotation_response, metric_name)
return result
Original file line number Diff line number Diff line change
Expand Up @@ -272,3 +272,33 @@ def validate_annotation(v: object, annotation: Union[str, type, object]) -> bool
validate_annotation(v, annotations[k])

return cast(T_TypedDict, o)

def retrieve_content_type(assistant_messages: list, metric: str) -> str:
w-javed marked this conversation as resolved.
Show resolved Hide resolved
"""Get the content type for service payload.

:param messages: The list of messages to be annotated by evaluation service
:type messages: list
:param metric: A string representing the metric type
:type metric: str
:return: A text representing the content type. Example: 'text', or 'image'
:rtype: str
"""
# Check if metric is "protected_material"
if metric == "protected_material":
return "image"

# Ensure there are messages
if assistant_messages:
w-javed marked this conversation as resolved.
Show resolved Hide resolved
# Iterate through each message
for item in assistant_messages:
# Ensure "content" exists in the message and is iterable
if "content" in item:
for content in item["content"]:
# Check if the content type is "image_url"
if content.get("type") == "image_url":
return "image"
w-javed marked this conversation as resolved.
Show resolved Hide resolved
# Default return if no image was found
return "text"

# Default return if no messages
return "text"
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@
import tempfile
from pathlib import Path
from typing import Any, Dict, NamedTuple, Optional, Tuple, Union
import uuid
import base64

import pandas as pd
from promptflow.client import PFClient
Expand Down Expand Up @@ -80,6 +82,32 @@ def _azure_pf_client_and_triad(trace_destination) -> Tuple[PFClient, AzureMLWork

return azure_pf_client, ws_triad

def _store_multimodal_content(messages, tmpdir: str):
# verify if images folder exists
images_folder_path = os.path.join(tmpdir, "images")
os.makedirs(images_folder_path, exist_ok=True)

# traverse all messages and replace base64 image data with new file name.
for message in messages:
if "content" in message:
for content in message["content"]:
w-javed marked this conversation as resolved.
Show resolved Hide resolved
if content.get("type") == "image_url":
image_url = content.get("image_url")
if image_url and 'url' in image_url and image_url['url'].startswith("data:image/jpg;base64,"):
# Extract the base64 string
base64image = image_url['url'].replace("data:image/jpg;base64,", "")

# Generate a unique filename
image_file_name = f"{str(uuid.uuid4())}.jpg"
image_url['url'] = f"images/{image_file_name}" # Replace the base64 URL with the file path

# Decode the base64 string to binary image data
image_data_binary = base64.b64decode(base64image)

# Write the binary image data to the file
image_file_path = os.path.join(images_folder_path, image_file_name)
with open(image_file_path, "wb") as f:
f.write(image_data_binary)

def _log_metrics_and_instance_results(
metrics: Dict[str, Any],
Expand Down Expand Up @@ -110,6 +138,14 @@ def _log_metrics_and_instance_results(
artifact_name = EvalRun.EVALUATION_ARTIFACT if run else EvalRun.EVALUATION_ARTIFACT_DUMMY_RUN

with tempfile.TemporaryDirectory() as tmpdir:
# storing multi_modal images if exists
col_name = "inputs.conversation"
if col_name in instance_results.columns:
for key, item in instance_results[col_name].items():
if "messages" in item:
_store_multimodal_content(item["messages"], tmpdir)

# storing artifact result
tmp_path = os.path.join(tmpdir, artifact_name)

with open(tmp_path, "w", encoding=DefaultOpenEncoding.WRITE) as f:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -99,10 +99,10 @@ def __init__(
self._eval_last_turn = eval_last_turn
self._parallel = parallel
self._evaluators: List[Callable[..., Dict[str, Union[str, float]]]] = [
ViolenceEvaluator(azure_ai_project, credential),
SexualEvaluator(azure_ai_project, credential),
SelfHarmEvaluator(azure_ai_project, credential),
HateUnfairnessEvaluator(azure_ai_project, credential),
ViolenceEvaluator(credential, azure_ai_project),
SexualEvaluator(credential, azure_ai_project),
SelfHarmEvaluator(credential, azure_ai_project),
HateUnfairnessEvaluator(credential, azure_ai_project),
w-javed marked this conversation as resolved.
Show resolved Hide resolved
]

def __call__(self, *, conversation: list, **kwargs):
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# ---------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# ---------------------------------------------------------
from ._content_safety_multimodal import ContentSafetyMultimodalEvaluator
from ._content_safety_multimodal_base import ContentSafetyMultimodalEvaluatorBase
from ._hate_unfairness import HateUnfairnessMultimodalEvaluator
from ._self_harm import SelfHarmMultimodalEvaluator
from ._sexual import SexualMultimodalEvaluator
from ._violence import ViolenceMultimodalEvaluator
from ._protected_material import ProtectedMaterialMultimodalEvaluator

__all__ = [
"ContentSafetyMultimodalEvaluator",
"ContentSafetyMultimodalEvaluatorBase",
"ViolenceMultimodalEvaluator",
"SexualMultimodalEvaluator",
"SelfHarmMultimodalEvaluator",
"HateUnfairnessMultimodalEvaluator",
"ProtectedMaterialMultimodalEvaluator",
]
Loading
Loading