Skip to content

Commit

Permalink
Multi-Modal-Content-Safety-Evaluators (#38002)
Browse files Browse the repository at this point in the history
* Initial-Commit-multimodal

* Fix

* Sync eng/common directory with azure-sdk-tools for PR 9092 (#37713)

* Export the subscription data from the service connection

* Update deploy-test-resources.yml

---------

Co-authored-by: Wes Haggard <Wes.Haggard@microsoft.com>
Co-authored-by: Wes Haggard <weshaggard@users.noreply.github.com>

* Removing private parameter from __call__ of AdversarialSimulator (#37709)

* Update task_query_response.prompty

remove required keys

* Update task_simulate.prompty

* Update task_query_response.prompty

* Update task_simulate.prompty

* Remove private variable and use kwargs

* Add experimental tag to adv sim

---------

Co-authored-by: Nagkumar Arkalgud <nagkumar@naarkalgworkmac.lan>

* Enabling option to disable response payload on writes (#37365)

* Initial draft

* Adding tests

* Renaming parameter

* Update container.py

* Renaming test file

* Fixing LINT issues

* Update container.py

* Update _base.py

* Update _base.py

* Fixing tests

* Fixing tests

* Adding support to disable response payload on write for AIO

* Update CHANGELOG.md

* Update _cosmos_client.py

* Reacting to code review comments

* Addressing code review feedback

* Addressed CR feedback

* Fixing pyLint errors

* Fixing pylint errors

* Update test_crud.py

* Fixing svc regression

* Update sdk/cosmos/azure-cosmos/azure/cosmos/aio/_container.py

Co-authored-by: Anna Tisch <antisch@microsoft.com>

* Reacting to code review feedback.

* Update container.py

* Update test_query_vector_similarity.py

---------

Co-authored-by: Anna Tisch <antisch@microsoft.com>

* deprecate azure_germany (#37654)

* deprecate azure_germany

* update

* update

* Update sdk/identity/azure-identity/azure/identity/_constants.py

Co-authored-by: Paul Van Eck <paulvaneck@microsoft.com>

* update

---------

Co-authored-by: Paul Van Eck <paulvaneck@microsoft.com>

* Add default impl to handle token challenges (#37652)

* Add default impl to handle token challenges

* update version

* update

* update

* update

* update

* Update sdk/core/azure-core/azure/core/pipeline/policies/_utils.py

Co-authored-by: Paul Van Eck <paulvaneck@microsoft.com>

* Update sdk/core/azure-core/azure/core/pipeline/policies/_utils.py

Co-authored-by: Paul Van Eck <paulvaneck@microsoft.com>

* update

* Update sdk/core/azure-core/tests/test_utils.py

Co-authored-by: Paul Van Eck <paulvaneck@microsoft.com>

* Update sdk/core/azure-core/azure/core/pipeline/policies/_utils.py

Co-authored-by: Paul Van Eck <paulvaneck@microsoft.com>

* update

---------

Co-authored-by: Paul Van Eck <paulvaneck@microsoft.com>

* Make Credentials Required for Content Safety and Protected Materials Evaluators (#37707)

* Make Credentials Required for Content Safety Evaluators

* fix a typo

* lint, fix content safety evaluator

* revert test change

* remove credential from rai_service

* addFeedRangesAndUseFeedRangeInQueryChangeFeed (#37687)

* Add getFeedRanges API 
* Add feedRange support in query changeFeed


Co-authored-by: annie-mac <xinlian@microsoft.com>

* Update release date for core (#37723)

* Improvements to mindependency dev_requirement conflict resolution (#37669)

* during mindependency runs, dev_requirements on local relative paths are now checked for conflict with the targeted set of minimum dependencies
* multiple type clarifications within azure-sdk-tools
* added tests for new conflict resolution logic

---------

Co-authored-by: McCoy Patiño <39780829+mccoyp@users.noreply.github.com>

* Need to add environment to subscription configuration (#37726)

Co-authored-by: Wes Haggard <Wes.Haggard@microsoft.com>

* Enable samples for formrecognizer (#37676)

* multi-modal-changes

* fixes

* Fix with latest

* dict-fix

* adding-protected-material

* adding-protected-material

* adding-protected-material

* bumping-version

* adding assets

* Added image in simulator

* Added image in simulator

* bumping-version

* push-asset

* assets

* pushing asset

* remove-containt-on-key

* asset

* asset2

* asset3

* asset4

* adding conftest

* conftest

* cred fix

* asset-new

* fix

* asset

* adding multi-modal-without-tests

* asset-from-main

* asset-from-main

* fix

* adding one test only

* new asset

* tests,fix: Sanitizer should replace with enum value not enum name

* test-asset

* [AutoRelease] t2-containerservicefleet-2024-09-24-42036(can only be merged by SDK owner) (#37538)

* code and test

* Update CHANGELOG.md

* update-testcase

---------

Co-authored-by: azure-sdk <PythonSdkPipelines>
Co-authored-by: ChenxiJiang333 <119990644+ChenxiJiang333@users.noreply.github.com>
Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com>

* [AutoRelease] t2-dns-2024-09-25-81486(can only be merged by SDK owner) (#37560)

* code and test

* update-testcase

* Update CHANGELOG.md

* Update test_mgmt_dns_test.py

---------

Co-authored-by: azure-sdk <PythonSdkPipelines>
Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com>
Co-authored-by: ChenxiJiang333 <119990644+ChenxiJiang333@users.noreply.github.com>

* [AutoRelease] t2-appconfiguration-2024-10-09-68726(can only be merged by SDK owner) (#37800)

* code and test

* update-testcase

* Update pyproject.toml

---------

Co-authored-by: azure-sdk <PythonSdkPipelines>
Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com>
Co-authored-by: Yuchao Yan <yuchaoyan@microsoft.com>

* code and test (#37855)

Co-authored-by: azure-sdk <PythonSdkPipelines>

* [AutoRelease] t2-servicefabricmanagedclusters-2024-10-08-57405(can only be merged by SDK owner) (#37768)

* code and test

* update-testcase

* update-testcases

---------

Co-authored-by: azure-sdk <PythonSdkPipelines>
Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com>

* [AutoRelease] t2-containerinstance-2024-10-21-66631(can only be merged by SDK owner) (#38005)

* code and test

* update-testcase

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

---------

Co-authored-by: azure-sdk <PythonSdkPipelines>
Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com>
Co-authored-by: ChenxiJiang333 <119990644+ChenxiJiang333@users.noreply.github.com>

* [sdk generation pipeline] bump typespec-python 0.36.1 (#38008)

* update version

* update package.json

* [AutoRelease] t2-dnsresolver-2024-10-12-16936(can only be merged by SDK owner) (#37864)

* code and test

* update-testcase

* Update CHANGELOG.md

* Update CHANGELOG.md

---------

Co-authored-by: azure-sdk <PythonSdkPipelines>
Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com>
Co-authored-by: ChenxiJiang333 <119990644+ChenxiJiang333@users.noreply.github.com>
Co-authored-by: Yuchao Yan <yuchaoyan@microsoft.com>

* new asset after fix in conftest

* asset

* chore: Update assets.json

* Move perf pipelines to TME subscription (#38020)

Co-authored-by: Wes Haggard <Wes.Haggard@microsoft.com>

* fix

* after-comments

* fix

* asset

* new asset with 1 test recording only

* chore: Update assets.json

* conftest fix

* assets change

* new test

* few changes

* removing proxy start

* added all tests

* asset

* fixes

* fixes with asset

* asset-after-tax

* enabling 2 more tests

* unit test fix

* asset

* new asset

* fixes per comments

* changes by black

* merge fix

* pylint fix

* pylint fix

* ground test fix

* fixes - pylint, black, mypy

* more tests

* docstring fixes

* doc string fix

* asset

* few updates after Nagkumar review

---------

Co-authored-by: Azure SDK Bot <53356347+azure-sdk@users.noreply.github.com>
Co-authored-by: Wes Haggard <Wes.Haggard@microsoft.com>
Co-authored-by: Wes Haggard <weshaggard@users.noreply.github.com>
Co-authored-by: Nagkumar Arkalgud <nagkumar91@users.noreply.github.com>
Co-authored-by: Nagkumar Arkalgud <nagkumar@naarkalgworkmac.lan>
Co-authored-by: Fabian Meiswinkel <fabianm@microsoft.com>
Co-authored-by: Anna Tisch <antisch@microsoft.com>
Co-authored-by: Xiang Yan <xiangsjtu@gmail.com>
Co-authored-by: Paul Van Eck <paulvaneck@microsoft.com>
Co-authored-by: Neehar Duvvuri <40341266+needuv@users.noreply.github.com>
Co-authored-by: Annie Liang <64233642+xinlian12@users.noreply.github.com>
Co-authored-by: annie-mac <xinlian@microsoft.com>
Co-authored-by: Scott Beddall <45376673+scbedd@users.noreply.github.com>
Co-authored-by: McCoy Patiño <39780829+mccoyp@users.noreply.github.com>
Co-authored-by: kdestin <101366538+kdestin@users.noreply.github.com>
Co-authored-by: ChenxiJiang333 <119990644+ChenxiJiang333@users.noreply.github.com>
Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com>
Co-authored-by: Yuchao Yan <yuchaoyan@microsoft.com>
  • Loading branch information
19 people authored Oct 28, 2024
1 parent 558336a commit 5b78782
Show file tree
Hide file tree
Showing 28 changed files with 1,680 additions and 31 deletions.
2 changes: 1 addition & 1 deletion sdk/evaluation/azure-ai-evaluation/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# Release History


## 1.0.0b5 (Unreleased)

### Features Added
Expand All @@ -23,6 +22,7 @@ outputs = asyncio.run(custom_simulator(
max_conversation_turns=1,
))
```
- Adding evaluator for multimodal use cases

### Breaking Changes
- Renamed environment variable `PF_EVALS_BATCH_USE_ASYNC` to `AI_EVALS_BATCH_USE_ASYNC`.
Expand Down
2 changes: 1 addition & 1 deletion sdk/evaluation/azure-ai-evaluation/assets.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@
"AssetsRepo": "Azure/azure-sdk-assets",
"AssetsRepoPrefixPath": "python",
"TagPrefix": "python/evaluation/azure-ai-evaluation",
"Tag": "python/evaluation/azure-ai-evaluation_f0444ef220"
"Tag": "python/evaluation/azure-ai-evaluation_eb4989f81d"
}
14 changes: 14 additions & 0 deletions sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,14 @@
SexualEvaluator,
ViolenceEvaluator,
)
from ._evaluators._multimodal._content_safety_multimodal import (
ContentSafetyMultimodalEvaluator,
HateUnfairnessMultimodalEvaluator,
SelfHarmMultimodalEvaluator,
SexualMultimodalEvaluator,
ViolenceMultimodalEvaluator,
)
from ._evaluators._multimodal._protected_material import ProtectedMaterialMultimodalEvaluator
from ._evaluators._f1_score import F1ScoreEvaluator
from ._evaluators._fluency import FluencyEvaluator
from ._evaluators._gleu import GleuScoreEvaluator
Expand Down Expand Up @@ -65,4 +73,10 @@
"Conversation",
"Message",
"EvaluationResult",
"ContentSafetyMultimodalEvaluator",
"HateUnfairnessMultimodalEvaluator",
"SelfHarmMultimodalEvaluator",
"SexualMultimodalEvaluator",
"ViolenceMultimodalEvaluator",
"ProtectedMaterialMultimodalEvaluator",
]
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,12 @@

import jwt

from promptflow.core._errors import MissingRequiredPackage
from azure.ai.evaluation._exceptions import ErrorBlame, ErrorCategory, ErrorTarget, EvaluationException
from azure.ai.evaluation._http_utils import AsyncHttpPipeline, get_async_http_client
from azure.ai.evaluation._model_configurations import AzureAIProject
from azure.core.credentials import TokenCredential
from azure.core.exceptions import HttpResponseError
from azure.core.pipeline.policies import AsyncRetryPolicy

from .constants import (
Expand All @@ -26,7 +28,7 @@
Tasks,
_InternalEvaluationMetrics,
)
from .utils import get_harm_severity_level
from .utils import get_harm_severity_level, retrieve_content_type

try:
version = importlib.metadata.version("azure-ai-evaluation")
Expand Down Expand Up @@ -281,7 +283,11 @@ def _parse_content_harm_response(
if key == EvaluationMetrics.HATE_FAIRNESS:
key = EvaluationMetrics.HATE_UNFAIRNESS

result: Dict[str, Union[str, float]] = {key: math.nan, key + "_score": math.nan, key + "_reason": ""}
result: Dict[str, Union[str, float]] = {
(key.value if hasattr(key, "value") else key): math.nan,
f"{(key.value if hasattr(key, 'value') else key)}_score": math.nan,
f"{(key.value if hasattr(key, 'value') else key)}_reason": math.nan,
}

response = batch_response[0]
if metric_name not in response:
Expand Down Expand Up @@ -330,6 +336,8 @@ def _parse_content_harm_response(
reason = ""

harm_score = metric_value
if metric_value == "n/a":
return result
if not math.isnan(metric_value):
# int(math.nan) causes a value error, and math.nan is already handled
# by get_harm_severity_level
Expand Down Expand Up @@ -465,3 +473,109 @@ async def evaluate_with_rai_service(
result = parse_response(annotation_response, metric_name, metric_display_name)

return result


def generate_payload_multimodal(content_type: str, messages, metric: str) -> Dict:
"""Generate the payload for the annotation request
:param content_type: The type of the content representing multimodal or images.
:type content_type: str
:param messages: The normalized list of messages to be entered as the "Contents" in the payload.
:type messages: str
:param metric: The evaluation metric to use. This determines the task type, and whether a "MetricList" is needed
in the payload.
:type metric: str
:return: The payload for the annotation request.
:rtype: Dict
"""
include_metric = True
task = Tasks.CONTENT_HARM
if metric == EvaluationMetrics.PROTECTED_MATERIAL:
task = Tasks.PROTECTED_MATERIAL
include_metric = False

if include_metric:
return {
"ContentType": content_type,
"Contents": [{"messages": messages}],
"AnnotationTask": task,
"MetricList": [metric],
}
return {
"ContentType": content_type,
"Contents": [{"messages": messages}],
"AnnotationTask": task,
}


async def submit_multimodal_request(messages, metric: str, rai_svc_url: str, token: str) -> str:
"""Submit request to Responsible AI service for evaluation and return operation ID
:param messages: The normalized list of messages to be entered as the "Contents" in the payload.
:type messages: str
:param metric: The evaluation metric to use.
:type metric: str
:param rai_svc_url: The Responsible AI service URL.
:type rai_svc_url: str
:param token: The Azure authentication token.
:type token: str
:return: The operation ID.
:rtype: str
"""
## handle json payload and payload from inference sdk strongly type messages
if len(messages) > 0 and not isinstance(messages[0], dict):
try:
from azure.ai.inference.models import ChatRequestMessage
except ImportError as ex:
error_message = (
"Please install 'azure-ai-inference' package to use SystemMessage, UserMessage, AssistantMessage"
)
raise MissingRequiredPackage(message=error_message) from ex
if len(messages) > 0 and isinstance(messages[0], ChatRequestMessage):
messages = [message.as_dict() for message in messages]

filtered_messages = [message for message in messages if message["role"] != "system"]
assistant_messages = [message for message in messages if message["role"] == "assistant"]
content_type = retrieve_content_type(assistant_messages, metric)
payload = generate_payload_multimodal(content_type, filtered_messages, metric)

## calling rai service for annotation
url = rai_svc_url + "/submitannotation"
headers = get_common_headers(token)
async with get_async_http_client() as client:
response = await client.post( # pylint: disable=too-many-function-args,unexpected-keyword-arg
url, json=payload, headers=headers
)
if response.status_code != 202:
raise HttpResponseError(
message=f"Received unexpected HTTP status: {response.status_code} {response.text()}", response=response
)
result = response.json()
operation_id = result["location"].split("/")[-1]
return operation_id


async def evaluate_with_rai_service_multimodal(
messages, metric_name: str, project_scope: AzureAIProject, credential: TokenCredential
):
""" "Evaluate the content safety of the response using Responsible AI service
:param messages: The normalized list of messages.
:type messages: str
:param metric_name: The evaluation metric to use.
:type metric_name: str
:param project_scope: The Azure AI project scope details.
:type project_scope: Dict
:param credential: The Azure authentication credential.
:type credential:
~azure.core.credentials.TokenCredential
:return: The parsed annotation result.
:rtype: List[List[Dict]]
"""

# Get RAI service URL from discovery service and check service availability
token = await fetch_or_reuse_token(credential)
rai_svc_url = await get_rai_svc_url(project_scope, token)
await ensure_service_availability(rai_svc_url, token, Tasks.CONTENT_HARM)
# Submit annotation request and fetch result
operation_id = await submit_multimodal_request(messages, metric_name, rai_svc_url, token)
annotation_response = cast(List[Dict], await fetch_result(operation_id, rai_svc_url, credential, token))
result = parse_response(annotation_response, metric_name)
return result
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@

import nltk
from typing_extensions import NotRequired, Required, TypeGuard

from promptflow.core._errors import MissingRequiredPackage
from azure.ai.evaluation._constants import AZURE_OPENAI_TYPE, OPENAI_TYPE
from azure.ai.evaluation._exceptions import ErrorBlame, ErrorCategory, EvaluationException
from azure.ai.evaluation._exceptions import ErrorBlame, ErrorCategory, ErrorTarget, EvaluationException
from azure.ai.evaluation._model_configurations import (
AzureAIProject,
AzureOpenAIModelConfiguration,
Expand Down Expand Up @@ -312,3 +312,100 @@ def remove_optional_singletons(eval_class, singletons):
if param in singletons:
del required_singletons[param]
return required_singletons


def retrieve_content_type(assistant_messages: List, metric: str) -> str:
"""Get the content type for service payload.
:param assistant_messages: The list of messages to be annotated by evaluation service
:type assistant_messages: list
:param metric: A string representing the metric type
:type metric: str
:return: A text representing the content type. Example: 'text', or 'image'
:rtype: str
"""
# Check if metric is "protected_material"
if metric == "protected_material":
return "image"

# Iterate through each message
for item in assistant_messages:
# Ensure "content" exists in the message and is iterable
content = item.get("content", [])
for message in content:
if message.get("type", "") == "image_url":
return "image"
# Default return if no image was found
return "text"


def validate_conversation(conversation):
def raise_exception(msg, target):
raise EvaluationException(
message=msg,
internal_message=msg,
target=target,
category=ErrorCategory.INVALID_VALUE,
blame=ErrorBlame.USER_ERROR,
)

if not conversation or "messages" not in conversation:
raise_exception(
"Attribute 'messages' is missing in the request",
ErrorTarget.CONTENT_SAFETY_CHAT_EVALUATOR,
)
messages = conversation["messages"]
if not isinstance(messages, list):
raise_exception(
"'messages' parameter must be a JSON-compatible list of chat messages",
ErrorTarget.CONTENT_SAFETY_MULTIMODAL_EVALUATOR,
)
expected_roles = {"user", "assistant", "system"}
image_found = False
for num, message in enumerate(messages, 1):
if not isinstance(message, dict):
try:
from azure.ai.inference.models import (
ChatRequestMessage,
UserMessage,
AssistantMessage,
SystemMessage,
ImageContentItem,
)
except ImportError as ex:
raise MissingRequiredPackage(
message="Please install 'azure-ai-inference' package to use SystemMessage, AssistantMessage"
) from ex

if isinstance(messages[0], ChatRequestMessage) and not isinstance(
message, (UserMessage, AssistantMessage, SystemMessage)
):
raise_exception(
f"Messages must be a strongly typed class of ChatRequestMessage. Message number: {num}",
ErrorTarget.CONTENT_SAFETY_MULTIMODAL_EVALUATOR,
)

if isinstance(message.content, list) and any(
isinstance(item, ImageContentItem) for item in message.content
):
image_found = True
continue
if message.get("role") not in expected_roles:
raise_exception(
f"Invalid role provided: {message.get('role')}. Message number: {num}",
ErrorTarget.CONTENT_SAFETY_MULTIMODAL_EVALUATOR,
)
content = message.get("content")
if not isinstance(content, (str, list)):
raise_exception(
f"Content in each turn must be a string or array. Message number: {num}",
ErrorTarget.CONTENT_SAFETY_MULTIMODAL_EVALUATOR,
)
if isinstance(content, list):
if any(item.get("type") == "image_url" and "url" in item.get("image_url", {}) for item in content):
image_found = True
if not image_found:
raise_exception(
"Message needs to have multi-modal input like images.",
ErrorTarget.CONTENT_SAFETY_MULTIMODAL_EVALUATOR,
)
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@
import tempfile
from pathlib import Path
from typing import Any, Dict, NamedTuple, Optional, Tuple, Union
import uuid
import base64

import pandas as pd
from promptflow.client import PFClient
Expand Down Expand Up @@ -81,6 +83,33 @@ def _azure_pf_client_and_triad(trace_destination) -> Tuple[PFClient, AzureMLWork
return azure_pf_client, ws_triad


def _store_multimodal_content(messages, tmpdir: str):
# verify if images folder exists
images_folder_path = os.path.join(tmpdir, "images")
os.makedirs(images_folder_path, exist_ok=True)

# traverse all messages and replace base64 image data with new file name.
for message in messages:
for content in message.get("content", []):
if content.get("type") == "image_url":
image_url = content.get("image_url")
if image_url and "url" in image_url and image_url["url"].startswith("data:image/jpg;base64,"):
# Extract the base64 string
base64image = image_url["url"].replace("data:image/jpg;base64,", "")

# Generate a unique filename
image_file_name = f"{str(uuid.uuid4())}.jpg"
image_url["url"] = f"images/{image_file_name}" # Replace the base64 URL with the file path

# Decode the base64 string to binary image data
image_data_binary = base64.b64decode(base64image)

# Write the binary image data to the file
image_file_path = os.path.join(images_folder_path, image_file_name)
with open(image_file_path, "wb") as f:
f.write(image_data_binary)


def _log_metrics_and_instance_results(
metrics: Dict[str, Any],
instance_results: pd.DataFrame,
Expand Down Expand Up @@ -110,6 +139,15 @@ def _log_metrics_and_instance_results(
artifact_name = EvalRun.EVALUATION_ARTIFACT if run else EvalRun.EVALUATION_ARTIFACT_DUMMY_RUN

with tempfile.TemporaryDirectory() as tmpdir:
# storing multi_modal images if exists
col_name = "inputs.conversation"
if col_name in instance_results.columns:
for item in instance_results[col_name].items():
value = item[1]
if "messages" in value:
_store_multimodal_content(value["messages"], tmpdir)

# storing artifact result
tmp_path = os.path.join(tmpdir, artifact_name)

with open(tmp_path, "w", encoding=DefaultOpenEncoding.WRITE) as f:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -99,10 +99,10 @@ def __init__(
self._eval_last_turn = eval_last_turn
self._parallel = parallel
self._evaluators: List[Callable[..., Dict[str, Union[str, float]]]] = [
ViolenceEvaluator(azure_ai_project, credential),
SexualEvaluator(azure_ai_project, credential),
SelfHarmEvaluator(azure_ai_project, credential),
HateUnfairnessEvaluator(azure_ai_project, credential),
ViolenceEvaluator(credential, azure_ai_project),
SexualEvaluator(credential, azure_ai_project),
SelfHarmEvaluator(credential, azure_ai_project),
HateUnfairnessEvaluator(credential, azure_ai_project),
]

def __call__(self, *, conversation: list, **kwargs):
Expand Down
Loading

0 comments on commit 5b78782

Please sign in to comment.