Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[textanalytics] auto LD and script detection #26342

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .vscode/cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -455,7 +455,8 @@
"verfasst",
"engelska",
"fhir",
"FHIR"
"FHIR",
"naam"
]
},
{
Expand Down
6 changes: 6 additions & 0 deletions sdk/textanalytics/azure-ai-textanalytics/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,12 @@
`VolumeUnit`, and `WeightUnit`.
- Added the Abstractive Summarization feature and related models: `AbstractSummaryAction`, `AbstractSummaryResult`, `AbstractiveSummary`,
`SummaryContext`, `PhraseControl`, and `PhraseControlStrategy`. Access the feature through the `begin_analyze_actions` API.
- Added automatic language detection to long-running operation APIs. Pass `auto` into the document `language` hint to use this feature.
- Added `autodetect_default_language` to long-running operation APIs. Pass as the default/fallback language for automatic language detection.
- Added property `detected_language` to `RecognizeEntitiesResult`, `RecognizePiiEntitiesResult`, `AnalyzeHealthcareEntitiesResult`,
`ExtractKeyPhrasesResult`, `RecognizeLinkedEntitiesResult`, `AnalyzeSentimentResult`, `RecognizeCustomEntitiesResult`,
`ClassifyDocumentResult`, `ExtractSummaryResult`, and `AbstractSummaryResult` to indicate the language detected by automatic language detection.
- Added property `script` to `DetectedLanguage` to indicate the script of the input document.

### Breaking Changes

Expand Down

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,9 @@ def entities_result(
statistics=TextDocumentStatistics._from_generated( # pylint: disable=protected-access
entity.statistics
),
detected_language=DetectedLanguage._from_generated( # pylint: disable=protected-access
entity.detected_language
) if hasattr(entity, "detected_language") and entity.detected_language else None
)


Expand All @@ -194,6 +197,9 @@ def linked_entities_result(
statistics=TextDocumentStatistics._from_generated( # pylint: disable=protected-access
entity.statistics
),
detected_language=DetectedLanguage._from_generated( # pylint: disable=protected-access
entity.detected_language
) if hasattr(entity, "detected_language") and entity.detected_language else None
)


Expand All @@ -211,6 +217,9 @@ def key_phrases_result(
statistics=TextDocumentStatistics._from_generated( # pylint: disable=protected-access
phrases.statistics
),
detected_language=DetectedLanguage._from_generated( # pylint: disable=protected-access
phrases.detected_language
) if hasattr(phrases, "detected_language") and phrases.detected_language else None
)


Expand All @@ -237,6 +246,9 @@ def sentiment_result(
)
for s in sentiment.sentences
],
detected_language=DetectedLanguage._from_generated( # pylint: disable=protected-access
sentiment.detected_language
) if hasattr(sentiment, "detected_language") and sentiment.detected_language else None
)


Expand All @@ -260,6 +272,9 @@ def pii_entities_result(
statistics=TextDocumentStatistics._from_generated( # pylint: disable=protected-access
entity.statistics
),
detected_language=DetectedLanguage._from_generated( # pylint: disable=protected-access
entity.detected_language
) if hasattr(entity, "detected_language") and entity.detected_language else None
)


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -621,7 +621,7 @@ def _healthcare_result_callback(
@validate_multiapi_args(
version_method_added="v3.1",
args_mapping={
"2022-10-01-preview": ["fhir_version", "document_type"],
"2022-10-01-preview": ["fhir_version", "document_type", "autodetect_default_language"],
"2022-05-01": ["display_name"]
}
)
Expand Down Expand Up @@ -652,9 +652,12 @@ def begin_analyze_healthcare_entities(
:keyword bool show_stats: If set to true, response will contain document level statistics.
:keyword str language: The 2 letter ISO 639-1 representation of language for the
entire batch. For example, use "en" for English; "es" for Spanish etc.
If not set, uses "en" for English as default. Per-document language will
take precedence over whole batch language. See https://aka.ms/talangs for
supported languages in Language API.
For automatic language detection, use "auto" (Only supported by API version
2022-10-01-preview and newer). If not set, uses "en" for English as default.
Per-document language will take precedence over whole batch language.
See https://aka.ms/talangs for supported languages in Language API.
:keyword autodetect_default_language: Default/fallback language to use for documents requesting
automatic language detection.
:keyword str display_name: An optional display name to set for the requested analysis.
:keyword str string_index_type: Specifies the method used to interpret string offsets.
`UnicodeCodePoint`, the Python encoding, is the default. To override the Python default,
Expand Down Expand Up @@ -695,7 +698,7 @@ def begin_analyze_healthcare_entities(
.. versionadded:: 2022-05-01
The *display_name* keyword argument.
.. versionadded:: 2022-10-01-preview
The *fhir_version* and *document_type* keyword arguments.
The *fhir_version*, *document_type*, and *autodetect_default_language* keyword arguments.

.. admonition:: Example:

Expand All @@ -717,6 +720,7 @@ def begin_analyze_healthcare_entities(
display_name = kwargs.pop("display_name", None)
fhir_version = kwargs.pop("fhir_version", None)
document_type = kwargs.pop("document_type", None)
autodetect_default_language = kwargs.pop("autodetect_default_language", None)

if continuation_token:
return cast(
Expand Down Expand Up @@ -759,6 +763,7 @@ def begin_analyze_healthcare_entities(
body=models.AnalyzeTextJobsInput(
analysis_input=docs,
display_name=display_name,
default_language=autodetect_default_language,
tasks=[
models.HealthcareLROTask(
task_name="0",
Expand Down Expand Up @@ -1065,7 +1070,10 @@ def _analyze_result_callback(
@distributed_trace
@validate_multiapi_args(
version_method_added="v3.1",
custom_wrapper=check_for_unsupported_actions_types
custom_wrapper=check_for_unsupported_actions_types,
args_mapping={
"2022-10-01-preview": ["autodetect_default_language"],
}
)
def begin_analyze_actions(
self,
Expand Down Expand Up @@ -1133,9 +1141,12 @@ def begin_analyze_actions(
:keyword str display_name: An optional display name to set for the requested analysis.
:keyword str language: The 2 letter ISO 639-1 representation of language for the
entire batch. For example, use "en" for English; "es" for Spanish etc.
If not set, uses "en" for English as default. Per-document language will
take precedence over whole batch language. See https://aka.ms/talangs for
supported languages in Language API.
For automatic language detection, use "auto" (Only supported by API version
2022-10-01-preview and newer). If not set, uses "en" for English as default.
Per-document language will take precedence over whole batch language.
See https://aka.ms/talangs for supported languages in Language API.
:keyword autodetect_default_language: Default/fallback language to use for documents requesting
automatic language detection.
:keyword bool show_stats: If set to true, response will contain document level statistics.
:keyword int polling_interval: Waiting time between two polls for LRO operations
if no Retry-After header is present. Defaults to 5 seconds.
Expand Down Expand Up @@ -1172,6 +1183,7 @@ def begin_analyze_actions(
.. versionadded:: 2022-10-01-preview
The *ExtractSummaryAction* and *AbstractSummaryAction* input options and the corresponding
*ExtractSummaryResult* and *AbstractSummaryResult* result objects.
The *autodetect_default_language* keyword argument.

.. admonition:: Example:

Expand All @@ -1191,6 +1203,7 @@ def begin_analyze_actions(
polling_interval = kwargs.pop("polling_interval", 5)
language = language_arg if language_arg is not None else self._default_language
bespoke = kwargs.pop("bespoke", False)
autodetect_default_language = kwargs.pop("autodetect_default_language", None)

if continuation_token:
return cast(
Expand Down Expand Up @@ -1246,6 +1259,7 @@ def begin_analyze_actions(
body=models.AnalyzeTextJobsInput(
analysis_input=docs,
display_name=display_name,
default_language=autodetect_default_language,
tasks=generated_tasks
),
cls=response_cls,
Expand Down Expand Up @@ -1320,13 +1334,16 @@ def begin_analyze_actions(

@distributed_trace
@validate_multiapi_args(
version_method_added="2022-05-01"
version_method_added="2022-05-01",
args_mapping={
"2022-10-01-preview": ["autodetect_default_language"],
}
)
def begin_recognize_custom_entities(
self,
documents: Union[List[str], List[TextDocumentInput], List[Dict[str, str]]],
project_name,
deployment_name,
project_name: str,
deployment_name: str,
**kwargs: Any,
) -> TextAnalysisLROPoller[ItemPaged[Union[RecognizeCustomEntitiesResult, DocumentError]]]:
"""Start a long-running custom named entity recognition operation.
Expand All @@ -1345,9 +1362,12 @@ def begin_recognize_custom_entities(
:param str deployment_name: This field indicates the deployment name for the model.
:keyword str language: The 2 letter ISO 639-1 representation of language for the
entire batch. For example, use "en" for English; "es" for Spanish etc.
If not set, uses "en" for English as default. Per-document language will
take precedence over whole batch language. See https://aka.ms/talangs for
supported languages in Language API.
For automatic language detection, use "auto" (Only supported by API version
2022-10-01-preview and newer). If not set, uses "en" for English as default.
Per-document language will take precedence over whole batch language.
See https://aka.ms/talangs for supported languages in Language API.
:keyword autodetect_default_language: Default/fallback language to use for documents requesting
automatic language detection.
:keyword bool show_stats: If set to true, response will contain document level statistics.
:keyword bool disable_service_logs: If set to true, you opt-out of having your text input
logged on the service side for troubleshooting. By default, the Language service logs your
Expand Down Expand Up @@ -1379,6 +1399,8 @@ def begin_recognize_custom_entities(

.. versionadded:: 2022-05-01
The *begin_recognize_custom_entities* client method.
.. versionadded:: 2022-10-01-preview
The *autodetect_default_language* keyword argument.

.. admonition:: Example:

Expand Down Expand Up @@ -1438,13 +1460,16 @@ def begin_recognize_custom_entities(

@distributed_trace
@validate_multiapi_args(
version_method_added="2022-05-01"
version_method_added="2022-05-01",
args_mapping={
"2022-10-01-preview": ["autodetect_default_language"],
}
)
def begin_single_label_classify(
self,
documents: Union[List[str], List[TextDocumentInput], List[Dict[str, str]]],
project_name,
deployment_name,
project_name: str,
deployment_name: str,
**kwargs: Any,
) -> TextAnalysisLROPoller[ItemPaged[Union[ClassifyDocumentResult, DocumentError]]]:
"""Start a long-running custom single label classification operation.
Expand All @@ -1463,9 +1488,12 @@ def begin_single_label_classify(
:param str deployment_name: This field indicates the deployment name for the model.
:keyword str language: The 2 letter ISO 639-1 representation of language for the
entire batch. For example, use "en" for English; "es" for Spanish etc.
If not set, uses "en" for English as default. Per-document language will
take precedence over whole batch language. See https://aka.ms/talangs for
supported languages in Language API.
For automatic language detection, use "auto" (Only supported by API version
2022-10-01-preview and newer). If not set, uses "en" for English as default.
Per-document language will take precedence over whole batch language.
See https://aka.ms/talangs for supported languages in Language API.
:keyword autodetect_default_language: Default/fallback language to use for documents requesting
automatic language detection.
:keyword bool show_stats: If set to true, response will contain document level statistics.
:keyword bool disable_service_logs: If set to true, you opt-out of having your text input
logged on the service side for troubleshooting. By default, the Language service logs your
Expand Down Expand Up @@ -1493,6 +1521,8 @@ def begin_single_label_classify(

.. versionadded:: 2022-05-01
The *begin_single_label_classify* client method.
.. versionadded:: 2022-10-01-preview
The *autodetect_default_language* keyword argument.

.. admonition:: Example:

Expand Down Expand Up @@ -1550,13 +1580,16 @@ def begin_single_label_classify(

@distributed_trace
@validate_multiapi_args(
version_method_added="2022-05-01"
version_method_added="2022-05-01",
args_mapping={
"2022-10-01-preview": ["autodetect_default_language"],
}
)
def begin_multi_label_classify(
self,
documents: Union[List[str], List[TextDocumentInput], List[Dict[str, str]]],
project_name,
deployment_name,
project_name: str,
deployment_name: str,
**kwargs: Any,
) -> TextAnalysisLROPoller[ItemPaged[Union[ClassifyDocumentResult, DocumentError]]]:
"""Start a long-running custom multi label classification operation.
Expand All @@ -1575,9 +1608,12 @@ def begin_multi_label_classify(
:param str deployment_name: This field indicates the deployment name for the model.
:keyword str language: The 2 letter ISO 639-1 representation of language for the
entire batch. For example, use "en" for English; "es" for Spanish etc.
If not set, uses "en" for English as default. Per-document language will
take precedence over whole batch language. See https://aka.ms/talangs for
supported languages in Language API.
For automatic language detection, use "auto" (Only supported by API version
2022-10-01-preview and newer). If not set, uses "en" for English as default.
Per-document language will take precedence over whole batch language.
See https://aka.ms/talangs for supported languages in Language API.
:keyword autodetect_default_language: Default/fallback language to use for documents requesting
automatic language detection.
:keyword bool show_stats: If set to true, response will contain document level statistics.
:keyword bool disable_service_logs: If set to true, you opt-out of having your text input
logged on the service side for troubleshooting. By default, the Language service logs your
Expand Down Expand Up @@ -1605,6 +1641,8 @@ def begin_multi_label_classify(

.. versionadded:: 2022-05-01
The *begin_multi_label_classify* client method.
.. versionadded:: 2022-10-01-preview
The *autodetect_default_language* keyword argument.

.. admonition:: Example:

Expand Down
Loading