Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[formrecognizer] adds AsyncLROPoller and continuation token support #11650

Merged
merged 22 commits into from
Jun 3, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 25 additions & 11 deletions sdk/formrecognizer/azure-ai-formrecognizer/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,34 +4,48 @@

**Breaking Changes**

- `training_files` parameter of `begin_train_model` is renamed to `training_files_url`
- `use_labels` parameter of `begin_train_model` is renamed to `use_training_labels`
- All asynchronous long running operation methods now return an instance of an `AsyncLROPoller` from `azure-core`
- All asynchronous long running operation methods are renamed with the `begin_` prefix to indicate that an `AsyncLROPoller` is returned:
- `train_model` is renamed to `begin_training`
- `recognize_receipts` is renamed to `begin_recognize_receipts`
- `recognize_receipts_from_url` is renamed to `begin_recognize_receipts_from_url`
- `recognize_content` is renamed to `begin_recognize_content`
- `recognize_content_from_url` is renamed to `begin_recognize_content_from_url`
- `recognize_custom_forms` is renamed to `begin_recognize_custom_forms`
- `recognize_custom_forms_from_url` is renamed to `begin_recognize_custom_forms_from_url`
- Sync method `begin_train_model` renamed to `begin_training`
- `training_files` parameter of `begin_training` is renamed to `training_files_url`
- `use_labels` parameter of `begin_training` is renamed to `use_training_labels`
- `list_model_infos` method has been renamed to `list_custom_models`
- Removed `get_form_training_client` from `FormRecognizerClient`
- Added `get_form_recognizer_client` to `FormTrainingClient`
- A `HttpResponseError` is now raised if a model with `status=="invalid"` is returned from the `begin_train_model()` or `train_model()` methods
- A `HttpResponseError` is now raised if a model with `status=="invalid"` is returned from the `begin_training` methods
- `PageRange` is renamed to `FormPageRange`
- `first_page` and `last_page` renamed to `first_page_number` and `last_page_number`, respectively on `FormPageRange`
- `FormField` does not have a page_number.
- `begin_recognize_receipts` APIs now return `RecognizedReceipt` instead of `USReceipt`
- `USReceiptType` is renamed to `ReceiptType`
- `use_training_labels` is now a required positional param in the `begin_training` APIs.
- `stream` and `url` parameters found on methods for `FormRecognizerClient` have been renamed to `form` and `form_url`, respectively.
- For recognize receipt methods, parameters have been renamed to `receipt` and `receipt_url`.
- `FormField` does not have a page_number
- `use_training_labels` is now a required positional param in the `begin_training` APIs
- `stream` and `url` parameters found on methods for `FormRecognizerClient` have been renamed to `form` and `form_url`, respectively
- For `begin_recognize_receipt` methods, parameters have been renamed to `receipt` and `receipt_url`
- `created_on` and `last_modified` are renamed to `requested_on` and `completed_on` in the
`CustomFormModel` and `CustomFormModelInfo` models.
`CustomFormModel` and `CustomFormModelInfo` models
- `models` property of `CustomFormModel` is renamed to `submodels`
- `CustomFormSubModel` is renamed to `CustomFormSubmodel`
- `begin_recognize_receipts` APIs now return `RecognizedReceipt` instead of `USReceipt`
- Removed `USReceipt`. To see how to deal with the return value of `begin_recognize_receipts`, see the recognize receipt samples in the [samples directory](https://github.com/Azure/azure-sdk-for-python/blob/master/sdk/formrecognizer/azure-ai-formrecognizer/samples) for details.
- Removed `USReceiptItem`. To see how to access the individual items on a receipt, see the recognize receipt samples in the [samples directory](https://github.com/Azure/azure-sdk-for-python/blob/master/sdk/formrecognizer/azure-ai-formrecognizer/samples) for details.
- Removed `ReceiptType` and the `receipt_type` property from `RecognizedReceipt`. See the recognize receipt samples in the [samples directory](https://github.com/Azure/azure-sdk-for-python/blob/master/sdk/formrecognizer/azure-ai-formrecognizer/samples) for details.
- Removed `USReceiptType` and the `receipt_type` property from `RecognizedReceipt`. See the recognize receipt samples in the [samples directory](https://github.com/Azure/azure-sdk-for-python/blob/master/sdk/formrecognizer/azure-ai-formrecognizer/samples) for details.

**New features**

- Support to copy a custom model from one Form Recognizer resource to another
- Authentication using `azure-identity` credentials now supported
- see the [Azure Identity documentation](https://github.com/Azure/azure-sdk-for-python/blob/master/sdk/identity/azure-identity/README.md) for more information
- `page_number` attribute has been added to `FormTable`
- All long running operation methods now accept the keyword argument `continuation_token` to restart the poller from a saved state

**Dependency updates**

- Adopted [azure-core](https://pypi.org/project/azure-core/) version 1.6.0 or greater
kristapratico marked this conversation as resolved.
Show resolved Hide resolved

## 1.0.0b2 (2020-05-06)

Expand Down
10 changes: 5 additions & 5 deletions sdk/formrecognizer/azure-ai-formrecognizer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,10 +140,10 @@ Long-running operations are operations which consist of an initial request sent
followed by polling the service at intervals to determine whether the operation has completed or failed, and if it has
succeeded, to get the result.

Methods that train models or recognize values from forms are modeled as long-running operations. The client exposes
a `begin_<method-name>` method that returns an `LROPoller`. Callers should wait for the operation to complete by
calling `result()` on the operation returned from the `begin_<method-name>` method. Sample code snippets are provided
to illustrate using long-running operations [below](#examples "Examples").
Methods that train models, recognize values from forms, or copy models are modeled as long-running operations.
The client exposes a `begin_<method-name>` method that returns an `LROPoller` or `AsyncLROPoller`. Callers should wait
for the operation to complete by calling `result()` on the operation returned from the `begin_<method-name>` method.
Sample code snippets are provided to illustrate using long-running operations [below](#examples "Examples").


## Examples
Expand Down Expand Up @@ -254,7 +254,7 @@ credential = AzureKeyCredential("<api_key>")
form_training_client = FormTrainingClient(endpoint, credential)

container_sas_url = "xxx" # training documents uploaded to blob storage
poller = form_training_client.begin_train_model(container_sas_url)
poller = form_training_client.begin_training(container_sas_url)
model = poller.result()

# Custom model information
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
Any,
IO,
Union,
List,
TYPE_CHECKING
)
from azure.core.tracing.decorator import distributed_trace
Expand All @@ -27,6 +28,7 @@
from ._polling import AnalyzePolling
if TYPE_CHECKING:
from azure.core.credentials import AzureKeyCredential, TokenCredential
from ._models import RecognizedReceipt, FormPage, RecognizedForm


class FormRecognizerClient(object):
Expand Down Expand Up @@ -66,7 +68,7 @@ def __init__(self, endpoint, credential, **kwargs):
authentication_policy = get_authentication_policy(credential)
self._client = FormRecognizer(
endpoint=endpoint,
credential=credential,
credential=credential, # type: ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the typing issue here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error: Argument "credential" to "FormRecognizerClient" has incompatible type "Union[AzureKeyCredential, TokenCredential]"; expected "TokenCredential"

sdk_moniker=USER_AGENT,
authentication_policy=authentication_policy,
**kwargs
Expand All @@ -78,7 +80,7 @@ def _receipt_callback(self, raw_response, _, headers): # pylint: disable=unused

@distributed_trace
def begin_recognize_receipts(self, receipt, **kwargs):
# type: (Union[bytes, IO[bytes]], Any) -> LROPoller
# type: (Union[bytes, IO[bytes]], Any) -> LROPoller[List[RecognizedReceipt]]
"""Extract field text and semantic values from a given US sales receipt.
The input document must be of one of the supported content types - 'application/pdf',
'image/jpeg', 'image/png' or 'image/tiff'.
Expand All @@ -93,6 +95,7 @@ def begin_recognize_receipts(self, receipt, **kwargs):
see :class:`~azure.ai.formrecognizer.FormContentType`.
:keyword int polling_interval: Waiting time between two polls for LRO operations
if no Retry-After header is present. Defaults to 5 seconds.
:keyword str continuation_token: A continuation token to restart a poller from a saved state.
:return: An instance of an LROPoller. Call `result()` on the poller
object to return a list[:class:`~azure.ai.formrecognizer.RecognizedReceipt`].
:rtype: ~azure.core.polling.LROPoller[list[~azure.ai.formrecognizer.RecognizedReceipt]]
Expand All @@ -109,6 +112,7 @@ def begin_recognize_receipts(self, receipt, **kwargs):
"""

polling_interval = kwargs.pop("polling_interval", POLLING_INTERVAL)
continuation_token = kwargs.pop("continuation_token", None)
content_type = kwargs.pop("content_type", None)
if content_type == "application/json":
raise TypeError("Call begin_recognize_receipts_from_url() to analyze a receipt from a url.")
Expand All @@ -125,12 +129,13 @@ def begin_recognize_receipts(self, receipt, **kwargs):
cls=kwargs.pop("cls", self._receipt_callback),
polling=LROBasePolling(timeout=polling_interval, **kwargs),
error_map=error_map,
continuation_token=continuation_token,
**kwargs
)

@distributed_trace
def begin_recognize_receipts_from_url(self, receipt_url, **kwargs):
# type: (str, Any) -> LROPoller
# type: (str, Any) -> LROPoller[List[RecognizedReceipt]]
"""Extract field text and semantic values from a given US sales receipt.
The input document must be the location (Url) of the receipt to be analyzed.

Expand All @@ -141,6 +146,7 @@ def begin_recognize_receipts_from_url(self, receipt_url, **kwargs):
Whether or not to include text elements such as lines and words in addition to form fields.
:keyword int polling_interval: Waiting time between two polls for LRO operations
if no Retry-After header is present. Defaults to 5 seconds.
:keyword str continuation_token: A continuation token to restart a poller from a saved state.
:return: An instance of an LROPoller. Call `result()` on the poller
object to return a list[:class:`~azure.ai.formrecognizer.RecognizedReceipt`].
:rtype: ~azure.core.polling.LROPoller[list[~azure.ai.formrecognizer.RecognizedReceipt]]
Expand All @@ -157,6 +163,7 @@ def begin_recognize_receipts_from_url(self, receipt_url, **kwargs):
"""

polling_interval = kwargs.pop("polling_interval", POLLING_INTERVAL)
continuation_token = kwargs.pop("continuation_token", None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're just popping continuation_token and putting it back in, you should just not pop it at all, as the kwargs passed into the function call will have it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to do that but in many of the methods I'm controlling the lro_algorithm so I need to pass in, e.g., LROBasePolling(**kwargs) myself into the generated call. I need to pop it here so it doesn't get passed into the polling method.

include_text_content = kwargs.pop("include_text_content", False)

return self._client.begin_analyze_receipt_async(
Expand All @@ -165,6 +172,7 @@ def begin_recognize_receipts_from_url(self, receipt_url, **kwargs):
cls=kwargs.pop("cls", self._receipt_callback),
polling=LROBasePolling(timeout=polling_interval, **kwargs),
error_map=error_map,
continuation_token=continuation_token,
**kwargs
)

Expand All @@ -174,7 +182,7 @@ def _content_callback(self, raw_response, _, headers): # pylint: disable=unused

@distributed_trace
def begin_recognize_content(self, form, **kwargs):
# type: (Union[bytes, IO[bytes]], Any) -> LROPoller
# type: (Union[bytes, IO[bytes]], Any) -> LROPoller[List[FormPage]]
"""Extract text and content/layout information from a given document.
The input document must be of one of the supported content types - 'application/pdf',
'image/jpeg', 'image/png' or 'image/tiff'.
Expand All @@ -186,6 +194,7 @@ def begin_recognize_content(self, form, **kwargs):
see :class:`~azure.ai.formrecognizer.FormContentType`.
:keyword int polling_interval: Waiting time between two polls for LRO operations
if no Retry-After header is present. Defaults to 5 seconds.
:keyword str continuation_token: A continuation token to restart a poller from a saved state.
:return: An instance of an LROPoller. Call `result()` on the poller
object to return a list[:class:`~azure.ai.formrecognizer.FormPage`].
:rtype: ~azure.core.polling.LROPoller[list[~azure.ai.formrecognizer.FormPage]]
Expand All @@ -202,6 +211,7 @@ def begin_recognize_content(self, form, **kwargs):
"""

polling_interval = kwargs.pop("polling_interval", POLLING_INTERVAL)
continuation_token = kwargs.pop("continuation_token", None)
content_type = kwargs.pop("content_type", None)
if content_type == "application/json":
raise TypeError("Call begin_recognize_content_from_url() to analyze a document from a url.")
Expand All @@ -215,38 +225,42 @@ def begin_recognize_content(self, form, **kwargs):
cls=kwargs.pop("cls", self._content_callback),
polling=LROBasePolling(timeout=polling_interval, **kwargs),
error_map=error_map,
continuation_token=continuation_token,
**kwargs
)

@distributed_trace
def begin_recognize_content_from_url(self, form_url, **kwargs):
# type: (str, Any) -> LROPoller
# type: (str, Any) -> LROPoller[List[FormPage]]
"""Extract text and layout information from a given document.
The input document must be the location (Url) of the document to be analyzed.

:param str form_url: The url of the form to analyze. The input must be a valid, encoded url
of one of the supported formats: JPEG, PNG, PDF and TIFF.
:keyword int polling_interval: Waiting time between two polls for LRO operations
if no Retry-After header is present. Defaults to 5 seconds.
:keyword str continuation_token: A continuation token to restart a poller from a saved state.
:return: An instance of an LROPoller. Call `result()` on the poller
object to return a list[:class:`~azure.ai.formrecognizer.FormPage`].
:rtype: ~azure.core.polling.LROPoller[list[~azure.ai.formrecognizer.FormPage]]
:raises ~azure.core.exceptions.HttpResponseError:
"""

polling_interval = kwargs.pop("polling_interval", POLLING_INTERVAL)
continuation_token = kwargs.pop("continuation_token", None)

return self._client.begin_analyze_layout_async(
file_stream={"source": form_url},
cls=kwargs.pop("cls", self._content_callback),
polling=LROBasePolling(timeout=polling_interval, **kwargs),
error_map=error_map,
continuation_token=continuation_token,
**kwargs
)

@distributed_trace
def begin_recognize_custom_forms(self, model_id, form, **kwargs):
# type: (str, Union[bytes, IO[bytes]], Any) -> LROPoller
# type: (str, Union[bytes, IO[bytes]], Any) -> LROPoller[List[RecognizedForm]]
"""Analyze a custom form with a model trained with or without labels. The form
to analyze should be of the same type as the forms that were used to train the model.
The input document must be of one of the supported content types - 'application/pdf',
Expand All @@ -262,6 +276,7 @@ def begin_recognize_custom_forms(self, model_id, form, **kwargs):
see :class:`~azure.ai.formrecognizer.FormContentType`.
:keyword int polling_interval: Waiting time between two polls for LRO operations
if no Retry-After header is present. Defaults to 5 seconds.
:keyword str continuation_token: A continuation token to restart a poller from a saved state.
:return: An instance of an LROPoller. Call `result()` on the poller
object to return a list[:class:`~azure.ai.formrecognizer.RecognizedForm`].
:rtype: ~azure.core.polling.LROPoller[list[~azure.ai.formrecognizer.RecognizedForm]
Expand All @@ -282,6 +297,7 @@ def begin_recognize_custom_forms(self, model_id, form, **kwargs):

cls = kwargs.pop("cls", None)
polling_interval = kwargs.pop("polling_interval", POLLING_INTERVAL)
continuation_token = kwargs.pop("continuation_token", None)
content_type = kwargs.pop("content_type", None)
if content_type == "application/json":
raise TypeError("Call begin_recognize_custom_forms_from_url() to analyze a document from a url.")
Expand All @@ -303,12 +319,13 @@ def analyze_callback(raw_response, _, headers): # pylint: disable=unused-argume
cls=deserialization_callback,
polling=LROBasePolling(timeout=polling_interval, lro_algorithms=[AnalyzePolling()], **kwargs),
error_map=error_map,
continuation_token=continuation_token,
**kwargs
)

@distributed_trace
def begin_recognize_custom_forms_from_url(self, model_id, form_url, **kwargs):
# type: (str, str, Any) -> LROPoller
# type: (str, str, Any) -> LROPoller[List[RecognizedForm]]
"""Analyze a custom form with a model trained with or without labels. The form
to analyze should be of the same type as the forms that were used to train the model.
The input document must be the location (Url) of the document to be analyzed.
Expand All @@ -320,6 +337,7 @@ def begin_recognize_custom_forms_from_url(self, model_id, form_url, **kwargs):
Whether or not to include text elements such as lines and words in addition to form fields.
:keyword int polling_interval: Waiting time between two polls for LRO operations
if no Retry-After header is present. Defaults to 5 seconds.
:keyword str continuation_token: A continuation token to restart a poller from a saved state.
:return: An instance of an LROPoller. Call `result()` on the poller
object to return a list[:class:`~azure.ai.formrecognizer.RecognizedForm`].
:rtype: ~azure.core.polling.LROPoller[list[~azure.ai.formrecognizer.RecognizedForm]
Expand All @@ -331,6 +349,7 @@ def begin_recognize_custom_forms_from_url(self, model_id, form_url, **kwargs):

cls = kwargs.pop("cls", None)
polling_interval = kwargs.pop("polling_interval", POLLING_INTERVAL)
continuation_token = kwargs.pop("continuation_token", None)
include_text_content = kwargs.pop("include_text_content", False)

def analyze_callback(raw_response, _, headers): # pylint: disable=unused-argument
Expand All @@ -345,6 +364,7 @@ def analyze_callback(raw_response, _, headers): # pylint: disable=unused-argume
cls=deserialization_callback,
polling=LROBasePolling(timeout=polling_interval, lro_algorithms=[AnalyzePolling()], **kwargs),
error_map=error_map,
continuation_token=continuation_token,
**kwargs
)

Expand Down
Loading