Skip to content

Commit

Permalink
[formrecognizer] adds AsyncLROPoller and continuation token support (#…
Browse files Browse the repository at this point in the history
…11650)

* regenerate code

* async poller and continuation token changes

* update tests

* update samples

* update shared requirements

* updates after cimultidict change in azure-core

* update readme/changelog

* mypy

* one more update on readme

* try with azure-core whitelisted context

* revert dev reqs

* update changelog with new dependency on azure-core 1.6.0

* forgot to apply changes

* update type hints

* fix for tests

* fix type hints, changelog; delete recording for test that doesn't exist
  • Loading branch information
kristapratico authored Jun 3, 2020
1 parent 3094344 commit a9de6f0
Show file tree
Hide file tree
Showing 47 changed files with 1,245 additions and 525 deletions.
36 changes: 25 additions & 11 deletions sdk/formrecognizer/azure-ai-formrecognizer/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,34 +4,48 @@

**Breaking Changes**

- `training_files` parameter of `begin_train_model` is renamed to `training_files_url`
- `use_labels` parameter of `begin_train_model` is renamed to `use_training_labels`
- All asynchronous long running operation methods now return an instance of an `AsyncLROPoller` from `azure-core`
- All asynchronous long running operation methods are renamed with the `begin_` prefix to indicate that an `AsyncLROPoller` is returned:
- `train_model` is renamed to `begin_training`
- `recognize_receipts` is renamed to `begin_recognize_receipts`
- `recognize_receipts_from_url` is renamed to `begin_recognize_receipts_from_url`
- `recognize_content` is renamed to `begin_recognize_content`
- `recognize_content_from_url` is renamed to `begin_recognize_content_from_url`
- `recognize_custom_forms` is renamed to `begin_recognize_custom_forms`
- `recognize_custom_forms_from_url` is renamed to `begin_recognize_custom_forms_from_url`
- Sync method `begin_train_model` renamed to `begin_training`
- `training_files` parameter of `begin_training` is renamed to `training_files_url`
- `use_labels` parameter of `begin_training` is renamed to `use_training_labels`
- `list_model_infos` method has been renamed to `list_custom_models`
- Removed `get_form_training_client` from `FormRecognizerClient`
- Added `get_form_recognizer_client` to `FormTrainingClient`
- A `HttpResponseError` is now raised if a model with `status=="invalid"` is returned from the `begin_train_model()` or `train_model()` methods
- A `HttpResponseError` is now raised if a model with `status=="invalid"` is returned from the `begin_training` methods
- `PageRange` is renamed to `FormPageRange`
- `first_page` and `last_page` renamed to `first_page_number` and `last_page_number`, respectively on `FormPageRange`
- `FormField` does not have a page_number.
- `begin_recognize_receipts` APIs now return `RecognizedReceipt` instead of `USReceipt`
- `USReceiptType` is renamed to `ReceiptType`
- `use_training_labels` is now a required positional param in the `begin_training` APIs.
- `stream` and `url` parameters found on methods for `FormRecognizerClient` have been renamed to `form` and `form_url`, respectively.
- For recognize receipt methods, parameters have been renamed to `receipt` and `receipt_url`.
- `FormField` does not have a page_number
- `use_training_labels` is now a required positional param in the `begin_training` APIs
- `stream` and `url` parameters found on methods for `FormRecognizerClient` have been renamed to `form` and `form_url`, respectively
- For `begin_recognize_receipt` methods, parameters have been renamed to `receipt` and `receipt_url`
- `created_on` and `last_modified` are renamed to `requested_on` and `completed_on` in the
`CustomFormModel` and `CustomFormModelInfo` models.
`CustomFormModel` and `CustomFormModelInfo` models
- `models` property of `CustomFormModel` is renamed to `submodels`
- `CustomFormSubModel` is renamed to `CustomFormSubmodel`
- `begin_recognize_receipts` APIs now return `RecognizedReceipt` instead of `USReceipt`
- Removed `USReceipt`. To see how to deal with the return value of `begin_recognize_receipts`, see the recognize receipt samples in the [samples directory](https://github.com/Azure/azure-sdk-for-python/blob/master/sdk/formrecognizer/azure-ai-formrecognizer/samples) for details.
- Removed `USReceiptItem`. To see how to access the individual items on a receipt, see the recognize receipt samples in the [samples directory](https://github.com/Azure/azure-sdk-for-python/blob/master/sdk/formrecognizer/azure-ai-formrecognizer/samples) for details.
- Removed `ReceiptType` and the `receipt_type` property from `RecognizedReceipt`. See the recognize receipt samples in the [samples directory](https://github.com/Azure/azure-sdk-for-python/blob/master/sdk/formrecognizer/azure-ai-formrecognizer/samples) for details.
- Removed `USReceiptType` and the `receipt_type` property from `RecognizedReceipt`. See the recognize receipt samples in the [samples directory](https://github.com/Azure/azure-sdk-for-python/blob/master/sdk/formrecognizer/azure-ai-formrecognizer/samples) for details.

**New features**

- Support to copy a custom model from one Form Recognizer resource to another
- Authentication using `azure-identity` credentials now supported
- see the [Azure Identity documentation](https://github.com/Azure/azure-sdk-for-python/blob/master/sdk/identity/azure-identity/README.md) for more information
- `page_number` attribute has been added to `FormTable`
- All long running operation methods now accept the keyword argument `continuation_token` to restart the poller from a saved state

**Dependency updates**

- Adopted [azure-core](https://pypi.org/project/azure-core/) version 1.6.0 or greater

## 1.0.0b2 (2020-05-06)

Expand Down
10 changes: 5 additions & 5 deletions sdk/formrecognizer/azure-ai-formrecognizer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,10 +140,10 @@ Long-running operations are operations which consist of an initial request sent
followed by polling the service at intervals to determine whether the operation has completed or failed, and if it has
succeeded, to get the result.

Methods that train models or recognize values from forms are modeled as long-running operations. The client exposes
a `begin_<method-name>` method that returns an `LROPoller`. Callers should wait for the operation to complete by
calling `result()` on the operation returned from the `begin_<method-name>` method. Sample code snippets are provided
to illustrate using long-running operations [below](#examples "Examples").
Methods that train models, recognize values from forms, or copy models are modeled as long-running operations.
The client exposes a `begin_<method-name>` method that returns an `LROPoller` or `AsyncLROPoller`. Callers should wait
for the operation to complete by calling `result()` on the operation returned from the `begin_<method-name>` method.
Sample code snippets are provided to illustrate using long-running operations [below](#examples "Examples").


## Examples
Expand Down Expand Up @@ -254,7 +254,7 @@ credential = AzureKeyCredential("<api_key>")
form_training_client = FormTrainingClient(endpoint, credential)

container_sas_url = "xxx" # training documents uploaded to blob storage
poller = form_training_client.begin_train_model(container_sas_url)
poller = form_training_client.begin_training(container_sas_url)
model = poller.result()

# Custom model information
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
Any,
IO,
Union,
List,
TYPE_CHECKING
)
from azure.core.tracing.decorator import distributed_trace
Expand All @@ -27,6 +28,7 @@
from ._polling import AnalyzePolling
if TYPE_CHECKING:
from azure.core.credentials import AzureKeyCredential, TokenCredential
from ._models import RecognizedReceipt, FormPage, RecognizedForm


class FormRecognizerClient(object):
Expand Down Expand Up @@ -66,7 +68,7 @@ def __init__(self, endpoint, credential, **kwargs):
authentication_policy = get_authentication_policy(credential)
self._client = FormRecognizer(
endpoint=endpoint,
credential=credential,
credential=credential, # type: ignore
sdk_moniker=USER_AGENT,
authentication_policy=authentication_policy,
**kwargs
Expand All @@ -78,7 +80,7 @@ def _receipt_callback(self, raw_response, _, headers): # pylint: disable=unused

@distributed_trace
def begin_recognize_receipts(self, receipt, **kwargs):
# type: (Union[bytes, IO[bytes]], Any) -> LROPoller
# type: (Union[bytes, IO[bytes]], Any) -> LROPoller[List[RecognizedReceipt]]
"""Extract field text and semantic values from a given US sales receipt.
The input document must be of one of the supported content types - 'application/pdf',
'image/jpeg', 'image/png' or 'image/tiff'.
Expand All @@ -93,6 +95,7 @@ def begin_recognize_receipts(self, receipt, **kwargs):
see :class:`~azure.ai.formrecognizer.FormContentType`.
:keyword int polling_interval: Waiting time between two polls for LRO operations
if no Retry-After header is present. Defaults to 5 seconds.
:keyword str continuation_token: A continuation token to restart a poller from a saved state.
:return: An instance of an LROPoller. Call `result()` on the poller
object to return a list[:class:`~azure.ai.formrecognizer.RecognizedReceipt`].
:rtype: ~azure.core.polling.LROPoller[list[~azure.ai.formrecognizer.RecognizedReceipt]]
Expand All @@ -109,6 +112,7 @@ def begin_recognize_receipts(self, receipt, **kwargs):
"""

polling_interval = kwargs.pop("polling_interval", POLLING_INTERVAL)
continuation_token = kwargs.pop("continuation_token", None)
content_type = kwargs.pop("content_type", None)
if content_type == "application/json":
raise TypeError("Call begin_recognize_receipts_from_url() to analyze a receipt from a url.")
Expand All @@ -125,12 +129,13 @@ def begin_recognize_receipts(self, receipt, **kwargs):
cls=kwargs.pop("cls", self._receipt_callback),
polling=LROBasePolling(timeout=polling_interval, **kwargs),
error_map=error_map,
continuation_token=continuation_token,
**kwargs
)

@distributed_trace
def begin_recognize_receipts_from_url(self, receipt_url, **kwargs):
# type: (str, Any) -> LROPoller
# type: (str, Any) -> LROPoller[List[RecognizedReceipt]]
"""Extract field text and semantic values from a given US sales receipt.
The input document must be the location (Url) of the receipt to be analyzed.
Expand All @@ -141,6 +146,7 @@ def begin_recognize_receipts_from_url(self, receipt_url, **kwargs):
Whether or not to include text elements such as lines and words in addition to form fields.
:keyword int polling_interval: Waiting time between two polls for LRO operations
if no Retry-After header is present. Defaults to 5 seconds.
:keyword str continuation_token: A continuation token to restart a poller from a saved state.
:return: An instance of an LROPoller. Call `result()` on the poller
object to return a list[:class:`~azure.ai.formrecognizer.RecognizedReceipt`].
:rtype: ~azure.core.polling.LROPoller[list[~azure.ai.formrecognizer.RecognizedReceipt]]
Expand All @@ -157,6 +163,7 @@ def begin_recognize_receipts_from_url(self, receipt_url, **kwargs):
"""

polling_interval = kwargs.pop("polling_interval", POLLING_INTERVAL)
continuation_token = kwargs.pop("continuation_token", None)
include_text_content = kwargs.pop("include_text_content", False)

return self._client.begin_analyze_receipt_async(
Expand All @@ -165,6 +172,7 @@ def begin_recognize_receipts_from_url(self, receipt_url, **kwargs):
cls=kwargs.pop("cls", self._receipt_callback),
polling=LROBasePolling(timeout=polling_interval, **kwargs),
error_map=error_map,
continuation_token=continuation_token,
**kwargs
)

Expand All @@ -174,7 +182,7 @@ def _content_callback(self, raw_response, _, headers): # pylint: disable=unused

@distributed_trace
def begin_recognize_content(self, form, **kwargs):
# type: (Union[bytes, IO[bytes]], Any) -> LROPoller
# type: (Union[bytes, IO[bytes]], Any) -> LROPoller[List[FormPage]]
"""Extract text and content/layout information from a given document.
The input document must be of one of the supported content types - 'application/pdf',
'image/jpeg', 'image/png' or 'image/tiff'.
Expand All @@ -186,6 +194,7 @@ def begin_recognize_content(self, form, **kwargs):
see :class:`~azure.ai.formrecognizer.FormContentType`.
:keyword int polling_interval: Waiting time between two polls for LRO operations
if no Retry-After header is present. Defaults to 5 seconds.
:keyword str continuation_token: A continuation token to restart a poller from a saved state.
:return: An instance of an LROPoller. Call `result()` on the poller
object to return a list[:class:`~azure.ai.formrecognizer.FormPage`].
:rtype: ~azure.core.polling.LROPoller[list[~azure.ai.formrecognizer.FormPage]]
Expand All @@ -202,6 +211,7 @@ def begin_recognize_content(self, form, **kwargs):
"""

polling_interval = kwargs.pop("polling_interval", POLLING_INTERVAL)
continuation_token = kwargs.pop("continuation_token", None)
content_type = kwargs.pop("content_type", None)
if content_type == "application/json":
raise TypeError("Call begin_recognize_content_from_url() to analyze a document from a url.")
Expand All @@ -215,38 +225,42 @@ def begin_recognize_content(self, form, **kwargs):
cls=kwargs.pop("cls", self._content_callback),
polling=LROBasePolling(timeout=polling_interval, **kwargs),
error_map=error_map,
continuation_token=continuation_token,
**kwargs
)

@distributed_trace
def begin_recognize_content_from_url(self, form_url, **kwargs):
# type: (str, Any) -> LROPoller
# type: (str, Any) -> LROPoller[List[FormPage]]
"""Extract text and layout information from a given document.
The input document must be the location (Url) of the document to be analyzed.
:param str form_url: The url of the form to analyze. The input must be a valid, encoded url
of one of the supported formats: JPEG, PNG, PDF and TIFF.
:keyword int polling_interval: Waiting time between two polls for LRO operations
if no Retry-After header is present. Defaults to 5 seconds.
:keyword str continuation_token: A continuation token to restart a poller from a saved state.
:return: An instance of an LROPoller. Call `result()` on the poller
object to return a list[:class:`~azure.ai.formrecognizer.FormPage`].
:rtype: ~azure.core.polling.LROPoller[list[~azure.ai.formrecognizer.FormPage]]
:raises ~azure.core.exceptions.HttpResponseError:
"""

polling_interval = kwargs.pop("polling_interval", POLLING_INTERVAL)
continuation_token = kwargs.pop("continuation_token", None)

return self._client.begin_analyze_layout_async(
file_stream={"source": form_url},
cls=kwargs.pop("cls", self._content_callback),
polling=LROBasePolling(timeout=polling_interval, **kwargs),
error_map=error_map,
continuation_token=continuation_token,
**kwargs
)

@distributed_trace
def begin_recognize_custom_forms(self, model_id, form, **kwargs):
# type: (str, Union[bytes, IO[bytes]], Any) -> LROPoller
# type: (str, Union[bytes, IO[bytes]], Any) -> LROPoller[List[RecognizedForm]]
"""Analyze a custom form with a model trained with or without labels. The form
to analyze should be of the same type as the forms that were used to train the model.
The input document must be of one of the supported content types - 'application/pdf',
Expand All @@ -262,6 +276,7 @@ def begin_recognize_custom_forms(self, model_id, form, **kwargs):
see :class:`~azure.ai.formrecognizer.FormContentType`.
:keyword int polling_interval: Waiting time between two polls for LRO operations
if no Retry-After header is present. Defaults to 5 seconds.
:keyword str continuation_token: A continuation token to restart a poller from a saved state.
:return: An instance of an LROPoller. Call `result()` on the poller
object to return a list[:class:`~azure.ai.formrecognizer.RecognizedForm`].
:rtype: ~azure.core.polling.LROPoller[list[~azure.ai.formrecognizer.RecognizedForm]
Expand All @@ -282,6 +297,7 @@ def begin_recognize_custom_forms(self, model_id, form, **kwargs):

cls = kwargs.pop("cls", None)
polling_interval = kwargs.pop("polling_interval", POLLING_INTERVAL)
continuation_token = kwargs.pop("continuation_token", None)
content_type = kwargs.pop("content_type", None)
if content_type == "application/json":
raise TypeError("Call begin_recognize_custom_forms_from_url() to analyze a document from a url.")
Expand All @@ -303,12 +319,13 @@ def analyze_callback(raw_response, _, headers): # pylint: disable=unused-argume
cls=deserialization_callback,
polling=LROBasePolling(timeout=polling_interval, lro_algorithms=[AnalyzePolling()], **kwargs),
error_map=error_map,
continuation_token=continuation_token,
**kwargs
)

@distributed_trace
def begin_recognize_custom_forms_from_url(self, model_id, form_url, **kwargs):
# type: (str, str, Any) -> LROPoller
# type: (str, str, Any) -> LROPoller[List[RecognizedForm]]
"""Analyze a custom form with a model trained with or without labels. The form
to analyze should be of the same type as the forms that were used to train the model.
The input document must be the location (Url) of the document to be analyzed.
Expand All @@ -320,6 +337,7 @@ def begin_recognize_custom_forms_from_url(self, model_id, form_url, **kwargs):
Whether or not to include text elements such as lines and words in addition to form fields.
:keyword int polling_interval: Waiting time between two polls for LRO operations
if no Retry-After header is present. Defaults to 5 seconds.
:keyword str continuation_token: A continuation token to restart a poller from a saved state.
:return: An instance of an LROPoller. Call `result()` on the poller
object to return a list[:class:`~azure.ai.formrecognizer.RecognizedForm`].
:rtype: ~azure.core.polling.LROPoller[list[~azure.ai.formrecognizer.RecognizedForm]
Expand All @@ -331,6 +349,7 @@ def begin_recognize_custom_forms_from_url(self, model_id, form_url, **kwargs):

cls = kwargs.pop("cls", None)
polling_interval = kwargs.pop("polling_interval", POLLING_INTERVAL)
continuation_token = kwargs.pop("continuation_token", None)
include_text_content = kwargs.pop("include_text_content", False)

def analyze_callback(raw_response, _, headers): # pylint: disable=unused-argument
Expand All @@ -345,6 +364,7 @@ def analyze_callback(raw_response, _, headers): # pylint: disable=unused-argume
cls=deserialization_callback,
polling=LROBasePolling(timeout=polling_interval, lro_algorithms=[AnalyzePolling()], **kwargs),
error_map=error_map,
continuation_token=continuation_token,
**kwargs
)

Expand Down
Loading

0 comments on commit a9de6f0

Please sign in to comment.