Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[formrecognizer] v2.1 #15448

Merged
merged 18 commits into from
Nov 23, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
c4713e3
[formrecognizer] 2.1-preview.2 gen and impl (#14929)
kristapratico Nov 3, 2020
c57116a
[formrecognizer] add copy tests for new features in v2.1 (#14987)
kristapratico Nov 4, 2020
23acc78
[formrecognizer] adds language param (#14984)
kristapratico Nov 5, 2020
5bf3add
[form recognizer] add tests for invoice multipage (#15012)
iscai-msft Nov 5, 2020
e4b36e2
[formrecognizer] fix sphinx errors and unify transform testcase for F…
kristapratico Nov 6, 2020
79964f2
[formrecognizer] allow None for required params when passing a contin…
kristapratico Nov 6, 2020
3e4959a
receipt test - total value fixed (#15159)
kristapratico Nov 9, 2020
78ab27c
add python 3.9 to setup classifiers (#15208)
kristapratico Nov 11, 2020
a7e0adf
[formrecognizer] adds tests for forms in other languages (#15173)
kristapratico Nov 11, 2020
04d33d2
[formrecognizer] unskip receipt/business card tests and re-record wit…
kristapratico Nov 11, 2020
6be8888
[formrecognizer] Remove business card ContactNames page_number workar…
kristapratico Nov 12, 2020
ec6443e
uncomment receipt test assertions - regression fixed (#15345)
kristapratico Nov 16, 2020
fc58de3
[formrecognizer] doc updates (#15346)
kristapratico Nov 17, 2020
fd8869e
[formrecognizer] try using existing resource in live tests (#15483)
kristapratico Nov 20, 2020
bd9d657
fix env var name (#15486)
kristapratico Nov 20, 2020
aed68f3
[formrecognizer] updates to test in prod (#15491)
kristapratico Nov 20, 2020
176de7d
update timeout for live runs for FR (#15492)
kristapratico Nov 21, 2020
d97833e
remove sample tests - will be run in smoke tests (#15494)
kristapratico Nov 22, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
11 changes: 10 additions & 1 deletion sdk/formrecognizer/azure-ai-formrecognizer/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,9 @@ This version of the SDK defaults to the latest supported API version, which curr
**New features**

- New methods `begin_recognize_business_cards` and `begin_recognize_business_cards_from_url` introduced to the SDK. Use these
methods to recognize data from business cards.
methods to recognize data from business cards
- New methods `begin_recognize_invoices` and `begin_recognize_invoices_from_url` introduced to the SDK. Use these
methods to recognize data from invoices
- Recognize receipt methods now take keyword argument `locale` to optionally indicate the locale of the receipt for
improved results
- Added ability to create a composed model from the `FormTrainingClient` by calling method `begin_create_composed_model()`
Expand All @@ -21,6 +23,13 @@ also be populated with any selection marks found on the page
- Added model type `CustomFormModelProperties` that includes information like if a model is a composed model
- Added property `model_id` to `CustomFormSubmodel` and `TrainingDocumentInfo`
- Added properties `model_id` and `form_type_confidence` to `RecognizedForm`
- `appearance` property added to `FormLine` to indicate the style of extracted text - like "handwriting" or "other"
- Added keyword argument `pages` to `begin_recognize_content` and `begin_recognize_content_from_url` to specify the page
numbers to analyze
- Added property `bounding_box` to `FormTable`
- Content-type `image/bmp` now supported by recognize content and prebuilt models
- Added keyword argument `language` to `begin_recognize_content` and `begin_recognize_content_from_url` to specify
which language to process document in

**Dependency updates**

Expand Down
49 changes: 42 additions & 7 deletions sdk/formrecognizer/azure-ai-formrecognizer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,10 @@ from form documents. It includes the following main functionalities:

* Custom models - Recognize field values and table data from forms. These models are trained with your own data, so they're tailored to your forms.
* Content API - Recognize text, table structures, and selection marks, along with their bounding box coordinates, from documents. Corresponds to the REST service's Layout API.
* Prebuilt receipt model - Recognize data from sales receipts using a prebuilt model.
* Prebuilt business card model - Recognize data from business cards using a prebuilt model.
* Prebuilt models - Recognize data using the following prebuilt models
* Receipt model - Recognize data from sales receipts using a prebuilt model.
* Business card model - Recognize data from business cards using a prebuilt model.
* Invoice model - Recognize data from invoices using a prebuilt model.

[Source code][python-fr-src] | [Package (PyPI)][python-fr-pypi] | [API reference documentation][python-fr-ref-docs]| [Product documentation][python-fr-product-docs] | [Samples][python-fr-samples]

Expand Down Expand Up @@ -132,8 +134,10 @@ form_recognizer_client = FormRecognizerClient(
`FormRecognizerClient` provides operations for:

- Recognizing form fields and content using custom models trained to recognize your custom forms. These values are returned in a collection of `RecognizedForm` objects.
- Recognizing common fields from sales receipts, using a pre-trained receipt model. These fields and metadata are returned in a collection of `RecognizedForm` objects.
- Recognizing common fields from business cards, using a pre-trained business card model. These fields and metadata are returned in a collection of `RecognizedForm` objects.
- Recognizing common fields from the following form types using prebuilt models. These fields and metadata are returned in a collection of `RecognizedForm` objects.
- Sales receipts. See fields found on a receipt [here][service_recognize_receipt].
- Business cards. See fields found on a business card [here][service_recognize_business_cards].
- Invoices. See fields found on an invoice [here][service_recognize_invoice].
- Recognizing form content, including tables, lines, words, and selection marks, without the need to train a model. Form content is returned in a collection of `FormPage` objects.

Sample code snippets are provided to illustrate using a FormRecognizerClient [here](#recognize-forms-using-a-custom-model "Recognize Forms Using a Custom Model").
Expand All @@ -156,7 +160,7 @@ Long-running operations are operations which consist of an initial request sent
followed by polling the service at intervals to determine whether the operation has completed or failed, and if it has
succeeded, to get the result.

Methods that train models, recognize values from forms, or copy models are modeled as long-running operations.
Methods that train models, recognize values from forms, or copy/compose models are modeled as long-running operations.
The client exposes a `begin_<method-name>` method that returns an `LROPoller` or `AsyncLROPoller`. Callers should wait
for the operation to complete by calling `result()` on the poller object returned from the `begin_<method-name>` method.
Sample code snippets are provided to illustrate using long-running operations [below](#examples "Examples").
Expand All @@ -170,6 +174,7 @@ The following section provides several code snippets covering some of the most c
* [Recognize Content](#recognize-content "Recognize Content")
* [Recognize Receipts](#recognize-receipts "Recognize receipts")
* [Recognize Business Cards](#recognize-business-cards "Recognize business cards")
* [Recognize Invoices](#recognize-invoices "Recognize invoices")
* [Train a Model](#train-a-model "Train a model")
* [Manage Your Models](#manage-your-models "Manage Your Models")

Expand Down Expand Up @@ -216,7 +221,7 @@ result = poller.result()
```

### Recognize Content
Recognize text and table structures, along with their bounding box coordinates, from documents.
Recognize text, selection marks, and table structures, along with their bounding box coordinates, from documents.

```python
from azure.ai.formrecognizer import FormRecognizerClient
Expand All @@ -235,6 +240,7 @@ page = poller.result()

table = page[0].tables[0] # page 1, table 1
print("Table found on page {}:".format(table.page_number))
print("Table location {}:".format(table.bounding_box))
for cell in table.cells:
print("Cell text: {}".format(cell.text))
print("Location: {}".format(cell.bounding_box))
Expand Down Expand Up @@ -309,6 +315,30 @@ for business_card in result:
print("{}: {} has confidence {}".format(item.name, item.value, item.confidence))
```

### Recognize Invoices
Recognize data from invoices using a prebuilt model. Invoice fields recognized by the service can be found [here][service_recognize_invoice].

```python
from azure.ai.formrecognizer import FormRecognizerClient
from azure.core.credentials import AzureKeyCredential

endpoint = "https://<region>.api.cognitive.microsoft.com/"
credential = AzureKeyCredential("<api_key>")

form_recognizer_client = FormRecognizerClient(endpoint, credential)

with open("<path to your invoice>", "rb") as fd:
invoice = fd.read()

poller = form_recognizer_client.begin_recognize_invoices(invoice)
result = poller.result()

for invoice in result:
for name, field in invoice.fields.items():
print("{}: {} has confidence {}".format(name, field.value, field.confidence))
```


### Train a model
Train a custom model on your own form type. The resulting model can be used to recognize values from the types of forms it was trained on.
Provide a container SAS URL to your Azure Storage Blob container where you're storing the training documents.
Expand Down Expand Up @@ -439,13 +469,14 @@ These code samples show common scenario operations with the Azure Form Recognize
* Recognize receipts: [sample_recognize_receipts.py][sample_recognize_receipts]
* Recognize receipts from a URL: [sample_recognize_receipts_from_url.py][sample_recognize_receipts_from_url]
* Recognize business cards: [sample_recognize_business_cards.py][sample_recognize_business_cards]
* Recognize invoices: [sample_recognize_invoices.py][sample_recognize_invoices]
* Recognize content: [sample_recognize_content.py][sample_recognize_content]
* Recognize custom forms: [sample_recognize_custom_forms.py][sample_recognize_custom_forms]
* Train a model without labels: [sample_train_model_without_labels.py][sample_train_model_without_labels]
* Train a model with labels: [sample_train_model_with_labels.py][sample_train_model_with_labels]
* Manage custom models: [sample_manage_custom_models.py][sample_manage_custom_models]
* Copy a model between Form Recognizer resources: [sample_copy_model.py][sample_copy_model]
* Create a composed model from a collection of models trained with labels: |[sample_create_composed_model.py][sample_create_composed_model]
* Create a composed model from a collection of models trained with labels: [sample_create_composed_model.py][sample_create_composed_model]

### Async APIs
This library also includes a complete async API supported on Python 3.5+. To use it, you must
Expand All @@ -456,6 +487,7 @@ are found under the `azure.ai.formrecognizer.aio` namespace.
* Recognize receipts: [sample_recognize_receipts_async.py][sample_recognize_receipts_async]
* Recognize receipts from a URL: [sample_recognize_receipts_from_url_async.py][sample_recognize_receipts_from_url_async]
* Recognize business cards: [sample_recognize_business_cards_async.py][sample_recognize_business_cards_async]
* Recognize invoices: [sample_recognize_invoices_async.py][sample_recognize_invoices_async]
* Recognize content: [sample_recognize_content_async.py][sample_recognize_content_async]
* Recognize custom forms: [sample_recognize_custom_forms_async.py][sample_recognize_custom_forms_async]
* Train a model without labels: [sample_train_model_without_labels_async.py][sample_train_model_without_labels_async]
Expand Down Expand Up @@ -510,6 +542,7 @@ This project has adopted the [Microsoft Open Source Code of Conduct][code_of_con
[default_azure_credential]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/identity/azure-identity#defaultazurecredential
[service_recognize_receipt]: https://aka.ms/formrecognizer/receiptfields
[service_recognize_business_cards]: https://aka.ms/formrecognizer/businesscardfields
[service_recognize_invoice]: https://aka.ms/formrecognizer/invoicefields
[sdk_logging_docs]: https://docs.microsoft.com/azure/developer/python/azure-sdk-logging

[cla]: https://cla.microsoft.com
Expand All @@ -531,6 +564,8 @@ This project has adopted the [Microsoft Open Source Code of Conduct][code_of_con
[sample_recognize_receipts_async]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/async_samples/sample_recognize_receipts_async.py
[sample_recognize_business_cards]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_recognize_business_cards.py
[sample_recognize_business_cards_async]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/async_samples/sample_recognize_business_cards_async.py
[sample_recognize_invoices]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_recognize_invoices.py
[sample_recognize_invoices_async]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/async_samples/sample_recognize_invoices_async.py
[sample_train_model_with_labels]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_train_model_with_labels.py
[sample_train_model_with_labels_async]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/async_samples/sample_train_model_with_labels_async.py
[sample_train_model_without_labels]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_train_model_without_labels.py
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,12 @@
from ._form_recognizer_client import FormRecognizerClient
from ._form_training_client import FormTrainingClient

from ._generated.v2_1_preview_2.models import (
Appearance,
Style
)


from ._models import (
FormElement,
LengthUnit,
Expand Down Expand Up @@ -67,6 +73,8 @@
'FieldValueType',
'CustomFormModelProperties',
'FormSelectionMark',
'Appearance',
'Style'
]

__VERSION__ = VERSION
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@
class FormRecognizerApiVersion(str, Enum):
"""Form Recognizer API versions supported by this package"""

#: this is the default version
V2_1_PREVIEW = "2.1-preview.1"
#: This is the default version
V2_1_PREVIEW = "2.1-preview.2"
V2_0 = "2.0"


Expand Down
Loading