Skip to content

Commit

Permalink
[formrecognizer] add strongly-typed receipt wrapper sample (#12128)
Browse files Browse the repository at this point in the history
* add strongly typed receipt samples

* update sample tests

* add link to doc showing receipt fields available

* update receipt fields link to aka.ms
  • Loading branch information
kristapratico authored Jun 23, 2020
1 parent 78ffbba commit 17ad110
Show file tree
Hide file tree
Showing 11 changed files with 276 additions and 5 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,9 @@ def begin_recognize_receipts(self, receipt, **kwargs):
The input document must be of one of the supported content types - 'application/pdf',
'image/jpeg', 'image/png' or 'image/tiff'.
See fields found on a receipt here:
https://aka.ms/azsdk/python/formrecognizer/receiptfields
:param receipt: JPEG, PNG, PDF and TIFF type file stream or bytes.
Currently only supports US sales receipts.
:type receipt: bytes or IO[bytes]
Expand Down Expand Up @@ -141,6 +144,9 @@ def begin_recognize_receipts_from_url(self, receipt_url, **kwargs):
"""Extract field text and semantic values from a given US sales receipt.
The input document must be the location (Url) of the receipt to be analyzed.
See fields found on a receipt here:
https://aka.ms/azsdk/python/formrecognizer/receiptfields
:param str receipt_url: The url of the receipt to analyze. The input must be a valid, encoded url
of one of the supported formats: JPEG, PNG, PDF and TIFF. Currently only supports
US sales receipts.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,9 @@ async def begin_recognize_receipts(
The input document must be of one of the supported content types - 'application/pdf',
'image/jpeg', 'image/png' or 'image/tiff'.
See fields found on a receipt here:
https://aka.ms/azsdk/python/formrecognizer/receiptfields
:param receipt: JPEG, PNG, PDF and TIFF type file stream or bytes.
Currently only supports US sales receipts.
:type receipt: bytes or IO[bytes]
Expand Down Expand Up @@ -155,6 +158,9 @@ async def begin_recognize_receipts_from_url(
"""Extract field text and semantic values from a given US sales receipt.
The input document must be the location (Url) of the receipt to be analyzed.
See fields found on a receipt here:
https://aka.ms/azsdk/python/formrecognizer/receiptfields
:param str receipt_url: The url of the receipt to analyze. The input must be a valid, encoded url
of one of the supported formats: JPEG, PNG, PDF and TIFF. Currently only supports
US sales receipts.
Expand Down
5 changes: 4 additions & 1 deletion sdk/formrecognizer/azure-ai-formrecognizer/samples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ what you can do with the Azure Form Recognizer client library.

|**Advanced Sample File Name**|**Description**|
|----------------|-------------|
|[sample_strongly_typing_recognized_form.py][sample_strongly_typing_recognized_form] and [sample_strongly_typing_recognized_form_async.py][sample_strongly_typing_recognized_form_async]|Use the fields in your recognized forms to create an object with strongly-typed fields|
|[sample_get_bounding_boxes.py][sample_get_bounding_boxes] and [sample_get_bounding_boxes_async.py][sample_get_bounding_boxes_async]|Get info to visualize the outlines of form content and fields, which can be used for manual validation|
|[sample_differentiate_output_models_trained_with_and_without_labels.py][sample_differentiate_output_models_trained_with_and_without_labels] and [sample_differentiate_output_models_trained_with_and_without_labels_async.py][sample_differentiate_output_models_trained_with_and_without_labels_async]|See the differences in output when using a custom model trained with labeled data and one trained with unlabeled data|

Expand Down Expand Up @@ -94,4 +95,6 @@ what you can do with the Azure Form Recognizer client library.
[sample_train_model_without_labels]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_train_model_without_labels.py
[sample_train_model_without_labels_async]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/async_samples/sample_train_model_without_labels_async.py
[sample_copy_model]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_copy_model.py
[sample_copy_model_async]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/async_samples/sample_copy_model_async.py
[sample_copy_model_async]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/async_samples/sample_copy_model_async.py
[sample_strongly_typing_recognized_form]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_strongly_typing_recognized_form.py
[sample_strongly_typing_recognized_form_async]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/async_samples/sample_strongly_typing_recognized_form_async.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,12 @@
FILE: sample_recognize_receipts_async.py
DESCRIPTION:
This sample demonstrates how to recognize US sales receipts from a file.
This sample demonstrates how to recognize and extract common fields from US receipts,
using a pre-trained receipt model. For a suggested approach to extracting information
from receipts, see sample_strongly_typed_recognized_form_async.py.
See fields found on a receipt here:
https://aka.ms/azsdk/python/formrecognizer/receiptfields
USAGE:
python sample_recognize_receipts_async.py
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,12 @@
FILE: sample_recognize_receipts_from_url_async.py
DESCRIPTION:
This sample demonstrates how to recognize US sales receipts from a URL.
This sample demonstrates how to recognize and extract common fields from a US receipt URL,
using a pre-trained receipt model. For a suggested approach to extracting information
from receipts, see sample_strongly_typed_recognized_form_async.py.
See fields found on a receipt here:
https://aka.ms/azsdk/python/formrecognizer/receiptfields
USAGE:
python sample_recognize_receipts_from_url_async.py
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# coding: utf-8

# -------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for
# license information.
# --------------------------------------------------------------------------

"""
FILE: sample_strongly_typed_recognized_form_async.py
DESCRIPTION:
This sample demonstrates how to use the fields in your recognized forms to create an object with
strongly-typed fields. The pre-trained receipt method will be used to illustrate this sample, but
note that a similar approach can be used for any custom form as long as you properly update the
fields' names and types.
See fields found on a receipt here:
https://aka.ms/azsdk/python/formrecognizer/receiptfields
USAGE:
python sample_strongly_typed_recognized_form_async.py
Set the environment variables with your own values before running the sample:
1) AZURE_FORM_RECOGNIZER_ENDPOINT - the endpoint to your Cognitive Services resource.
2) AZURE_FORM_RECOGNIZER_KEY - your Form Recognizer API key
"""

import os
import asyncio
from azure.ai.formrecognizer import FormField


class Receipt(object):
"""Creates a strongly-typed Receipt class from the fields returned in a RecognizedForm.
If a specific field is not found on the receipt, it will return None.
See fields found on a receipt here:
https://aka.ms/azsdk/python/formrecognizer/receiptfields
"""

def __init__(self, form):
self.receipt_type = form.fields.get("ReceiptType", FormField())
self.merchant_name = form.fields.get("MerchantName", FormField())
self.merchant_address = form.fields.get("MerchantAddress", FormField())
self.merchant_phone_number = form.fields.get("MerchantPhoneNumber", FormField())
self.receipt_items = self.convert_to_receipt_item(form.fields.get("Items", FormField()))
self.subtotal = form.fields.get("Subtotal", FormField())
self.tax = form.fields.get("Tax", FormField())
self.tip = form.fields.get("Tip", FormField())
self.total = form.fields.get("Total", FormField())
self.transaction_date = form.fields.get("TransactionDate", FormField())
self.transaction_time = form.fields.get("TransactionTime", FormField())

def convert_to_receipt_item(self, items):
"""Converts Items in a receipt to a list of strongly-typed ReceiptItem
"""
if items is None:
return []
return [ReceiptItem(item) for item in items.value]


class ReceiptItem(object):
"""Creates a strongly-typed ReceiptItem for every receipt item found in a RecognizedForm
"""

def __init__(self, item):
self.name = item.value.get("Name", FormField())
self.quantity = item.value.get("Quantity", FormField())
self.price = item.value.get("Price", FormField())
self.total_price = item.value.get("TotalPrice", FormField())


class StronglyTypedRecognizedFormSampleAsync(object):

async def strongly_typed_receipt_async(self):
path_to_sample_forms = os.path.abspath(os.path.join(os.path.abspath(__file__), "..", "..", "./sample_forms/receipt/contoso-allinone.jpg"))

from azure.core.credentials import AzureKeyCredential
from azure.ai.formrecognizer.aio import FormRecognizerClient

endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"]
key = os.environ["AZURE_FORM_RECOGNIZER_KEY"]

async with FormRecognizerClient(
endpoint=endpoint, credential=AzureKeyCredential(key)
) as form_recognizer_client:

with open(path_to_sample_forms, "rb") as f:
poller = await form_recognizer_client.begin_recognize_receipts(receipt=f)
receipts = await poller.result()

for receipt in receipts:
receipt = Receipt(receipt)
print("Receipt Type: {} has confidence: {}".format(receipt.receipt_type.value, receipt.receipt_type.confidence))
print("Merchant Name: {} has confidence: {}".format(receipt.merchant_name.value, receipt.merchant_name.confidence))
print("Transaction Date: {} has confidence: {}".format(receipt.transaction_date.value, receipt.transaction_date.confidence))
print("Receipt items:")
for item in receipt.receipt_items:
print("...Item Name: {} has confidence: {}".format(item.name.value, item.name.confidence))
print("...Item Quantity: {} has confidence: {}".format(item.quantity.value, item.quantity.confidence))
print("...Individual Item Price: {} has confidence: {}".format(item.price.value, item.price.confidence))
print("...Total Item Price: {} has confidence: {}".format(item.total_price.value, item.total_price.confidence))
print("Subtotal: {} has confidence: {}".format(receipt.subtotal.value, receipt.subtotal.confidence))
print("Tax: {} has confidence: {}".format(receipt.tax.value, receipt.tax.confidence))
print("Tip: {} has confidence: {}".format(receipt.tip.value, receipt.tip.confidence))
print("Total: {} has confidence: {}".format(receipt.total.value, receipt.total.confidence))


async def main():
sample = StronglyTypedRecognizedFormSampleAsync()
await sample.strongly_typed_receipt_async()


if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,12 @@
FILE: sample_recognize_receipts.py
DESCRIPTION:
This sample demonstrates how to recognize US sales receipts from a file.
This sample demonstrates how to recognize and extract common fields from US receipts,
using a pre-trained receipt model. For a suggested approach to extracting information
from receipts, see sample_strongly_typed_recognized_form.py.
See fields found on a receipt here:
https://aka.ms/azsdk/python/formrecognizer/receiptfields
USAGE:
python sample_recognize_receipts.py
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,12 @@
FILE: sample_recognize_receipts_from_url.py
DESCRIPTION:
This sample demonstrates how to recognize US sales receipts from a URL.
This sample demonstrates how to recognize and extract common fields from a US receipt URL,
using a pre-trained receipt model. For a suggested approach to extracting information
from receipts, see sample_strongly_typed_recognized_form.py.
See fields found on a receipt here:
https://aka.ms/azsdk/python/formrecognizer/receiptfields
USAGE:
python sample_recognize_receipts_from_url.py
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# coding: utf-8

# -------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for
# license information.
# --------------------------------------------------------------------------

"""
FILE: sample_strongly_typed_recognized_form.py
DESCRIPTION:
This sample demonstrates how to use the fields in your recognized forms to create an object with
strongly-typed fields. The pre-trained receipt method will be used to illustrate this sample, but
note that a similar approach can be used for any custom form as long as you properly update the
fields' names and types.
See fields found on a receipt here:
https://aka.ms/azsdk/python/formrecognizer/receiptfields
USAGE:
python sample_strongly_typed_recognized_form.py
Set the environment variables with your own values before running the sample:
1) AZURE_FORM_RECOGNIZER_ENDPOINT - the endpoint to your Cognitive Services resource.
2) AZURE_FORM_RECOGNIZER_KEY - your Form Recognizer API key
"""

import os
from azure.ai.formrecognizer import FormField


class Receipt(object):
"""Creates a strongly-typed Receipt class from the fields returned in a RecognizedForm.
If a specific field is not found on the receipt, it will return None.
See fields found on a receipt here:
https://aka.ms/azsdk/python/formrecognizer/receiptfields
"""

def __init__(self, form):
self.receipt_type = form.fields.get("ReceiptType", FormField())
self.merchant_name = form.fields.get("MerchantName", FormField())
self.merchant_address = form.fields.get("MerchantAddress", FormField())
self.merchant_phone_number = form.fields.get("MerchantPhoneNumber", FormField())
self.receipt_items = self.convert_to_receipt_item(form.fields.get("Items", FormField()))
self.subtotal = form.fields.get("Subtotal", FormField())
self.tax = form.fields.get("Tax", FormField())
self.tip = form.fields.get("Tip", FormField())
self.total = form.fields.get("Total", FormField())
self.transaction_date = form.fields.get("TransactionDate", FormField())
self.transaction_time = form.fields.get("TransactionTime", FormField())

def convert_to_receipt_item(self, items):
"""Converts Items in a receipt to a list of strongly-typed ReceiptItem
"""
if items is None:
return []
return [ReceiptItem(item) for item in items.value]


class ReceiptItem(object):
"""Creates a strongly-typed ReceiptItem for every receipt item found in a RecognizedForm
"""

def __init__(self, item):
self.name = item.value.get("Name", FormField())
self.quantity = item.value.get("Quantity", FormField())
self.price = item.value.get("Price", FormField())
self.total_price = item.value.get("TotalPrice", FormField())


class StronglyTypedRecognizedFormSample(object):

def strongly_typed_receipt(self):
path_to_sample_forms = os.path.abspath(os.path.join(os.path.abspath(__file__), "..", "./sample_forms/receipt/contoso-allinone.jpg"))

from azure.core.credentials import AzureKeyCredential
from azure.ai.formrecognizer import FormRecognizerClient

endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"]
key = os.environ["AZURE_FORM_RECOGNIZER_KEY"]

form_recognizer_client = FormRecognizerClient(
endpoint=endpoint, credential=AzureKeyCredential(key)
)
with open(path_to_sample_forms, "rb") as f:
poller = form_recognizer_client.begin_recognize_receipts(receipt=f)
receipts = poller.result()

for receipt in receipts:
receipt = Receipt(receipt)
print("Receipt Type: {} has confidence: {}".format(receipt.receipt_type.value, receipt.receipt_type.confidence))
print("Merchant Name: {} has confidence: {}".format(receipt.merchant_name.value, receipt.merchant_name.confidence))
print("Transaction Date: {} has confidence: {}".format(receipt.transaction_date.value, receipt.transaction_date.confidence))
print("Receipt items:")
for item in receipt.receipt_items:
print("...Item Name: {} has confidence: {}".format(item.name.value, item.name.confidence))
print("...Item Quantity: {} has confidence: {}".format(item.quantity.value, item.quantity.confidence))
print("...Individual Item Price: {} has confidence: {}".format(item.price.value, item.price.confidence))
print("...Total Item Price: {} has confidence: {}".format(item.total_price.value, item.total_price.confidence))
print("Subtotal: {} has confidence: {}".format(receipt.subtotal.value, receipt.subtotal.confidence))
print("Tax: {} has confidence: {}".format(receipt.tax.value, receipt.tax.confidence))
print("Tip: {} has confidence: {}".format(receipt.tip.value, receipt.tip.confidence))
print("Total: {} has confidence: {}".format(receipt.total.value, receipt.total.confidence))


if __name__ == '__main__':
sample = StronglyTypedRecognizedFormSample()
sample.strongly_typed_receipt()
Original file line number Diff line number Diff line change
Expand Up @@ -110,3 +110,8 @@ def test_sample_train_model_with_labels(self, resource_group, location, form_rec
def test_sample_train_model_without_labels(self, resource_group, location, form_recognizer_account, form_recognizer_account_key):
os.environ['CONTAINER_SAS_URL'] = self.get_settings_value("FORM_RECOGNIZER_STORAGE_CONTAINER_SAS_URL")
_test_file('sample_train_model_without_labels.py', form_recognizer_account, form_recognizer_account_key)

@pytest.mark.live_test_only
@GlobalFormRecognizerAccountPreparer()
def test_sample_strongly_typing_recognized_form(self, resource_group, location, form_recognizer_account, form_recognizer_account_key):
_test_file('sample_strongly_typing_recognized_form.py', form_recognizer_account, form_recognizer_account_key)
Original file line number Diff line number Diff line change
Expand Up @@ -108,3 +108,7 @@ def test_sample_train_model_without_labels_async(self, resource_group, location,
os.environ['CONTAINER_SAS_URL'] = self.get_settings_value("FORM_RECOGNIZER_STORAGE_CONTAINER_SAS_URL")
_test_file('sample_train_model_without_labels_async.py', form_recognizer_account, form_recognizer_account_key)

@pytest.mark.live_test_only
@GlobalFormRecognizerAccountPreparer()
def test_sample_strongly_typing_recognized_form_async(self, resource_group, location, form_recognizer_account, form_recognizer_account_key):
_test_file('sample_strongly_typing_recognized_form_async.py', form_recognizer_account, form_recognizer_account_key)

0 comments on commit 17ad110

Please sign in to comment.