-
Notifications
You must be signed in to change notification settings - Fork 33
Description
Is this related to an existing feature request or issue?
No
Which AWS Lambda Powertools utility does this relate to?
Other
Summary
When using an AWS Lambda function to process a batch of items from either Amazon SQS, Amazon Kinesis Data Streams or Amazon DynamoDB Streams, and the function fails during processing of any item within the batch, then the entire batch will be returned to the queue or stream, by default. This means that already successfully processed items risk being processed multiple times, which increases Lambda function invocation count, costs and the overall time it takes to consume items from one of these event sources.
By adding the Batch Processing Utility, we can bring support for simple utilization of the AWS Lambda function response type ReportBatchItemFailures where partial batch item failures can be reported, and thereby help reducing the number of items that are being re-processed. The utility will automatically monitor the processing of each item within a batch, and report which items failed to be processed. This enable developers to focus on writing business logic while benefiting from the Batch Processing Utility automatically doing the reporting of partial failures within a batch.
Use case
There are multiple use cases for adding the Batch Processing Utility to Lambda Powertools for .NET:
- Enable developers to get "plug-n-play" support for automated reporting of partial failures when consuming batches from SQS Queues, Kinesis Data Streams and DynamoDB Streams.
- By reporting partial failures, we can reduce the number of items being re-processed, leading to fewer Lambda function invocations, reduced cost, and faster consumption of events.
- Improve feature parity across the different AWS Lambda Powertools projects.
Proposal
Key features include:
- Reduce the number of items being re-processed by automatically reporting partial failures when processing batches from Amazon SQS Queues, Amazon Kinesis Data Streams and Amazon DynamoDB Streams.
- Batch items are processed in isolation. One item failing processing will not cause the Lambda function to immediately fail.
- Ease of use (simple to setup and configure).
- Extensibility of the batch processing logic (i.e. by decorating the batch processor and/or deriving from it). This provides extension points for handling per-item failure/success, hooks for running code before and after a batch has been processed, as well as an option for decorating the per-item batch processing logic. All of this very similar to the extension points currently available in AWS Lambda Powertools for Python.
- Support for enabling and configuring parallel processing of items in a batch (to utilize multiple cores if available). This of course means that focus will also be on providing a thread-safe utility.
Out of scope
Integration with the Idempotency Utility
Even though reporting partial failures will reduce the number of items being re-processed, it cannot be guaranteed that successfully processed items are only processed once. To guard against this, it makes sense to look at the Idempotency Utility and see if we can provide simple utilization of that utility within the Batch Processing Utility. However, since the Idempotency Utility is not available yet, the use of this utility is considered out of scope for now.
Potential challenges
Event Source Data Classes
When processing a batch event, we should provide an object that describes the specific event, i.e., an SQSEvent. In Lambda Powertools for Python we have an Event Source Data Classes utility containing classes for different types of events, including SQSEvent, KinesisStreamEvent and DynamoDBStreamEvent. In the the AWS Lambda for .NET project we have classes defined for each of these events:
- Amazon.Lambda.SQSEvents (github, nuget)
- Amazon.Lambda.KinesisEvents (github, nuget)
- Amazon.Lambda.DynamoDBEvents (github, nuget)
While there are benefits to using the event types that are already defined and maintained (instead of having to replicate and maintain these in this project as well), it also means that there will be a dependency on another project / team which to some degree goes against "keeping it lean" and minimizing external dependencies (as per the AWS Lambda Powertools Tenets).
Required Configuration of Event Source Mapping
The utility will require that ReportBatchItemFailures is included in the FunctionResponseTypes list of the event source mapping configuration. We need to be clear on this in the documentation for AWS Lambda Powertools, as this could otherwise turn into a source of confusion.
Dependencies and Integrations
Dependencies
Depending on what direction we decide to go in terms of Event Source Data Classes (see Potential Challenges above), we might take a dependency on:
- Amazon.Lambda.SQSEvents (github, nuget)
- Amazon.Lambda.KinesisEvents (github, nuget)
- Amazon.Lambda.DynamoDBEvents (github, nuget)
Alternative solutions
Today, developers can implement support for reporting batch item failures by following the AWS Lambda documentation on "Reporting batch item failures" for each of the following event sources:
Amazon SQS
https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html#services-sqs-batchfailurereporting
Amazon Kinesis Data Streams
https://docs.aws.amazon.com/lambda/latest/dg/with-kinesis.html#services-kinesis-batchfailurereporting
Amazon DynamoDB Streams
https://docs.aws.amazon.com/lambda/latest/dg/with-ddb.html#services-ddb-batchfailurereportingAcknowledgment
- This feature request meets Lambda Powertools Tenets
- Should this be considered in other Lambda Powertools languages? i.e. Python, Java, TypeScript
Metadata
Metadata
Labels
Type
Projects
Status