-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC WIP: New batch processor for new native partial response (SQS, DynamoDB, Kinesis, Aurora Stream(?)) #797
Comments
@machafer will start to look the implementations and the UX. |
@machafer, what's the status on this please? |
As I recently worked on the code here i'll pick this up ! |
Have you seen this issue #596 and is it something to consider as part of this RFC? Regarding the proposition, I'm not sure how you can return this public class ProcessSQSMessageBatch implements RequestHandler<SQSEvent, SQSBatchResponse> {
@Override
public SQSBatchResponse handleRequest(SQSEvent sqsEvent, Context context) {
List<SQSBatchResponse.BatchItemFailure> batchItemFailures = new ArrayList<SQSBatchResponse.BatchItemFailure>();
String messageId = "";
for (SQSEvent.SQSMessage message : sqsEvent.getRecords()) {
try {
//process your message
messageId = message.getMessageId();
} catch (Exception e) {
//Add failed message identifier to the batchItemFailures list
batchItemFailures.add(new SQSBatchResponse.BatchItemFailure(messageId));
}
}
return new SQSBatchResponse(batchItemFailures);
}
} Same for kinesis/dynamodb: ➡️ I would definitely use the built-in events (just seen the solution 4.2). But in that case, what will be the developer experience, what will be the added value of the module? Can you elaborate on this? Regarding your questions:
|
Can we also take inspiration from python and see if we can have a similar "experience". Wondering if we could have something like this (for SQS), using interfaces and default implementations: public class ProcessSQSMessageBatch implements RequestHandler<SQSEvent, SQSBatchResponse>, BatchProcessor<SQSEvent, SQSBatchResponse> {
@Override
public SQSBatchResponse handleRequest(SQSEvent sqsEvent, Context context) {
// processBatch is implemented as default in the interface and handle exceptions in the processElement function to add the item to the BatchItemFailure list
return this.processBatch(sqsEvent); // we may need to pass context too... ?
}
// this method comes from the BatchProcessor interface, developers need to override the appropriate one
@Override
protected void processItem(SQSMessage message) {
// do some stuff with this item
}
} With the public interface BatchProcessor<I, O> {
default O processBatch(I input) {
// depending on the input type (SQS/Kinesis/Dynamo/...), create the appropriate response
// browse the list of items
// for each item
try {
processItem(item);
catch (Throwable t) {
// put item in item failure list
}
}
default void processItem(SQSMessage message) {
System.out.println(message.messageId);
}
default void processItem(KinesisEventRecord record) {
System.out.println(record.eventID);
}
default void processItem(DynamodbStreamRecord record) {
System.out.println(record.eventID);
}
} With this, we could add new streaming services with the Interface defaults without breaking anything. |
Thanks @jeromevdl for the considered feedback!
Good catch. I think the base class / base builder will have to return I've linked the examples and existing partial response types into the RFC. I've also updated the section at the top explaining why we should do this. It's important we get this part right.
My feeling is how to handle this will come out as part of the implementation, we just have to be attentive to it.
I tend to agree. I don't think there is much value add, and a lot of extra code to maintain, in mapping to another model. Realistically the presented use case - "in some cases you can move message handlers between event sources without changing a type in your code" - is pretty tenuous. |
Looks like another option - I hadn't thought of using interface defaults. Let me have a play with it - i've started hacking around on a branch to try and get a feel for the ergonomics of different options. We've narrowed down the solution space in this discussion already; I will mock up some variants on the branch and we can discuss again 💪 |
I've added a reasonably complete example using extension and a very rough sketch using default impls. I think it would be helpful to jump on a call together in the next week to discuss |
@Override
protected SQSBatchResponse writeResponse(Iterable<MessageProcessingResult<SQSEvent.SQSMessage>> results) {
// Here we map up the SQS-specific response for the batch based on the success of the individual messages
throw new NotImplementedException();
} But we have a good basis to discuss on Thursday. |
This is a pretty strong statement - do you have some more details? I think it's important to know why we're discarding this. The default interface stuff I struggled to make work but if we turn our brains to it we can probably get somewhere. I can't think of a way of extending this that would cause problems with the fairly classic inheritance structure i've used so a good counter example - "if we add feature X, it will break the public interface" would be good.
I've got this in both variants now - for SQS, Lines 12 to 14 in 6241014
Lines 17 to 20 in 6241014
Because of the way the ABC is extended it's easy to provide a map down from the batch to the records per-type - here's the SQS batch handler: Lines 15 to 17 in 6241014
This may well be the case! My reasoning is - if we want to add a tuneable , we should be able to do it without breaking the interface.With the existing SQS batch handling impl you can't because the whole implementation is a series of various public overloads taking a huge list of possible params e.g.: Lines 481 to 485 in 9afd274
I can across this as part of #1183 , where I would've liked to add a "use FIFO batch behaviour" switch but can no longer change the interface (in this case we can avoid it because we can infer we are on a FIFO queue, but that's kind of beside the point). I'm not confident about this being the right way and am keen to discuss.
This is a small part of the impl I started to flesh out and not a user facing thing - the user's code returns nothing or throws like the examples inline above. Appreciate it's hard to decode intent from a big dump of uncommented PoC code :) |
The idempotency library also integrates with the SQS utility; we must retain this functionality also, and it gives another example of an extension point. |
If we go down the Builder route, we should implement it in a way that the batch handler is created and configured once with a builder, and then it is invoked during the public class SqsExampleWithBuilder implements RequestHandler<SQSEvent, SQSBatchResponse> {
BatchMessageHandler handler;
public SqsExampleWithBuilder(){
handler = new BatchMessageHandler.Builder()
.withSource(SourceType.SQS)
.withFailureHandler(msg -> System.out.println("Whoops: " + msg.getMessageId()))
.withRawMessageHandler(this::processWithIdempotency)
.build();
}
@Override
public SQSBatchResponse handleRequest(SQSEvent sqsEvent, Context context) {
return handler.handle(sqsEvent, context);
}
@Idempotent
@SqsLargeMessage
private void processWithIdempotency(@IdempotencyKey SQSEvent.SQSMessage sqsMessage, Context context) {
}
} |
Thanks for the comment @mriccia It's more verbose than the interfaces, but probably more customizable I admit... How do you handle the inner content of a message (body deserialization) ? FailureHandler is optional right (as we need to handle the failure and add items to partialBatchFailure)? Not sure about the source... can't we guess it instead of asking it? Otherwise, I kinda like it... Something like this ? BatchMessageHandler<SQSEvent> handler = new BatchMessageHandler.Builder(SQSEvent.class)
.withFailureHandler(msg -> System.out.println("Whoops: " + msg.getMessageId()))
.withDeserializedMessageHandler(this::processDeserialized, Basket.class)
.build();
private void processDeserialized(Basket message, Context context) {
} |
RFC looks good now.
|
RFC is good now, let's build it! For the |
Key information
Summary
A new generic batch processing utility, which can process records from SQS, Kinesis Data Streams, and DynamoDB streams, and handle reporting batch failures.
Motivation
With the launch of support for partial batch responses for Lambda/SQS, the event source mapping can now natively handle partial failures in a batch - removing the need for calls to the delete api. This support already exists for Kinesis and DynamoDB streams.
The Java SDK for Lambda contains 1/ the incoming message types (both batch and nested messages within batch) and 2/ the partial batch response types. The documentation for each event source (e.g. SQS) contains examples for implementing partial batch responses. Powertools aims to improve on this by:
Proposal
1. Layout
The new utility will be implemented in a new package
powertools-batch
. The existing SQS batch processorpowertools-sqs
will be maintained for bug fixes and removed in powertools for java v2.2. Existing SQS Batch Interface Simplifications
Powertools for Java has an existing batch processing mechanism for SQS only, which was written before partial responses and uses explicit message deletion instead (documentation, code).
It includes extra tunables that are not present in the python implementation and make less sense with partial batch responses. We will not support these:
The existing implementation in addition to a utility class provides an annotation to handle batch responses. It works like this:
This doesn't make sense in the world of partial batch responses. The message returned to Lambda will be the partial batch response itself, and will therefore be generated by the new batch utility. That means that this still of embedding a success message in the center makes no sense, as the user's own code does not control the return.
3. Features to retain
@SqsLargeMessage
annotation on the lambda request handler itself. However in Feature request: Better failures handling while using both BatchProcessing and LargeMessageHandling #596 we can see that the current implementation is not optimal when batches and large messages are combined, leading to an entire batch failure when a single message has issues. We will resolve this here by incorporating the large message processing into the inner-loop of the batch processing, rather than an aspect that happens before the batch. We will see if this can be done automatically or if it will require the user to hint they need large message processing enabled when providing their batch handler.4. User-Facing API
To decide on the approach, let's look at 2 alternative implementations of the user-facing API. The complete
implementation has not been built, just enough of the API of the new library to complete the RFC phase.
We can provide a simple builder interface.
This decouples us completely from the request handler model which may provide extra flexibility in the future, and
gives us a mechanism to extend behaviour later without breaking interfaces by adding extra parameters to the builder.
By way of example, success and failure hooks are added to SQS, a feature also provided in the Python implementation.
A partial mockup of this implementation can be found here.
4.3
RequestHandler
base classA third option was considered providing a base
BatchRequestHandler<...>
for the user to extend.This option was discarded because it limits flexibility of the user's code. The code for this
variant can nonetheless be found
here.
5. Data Model
The new module will consume batch events from Lambda using the types in aws-lambda-java-events. From there individual records must be pulled out and passed onto the user-provided handler.
The events library already has nested types for the messages within a batch; we simply pass these through to the user's handler. These types do not share a ABC so each handler is coupled to the concrete type of the source that is producing messages.
This approach decreases the complexity of the powertools implementation - no additional mapping needs to be done - and would also automatically pass through new fields appearing in the interface with a simple dependency version update.
Questions
powertools-sqs
completely? If so, we'd need to duplicate some other functionality - e.g. the large message aspect - so that users do not need to pull both libraries in.Drawbacks
This change will introduce redundancy between the existing SQS batch processing utility and this
new utility. The old utility will be removed as part of the v2 changes.
This utility adds no additional dependencies. The message types involved are all bundled together
in
aws-lambda-java-events
.Rationale and alternatives
RequestHandler
that can be extended by the user (example code). This has been discarded as previous feedback has indicated that in some cases, it is not practical to extend a Powertools class in theRequestHandler
- using default interfaces allows us to mixin implementation without doing thisRequestHandler
still to some extent and 2/ it isn't a common pattern in JavaUnresolved questions
The text was updated successfully, but these errors were encountered: