Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(event_source): Add support for S3 batch operations #3572

Merged

Conversation

sbailliez
Copy link
Contributor

@sbailliez sbailliez commented Dec 30, 2023

Issue number: #3563

Summary

Changes

This feature adds data class support for S3 Batch Operations. It supports events that are using either invocation schema 1.0 or 2.0 and provides as well as the response structure that needs to be returned by the lambda.

Notable details:

  • Convenient way to get the first task via event.task in addition to event.tasks[0]

  • Convenient way to get the s3 key via event.task.s3_bucket which works for both 1.0 and 2.0 and avoids version specific code using s3BucketArn (1.0) or s3Bucket (2.0).

  • Factory methods as_succeeded, as_temporary_failure and as_permanent_failure to build result. For example S3BatchOperationResult.as_temporary_failure(task, 'failure message') is simpler than S3BatchOperationResult(task.task_id, 'TemporaryFailure', 'failure message')

  • The s3 key decoding is using unquote_plus while AWS documentation example is using unquote. Testing on S3 Batch Operations supports the usage of unquote_plus. (see comments in code and unit test)

A couple of things where I'm asking for a second opinion:

  • Should the name use S3BatchOperationXXX or S3BatchOperationsXXX - The AWS service is named Batch Operations, it felt more natural to use singular but I'm going back and forth.

  • Add a factory method to S3BatchOperationsResponse.from_event(event) similar to the factory methods for the results. There is a strong dependency with the original event, so it feels better to use this from a syntax perspective compare to the current verbose way.

User experience

The data class documentation has been updated with a converted sample provided by AWS in the S3 Batch Operations documentation. A simplified example like the one below gives a good idea of the changes:

before:

def lambda_handler(event, context: LambdaContext):
    invocation_id = event["invocationId"]
    invocation_schema_version = event["invocationSchemaVersion"]

    results = []
    result_code = None
    result_string = None
    task_id = event['task']['taskId']
    src_key: str = unquote_plus(event['task']['s3Key'], encoding='utf-8')
    src_bucket: str = event['task']['s3BucketArn'].split(':::')[-1]  
    
    s3 = boto3.client("s3", region_name='us-east-1')

    try:
        dest_bucket, dest_key = do_some_work(s3, src_bucket, src_key)
        result_code = 'Succeeded'
        result_string = f"s3://{dest_bucket}/{dest_key}"
    except ClientError as e:
        error_code = e.response['Error']['Code']
        error_message = e.response['Error']['Message']
        if error_code == 'RequestTimeout':
            result_code = 'TemporaryFailure'
            result_string = 'Retry request to Amazon S3 due to timeout.'
        else:
            result_code = 'PermanentFailure'
            result_string = f"{error_code}: {error_message}"
    except Exception as e:
        result_code = 'PermanentFailure'
        result_string = str(e)
        log.exception(e)
    finally:
        results.append(
            {
                "taskId": task_id,
                "resultCode": result_code,
                "resultString": result_string,
            }
        )
    
    return {
        "invocationSchemaVersion": invocation_schema_version,
        "treatMissingKeysAs": "PermanentFailure",
        "invocationId": invocation_id,
        "results": results,
    }

After:

def lambda_handler(event:S3BatchOperationEvent, context: LambdaContext):
    response = S3BatchOperationResponse(event.invocation_schema_version, event.invocation_id)
    
    result = None
    task = event.task
    src_key: str = task.s3_key
    src_bucket: str = task.s3_bucket    
    
    s3 = boto3.client("s3", region_name='us-east-1')

    try:
        dest_bucket, dest_key = do_some_work(s3, src_bucket, src_key)
        result = S3BatchOperationResult.as_succeeded(task, f"s3://{dest_bucket}/{dest_key}")
    except ClientError as e:
        error_code = e.response['Error']['Code']
        error_message = e.response['Error']['Message']
        if error_code == 'RequestTimeout':
            result = S3BatchOperationResult.as_temporary_failure(task, 'Retry request to Amazon S3 due to timeout.')
        else:
            result = S3BatchOperationResult.as_permanent_failure(task,  f"{error_code}: {error_message}")
    except Exception as e:
        result = S3BatchOperationResult.as_permanent_failure(task, str(e))
        log.exception(e)
    finally:
        response.add_result(result)
    
    return response.asdict()

Update 16/01

New DX with some small changes.

import boto3

from botocore.exceptions import ClientError

from aws_lambda_powertools.utilities.data_classes import (
    S3BatchOperationEvent,
    S3BatchOperationResponse,
    event_source
)
from aws_lambda_powertools.utilities.typing import LambdaContext


@event_source(data_class=S3BatchOperationEvent)
def lambda_handler(event: S3BatchOperationEvent, context: LambdaContext):
    response = S3BatchOperationResponse(event.invocation_schema_version, event.invocation_id, "PermanentFailure")

    result = None
    task = event.task
    src_key: str = task.s3_key
    src_bucket: str = task.s3_bucket    
    
    s3 = boto3.client("s3", region_name='us-east-1')

    try:
        dest_bucket, dest_key = do_some_work(s3, src_bucket, src_key)
        result = task.build_task_batch_response("Succeeded", f"s3://{dest_bucket}/{dest_key}")
    except ClientError as e:
        error_code = e.response['Error']['Code']
        error_message = e.response['Error']['Message']
        if error_code == 'RequestTimeout':
            result = task.build_task_batch_response("TemporaryFailure", "Retry request to Amazon S3 due to timeout.")
        else:
            result = task.build_task_batch_response("PermanentFailure", f"{error_code}: {error_message}")
    except Exception as e:
        result = task.build_task_batch_response("PermanentFailure", str(e))
    finally:
        response.add_result(result)
    
    return response.asdict()

Checklist

If your change doesn't seem to apply, please leave them unchecked.

Is this a breaking change?

RFC issue number:

Checklist:

  • Migration process documented
  • Implement warnings (if it can live side by side)

Acknowledgment

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Disclaimer: We value your time and bandwidth. As such, any pull requests created on non-triaged issues might not be successful.

sbailliez and others added 4 commits December 28, 2023 23:27
…t tests. This seamlessly support both schema 1.0 and 2.0

A few notes:
- S3BatchOperationXXX or S3BatchOperationsXXX ?
- s3 key is not url-encoded in real life despite what the documentation implies. Need to test with some keys that contain spaces, etc...
- S3BatchOperationResult has some factory methods to simplifies building
- S3BatchOperationEvent may need to as it makes initialization needlessly complicated
@sbailliez sbailliez requested a review from a team as a code owner December 30, 2023 01:40
@boring-cyborg boring-cyborg bot added documentation Improvements or additions to documentation tests labels Dec 30, 2023
@pull-request-size pull-request-size bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Dec 30, 2023
Copy link

boring-cyborg bot commented Dec 30, 2023

Thanks a lot for your first contribution! Please check out our contributing guidelines and don't hesitate to ask whatever you need.
In the meantime, check out the #python channel on our Powertools for AWS Lambda Discord: Invite link

@leandrodamascena leandrodamascena linked an issue Dec 30, 2023 that may be closed by this pull request
2 tasks
@leandrodamascena leandrodamascena changed the title feat: Add support for S3 batch operations as data class event/response feat(event_source): Add support for S3 batch operations Dec 30, 2023
@codecov-commenter
Copy link

codecov-commenter commented Dec 30, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (1409499) 95.51% compared to head (a99594f) 95.54%.
Report is 2 commits behind head on develop.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3572      +/-   ##
===========================================
+ Coverage    95.51%   95.54%   +0.02%     
===========================================
  Files          211      213       +2     
  Lines         9860     9961     +101     
  Branches      1802      850     -952     
===========================================
+ Hits          9418     9517      +99     
- Misses         329      330       +1     
- Partials       113      114       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@leandrodamascena
Copy link
Contributor

Hey @sbailliez! You were quick to send this PR and did an excellent job. I would like to highlight that the level of detail in the PR description is impressive, you put an effort into it!!.
I'm still reviewing the PR and should make some considerations as soon as I finish.

Thanks.

@leandrodamascena leandrodamascena self-requested a review December 30, 2023 15:07
Copy link
Contributor

@leandrodamascena leandrodamascena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sbailliez! I did an initial review and we have some small things to change!
Please let me know what you think about this.

@leandrodamascena
Copy link
Contributor

Hey @sbailliez! Please let me know if you need any help here!

@leandrodamascena
Copy link
Contributor

I'm working on this PR to release this support in our next release - 19/01/2024

@leandrodamascena
Copy link
Contributor

@rubenfonseca pls review this PR.

Copy link
Contributor

@rubenfonseca rubenfonseca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Just left some style comments inline.

Copy link
Contributor

@rubenfonseca rubenfonseca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting until all conversations are resolved before the next review

@leandrodamascena
Copy link
Contributor

Waiting until all conversations are resolved before the next review

Done

Copy link

Quality Gate Passed Quality Gate passed

The SonarCloud Quality Gate passed, but some issues were introduced.

2 New issues
0 Security Hotspots
No data about Coverage
0.2% Duplication on New Code

See analysis details on SonarCloud

Copy link
Contributor

@rubenfonseca rubenfonseca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APPROVED!

@leandrodamascena leandrodamascena merged commit c466c80 into aws-powertools:develop Jan 17, 2024
16 checks passed
Copy link

boring-cyborg bot commented Jan 17, 2024

Awesome work, congrats on your first merged pull request and thank you for helping improve everyone's experience!

@sbailliez
Copy link
Contributor Author

@leandrodamascena Thanks for the review and the fixes. Sorry for the silence but was OOO and away from computer.

@Dilski
Copy link

Dilski commented Oct 9, 2024

Hello. Maybe this is a silly question - but why is there a check that there can be only 1 result?

https://github.com/aws-powertools/powertools-lambda-python/pull/3572/files#diff-effd0fefbf972220e733d15c434037f0226b500564ded07e87ad8b35f3fe0247R132-R133

@leandrodamascena
Copy link
Contributor

Hello. Maybe this is a silly question - but why is there a check that there can be only 1 result?

#3572 (files)

Hey @Dilski! Because this is a relation 1x1 with invoke and response. According to the official documentation S3 invokes the Lambda for each object, and not for a batch of objects.

When the batch job starts, Amazon S3 invokes the Lambda function synchronously for each object in the manifest. The event parameter includes the names of the bucket and the object.

Do you see a different behavior or something we're missing that needs to be fixed? I'd love to hear it.

PS: Please next time open a discussion or issue and indicate the specific PR/comment you want to discuss. Commenting on closed issues/PRs is sometimes challenging to respond to because we don't have much visibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: Add support for S3 Batch Operations event
5 participants