Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up responses of Lambda-backed CloudFormation Custom Resources #1813

Open
flostadler opened this issue Nov 8, 2024 · 0 comments
Open
Labels
kind/enhancement Improvements or new features

Comments

@flostadler
Copy link
Contributor

Hello!

  • Vote on this issue by adding a 👍 reaction
  • If you want to implement this feature, comment to let us know (we'll work with you on design, scheduling, etc.)

Issue details

Right now the responses of CloudFormation Custom Resources are kept in the bucket and expected to be cleaned up with bucket lifecycle rules or manually.

Enhance the response handling to clean up the responses automatically, including the old versioned objects.

Affected area/feature

@flostadler flostadler added kind/enhancement Improvements or new features needs-triage Needs attention from the triage team and removed needs-triage Needs attention from the triage team labels Nov 8, 2024
flostadler added a commit that referenced this issue Nov 8, 2024
This PR adds support for [CloudFormation Custom
Resource](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/template-custom-resources.html)
to the aws-native provider. It implements an emulator that enables
Pulumi programs to interact with Lambda-backed CloudFormation Custom
Resources.

A CloudFormation custom resource is essentially an extension point to
run arbitrary code as part of the CloudFormation lifecycle. It is
similar in concept to the [Pulumi Command
Provider](https://www.pulumi.com/registry/packages/command/), the
difference being that CloudFormation CustomResources are executed in the
Cloud; either through Lambda or SNS.

For the first implementation we decided to limit the scope to Lambda
backed Custom Resources, because the SNS variants are not widely used.

## Custom Resource Protocol
The implementation follows the CloudFormation Custom Resource protocol.
I derived the necessary parts by combining information from the
[docs](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/crpg-ref.html),
[CDKs CustomResource
Framework](https://github.com/aws/aws-cdk/tree/main/packages/%40aws-cdk/custom-resource-handlers/lib/custom-resources-framework)
and trial&error.

Notable aspects of that protocol are:
- primitive properties need to be string encoded when sending them to
Custom Resource handlers. This includes deeply nested properties:
aws-cloudformation/cloudformation-coverage-roadmap#1037
- The Lambda Function is invoked asynchronously. Lambda will retry the
execution if the function fails unexpectedly (e.g. unhandled exception).
- Due to the async invocation, the response is not returned from the
Lambda Function, instead it's sent to a `ResponseURL` that needs to be
included in the request payload.
- Similarly to CloudFormation, we decided to implement this using S3
Buckets and presigned URLs.

### Custom Resource Lifecycle
```mermaid
sequenceDiagram
    participant A as aws-native
    participant S3 as S3 Bucket
    participant L as Lambda
    
    %% Create Flow
    Note over A,L: Create Operation
    A->>S3: Generate presigned URL
    A->>L: Invoke with CREATE event
    activate L
    loop Until response found or timeout
        A->>S3: Poll for response
        L-->>S3: Upload response
    end
    deactivate L
    A->>S3: Fetch response
    alt Success
        A->>A: Store PhysicalId & outputs
    else Failure
        A->>A: Return error
    end

    %% Update Flow
    Note over A,L: Update Operation
    A->>S3: Generate presigned URL
    A->>L: Invoke with UPDATE event
    activate L
    loop Until response found or timeout
        A->>S3: Poll for response
        L-->>S3: Upload response
    end
    deactivate L
    A->>S3: Fetch response
    alt Success
        A->>A: Check PhysicalId
        alt ID Changed
            A->>S3: Generate presigned URL for cleanup
            A->>L: Invoke with DELETE event for old resource
            activate L
            loop Until cleanup response found or timeout
                A->>S3: Poll for cleanup response
                L-->>S3: Upload cleanup response
            end
            deactivate L
            A->>S3: Fetch cleanup response
        end
    else Failure
        A->>A: Return error
    end

    %% Delete Flow
    Note over A,L: Delete Operation
    A->>S3: Generate presigned URL
    A->>L: Invoke with DELETE event
    activate L
    loop Until response found or timeout
        A->>S3: Poll for response
        L-->>S3: Upload response
    end
    deactivate L
    A->>S3: Fetch response
    alt Success
        A->>A: Return success
    else Failure
        A->>A: Return error
    end
```

## Reviewer Notes

Key areas to review:
1. Error handling in the response collection mechanism
2. Timeout management, especially for the `Update` lifecycle
3. Documentation completeness and accuracy

Exposing this resource and schematizing it is part of this PR
#1807.
Automatically cleaning up the response objects is not included in this
PR in order to keep its size manageable. Implementing this is tracked
here: #1813.

Please pay special attention to:
- S3 response collection mechanism security
- State management during updates
- Cleanup handling when physical resource IDs change

## Testing
- Unit tests including error handling tests for various failure
scenarios
- Integration tests with actual Lambda functions are added in this
stacked PR: #1807

## Related Issues
- pulumi/pulumi-cdk#109
- #1812
- #1813
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement Improvements or new features
Projects
None yet
Development

No branches or pull requests

1 participant