[s3] Gracefully handle failed deploys #11687

adam-nielsen · 2020-11-25T01:05:30Z

A failed deploy that creates an S3 bucket should either remove the bucket again (making the deploy appear atomic), or not fail when the stack is redeployed due to the S3 bucket existing from the previous run.

Use Case

If you have a CDK stack that creates an S3 bucket, then the stack must deploy successfully otherwise manual intervention is required.

For example, if you're still working out your permissions for non-S3 services, then the stack will create your S3 bucket, fail on the non-S3 permission problem, then abort - leaving the S3 bucket in place. When you next deploy again, the S3 bucket creation fails because the bucket already exists, causing the whole deploy to fail.

This is a big pain having to remember to delete the same S3 bucket over and over after each failed deploy, until you finally manage to figure out the set of permissions that allow all the resources to be created successfully.

Proposed Solution

If the cdk deploy process created an S3 bucket, and the deploy fails, then the S3 bucket should be deleted again if it is empty and even if it was set with the removal policy of "retain". This is because stack deployments should be atomic, so if the stack hasn't been deployed, the S3 bucket should not have been created.

Alternatively the bucket can be left in place, but a subsequent deploy should recognise that this is the bucket created from a prior run, and take ownership of it rather than complaining that the bucket already exists.

Other

👋 I may be able to implement this feature request
⚠️ This feature might incur a breaking change

This is a 🚀 Feature Request

The text was updated successfully, but these errors were encountered:

iliapolo · 2020-11-25T11:56:24Z

@adam-nielsen

If the cdk deploy process created an S3 bucket, and the deploy fails, then the S3 bucket should be deleted again if it is empty and even if it was set with the removal policy of "retain"

Im not sure its ok for us to interpret Retain as Retain only if the bucket is not empty, otherwise delete. Especially since this is not the CloudFormation behavior.

CDK makes a conscious choice about defaulting the removal policy of an s3 bucket to Retain because its a stateful resource, I understand this creates some friction during development cycles, but we currently feel this is the safer choice.

You could explicitly set the removal policy to Destroy during your development iterations, if you know that the bucket won't contain any objects.

We also have a PR for automatically deleting objects in the bucket, in which case, a default of Destroy might be more appropriate.

adam-nielsen · 2020-11-26T00:33:52Z

To be clear "Retain only if the bucket is not empty, otherwise delete" isn't technically what I'm after either, it would be "Retain, unless the bucket was created in this stack deploy run and the deploy is being rolled back". For example, if the bucket was created in the first successful run, and it's still empty, and a subsequent stack deploy failed and was rolled back, the bucket should not be deleted because it was not created during the failed run.

I suppose what I'm getting at is that the deploys should be atomic - they either fully succeed or fully fail. Having them partially succeed and then break subsequent deploys because of earlier leftovers now conflicting doesn't seem right to me.

I wonder whether if CloudFormation created all S3 buckets last, that might help address the issue because then S3 buckets would only get created after every non-S3 resource was created successfully, reducing the likelihood of a failure after a bucket creation?

I agree your suggestion of setting the removal policy to "Destroy" during testing is a good workaround, though.

iliapolo · 2020-11-29T10:02:15Z

@adam-nielsen

Retain, unless the bucket was created in this stack deploy run and the deploy is being rolled back

Yeah I understand. It sounds reasonable, but more of a CloudFormation capability I believe. Perhaps it's worth a feature request for adding a new deletion policy type. Implementing this in CDK would require rather involved hooks into the CFN engine.

I wonder whether if CloudFormation created all S3 buckets last, that might help address the issue because then S3 buckets would only get created after every non-S3 resource was created successfully, reducing the likelihood of a failure after a bucket creation?

Resource creation is determined by dependencies, even if CloudFormation would hold back on provisioning all independent buckets to the end, it cannot delay the provisioning of a bucket if another resource depends on it.

I'm going to close this issue out since I believe its more relevant for a discussion on CloudFormation channels.

Let me know if you still feel differently.

Thanks

github-actions · 2020-11-29T10:02:42Z

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

adam-nielsen added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Nov 25, 2020

github-actions bot added the @aws-cdk/aws-s3 Related to Amazon S3 label Nov 25, 2020

github-actions bot assigned iliapolo Nov 25, 2020

github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Nov 26, 2020

iliapolo closed this as completed Nov 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[s3] Gracefully handle failed deploys #11687

[s3] Gracefully handle failed deploys #11687

adam-nielsen commented Nov 25, 2020

iliapolo commented Nov 25, 2020

adam-nielsen commented Nov 26, 2020

iliapolo commented Nov 29, 2020

github-actions bot commented Nov 29, 2020