Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: UserPool, Volume, ElasticSearch, FSx are now RETAIN by default #12920

Merged
merged 47 commits into from
Feb 22, 2021

Conversation

rix0rrr
Copy link
Contributor

@rix0rrr rix0rrr commented Feb 8, 2021

(Gave this change an eye-catching title so it would stand out in the change log)

cfn-lint expects all stateful resources to have a removal policy, but we weren't providing that option at the L2 level yet, and the L1 API is cumbersome.

Add the removalPolicy option to the following resources:

  • Cognito User Pools (default: RETAIN)
  • EC2 Volume (default: RETAIN)
  • ElasticSearch Domain (default: RETAIN)
  • FSx FileSystem (default: RETAIN)
  • SQS Queue (default: DESTROY)
  • Nested Stack (default: DESTROY)

All L2 resources now have an applyRemovalPolicy method, so it can be set even for non-stateful resources.

A mechanism has been added to the codegen so that new L2 authors cannot forget about the removalPolicy when they're writing an L2.

I'm aware that the choice to make most of these RETAIN by default is going to be contentious. There are 2 questions here:

  • Is RETAIN the correct default in general?
  • Can we afford switching from implicit-DESTROY to implicit-RETAIN?

Is RETAIN the correct default?

I would argue "yes", by process of elimination:

Defaulting removalPolicy for all resources to DESTROY if nothing is given (and writing DESTROY to the template) is just going to put us back to the situation in CloudFormation before cfn-lint existed (your stateful resources are going to be destroyed and nothing is going to warn you about it).

Making removalPolicy explicitly required for all stateful resources is a backwards-breaking change, and the point about CDK is that it will have sane defaults.

I can't come up with a better solution than picking RETAIN as default.

Can we afford changing the default here?

I'm sensitive to the arguments that this a breaking change which we can't afford.

It feels like this is the value we would have given the resources had we thought about this earlier, and not being able to correct past mistakes by saying they are "locked in" is going to be a death knell for the project.

We could go for a "future behavior" flag, but the risk here seems minimal: this is a change that REDUCES risk of data loss. It might increase risk of deployment errors (duplicate resource names), but that can be manually managed. I will fast-follow with a change that will introduce warnings for resources that have a physical name set and also a RETAIN policy, to clearly identify you're doing something dangerous in CloudFormation. I'm not sure it's worth the effort to go further than that.

Although please: discussion welcome.

Fixes #12563.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@rix0rrr rix0rrr requested a review from a team February 8, 2021 13:29
@rix0rrr rix0rrr self-assigned this Feb 8, 2021
@gitpod-io
Copy link

gitpod-io bot commented Feb 8, 2021

@mergify mergify bot added the contribution/core This is a PR that came from AWS. label Feb 8, 2021
@PatMyron
Copy link
Contributor

PatMyron commented Feb 8, 2021

but we weren't providing that option at the L2 level yet, and the L1 API is cumbersome

Is it possible to tackle this specifically? I agree none of the options here are clearly better without downsides, but making it more obvious and less cumbersome to set removalPolicy when CDK hasn't set defaults seems strictly better

@eladb
Copy link
Contributor

eladb commented Feb 9, 2021

@rix0rrr do we have some kind of enforcement/linting that future stateful resources also have a removal policy?

@eladb
Copy link
Contributor

eladb commented Feb 9, 2021

@rix0rrr also - see aws/aws-cdk-rfcs#23 - perhaps would be useful to reference.

@rix0rrr
Copy link
Contributor Author

rix0rrr commented Feb 9, 2021

do we have some kind of enforcement/linting that future stateful resources also have a removal policy?

Not at the moment, because it's not clear where that information should live. I suppose in the doc comments of each L2 is as good a place as any... though I'm not quite sure how linting and static analysis is going to help us verify (for example) default behavior.

What we CAN do is say "if a resource has @stateful, then it MUST accept a removalPolicy?: RemovalPolicy as property."

That's good as far as it goes, but just adding:

class Resource {
  /** get/set */
  public removalPolicy: RemovalPolicy;

  /* or this */
  public applyRemovalPolicy(policy) {
    (this.node.defaultChild as CfnResource).applyRemovalPolicy(policy);
  }
}

Is going to unlock the API for ALL resources, stateful or otherwise. Why wouldn't we do that instead? And if we have this, then the annotation and the forced removalPolicy prop is less urgent.

Also, what is the point of the stateful: boolean property you described in that RFC? Not sure how it would be used

@eladb
Copy link
Contributor

eladb commented Feb 9, 2021

Not at the moment, because it's not clear where that information should live.

Where can we obtain a list of stateful resources? Does cfn-lint have that info somewhere?

Also, what is the point of the stateful: boolean property you described in that RFC? Not sure how it would be used

The RFC issue is just a bunch of preliminary ideas. I wouldn't read too much into it. If I recall, the stateful: boolean is intended to allow runtime introspection. I.e. "I want to traverse the tree and apply some policy to all stateful resources."

@rix0rrr
Copy link
Contributor Author

rix0rrr commented Feb 9, 2021

Where can we obtain a list of stateful resources? Does cfn-lint have that info somewhere?

It does. We aren't really set up for this yet, but we could potentially start extending our L1 model with information from external sources such as cfnlint.

The behavior I think would work is that we would mimic the same check cfn-lint would perform during validation: we can emit code for stateful L1s (as marked as such by cfn-lint) to require that they have a removal policy set.

That doesn't force anyhthing about HOW the removal policy is set, but you will encounter this error while developing an L2 (while writing the unit tests for it) and the path of least resistance will be to just add that removalPolicy prop that all the other resources have.

@eladb
Copy link
Contributor

eladb commented Feb 9, 2021

@rix0rrr sounds like a step in the right direction!

Copy link
Contributor

@skinny85 skinny85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only reviewed the RDS part. LGTM.

@rix0rrr rix0rrr changed the title fix: add removal policies to stateful resources fix: User Pools, Volumes, ElasticSeach, FSx are now RETAIN by default Feb 18, 2021
@rix0rrr rix0rrr changed the title fix: User Pools, Volumes, ElasticSeach, FSx are now RETAIN by default fix: UserPools, Volumes, ElasticSearch, FSx are now RETAIN by default Feb 18, 2021
@github-actions github-actions bot added @aws-cdk/aws-elasticsearch Related to Amazon Elasticsearch Service @aws-cdk/aws-fsx Related to Amazon FSx labels Feb 18, 2021
@rix0rrr rix0rrr changed the title fix: UserPools, Volumes, ElasticSearch, FSx are now RETAIN by default fix: UserPool, Volume, ElasticSearch, FSx are now RETAIN by default Feb 18, 2021
Copy link
Contributor

@iliapolo iliapolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • ElasticSearch
  • EKS

Comment on lines +149 to +151
* @default RemovalPolicy.DESTROY
*/
readonly removalPolicy?: RemovalPolicy;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're sure this is the CF default, yeah?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not. I've chosen to make this the default.

Comment on lines 45 to 46
// eslint-disable-next-line no-console
console.log(outFile);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stale from local dev?

Comment on lines 65 to 66
// eslint-disable-next-line no-console
console.log(outFile);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@@ -0,0 +1,30 @@
{
"ResourceTypes": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ow, this is a start of a new pattern.

Do we have guarantees from cfnlint about this file spec staying backwards compat? I'd hate for another round of dealing with breaking changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, but I don't see any way around that.

Comment on lines +84 to +85
// Apply the loaded file as a patch onto the current structure
patch(structure, data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor clarification - this the same patching mechanism as the cfn spec, yeah?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

echo >&2 "Downloading from ${url}..."
curl -sSfL "${url}" -o ${intermediate}

for file in StatefulResources; do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: not familiar with this bash-ism: what is StatefulResources?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A literal string :D

* ```
*/
private emitConstructValidator(resourceType: genspec.CodeName) {
this.code.openBlock('protected validate(): string[]');
Copy link
Contributor

@nija-at nija-at Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assume you're trying to use the validate API on construct compat layer. This is going away with v2 - https://github.com/aws/aws-cdk-rfcs/blob/master/text/0192-remove-constructs-compat.md#08-validation-the-validate-hook-is-now-nodeaddvalidation

Use addValidation() instead, so we don't have to come back to fix this in v2.

Copy link
Contributor Author

@rix0rrr rix0rrr Feb 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did that, the types don't like it :(

@aws-cdk/core: �[96mlib/cloudformation.generated.ts�[0m:�[93m883�[0m:�[93m23�[0m - �[91merror�[0m�[90m TS2339: �[0mProperty 'addValidation' does not exist on type 'ConstructNode'.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @eladb

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wait. Is this because this doesn't exist on constructs pre-10.x?

Copy link
Contributor

@eladb eladb Feb 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@nija-at nija-at left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix this - #12920 (comment)

Would also like to hear about the cfn-lint guarantee.

Otherwise looks good.

@iliapolo iliapolo removed their assignment Feb 21, 2021
@rix0rrr rix0rrr removed the pr/do-not-merge This PR should not be merged at this time. label Feb 22, 2021
@mergify
Copy link
Contributor

mergify bot commented Feb 22, 2021

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildProject89A8053A-LhjRyN9kxr8o
  • Commit ID: 4a72864
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mergify
Copy link
Contributor

mergify bot commented Feb 22, 2021

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-elasticsearch Related to Amazon Elasticsearch Service @aws-cdk/aws-fsx Related to Amazon FSx contribution/core This is a PR that came from AWS.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(many): a number of stateful resources don't have a removal policy
8 participants