Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ec2): Internet connectivity not established for private subnets #21495

Merged
merged 103 commits into from
Aug 22, 2022

Conversation

laurelmay
Copy link
Contributor

@laurelmay laurelmay commented Aug 8, 2022

Because private subnets rely on a NAT Gateway for internet connectivity,
it is important that the NAT Gateway have the necessary dependencies for
its own internet connectivity. Otherwise,
internetConnectivityEstablished on a private subnet may not be true
during stack creation and deletion. This is most notable for
CloudFormaton Custom Resources; however, it can result in other
dependency failures during stack deletion, especially if resources
within a private subnet take a long time to delete.

Ensuring that the NAT Gateway depends on its public subnet having
internet connectivity completes the chain of dependencies and ensures
that all resources will correctly have internet connectivity.

Because of the layers of abstraction around subnets and NAT gateways,
unit tests for this feature are challenging (because there isn't a clear
means to get the CloudFormaton Logical ID of the AWS::EC2::Route that
establishes the connectivity); however, NAT Gateways are included in
several integration tests so this dependency can be tested there.

Closes: #21348


All Submissions:

Adding new Unconventional Dependencies:

  • This PR adds new unconventional dependencies following the process described here

New Features

  • Have you added the new feature to an integration test?
    • Did you use yarn integ to deploy the infrastructure and generate the snapshot (i.e. yarn integ without --dry-run)?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

Because private subnets rely on a NAT Gateway for internet connectivity,
it is important that the NAT Gateway have the necessary dependencies for
its own internet connectivity. Otherwise,
`internetConnectivityEstablished` on a private subnet may not be true
during stack creation and deletion. This is most notable for
CloudFormaton Custom Resources; however, it can result in other
dependency failures during stack deletion, especially if resources
within a private subnet take a long time to delete.

Ensuring that the NAT Gateway depends on its public subnet having
internet connectivity completes the chain of dependencies and ensures
that all resources will correctly have internet connectivity.

Because of the layers of abstraction around subnets and NAT gateways,
unit tests for this feature are challenging (because there isn't a clear
means to get the CloudFormaton Logical ID of the AWS::EC2::Route that
establishes the connectivity); however, NAT Gateways are included in
several integration tests so this dependency can be tested there.
@gitpod-io
Copy link

gitpod-io bot commented Aug 8, 2022

@laurelmay
Copy link
Contributor Author

It looks like this ended up failing because of changes in snapshots for integration tests in other packages than @aws-cdk/aws-ec2. Unfortunately, there are several there whose pricing models I am unfamiliar with and I don't really want to spin up a bunch of resources I don't understand to fully integration test them, especially in light of #21496.

I'd be happy to find all the relevant integration tests and run them using --dry-run if that'd be acceptable. But if not, based on snapshot template contents the following packages would need to be yarn build+test && yarn integ --update-on-failed and I'd need support with that:

$ grep -l  "AWS::EC2::NatGateway"  **/*template*.json | cut -d'/' -f1-2 | uniq

@aws-cdk/aws-apigateway
@aws-cdk/aws-apigatewayv2-integrations
@aws-cdk/aws-appmesh
@aws-cdk/aws-apprunner
@aws-cdk/aws-autoscaling
@aws-cdk/aws-batch
@aws-cdk/aws-cloud9
@aws-cdk/aws-cloudfront-origins
@aws-cdk/aws-codebuild
@aws-cdk/aws-codedeploy
@aws-cdk/aws-codepipeline-actions
@aws-cdk/aws-docdb
@aws-cdk/aws-ec2
@aws-cdk/aws-ecs-patterns
@aws-cdk/aws-ecs
@aws-cdk/aws-efs
@aws-cdk/aws-eks-legacy
@aws-cdk/aws-eks
@aws-cdk/aws-elasticloadbalancing
@aws-cdk/aws-elasticloadbalancingv2-actions
@aws-cdk/aws-elasticloadbalancingv2-targets
@aws-cdk/aws-elasticloadbalancingv2
@aws-cdk/aws-elasticsearch
@aws-cdk/aws-events-targets
@aws-cdk/aws-fsx
@aws-cdk/aws-globalaccelerator-endpoints
@aws-cdk/aws-glue
@aws-cdk/aws-lambda-nodejs
@aws-cdk/aws-lambda
@aws-cdk/aws-msk
@aws-cdk/aws-neptune
@aws-cdk/aws-opensearchservice
@aws-cdk/aws-rds
@aws-cdk/aws-redshift
@aws-cdk/aws-route53resolver
@aws-cdk/aws-route53-targets
@aws-cdk/aws-route53
@aws-cdk/aws-s3-deployment
@aws-cdk/aws-servicediscovery
@aws-cdk/aws-stepfunctions-tasks
@aws-cdk/aws-synthetics
@aws-cdk-containers/ecs-service-extensions
@aws-cdk/pipelines

@laurelmay laurelmay marked this pull request as ready for review August 8, 2022 12:05
@TheRealAmazonKendra
Copy link
Contributor

We don't want to just do a dry run because we won't see if the deployment of the new template actually succeeds. I'll go ahead and download this PR and run the tests and upload the changes. I'll then have you take a look and ensure it matches the output you'd expect based off the changes you made. In the meantime, I'll take an initial pass through your code changes this morning.

@TheRealAmazonKendra
Copy link
Contributor

Oh, looks like this is just a one line change. I'll upload the tests. I see there's conversation going on in the issue so I'll let @corymhall or @rix0rrr do the actual review on this one (this is also very much not my area of expertise).

rix0rrr
rix0rrr previously requested changes Aug 9, 2022
Copy link
Contributor

@rix0rrr rix0rrr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clever, and you are very correct that this is a missing dependency.

But I think we need a couple dependencies more to make sure we get rid of all race conditions, no?

  • Function to internetConnectivityEstablished
  • internetConnectivityEstablished needs to include the RouteTableAssociation

Would you mind adding those as well, while you're in here anyway?

@TheRealAmazonKendra
Copy link
Contributor

Crying on the inside for finally having finished updating all the integ tests

Just kidding. I'll keep these test updates stashed until the requested changes have been made and then I'll rerun the ones needed. @kylelaker please assign this to me and/or tag me when it's ready for integ test updates and I'll get them uploaded.

…ablished

Internet connectivity is not truly available until the subnet route
table has been properly associated. This needs to be around for the life
of any resources that depend on internet connectivity.
@mergify mergify bot dismissed rix0rrr’s stale review August 10, 2022 03:04

Pull request has been modified.

@laurelmay
Copy link
Contributor Author

Alright! I think I've made the requested changes. Subnet's internetConnectivityEstablished now depends on the Route Table Association. A NAT Gateway now depends on its subnet's internetConnectivityEstablished. And all Functions within a VPC now depend on their subnets' internetConnectivityEstablished.

I also reverted 99bdea4 which added (now incorrect) updated results for integration tests.

I think the list of packages that need to have integration tests run to have the outputs updated is:

@aws-cdk/aws-apigateway
@aws-cdk/aws-apigatewayv2-integrations
@aws-cdk/aws-appmesh
@aws-cdk/aws-apprunner
@aws-cdk/aws-autoscaling
@aws-cdk/aws-batch
@aws-cdk/aws-cloud9
@aws-cdk/aws-cloudfront-origins
@aws-cdk/aws-codebuild
@aws-cdk/aws-codedeploy
@aws-cdk/aws-codepipeline-actions
@aws-cdk/aws-docdb
@aws-cdk/aws-ec2
@aws-cdk/aws-ecs
@aws-cdk/aws-ecs-patterns
@aws-cdk/aws-efs
@aws-cdk/aws-eks
@aws-cdk/aws-eks-legacy
@aws-cdk/aws-elasticloadbalancing
@aws-cdk/aws-elasticloadbalancingv2
@aws-cdk/aws-elasticloadbalancingv2-actions
@aws-cdk/aws-elasticloadbalancingv2-targets
@aws-cdk/aws-elasticsearch
@aws-cdk/aws-events-targets
@aws-cdk/aws-fsx
@aws-cdk/aws-globalaccelerator-endpoints
@aws-cdk/aws-glue
@aws-cdk/aws-lambda
@aws-cdk/aws-lambda-nodejs
@aws-cdk/aws-lambda-python
@aws-cdk/aws-msk
@aws-cdk/aws-neptune
@aws-cdk/aws-opensearchservice
@aws-cdk/aws-rds
@aws-cdk/aws-redshift
@aws-cdk/aws-route53
@aws-cdk/aws-route53resolver
@aws-cdk/aws-route53-targets
@aws-cdk/aws-s3-deployment
@aws-cdk/aws-servicediscovery
@aws-cdk/aws-stepfunctions-tasks
@aws-cdk/aws-synthetics
@aws-cdk-containers/ecs-service-extensions
@aws-cdk/custom-resources
@aws-cdk/pipelines

That list is based off running yarn integ and ignoring failures that were caused by integ not being a command.

@TheRealAmazonKendra
Copy link
Contributor

@rix0rrr Can you review and make sure this looks right to you before I add the test updates? They're quite a lot so I don't want the real code changes to get lost in those.

Copy link
Contributor

@rix0rrr rix0rrr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! One more thing?

@@ -1101,6 +1101,8 @@ Environment variables can be marked for removal when used in Lambda@Edge by sett
return undefined;
}

this.node.addDependency(props.vpc.selectSubnets(props.vpcSubnets).internetConnectivityEstablished);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call selectSubnets once, and use the result for both the subnetIds and the dependency?

In other words, add the dependnecy where selectSubnets is already being called?

This also gets rid of the destructing assignment to `subnetIds` to make
it a little more clear what's going on when we're also getting public
subnets and what exactly `internetConnectivityEstablished` means, since
we're now using two properties and there are several subnet lists flying
around.
@mergify mergify bot dismissed rix0rrr’s stale review August 10, 2022 15:23

Pull request has been modified.

rix0rrr
rix0rrr previously approved these changes Aug 10, 2022
@laurelmay
Copy link
Contributor Author

@TheRealAmazonKendra It looks like I am not able to assign to you nor to directly request review; but I think this should be ready for the integration tests. Thank you so much for helping with those!

@TheRealAmazonKendra TheRealAmazonKendra self-assigned this Aug 10, 2022
@TheRealAmazonKendra
Copy link
Contributor

On it!

@mergify mergify bot dismissed rix0rrr’s stale review August 11, 2022 04:56

Pull request has been modified.

@TheRealAmazonKendra
Copy link
Contributor

Good news/bad news is this: we have a lot of broken tests but I don't think they're broken from this change so I'm uploading them in batches as I verify that they're good. If they need a fix that is unrelated to this change, I'll PR that fix separately first so it doesn't get mixed in with these updates.

@mergify
Copy link
Contributor

mergify bot commented Aug 18, 2022

update

✅ Branch has been successfully updated

@TheRealAmazonKendra
Copy link
Contributor

@Mergifyio update

@mergify
Copy link
Contributor

mergify bot commented Aug 19, 2022

update

✅ Branch has been successfully updated

Copy link
Contributor

@TheRealAmazonKendra TheRealAmazonKendra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally!!!

@mergify
Copy link
Contributor

mergify bot commented Aug 22, 2022

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
  • Commit ID: 3ba573b
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mergify
Copy link
Contributor

mergify bot commented Aug 22, 2022

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify mergify bot merged commit 5b1488d into aws:main Aug 22, 2022
josephedward pushed a commit to josephedward/aws-cdk that referenced this pull request Aug 30, 2022
…ws#21495)

Because private subnets rely on a NAT Gateway for internet connectivity,
it is important that the NAT Gateway have the necessary dependencies for
its own internet connectivity. Otherwise,
`internetConnectivityEstablished` on a private subnet may not be true
during stack creation and deletion. This is most notable for
CloudFormaton Custom Resources; however, it can result in other
dependency failures during stack deletion, especially if resources
within a private subnet take a long time to delete.

Ensuring that the NAT Gateway depends on its public subnet having
internet connectivity completes the chain of dependencies and ensures
that all resources will correctly have internet connectivity.

Because of the layers of abstraction around subnets and NAT gateways,
unit tests for this feature are challenging (because there isn't a clear
means to get the CloudFormaton Logical ID of the AWS::EC2::Route that
establishes the connectivity); however, NAT Gateways are included in
several integration tests so this dependency can be tested there.

Closes: aws#21348

----

### All Submissions:

* [X] Have you followed the guidelines in our [Contributing guide?](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md)

### Adding new Unconventional Dependencies:

* [ ] This PR adds new unconventional dependencies following the process described [here](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md/#adding-new-unconventional-dependencies)

### New Features

* [ ] Have you added the new feature to an [integration test](https://github.com/aws/aws-cdk/blob/main/INTEGRATION_TESTS.md)?
	* [ ] Did you use `yarn integ` to deploy the infrastructure and generate the snapshot (i.e. `yarn integ` without `--dry-run`)?

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@vleandersson
Copy link

vleandersson commented Aug 31, 2022

This is amazing! Nice to come to completed work when finding a bug 🥳

I'm not fully sure about CDKs publishing process, but do we know when this will be available in the npm cdk package @TheRealAmazonKendra ?

@laurelmay
Copy link
Contributor Author

laurelmay commented Aug 31, 2022

@vleandersson It looks like it was included in v2.39.0, according to the last "bug fix" bullet point

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(ec2): default Vpc structure results in broken networking during create/delete
5 participants