-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(ec2): default Vpc structure results in broken networking during create/delete #21348
Comments
@kylelaker it sounds like this may be an issue specific to CloudFormation custom resources since I think the solution here would be to specify the dependency on This issue has been classified as p2. That means a workaround is available or it is deemed a nice-to-have feature. Given the amount of work there is to do and the relative priority of this issue, the CDK team is unlikely to address it. That does not mean the issue will never be fixed! If someone from the community submits a PR to fix this issue, and the PR is small and straightforward enough, and meets the quality bars to be reviewed and merged with little effort we will accept that PR. PRs that do not build or need complex or multiple rounds of reviews are unlikely to be merged and will be closed to keep our backlog manageable. In the mean time, remember that you can always use the escape hatch (https://docs.aws.amazon.com/cdk/v2/guide/cfn_layer.html) mechanism to have fine control over the CloudFormation output you want. We will keep the issue open for discoverability, to collect upvotes, and to facilitate discussion around this topic. We use +1s on this issue to help prioritize our work, and are happy to re-evaluate the prioritization of this issue based on community feedback. You can reach out to the cdk.dev community on Slack (https://cdk.dev/) to solicit support for reprioritization. |
So I think that's part of it. But there's something even weirder happening when specifying a dependency on #!/usr/bin/env node
import 'source-map-support/register';
import * as cdk from "aws-cdk-lib";
import * as ec2 from "aws-cdk-lib/aws-ec2";
import * as lambda from "aws-cdk-lib/aws-lambda";
const app = new cdk.App();
const stack = new cdk.Stack(app, "Ec2PrivateStack");
const vpc = new ec2.Vpc(stack, "SampleVpc");
const vpcSubnets = { subnetType: ec2.SubnetType.PRIVATE_WITH_NAT };
const fn = new lambda.SingletonFunction(stack, "Function", {
code: lambda.Code.fromInline(`module.exports = { handler: function() { console.log("Hello, world"); } }`),
runtime: lambda.Runtime.NODEJS_16_X,
handler: "index.handler",
uuid: "fa5088ab-850a-4106-a657-482d78dccf68",
lambdaPurpose: "Hello",
vpc,
vpcSubnets,
});
fn.addDependency(vpc.selectSubnets(vpcSubnets).internetConnectivityEstablished); So the dependency on the private routes gets set, but it seems like other dependencies aren't really happening in a way that results in success. It works for creation but then the stack fails to delete. You can see that the public subnet resources start to get deleted immediately. So the private subnet routes and the NAT Gateway may still exist but they have no routes to the IGW. And from there because the Lambda function takes some time to delete (well, it actually deletes pretty quickly but CloudFormation takes ~20min to notice but that feels like an unrelated CloudFormation bug) other resources keep deleting in weird orders and the whole process ends up failing with a lingering Internet Gateway and VPC with "dependent resources". This means the given workaround does not work. And it makes #21357 pretty impossible to implement correctly. I think the fix here would be to make the aws-cdk/packages/@aws-cdk/aws-ec2/lib/vpc.ts Lines 1937 to 1946 in 618c5bc
this.internetConnectivityEstablished ; otherwise every resource that needs internet connectivity in a PRIVATE_WITH_NAT subnet also has to add a dependency on the public subnets that have NAT Gateways which is a hard filter to add.
The stack events from building and deleting this stack can be seen at https://gist.github.com/b4ecd79a8be6e57e229a34ccdc34e220. I think I ran that command a little to early so the final VPC delete failure isn't there but I think the point is clear. I will open a PR to add the dependency to the NAT Gateway but let me know if that's not the right approach. |
I would need to see the dependency structure here to see where we can add a dependency line most conveniently without introducing a cycle. I'm guessing right now |
I believe it's this.
I guess the safe solution is to have |
I tried to think through this to see if I could identify a better solution than doing this for all But then I realized that theoretically a Lambda function that is used as a Custom Resource (and therefore would get the dependency via And it feels feasible to document "if you use a So I agree that just doing the safer thing by default feels most correct and is closer to what I'd expect as an end user. |
…21495) Because private subnets rely on a NAT Gateway for internet connectivity, it is important that the NAT Gateway have the necessary dependencies for its own internet connectivity. Otherwise, `internetConnectivityEstablished` on a private subnet may not be true during stack creation and deletion. This is most notable for CloudFormaton Custom Resources; however, it can result in other dependency failures during stack deletion, especially if resources within a private subnet take a long time to delete. Ensuring that the NAT Gateway depends on its public subnet having internet connectivity completes the chain of dependencies and ensures that all resources will correctly have internet connectivity. Because of the layers of abstraction around subnets and NAT gateways, unit tests for this feature are challenging (because there isn't a clear means to get the CloudFormaton Logical ID of the AWS::EC2::Route that establishes the connectivity); however, NAT Gateways are included in several integration tests so this dependency can be tested there. Closes: #21348 ---- ### All Submissions: * [X] Have you followed the guidelines in our [Contributing guide?](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md) ### Adding new Unconventional Dependencies: * [ ] This PR adds new unconventional dependencies following the process described [here](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md/#adding-new-unconventional-dependencies) ### New Features * [ ] Have you added the new feature to an [integration test](https://github.com/aws/aws-cdk/blob/main/INTEGRATION_TESTS.md)? * [ ] Did you use `yarn integ` to deploy the infrastructure and generate the snapshot (i.e. `yarn integ` without `--dry-run`)? *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
|
…ws#21495) Because private subnets rely on a NAT Gateway for internet connectivity, it is important that the NAT Gateway have the necessary dependencies for its own internet connectivity. Otherwise, `internetConnectivityEstablished` on a private subnet may not be true during stack creation and deletion. This is most notable for CloudFormaton Custom Resources; however, it can result in other dependency failures during stack deletion, especially if resources within a private subnet take a long time to delete. Ensuring that the NAT Gateway depends on its public subnet having internet connectivity completes the chain of dependencies and ensures that all resources will correctly have internet connectivity. Because of the layers of abstraction around subnets and NAT gateways, unit tests for this feature are challenging (because there isn't a clear means to get the CloudFormaton Logical ID of the AWS::EC2::Route that establishes the connectivity); however, NAT Gateways are included in several integration tests so this dependency can be tested there. Closes: aws#21348 ---- ### All Submissions: * [X] Have you followed the guidelines in our [Contributing guide?](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md) ### Adding new Unconventional Dependencies: * [ ] This PR adds new unconventional dependencies following the process described [here](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md/#adding-new-unconventional-dependencies) ### New Features * [ ] Have you added the new feature to an [integration test](https://github.com/aws/aws-cdk/blob/main/INTEGRATION_TESTS.md)? * [ ] Did you use `yarn integ` to deploy the infrastructure and generate the snapshot (i.e. `yarn integ` without `--dry-run`)? *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Describe the bug
When creating a VPC with the default configuration, in some situations, the NAT Gateway(s) may be torn down before the private subnets are. This can cause issues for resources that rely on network egress in order to successfully (such as a Lambda-backed CloudFormation Custom Resource). Additionally, the Private subnets (and resources that depend on them) may be created before the NAT Gateways. This can result in broken initialization logic.
Expected Behavior
It should not be possible for the NAT Gateway to be deleted before the private subnets. The private subnets should depend on the Gateway.
Current Behavior
The NAT Gateway resource, the public subnets, and the internet gateway are deleted.
Reproduction Steps
Building out the necessary infrastructure from here to actually create a custom resource to demonstrate is somewhat non-trivial and would e able to be copied/pasted.
Possible Solution
Whether directly or indirectly, the private subnet should depend on the NAT Gateway. This could also be done by depending on the route. This should be handled already in
configureSubnet
.aws-cdk/packages/@aws-cdk/aws-ec2/lib/nat.ts
Lines 248 to 255 in 2d26916
But the Route Table Association has the references to the Subnet and the Route Table, and the Routes reference the Route Table and the target (NAT Gateway). But there's no reference from the Subnet to any of those resources, so there is no implicit dependency, which breaks network egress for resources in private subnets as the stack is being deleted.
Maybe this issue is limited to Lambda Functions instead of being a larger problem with EC2? Do more resources need to add a dependency on
Vpc.internetConnectivityEstablished
?Additional Information/Context
No response
CDK CLI Version
2.33.0
Framework Version
No response
Node.js Version
16
OS
Linux
Language
Typescript
Language Version
No response
Other information
No response
The text was updated successfully, but these errors were encountered: