-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aws_ec2: seems impossible to change subnets in a vpc once deployed #28369
Comments
Yes the ec2.Vpc class does not have very granular control like that and you will need some customization. We had similar discussion here for your reference. |
@pahud I don't see how the comment in the linked issue helps me I am not looking for more granular control, I'm just asking for CDK to be able to successfully deploy my changes |
I ended up not needing to add new subnets to my VPC but later I found I needed to change the CIDR of the VPC to be compatible with other VPCs in the organization (had previously used the default CIDR) this ends up at the same problem, adding
The problem seems to centred around the RDS db For that I have: rds.DatabaseInstance(
self,
id="MySQL",
engine=rds.DatabaseInstanceEngine.mysql(
version=rds.MysqlEngineVersion.VER_8_0
),
vpc=vpc,
vpc_subnets=ec2.SubnetSelection(
subnet_type=ec2.SubnetType.PRIVATE_ISOLATED
),
...
) The error makes it sound like the deployment is creating a new VPC, and new subnets, and then maybe there is an implicit SubnetGroup already created for the RDS db, and what fails is trying to put the new subnets in that existing SubnetGroup? It sounds same cause as this issue: hashicorp/terraform-provider-aws#27459 and also hashicorp/terraform-provider-aws#16419 I will see if I can find a workaround along the lines described ... by creating a new SubnetGroup for the new VPC subnets and migrating the RDS db to use that. |
First attempt failed: subnet_group = rds.SubnetGroup(
self,
id="MySQL DB Subnet-v1", # increment this if you need a new subnet group for VPC changes
description="Manually-defined SubnetGroup to allow VPC modifications.",
vpc=vpc,
vpc_subnets=ec2.SubnetSelection(
subnet_type=ec2.SubnetType.PRIVATE_ISOLATED
),
)
rds_instance = rds.DatabaseInstance(
self,
id="MySQL",
engine=rds.DatabaseInstanceEngine.mysql(
version=rds.MysqlEngineVersion.VER_8_0
),
vpc=vpc,
subnet_group=subnet_group,
...
) This gives:
Ok right, I also have an RDS Proxy updating this to change the id: rds_proxy = rds_instance.add_proxy(
"RDS Proxy-v1", # increment this if make changes to VPC (see SubnetGroup)
secrets=[self.db_admin_credentials],
vpc=vpc,
vpc_subnets=ec2.SubnetSelection(
subnet_type=ec2.SubnetType.PRIVATE_ISOLATED
),
debug_logging=proxy_debug_logging,
security_groups=[self.db_connection_sg],
) ...gets further and looks promising. But then fails with:
This seems related to I guessed it might help to give the VPC a new id, to try and force it to generate a new Custom::VpcRestrictDefaultSG instead of apparently re-using the old one that didn't have permissions against the new VPC (?) But unfortunately the previous error had left my stack in UPDATE_ROLLBACK_FAILED state, so attempts to proceed further just give:
|
I think this is true for any resources created using default values for the Vpc construct. If you have the need to replace a NAT Gateway or change an IP, it would be hard to do so as these resources are not exposed. It would be extremely useful to get access to them |
@trobert2 I think in the end I have conflated two related but slightly different issues My original problem was adding subnets to an existing VPC, which seems impossible currently. I tried to work around that by creating a second VPC in CDK code and moving my resources into it, but then I hit the second problem which seems to be around the RDS db construct and subnet groups. Later I came back and tried to change the CIDR for my VPC... this seems to implicitly create a new VPC and migrate resources to it (which is maybe what CF/CDK should do in the first case too) but that meant it ran into the second problem again, which is what I have described in more detail in comments above. Finally I tried to work around the RDS problem (by trying to help/trick CF into doing what it needs to do) but ended up bricking my stack into a UPDATE_ROLLBACK_FAILED state. I tried to recover that via the tips here https://stackoverflow.com/a/72755589/202168 and by manually updating resources to make them "rollback-able" but it didn't seem to be getting anywhere and in the end I gave up and deleted the stack and redeployed it. TBH I am kind of anxious about using this IaC tooling in production now. Are there any escape hatches for e.g. providing a manual deployment plan? My understanding is that even if I manually updated all my resources into desired state via AWS web console, CF/CDK would still think they were in bad state and refuse to deploy. Is there a way to tell CDK "just assume everything's ok now" and reset its state? IaC tooling seems conceptually similar to database migration tooling. The latter usually provides both auto-generated (from ORM models, akin to CDK code) and manually-defined migration scripts. Is there anything like that for CDK? |
I see. Well in that case the issue isn't really the IaC. Even if you create a VPC manually with a CIDR, give it a subnet and create an instance in it, be that RDS or otherwise, you will have a hard time changing the network stack under the machine. Even if possible, there's a lot of operational steps to get something like this done that the tools can't really take care of for the user. I don't think CDK, Cloudformation or any other tool can help with that. Some AWS resources are just more static in nature. I do feel like CDK could expose more of these resources so that SOME operations can be possible. A lot of those are hidden now. Like in the example with a NAT gateway, it cannot be accessed on the Vpc object. About manually changing those resources then expecting Cloudformation (which lies under CDK) to keep track of it, in a more general sense. You can see drift and how far the tooling around that helps, but the declarative nature of the system means it expects changes to happen in code first. You can set the retain policy on the resources you care about and just delete the stack: |
I'm not sure that's entirely true. The issues I ran into mostly look like bugs. e.g. in the first issue CF/CDK said it couldn't do anything, but it seems like it actually needed to set up a new VPC and migrate resources over, which in other cases it is apparently willing to do. and then when it does do that there is the second issue where one of the constructs does not cope with a detail of that migration, when it seems like it could do (e.g. tweaking the CDK code to get it to think about the changes differently allows for progress) I don't mind making these tweaks, or splitting my changes across 2-3 phases. Similar things are required in case of some database migrations. the more worrying part is when the rollback didn't work and it got in an unrecoverable state - this is essentially a second symptom of the bug which prevented the deployment from succeeding, i.e. some of the resources don't know how to cope with the changes that other resources are making (e.g. they make assumptions they shouldn't about what doesn't change) and I'm not sure how useful the drift detection is, since when my stack was in UPDATE_ROLLBACK_FAILED state the drift detection said everything was "in sync" - it didn't provide any signals about how to get back in a state that can be rolled back or forward Instead of deleting the whole stack I could have marked say the S3 bucket and RDS db with But it defeats the point of IaC a bit if I have to manage half the stack manually because it's not safe to let the IaC manage it |
Describe the bug
I started out with a vpc containing only a PRIVATE_ISOLATED subnet
Later I needed to add PRIVATE_WITH_EGRESS and PUBLIC subnets in order to add NAT gateway
But this fails with:
The CIDR '10.0.2.0/24' conflicts with another subnet
I tried a few things to get around this, eventually I tried in two stages 1) setting up a whole second vpc with the three subnets and 2) move resources to the second vpc and drop the original one
But this has problems too:
AWS::RDS::DBSubnetGroup | "The new Subnets are not in the same Vpc as the existing subnet group (Service: Rds, Status Code: 400 ...)
For various reasons I cannot afford to destroy the stack and recreate it
Expected Behavior
It should be possible to modify the stack and then deploy the changes
Current Behavior
Stuck in a dead end
Reproduction Steps
and add resources like RDS MySQL db and Lambda functions to it
I tried with and without explicitly adding
cidr_mask=24
specifier to the subnetsPossible Solution
No response
Additional Information/Context
No response
CDK CLI Version
2.114.1 (build 02bbb1d)
Framework Version
2.114.1
Node.js Version
v18.18.0
OS
macOS 14.1
Language
Python
Language Version
3.11
Other information
Right now I'd just love suggestions for a workaround even if underlying issue can't be fixed any time soon
The text was updated successfully, but these errors were encountered: