-
Notifications
You must be signed in to change notification settings - Fork 9.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
r/security_group: Add option to forcefully revoke rules before deletion #2074
Conversation
Test results:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for debugging this! 👍
Just for transparency - as discussed via Slack on Friday, there's no way to identify rules created by EMR, so there's no better way to approach this.
My only two questions:
-
Do we really need the customizable timeout? It suggests that to work around the EMR problem user not only needs to set
revoke_rules_on_delete
but also figure out if the default timeout is sufficient. I think that for operation like this Terraform should have sufficiently high default timeout that nobody needs to tune it. Customizable timeouts are IMO useful for things like higher instance/disk sizes, where it's obvious that the user is doing something unusual (spinning up unusually big instance) and that will naturally take more time than usual. -
Are those VPC sweepers actually going to work at this point? I thought we discussed earlier that VPC should come at last as one cannot delete a VPC before deleting all the resources within (subnets, IGWs, route tables, instances, etc.).
Hey @radeksimko !
The
Yes. At this point they are scoped so that they would only ever destroy a VPC created in this test, to my knowledge. You are correct that destroying VPCs in the context of sweepers is hard, but I'd like to get it started. You see here that we depend on sweeping SecurityGroups, also tightly scoped.
My concern is that "at last" will come way to late. I'm starting small, destroying a set of VPCs that are "known" to only have a few conflicting things, in a certain scenario. And even then, maybe not 100%, but I want to start small. I'm ok removing them if you'd like. Also as I mentioned:
Originally I started with an acceptance test that would fail, and I wanted leaked resources, to full reproduce what was happening. I wrote sweepers to clean up the mess I left. I misunderstood |
In ec868d3 I removed the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, assuming green Travis.
@catsby trying to figure out if your change to the documentation here is correct. The Or at least, all three of my EMR SGs have Edit: I guess |
Oh, for what it's worth, the issue for me is that terraform gets stuck deleting the service SG for their ENI in private subnets, which probably means I should have a |
Sorry for the noise, I described my more complete issue in #5413 |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks! |
Add new
revoke_rules_on_delete
option foraws_security_group
, which instructs the resource to delete it’s attached ingress and egress rules before attempting to delete the security group itself. Normally this isn’t required but there are some AWS services that may accept a Security Group as an input and apply rules to it outside of Terraform’s influence. Specifically, the EMR service will automatically apply rules to security groups used for the EMR Managed Security Groups, and the service will also re-apply those rules if they are removed by the API, web console, et. al. See Amazon EMR–Managed Security Groups for more information about the EMR managed security groups, specifically.revoke_rules_on_delete
is optional, with a default offalse
, so this extra operation is opt-in as it shouldn’t normally be needed.This PR contains several things to this support this feature:
revoke_rules_on_delete
attribute, and documentationrevoke_rules_on_delete
attribute:TestAccAWSSecurityGroup_forceRevokeRules_true
TestAccAWSSecurityGroup_forceRevokeRules_false
timeouts
on Security Groups: onlydelete
at this timeemr_cluster
for usingrevoke_rules_on_delete
with any security groups used inemr_cluster.emr_managed_master_security_group
oremr_cluster.emr_managed_slave_security_group
This PR is a patch for issues like #1454 where users cannot destroy an environment that has an EMR cluster in it. The events there are like so:
master
andslave
security groupsemr_cluster
withemr_managed_master_security_group
andemr_managed_slave_security_group
, interpolated from the abovemaster
andslave
groups, respectivelymaster
andslave
, creating a cyclic dependency;master
depends onslave
and visa-versa. You cannot delete either without first revoking the rules that create the dependency, which Terraform has no authority over, because Terraform sees them as computed attributes of the two respective groupsWith
revoke_rules_on_delete
on themaster
andslave
groups, the EMR Cluster destroys successfully, then the rules are revoked and the groups destroy successfully.Couldn’t users just specify the necessary rules with
aws_security_group_rule
resources, so Terraform could revoke them?No; the EMR Service applies these rules itself. If a user specifies these rules and they are created before the cluster, the EMR service will likely silently fail to add those rules as they are already there. When destroying the environment, Terraform revokes the rules and the Cluster in parallel (or likely does, no guarantee). There is no dependency there; the cluster depends on the groups, the rules depend on the groups. After the rules are revoked by Terraform, EMR re-applies them. In my testing it takes ~5 minutes to destroy an EMR cluster, and it seems that even after the deletion API call is made, the EMR Service is still re-applying those rules. Terraform revokes them, but EMR restores them, and we’re stuck in the same situation. In this scenario, with
revoke_rules_on_delete
, the EMR cluster destroys and the EMR service no longer attempts to re-apply those rules if they are removed, but they remain, sorevoke_rules_on_delete
removes them first and then we destroy the groups successfully.