Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Security group woes launching EMR into a private subnet #5413

Closed
copumpkin opened this issue Aug 1, 2018 · 7 comments
Closed

Security group woes launching EMR into a private subnet #5413

copumpkin opened this issue Aug 1, 2018 · 7 comments
Labels
bug Addresses a defect in current functionality. service/emr Issues and PRs that pertain to the emr service. stale Old or inactive issues managed by automation, if no further action taken these will get closed.

Comments

@copumpkin
Copy link

copumpkin commented Aug 1, 2018

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

0.11.7 and AWS plugin 1.25.0

Affected Resource(s)

  • aws_emr_cluster
  • aws_security_group

Issue description

Sorry for not providing a full repro here, but it's more of a conceptual problem. Let me describe the scenario:

  1. When launching EMR into a private subnet, it requires a third SG (service_access_security_group) for its own private ENI to hit the master/worker SGs.
  2. You want to create all three SGs in terraform (as I do)
  3. First off, you can create all three SGs with no rules in them, because EMR will add them itself: it'll set up the service SG with symbolic outbound references to the master/worker SGs, and the master/worker SGs with symbolic inbound references to one another and to the service SG
  4. Then when you get around to deletion stuff breaks, even with revoke_rules_on_delete = true on all three SGs.

The pain arises because terraform needs to revoke rules on all three SGs before it can delete any of them, because they're basically a trollish knot of mutual references, and the API won't let us delete the service SG until the master/worker SGs don't refer to it anymore, and Terraform doesn't know about the mutual references. Furthermore, if I try to help it out with depends_on between the SGs, it complains (correctly) that there's a cycle. But that's because there's actually a cycle in the underlying resources, created by AWS.

What's a good solution here?

References

@copumpkin
Copy link
Author

copumpkin commented Aug 1, 2018

Here's a thought: when deleting an EMR cluster, terraform should wait for the cluster to shut down, then revoke all rules from the (managed) SGs associated with it. It's kind of weird, but then because the cluster depends on those SGs, terraform should be able to proceed and delete those SGs with no issue (the circular rules are now gone). Perhaps hide it behind a flag on the cluster like revoke_rules_on_delete there too, or only delete the rules if the SGs are managed by Terraform, or something like that.

Does that seem too wild?

@bflad
Copy link
Contributor

bflad commented Aug 1, 2018

when deleting an EMR cluster, terraform should wait for the cluster to shut down

The resource should be doing this. If its not, we should update the resource to do this.

@bflad bflad added bug Addresses a defect in current functionality. service/emr Issues and PRs that pertain to the emr service. labels Aug 1, 2018
@copumpkin
Copy link
Author

copumpkin commented Aug 1, 2018

@bflad sorry, I didn't mean to imply that it doesn't. I meant that the TF "delete an EMR cluster" process should wait for the cluster to shut down (it already does this), and then revoke all rules attached to its managed security groups (this is what I'm proposing to add). Which I think is pretty weird for a TF resource to do (usually there's a better separation of concerns and deleting the SG rules would be up to the SG resources), but I don't see any other clean way to make this work.

@copumpkin
Copy link
Author

copumpkin commented Aug 1, 2018

Another option here might be to expand the meaning of aws_security_group's revoke_rules_on_delete attribute to also mean "delete rules in other security groups that refer to me", which by using the {,egress.}ip-permission.group-id filters on DescribeSecurityGroups could be a fairly efficient operation. cc @catsby as the author of revoke_rules_on_delete

@merkleary
Copy link

merkleary commented Jul 23, 2019

was this issue ever resolved? found this thread while experiencing the same thing.

edit - apparently so. revoke_rules_on_delete = true worked for me.

@github-actions
Copy link

Marking this issue as stale due to inactivity. This helps our maintainers find and focus on the active issues. If this issue receives no comments in the next 30 days it will automatically be closed. Maintainers can also remove the stale label.

If this issue was automatically closed and you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thank you!

@github-actions github-actions bot added the stale Old or inactive issues managed by automation, if no further action taken these will get closed. label Jul 17, 2021
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. service/emr Issues and PRs that pertain to the emr service. stale Old or inactive issues managed by automation, if no further action taken these will get closed.
Projects
None yet
Development

No branches or pull requests

3 participants