Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Ensure aws_eks_addons with before_compute flag get destoyed after compute resources #2743

Conversation

bmihalache
Copy link

Description

With the added dependency, the aws_eks_addon.before_compute resources get destroyed after the node groups.

Motivation and Context

Destroying the vpc-cni EKS addon before destroying the node groups may result in leaked EC2 network interfaces.

Current behavior:

module.eks.aws_eks_addon.before_compute["vpc-cni"]: Destruction complete after 2s
module.eks.module.eks_managed_node_group["test"].aws_eks_node_group.this[0]: Destroying...

Expected behavior:

module.eks.module.eks_managed_node_group["default_b"].aws_eks_node_group.this[0]: Destruction complete after 2m37s
module.eks.module.eks_managed_node_group["test"].aws_launch_template.this[0]: Destroying... 
module.eks.module.eks_managed_node_group["test"].aws_launch_template.this[0]: Destruction complete after 0s
module.eks.module.eks_managed_node_group["test"].aws_iam_role_policy_attachment.this["arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"]: Destroying... 
module.eks.module.eks_managed_node_group["test"].aws_iam_role_policy_attachment.this["arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"]: Destroying...
module.eks.module.eks_managed_node_group["test"].aws_iam_role_policy_attachment.this["arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"]: Destroying... 
module.eks.time_sleep.this[0]: Destroying... 
module.eks.time_sleep.this[0]: Destruction complete after 0s
module.eks.aws_eks_addon.before_compute["vpc-cni"]: Destroying... 

Breaking Changes

NA

How Has This Been Tested?

I have manually tested the change by provisioning and then destroying a Terraform module with AWS VPC and AWS EKS cluster with multiple node groups and the vpc-cni configured as a "before_compute" addon.

@bryantbiggs
Copy link
Member

I am familiar with the VPC CNI leaking ENIs, but I don't believe this will solve that (or have the intended effect)

Do you have proof or supporting evidence to support this change?

@bmihalache
Copy link
Author

Do you have proof or supporting evidence to support this change?

We experienced frequent ENI leaks in automated tests right after we enabled managed EKS addon for vpc-cni (a lot more than usual). Using this fix I could not reproduce the issue with 4 or 5 provision/destroy cycles.

The expectation is not to solve the ENI leaking issue entirely, but (considerable) reduce the frequency of its occurrence.

When the vpc-cni addon is destroyed, aws-node pods are terminated on all nodes. Can't this explain the issue, at least partially?

Regardless of ENI leaking issue, I do think the explicit dependency makes sense. It guarantees that "before_compute" EKS addons are provisioned before compute resources and destroyed after.

@bryantbiggs
Copy link
Member

The VPC CNI is known to leak ENIs, I don't know that this will solve that and instead would recommend that you set preserve = true on the VPC CNI addon configuration. When you run terraform destroy this will leave the VPC CNI running on the cluster instead of having Terraform delete it, allowing it to handle the ENI cleanup

@bryantbiggs
Copy link
Member

Note: preserve = true is going to be the default setting for addons on the next v20.x release

@bmihalache
Copy link
Author

set preserve = true on the VPC CNI addon configuration

Thank you for the info. This resolves our issue and makes this PR kind of redundant.

@bmihalache bmihalache closed this Sep 14, 2023
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants