Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(eks): Self managed nodes cannot be added to LoadBalancers created via the LoadBalancer service type #12269

Merged
merged 6 commits into from
Dec 31, 2020

Conversation

iliapolo
Copy link
Contributor

@iliapolo iliapolo commented Dec 29, 2020

Following this PR, self managed nodes are now attached with the cluster security group. This causes the self managed nodes to have multiple security groups with the "owned" tag. This in turn causes load balancers to reject these instances since its unable to determine which security groups should be added with ingress rules to allow the load balancer to connect to the instances.

The fix is to exclude tagging the dedicated ASG security group with this tag, it is no longer necessary since the cluster security group has that tag by default.

Fixes #12166

This breaksge is unfortunate, but I can't see a way out of it. And it does actually fix a bug.

BREAKING CHANGE: Existing self managed nodes may loose the ability to host additional services of type LoadBalancer . See #12269 (comment) for possible mitigations.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@gitpod-io
Copy link

gitpod-io bot commented Dec 29, 2020

@mergify mergify bot added the contribution/core This is a PR that came from AWS. label Dec 29, 2020
@github-actions github-actions bot added the @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service label Dec 29, 2020
@iliapolo
Copy link
Contributor Author

iliapolo commented Dec 29, 2020

After applying this fix, the dedicated security group of self managed nodes will no longer be tagged with the "owned" tag.
This will prevent these nodes from being added to load balancers because they will not have any owned security groups attached to them, effectively causing the same phenomena this PR is meant to solve.

Note: This only applies to self managed nodes launched with a version prior to 1.79.0. Nodes provisioned following that version will have the cluster security group attached to them, which has this tag by default.

While this will not cause any immediate disruption, it does pose an operational risk for future service deployments. The proper setup for nodes should be that they are attached with exactly one owned security group.

We suggest one of the following approaches to achieve that:

  • Deploy the fix and replace the existing nodes (either manually or via an instance refresh) (recommended)

  • Temporarily configure the updatePolicy of the auto-scaling group to either UpdatePolicy.replacingUpdate or UpdatePolicy.rollingUpdate and deploy the fix. Wait for the instances to be refreshed, and revert the policy back.

  • Manually attach the cluster security group to the existing instances.

@iliapolo iliapolo requested a review from eladb December 29, 2020 17:54
@iliapolo iliapolo changed the title fix(eks): Self managed nodes cannot host services of type LoadBalancer fix(eks): Self managed nodes cannot be added to LoadBalancers created via the LoadBalancer service type Dec 30, 2020
@iliapolo
Copy link
Contributor Author

@eladb I failed miserably in trying to get the integ test to deterministically cover this scenario.

To make sure its tested properly, we need to force the load balancer to only connect to self-managed nodes, since this is what fails and will cause the service not to be deployed.

Supposedly this is done by attaching labels to the nodes (i.e self-managed=true) and adding an annotation to the service:

'service.beta.kubernetes.io/aws-load-balancer-target-node-labels': 'self-managed=true'

However, this doesn't seem to work. I'm probably missing something, but I figured its not worth delaying this PR for. I verified the expected behavior manually and we also have a unit test and the current integ test to make sure we don't attach the unnecessary tag.

I feel we can merge this.

Let me know if you think differently.

@iliapolo iliapolo added the pr/do-not-merge This PR should not be merged at this time. label Dec 31, 2020
@iliapolo iliapolo marked this pull request as ready for review December 31, 2020 18:59
@iliapolo iliapolo requested a review from eladb December 31, 2020 18:59
@eladb
Copy link
Contributor

eladb commented Dec 31, 2020

Yeah sounds reasonable

@iliapolo iliapolo removed the pr/do-not-merge This PR should not be merged at this time. label Dec 31, 2020
@mergify
Copy link
Contributor

mergify bot commented Dec 31, 2020

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify mergify bot merged commit 470a881 into master Dec 31, 2020
@mergify mergify bot deleted the epolon/eks-multiple-owned-sgs branch December 31, 2020 20:41
@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildProject6AEA49D1-qxepHUsryhcu
  • Commit ID: c88d42b
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

flochaz pushed a commit to flochaz/aws-cdk that referenced this pull request Jan 5, 2021
… via the `LoadBalancer` service type (aws#12269)

Following this [PR](aws#12042), self managed nodes are now attached with the cluster security group. This causes the self managed nodes to have multiple security groups with the "owned" tag. This in turn causes load balancers to reject these instances since its unable to determine which security groups should be added with ingress rules to allow the load balancer to connect to the instances.

The fix is to exclude tagging the dedicated ASG security group with this tag, it is no longer necessary since the cluster security group has that tag by default. 

Fixes aws#12166

This breaksge is unfortunate, but I can't see a way out of it. And it does actually fix a bug.

BREAKING CHANGE: Existing self managed nodes may loose the ability to host additional services of type `LoadBalancer` . See aws#12269 (comment) for possible mitigations.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service contribution/core This is a PR that came from AWS.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(eks): LoadBalancer type of Service failed to start in the EKS cluster created by CDK
3 participants