[EKS] [request]: EKS managed node group support for ASG target group #709

chingyi-lin · 2020-01-21T01:31:16Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request
The ability to attach a load balancer to the ASG created by the EKS managed node group at cluster creation with cloudformation.

Which service(s) is this request for?
EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
We used to create an unmanaged node group with ASG and a classic load balancer in the same cloudformation stack. We used !Ref to attach the load balancer to the ASG using TargetGroupARNs. However, the configuration is not available in eks managed node group at cluster creation today.

Are you currently working around this issue?
We need to separate the creation of cluster and the load balancer into two stacks while they have the same lifecycle. Besides, we are not sure if this modification to ASG is allowed and supported since the ASG is managed by EKS.

The text was updated successfully, but these errors were encountered:

tabern · 2020-03-02T03:51:11Z

@chingyi-lin can you help clarify your use case for this configuration vs. creating a Kubernetes service type=LoadBalancer?

yann-soubeyrand · 2020-03-02T10:06:09Z

@tabern Unless I'm mistaken, one can not use a single NLB for several K8s services of type load balancer. For example, we want to be able to point ports 80 and 443 to our ingress controller service, but we also want port 22 to the SSH service of our GitLab.

Also we want to be able to share our NLB between classic EC2 instances and EKS cluster to be able to do a zero downtime migration from the stateless application running on EC2 instances to the same application running on an EKS cluster.

And the last use case we have is sharing a NLB between two EKS clusters (blue and green) to be able to seamlessly switch from one to the other (in case we have big changes to bring to our cluster, we prefer spawning a new cluster and switching to it after having tested that it works as intended).

dawidmalina · 2020-04-03T09:29:15Z

I have a workaround in terraform (a bit tricky but it works):

resource "aws_autoscaling_attachment" "autoscaling_attachment" {
  autoscaling_group_name = lookup(lookup(lookup(aws_eks_node_group.node_group, "resources")[0], "autoscaling_groups")[0], "name")
  alb_target_group_arn   = var.TARGET_GROUP_ARN
}

guigo2k · 2020-04-09T18:56:09Z

@dawidmalina your workaround works for adding the autoscaling instances to the load balancer target group, however, the ALB can't reach the node group.

HTTP/2 504 
server: awselb/2.0
date: Thu, 09 Apr 2020 18:53:46 GMT
content-type: text/html
content-length: 148

<html>
<head><title>504 Gateway Time-out</title></head>
<body bgcolor="white">
<center><h1>504 Gateway Time-out</h1></center>
</body>
</html>

jodem · 2020-04-27T16:15:49Z

Another workaround I plan to test is to add postStart and preStop lifecycle event on the pod (https://kubernetes.io/docs/tasks/configure-pod-container/attach-handler-lifecycle-event/) with a little command that register / deregister from the target group using aws cli. You can easily get instanceId from within the container (wget -q -O - http://169.254.169.254/latest/meta-data/instance-id) and use it on aws elbv2 register-targets .

mikestef9 · 2020-09-16T17:58:08Z

Hey all, please take a look at the TargetGroupBinding CRD included in the v2 release candidate of the ALB ingress controller

https://github.com/kubernetes-sigs/aws-alb-ingress-controller/releases/tag/v2.0.0-rc0

We believe this will address the feature request described in this issue, and are looking for feedback.

yann-soubeyrand · 2020-09-17T10:46:54Z

Hi @mikestef9, thanks for the update. Unfortunately, this does not address our use cases outlined in this comment #709 (comment).

adamrbennett · 2020-09-18T04:53:12Z

We also need this to support services of type: NodePort.

M00nF1sh · 2020-10-05T17:56:51Z

@yann-soubeyrand are you tring to use multiple ASG in a single TargetGroup? otherwise, TargetGroupBinding should solve it.

yann-soubeyrand · 2020-10-05T18:33:19Z

@M00nF1sh isn't TargetGroupBinding meant for use with ALB ingress controller only? We use NLB with Istio ingress gateway. And yes, we need to put two ASG in a single target group for certain operations requiring zero downtime.

M00nF1sh · 2020-10-05T18:48:42Z

@yann-soubeyrand
it supports both ALB/NLB targetGroups. we'll rename ALB ingress controller to AWS LoadBalanacer controller soon.
Currently, when using instanceType TargetGroups, it only supports using all nodes in your cluster as backend, but we'll add some nodeSelector in the future, so if your two ASG will be in same cluster, it will be support. (but we won't support two ASG in different cluster)

yann-soubeyrand · 2020-11-15T10:25:46Z

@M00nF1sh sorry for the late reply. We need to be able to put two ASGs from different clusters in a single target group. This is how we do certain migrations requiring rebuilding a whole cluster.

lilley2412 · 2021-02-08T14:45:31Z

A null_resource is working for me, I have validated that aws_eks_node_group does not see the attached target group as a change, and when making changes it leaves the attachment preserved.

resource "null_resource" "managed_node_asg_nlb_attach" {

  triggers = {
    asg = aws_eks_node_group.managed.resources[0].autoscaling_groups[0].name
  }

  provisioner "local-exec" {
    command = "aws autoscaling attach-load-balancer-target-groups --auto-scaling-group-name '${aws_eks_node_group.managed.resources[0].autoscaling_groups[0].name}' --target-group-arns '${aws_lb_target_group.tg.arn}' '${aws_lb_target_group.tg2.arn}'"
  }
}

netguino · 2021-04-10T19:18:38Z

@dawidmalina

I have a workaround in terraform (a bit tricky but it works):

resource "aws_autoscaling_attachment" "autoscaling_attachment" {
  autoscaling_group_name = lookup(lookup(lookup(aws_eks_node_group.node_group, "resources")[0], "autoscaling_groups")[0], "name")
  alb_target_group_arn   = var.TARGET_GROUP_ARN
}

Thank you so much for this workaround. Totally made my week by helping me solve a very annoying problem we've been having for so long!

ddvdozuki · 2021-09-06T06:16:14Z

A null_resource is working for me, I have validated that aws_eks_node_group does not see the attached target group as a change, and when making changes it leaves the attachment preserved.

resource "null_resource" "managed_node_asg_nlb_attach" {

  triggers = {
    asg = aws_eks_node_group.managed.resources[0].autoscaling_groups[0].name
  }

  provisioner "local-exec" {
    command = "aws autoscaling attach-load-balancer-target-groups --auto-scaling-group-name '${aws_eks_node_group.managed.resources[0].autoscaling_groups[0].name}' --target-group-arns '${aws_lb_target_group.tg.arn}' '${aws_lb_target_group.tg2.arn}'"
  }
}

Thank you for this workaround but it seems to be leaving behind ENI's and SG's that are preventing VPC destruction due to it creating resources outside of terraforms knowledge. Is there any way to achieve this with an NLB without using a null provisioner? Or some way to have an on_delete provisioner that does the cleanup?

daroga0002 · 2021-09-20T07:28:54Z

does https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.2/guide/service/nlb/ will not solve your challenges?

nogara · 2021-10-06T18:47:03Z

@dawidmalina your workaround works for adding the autoscaling instances to the load balancer target group, however, the ALB can't reach the node group.
HTTP/2 504 
server: awselb/2.0
date: Thu, 09 Apr 2020 18:53:46 GMT
content-type: text/html
content-length: 148

<html>
<head><title>504 Gateway Time-out</title></head>
<body bgcolor="white">
<center><h1>504 Gateway Time-out</h1></center>
</body>
</html>

I used @dawidmalina 's answer, and also opened up the NodePort to the ALB's SG using

resource "aws_security_group_rule" "example" {
  type              = "ingress"
  from_port         = {nodeport}
  to_port           =   {nodeport}
  protocol          = "tcp"
  source_security_group_id = {ALB's security group}
  security_group_id = {target's security group}
}

otterley · 2021-10-19T22:53:21Z

Attaching Load Balancers to Auto Scaling Group instances, as opposed to instance IP addresses and ports, was a design pattern that made a lot of sense back when the instances in ASGs were configured exactly alike -- typically there was an application stack running on each instance that had identical software, listened on the same ports, served the same traffic for each, etc.

But with containers, that pattern generally no longer holds true: each instance could be (and usually is) running completely different applications, listening on different ports or even different interfaces. In the latter design, instances are now heterogeneous. The Auto Scaling Group no longer implies homogeneity; it's now merely a scalable capacity provider for memory, CPUs, GPUs, network interfaces, etc. As a consequence, we no longer think of an instance as a backend (i.e., a load balancer target); today, we consider an IP:port tuple to be a backend instead.

I've heard a few justifications for hanging on to the historical functionality, despite the evolution. So I'm curious: for those of you dealing with this issue, is there a particular reason you're not using DNS to handle migrations of applications between clusters (each with their own ingress LBs) for north-south traffic, and/or using some sort of service mesh (App Mesh, Istio, Linkerd, etc.) to handle migrations for east-west traffic? These are what we prescribe as best practices today.

ddvdozuki · 2021-10-19T22:58:13Z

@otterley Yea, because we are migrating an app off bare metal and on to k8. We have all those fancy things on the roadmap (service mesh, ingress controllers, dns, etc) but we're in the middle of moving a decades old application and trying the best we can to make it cloud-native but there's a lot of uncoupling to do. In the meantime we need to leverage the "old ways" to allow us to transition. It's rare to be able to start with a fresh new project and do everything right from the beginning. We rely on ASG's to allow us to continue using k8 with our old vm-in-a-container images.

otterley · 2021-10-19T23:04:38Z

@ddvdozuki Thanks for the insight. Since you're still in transition, might I recommend you use unmanaged node groups instead? That will allow you to retain the functionality you need during your migration. Then, after you have migrated to the next generation of load balancers using the Load Balancer Controller's built-in Ingress support (and cut over DNS), you can attach a new Managed Node Group to your cluster, migrate your pods, and the load balancer will continue to send them traffic. The controller will ensure that the target IP and port follows the pod as it moves. Once all your pods have migrated to Managed Node Groups, you can tear down the unmanaged node groups.

mwalsher · 2021-10-21T00:41:19Z

We have a single DNS entry point (i.e. api.example.com) that points to a single ALB, with a Target Group that points to our Traefik entrypoint. Traefik is running as a DaemonSet on each Node. Traefik is then used to route requests to the appropriate service/pod. There may well be a better approach to this, which I'd be curious to hear, but this is working well for us.

ddvdozuki · 2021-10-21T00:55:49Z

@mwalsher It sounds like you might have a redundant layer there. The k8 service can do most of what traefik can do as far as routing and pod selection. We use the same setup you have but without any additional layer in between. Just an LB pointing at the node port for the service and the service has selectors for the proper pods

mwalsher · 2021-10-21T16:31:27Z

@ddvdozuki interesting, thanks for the info. Can we route e.g. api.example.com/contacts to our Contacts microservice and api.example.com/accounts to Accounts microservice using the k8 service routing?

I took a quick look at the k8s Service docs and don't see anything on path-based routing, but it is probable that my ☕ hasn't kicked in yet.

We are also using some Traefik middleware (StripPrefix and ForwardAuth).

I suppose we could use the ALB for routing to the appropriate TG/Service port. Perhaps that's what you meant? But we'd still need the aforementioned middleware...

daroga0002 · 2021-10-22T07:35:21Z

yes you need middleware, but general practice is to use such ingress controller which is served thru Loadbalancer service. Running such middleware as Daemonset is just unpractical when you have more nodes, because you wasting resources.

charles-d-burton · 2021-12-03T23:23:29Z

There's also our use case which is more aking to @mwalsher. We create and destroy namespaces near constantly. Every CI branch that people make creates a new namespace with a full (scaled down) copy of our software stack. That lets our engineers connect their IDE to that running stack and dev against it in isolation from each other. So we have an Nginx ingress controller that can handle that kind of churn. Meaning we create and destroy up to dozens of namespaces per day each one with a unique URL and certificate. This is all behind an NLB currently so Cert Manager can provision certs for these namespaces on the fly. Provisioning a load balancer per namespace in that use case is really expensive both monetarily and in the delay in wiring up our system. Not to mention it makes the domains pretty hard to deal with.

antonmatsiuk · 2022-04-25T10:02:53Z

I've heard a few justifications for hanging on to the historical functionality, despite the evolution. So I'm curious: for those of you dealing with this issue, is there a particular reason you're not using DNS to handle migrations of applications between clusters (each with their own ingress LBs) for north-south traffic, and/or using some sort of service mesh (App Mesh, Istio, Linkerd, etc.) to handle migrations for east-west traffic? These are what we prescribe as best practices today.

Another use case for this is having VoIP application on the nodes which handles 20k UDP ports. You can't solve it with "Service: LoadBalancer" at the moment. The only option is to use hostNetwork: true in the application and a network LB in front of eks_managed_node_group which will do load balancing of UDP traffic to the app

carlosjgp · 2023-07-13T17:46:13Z

I have a workaround in terraform (a bit tricky but it works):

resource "aws_autoscaling_attachment" "autoscaling_attachment" {
  autoscaling_group_name = lookup(lookup(lookup(aws_eks_node_group.node_group, "resources")[0], "autoscaling_groups")[0], "name")
  alb_target_group_arn   = var.TARGET_GROUP_ARN
}

Sadly this workaround only works if you first create the aws_eks_node_group which dynamically creates the autoscaling_group and the name is not fix

resource "aws_autoscaling_attachment" "autoscaling_attachment" {
  for_each = {
    for permutation in setproduct(
      # flatten(aws_eks_node_group.node_group.resources[*].autoscaling_groups[*].name)
      [lookup(lookup(lookup(aws_eks_node_group.node_group, "resources")[0], "autoscaling_groups")[0], "name")],
      var.target_group_arns,
    ) :
    permutation[0] => permutation[1]
  }
  autoscaling_group_name = each.key
  lb_target_group_arn    = each.value
}

When I add a new node group and attach the target group using this method I get

The "for_each" map includes keys derived from resource attributes that cannot be determined until apply, and so Terraform cannot determine the full set of keys that will identify the instances of this resource.

And using the AWS CLI with null-resource is rather messy and leaves "orphan" resources

Is the aws_eks_node_group resource designed to only work with the AWS Loadbalance Controller?

We also want to disable the AZRebalance process which needs to be done through CLI too ☠️

This is the full hack we were considering but I think we are going to backtrack to ASGs

resource "null_resource" "nodegroup_asg_hack" {
  triggers = merge(
    var.asg_force_patching_suspended_processes ? {
      timestamp = timestamp()
    } : {},
    {
      asg_suspended_processes = join(",", var.asg_suspended_processes)
      asg_names               = join("", module.eks_managed_node_group.node_group_autoscaling_group_names)
    }
  )

  provisioner "local-exec" {
    interpreter = ["/bin/sh", "-c"]
    environment = {
      AWS_DEFAULT_REGION = local.aws_region
    }
    command = <<EOF
set -e

$(aws sts assume-role --role-arn "${data.aws_iam_session_context.current.issuer_arn}" --role-session-name terraform_asg_no_cap_rebalance --query 'Credentials.[`export#AWS_ACCESS_KEY_ID=`,AccessKeyId,`#AWS_SECRET_ACCESS_KEY=`,SecretAccessKey,`#AWS_SESSION_TOKEN=`,SessionToken]' --output text | sed $'s/\t//g' | sed 's/#/ /g')

for asg_name in ${join(" ", formatlist("'%s'", module.eks_managed_node_group.node_group_autoscaling_group_names))} ; do
  aws autoscaling update-auto-scaling-group \
    --auto-scaling-group-name $${asg_name} \
    --no-capacity-rebalance

  aws autoscaling suspend-processes \
    --auto-scaling-group-name $${asg_name} \
    --scaling-processes ${join(" ", var.asg_suspended_processes)}

%{if length(var.target_group_arns) > 0~}
  aws autoscaling attach-load-balancer-target-groups \
    --auto-scaling-group-name $${asg_name} \
    --target-group-arns ${join(" ", formatlist("'%s'", var.target_group_arns))}
%{endif~}
done
EOF
  }
}

kr3cj · 2024-01-04T18:29:40Z

Another workaround I plan to test is to add postStart and preStop lifecycle event...

Did you ever get that working @jodem ?

jodem · 2024-01-08T17:33:59Z

Another workaround I plan to test is to add postStart and preStop lifecycle event...

Did you ever get that working @jodem ?

Hello I ended up using "aws_autoscaling_attachment" in terraform

resource "aws_autoscaling_attachment" "ingress_attach" {
  count = ( var.attachToTargetGroup  ? length(var.targetGroupARNToAssociate) : 0)
  autoscaling_group_name = aws_eks_node_group.multi_tenant_worker_nodegroup.resources[0].autoscaling_groups[0].name
  lb_target_group_arn = var.targetGroupARNToAssociate[count.index]
}

chingyi-lin added the Proposed Community submitted issue label Jan 21, 2020

mikestef9 mentioned this issue Feb 29, 2020

[EKS] [request]: EKS Managed Nodes should allow attaching LB target groups #728

Closed

mikestef9 added the EKS Amazon Elastic Kubernetes Service label Feb 29, 2020

mikestef9 mentioned this issue Feb 29, 2020

[EKS] [request]: EKS Managed Nodes should allow for custom security groups #609

Closed

mikestef9 added EKS Managed Nodes EKS Managed Nodes and removed Proposed Community submitted issue labels Jun 11, 2020

DmitriyStoyanov mentioned this issue Aug 20, 2021

Support target_group_arns for managed node groups terraform-aws-modules/terraform-aws-eks#1539

Closed

maxsxu mentioned this issue May 22, 2024

feat: disable ASG AZRebalance of EKS by default streamnative/terraform-aws-cloud#127

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EKS] [request]: EKS managed node group support for ASG target group #709

[EKS] [request]: EKS managed node group support for ASG target group #709

chingyi-lin commented Jan 21, 2020

tabern commented Mar 2, 2020

yann-soubeyrand commented Mar 2, 2020

dawidmalina commented Apr 3, 2020

guigo2k commented Apr 9, 2020

jodem commented Apr 27, 2020

mikestef9 commented Sep 16, 2020 •

edited

Loading

yann-soubeyrand commented Sep 17, 2020

adamrbennett commented Sep 18, 2020

M00nF1sh commented Oct 5, 2020

yann-soubeyrand commented Oct 5, 2020

M00nF1sh commented Oct 5, 2020

yann-soubeyrand commented Nov 15, 2020

lilley2412 commented Feb 8, 2021

netguino commented Apr 10, 2021 •

edited

Loading

ddvdozuki commented Sep 6, 2021

daroga0002 commented Sep 20, 2021

nogara commented Oct 6, 2021

otterley commented Oct 19, 2021 •

edited

Loading

ddvdozuki commented Oct 19, 2021

otterley commented Oct 19, 2021

mwalsher commented Oct 21, 2021

ddvdozuki commented Oct 21, 2021

mwalsher commented Oct 21, 2021

daroga0002 commented Oct 22, 2021

charles-d-burton commented Dec 3, 2021

antonmatsiuk commented Apr 25, 2022

carlosjgp commented Jul 13, 2023

kr3cj commented Jan 4, 2024

jodem commented Jan 8, 2024

[EKS] [request]: EKS managed node group support for ASG target group #709

[EKS] [request]: EKS managed node group support for ASG target group #709

Comments

chingyi-lin commented Jan 21, 2020

Community Note

tabern commented Mar 2, 2020

yann-soubeyrand commented Mar 2, 2020

dawidmalina commented Apr 3, 2020

guigo2k commented Apr 9, 2020

jodem commented Apr 27, 2020

mikestef9 commented Sep 16, 2020 • edited Loading

yann-soubeyrand commented Sep 17, 2020

adamrbennett commented Sep 18, 2020

M00nF1sh commented Oct 5, 2020

yann-soubeyrand commented Oct 5, 2020

M00nF1sh commented Oct 5, 2020

yann-soubeyrand commented Nov 15, 2020

lilley2412 commented Feb 8, 2021

netguino commented Apr 10, 2021 • edited Loading

ddvdozuki commented Sep 6, 2021

daroga0002 commented Sep 20, 2021

nogara commented Oct 6, 2021

otterley commented Oct 19, 2021 • edited Loading

ddvdozuki commented Oct 19, 2021

otterley commented Oct 19, 2021

mwalsher commented Oct 21, 2021

ddvdozuki commented Oct 21, 2021

mwalsher commented Oct 21, 2021

daroga0002 commented Oct 22, 2021

charles-d-burton commented Dec 3, 2021

antonmatsiuk commented Apr 25, 2022

carlosjgp commented Jul 13, 2023

kr3cj commented Jan 4, 2024

jodem commented Jan 8, 2024

mikestef9 commented Sep 16, 2020 •

edited

Loading

netguino commented Apr 10, 2021 •

edited

Loading

otterley commented Oct 19, 2021 •

edited

Loading