Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EKS] [request]: EKS managed node group support for ASG target group #709

Open
chingyi-lin opened this issue Jan 21, 2020 · 29 comments
Open
Labels
EKS Managed Nodes EKS Managed Nodes EKS Amazon Elastic Kubernetes Service

Comments

@chingyi-lin
Copy link

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request
The ability to attach a load balancer to the ASG created by the EKS managed node group at cluster creation with cloudformation.

Which service(s) is this request for?
EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
We used to create an unmanaged node group with ASG and a classic load balancer in the same cloudformation stack. We used !Ref to attach the load balancer to the ASG using TargetGroupARNs. However, the configuration is not available in eks managed node group at cluster creation today.

Are you currently working around this issue?
We need to separate the creation of cluster and the load balancer into two stacks while they have the same lifecycle. Besides, we are not sure if this modification to ASG is allowed and supported since the ASG is managed by EKS.

@tabern
Copy link
Contributor

tabern commented Mar 2, 2020

@chingyi-lin can you help clarify your use case for this configuration vs. creating a Kubernetes service type=LoadBalancer?

@yann-soubeyrand
Copy link

@tabern Unless I'm mistaken, one can not use a single NLB for several K8s services of type load balancer. For example, we want to be able to point ports 80 and 443 to our ingress controller service, but we also want port 22 to the SSH service of our GitLab.

Also we want to be able to share our NLB between classic EC2 instances and EKS cluster to be able to do a zero downtime migration from the stateless application running on EC2 instances to the same application running on an EKS cluster.

And the last use case we have is sharing a NLB between two EKS clusters (blue and green) to be able to seamlessly switch from one to the other (in case we have big changes to bring to our cluster, we prefer spawning a new cluster and switching to it after having tested that it works as intended).

@dawidmalina
Copy link

I have a workaround in terraform (a bit tricky but it works):

resource "aws_autoscaling_attachment" "autoscaling_attachment" {
  autoscaling_group_name = lookup(lookup(lookup(aws_eks_node_group.node_group, "resources")[0], "autoscaling_groups")[0], "name")
  alb_target_group_arn   = var.TARGET_GROUP_ARN
}

@guigo2k
Copy link

guigo2k commented Apr 9, 2020

@dawidmalina your workaround works for adding the autoscaling instances to the load balancer target group, however, the ALB can't reach the node group.

HTTP/2 504 
server: awselb/2.0
date: Thu, 09 Apr 2020 18:53:46 GMT
content-type: text/html
content-length: 148

<html>
<head><title>504 Gateway Time-out</title></head>
<body bgcolor="white">
<center><h1>504 Gateway Time-out</h1></center>
</body>
</html>

@jodem
Copy link

jodem commented Apr 27, 2020

Another workaround I plan to test is to add postStart and preStop lifecycle event on the pod (https://kubernetes.io/docs/tasks/configure-pod-container/attach-handler-lifecycle-event/) with a little command that register / deregister from the target group using aws cli. You can easily get instanceId from within the container (wget -q -O - http://169.254.169.254/latest/meta-data/instance-id) and use it on aws elbv2 register-targets .

@mikestef9 mikestef9 added EKS Managed Nodes EKS Managed Nodes and removed Proposed Community submitted issue labels Jun 11, 2020
@mikestef9
Copy link
Contributor

mikestef9 commented Sep 16, 2020

Hey all, please take a look at the TargetGroupBinding CRD included in the v2 release candidate of the ALB ingress controller

https://github.com/kubernetes-sigs/aws-alb-ingress-controller/releases/tag/v2.0.0-rc0

We believe this will address the feature request described in this issue, and are looking for feedback.

@yann-soubeyrand
Copy link

Hi @mikestef9, thanks for the update. Unfortunately, this does not address our use cases outlined in this comment #709 (comment).

@adamrbennett
Copy link

We also need this to support services of type: NodePort.

@M00nF1sh
Copy link

M00nF1sh commented Oct 5, 2020

@yann-soubeyrand are you tring to use multiple ASG in a single TargetGroup? otherwise, TargetGroupBinding should solve it.

@yann-soubeyrand
Copy link

@M00nF1sh isn't TargetGroupBinding meant for use with ALB ingress controller only? We use NLB with Istio ingress gateway. And yes, we need to put two ASG in a single target group for certain operations requiring zero downtime.

@M00nF1sh
Copy link

M00nF1sh commented Oct 5, 2020

@yann-soubeyrand
it supports both ALB/NLB targetGroups. we'll rename ALB ingress controller to AWS LoadBalanacer controller soon.
Currently, when using instanceType TargetGroups, it only supports using all nodes in your cluster as backend, but we'll add some nodeSelector in the future, so if your two ASG will be in same cluster, it will be support. (but we won't support two ASG in different cluster)

@yann-soubeyrand
Copy link

@M00nF1sh sorry for the late reply. We need to be able to put two ASGs from different clusters in a single target group. This is how we do certain migrations requiring rebuilding a whole cluster.

@lilley2412
Copy link

A null_resource is working for me, I have validated that aws_eks_node_group does not see the attached target group as a change, and when making changes it leaves the attachment preserved.

resource "null_resource" "managed_node_asg_nlb_attach" {

  triggers = {
    asg = aws_eks_node_group.managed.resources[0].autoscaling_groups[0].name
  }

  provisioner "local-exec" {
    command = "aws autoscaling attach-load-balancer-target-groups --auto-scaling-group-name '${aws_eks_node_group.managed.resources[0].autoscaling_groups[0].name}' --target-group-arns '${aws_lb_target_group.tg.arn}' '${aws_lb_target_group.tg2.arn}'"
  }
}

@netguino
Copy link

netguino commented Apr 10, 2021

@dawidmalina

I have a workaround in terraform (a bit tricky but it works):

resource "aws_autoscaling_attachment" "autoscaling_attachment" {
  autoscaling_group_name = lookup(lookup(lookup(aws_eks_node_group.node_group, "resources")[0], "autoscaling_groups")[0], "name")
  alb_target_group_arn   = var.TARGET_GROUP_ARN
}

Thank you so much for this workaround. Totally made my week by helping me solve a very annoying problem we've been having for so long!

@ddvdozuki
Copy link

A null_resource is working for me, I have validated that aws_eks_node_group does not see the attached target group as a change, and when making changes it leaves the attachment preserved.

resource "null_resource" "managed_node_asg_nlb_attach" {

  triggers = {
    asg = aws_eks_node_group.managed.resources[0].autoscaling_groups[0].name
  }

  provisioner "local-exec" {
    command = "aws autoscaling attach-load-balancer-target-groups --auto-scaling-group-name '${aws_eks_node_group.managed.resources[0].autoscaling_groups[0].name}' --target-group-arns '${aws_lb_target_group.tg.arn}' '${aws_lb_target_group.tg2.arn}'"
  }
}

Thank you for this workaround but it seems to be leaving behind ENI's and SG's that are preventing VPC destruction due to it creating resources outside of terraforms knowledge. Is there any way to achieve this with an NLB without using a null provisioner? Or some way to have an on_delete provisioner that does the cleanup?

@daroga0002
Copy link

@nogara
Copy link

nogara commented Oct 6, 2021

@dawidmalina your workaround works for adding the autoscaling instances to the load balancer target group, however, the ALB can't reach the node group.

HTTP/2 504 
server: awselb/2.0
date: Thu, 09 Apr 2020 18:53:46 GMT
content-type: text/html
content-length: 148

<html>
<head><title>504 Gateway Time-out</title></head>
<body bgcolor="white">
<center><h1>504 Gateway Time-out</h1></center>
</body>
</html>

I used @dawidmalina 's answer, and also opened up the NodePort to the ALB's SG using

resource "aws_security_group_rule" "example" {
  type              = "ingress"
  from_port         = {nodeport}
  to_port           =   {nodeport}
  protocol          = "tcp"
  source_security_group_id = {ALB's security group}
  security_group_id = {target's security group}
}

@otterley
Copy link

otterley commented Oct 19, 2021

Attaching Load Balancers to Auto Scaling Group instances, as opposed to instance IP addresses and ports, was a design pattern that made a lot of sense back when the instances in ASGs were configured exactly alike -- typically there was an application stack running on each instance that had identical software, listened on the same ports, served the same traffic for each, etc.

But with containers, that pattern generally no longer holds true: each instance could be (and usually is) running completely different applications, listening on different ports or even different interfaces. In the latter design, instances are now heterogeneous. The Auto Scaling Group no longer implies homogeneity; it's now merely a scalable capacity provider for memory, CPUs, GPUs, network interfaces, etc. As a consequence, we no longer think of an instance as a backend (i.e., a load balancer target); today, we consider an IP:port tuple to be a backend instead.

I've heard a few justifications for hanging on to the historical functionality, despite the evolution. So I'm curious: for those of you dealing with this issue, is there a particular reason you're not using DNS to handle migrations of applications between clusters (each with their own ingress LBs) for north-south traffic, and/or using some sort of service mesh (App Mesh, Istio, Linkerd, etc.) to handle migrations for east-west traffic? These are what we prescribe as best practices today.

@ddvdozuki
Copy link

@otterley Yea, because we are migrating an app off bare metal and on to k8. We have all those fancy things on the roadmap (service mesh, ingress controllers, dns, etc) but we're in the middle of moving a decades old application and trying the best we can to make it cloud-native but there's a lot of uncoupling to do. In the meantime we need to leverage the "old ways" to allow us to transition. It's rare to be able to start with a fresh new project and do everything right from the beginning. We rely on ASG's to allow us to continue using k8 with our old vm-in-a-container images.

@otterley
Copy link

@ddvdozuki Thanks for the insight. Since you're still in transition, might I recommend you use unmanaged node groups instead? That will allow you to retain the functionality you need during your migration. Then, after you have migrated to the next generation of load balancers using the Load Balancer Controller's built-in Ingress support (and cut over DNS), you can attach a new Managed Node Group to your cluster, migrate your pods, and the load balancer will continue to send them traffic. The controller will ensure that the target IP and port follows the pod as it moves. Once all your pods have migrated to Managed Node Groups, you can tear down the unmanaged node groups.

@mwalsher
Copy link

We have a single DNS entry point (i.e. api.example.com) that points to a single ALB, with a Target Group that points to our Traefik entrypoint. Traefik is running as a DaemonSet on each Node. Traefik is then used to route requests to the appropriate service/pod. There may well be a better approach to this, which I'd be curious to hear, but this is working well for us.

@ddvdozuki
Copy link

@mwalsher It sounds like you might have a redundant layer there. The k8 service can do most of what traefik can do as far as routing and pod selection. We use the same setup you have but without any additional layer in between. Just an LB pointing at the node port for the service and the service has selectors for the proper pods

@mwalsher
Copy link

@ddvdozuki interesting, thanks for the info. Can we route e.g. api.example.com/contacts to our Contacts microservice and api.example.com/accounts to Accounts microservice using the k8 service routing?

I took a quick look at the k8s Service docs and don't see anything on path-based routing, but it is probable that my ☕ hasn't kicked in yet.

We are also using some Traefik middleware (StripPrefix and ForwardAuth).

I suppose we could use the ALB for routing to the appropriate TG/Service port. Perhaps that's what you meant? But we'd still need the aforementioned middleware...

@daroga0002
Copy link

yes you need middleware, but general practice is to use such ingress controller which is served thru Loadbalancer service. Running such middleware as Daemonset is just unpractical when you have more nodes, because you wasting resources.

@charles-d-burton
Copy link

There's also our use case which is more aking to @mwalsher. We create and destroy namespaces near constantly. Every CI branch that people make creates a new namespace with a full (scaled down) copy of our software stack. That lets our engineers connect their IDE to that running stack and dev against it in isolation from each other. So we have an Nginx ingress controller that can handle that kind of churn. Meaning we create and destroy up to dozens of namespaces per day each one with a unique URL and certificate. This is all behind an NLB currently so Cert Manager can provision certs for these namespaces on the fly. Provisioning a load balancer per namespace in that use case is really expensive both monetarily and in the delay in wiring up our system. Not to mention it makes the domains pretty hard to deal with.

@antonmatsiuk
Copy link

I've heard a few justifications for hanging on to the historical functionality, despite the evolution. So I'm curious: for those of you dealing with this issue, is there a particular reason you're not using DNS to handle migrations of applications between clusters (each with their own ingress LBs) for north-south traffic, and/or using some sort of service mesh (App Mesh, Istio, Linkerd, etc.) to handle migrations for east-west traffic? These are what we prescribe as best practices today.

Another use case for this is having VoIP application on the nodes which handles 20k UDP ports. You can't solve it with "Service: LoadBalancer" at the moment. The only option is to use hostNetwork: true in the application and a network LB in front of eks_managed_node_group which will do load balancing of UDP traffic to the app

@carlosjgp
Copy link

I have a workaround in terraform (a bit tricky but it works):

resource "aws_autoscaling_attachment" "autoscaling_attachment" {
  autoscaling_group_name = lookup(lookup(lookup(aws_eks_node_group.node_group, "resources")[0], "autoscaling_groups")[0], "name")
  alb_target_group_arn   = var.TARGET_GROUP_ARN
}

Sadly this workaround only works if you first create the aws_eks_node_group which dynamically creates the autoscaling_group and the name is not fix

resource "aws_autoscaling_attachment" "autoscaling_attachment" {
  for_each = {
    for permutation in setproduct(
      # flatten(aws_eks_node_group.node_group.resources[*].autoscaling_groups[*].name)
      [lookup(lookup(lookup(aws_eks_node_group.node_group, "resources")[0], "autoscaling_groups")[0], "name")],
      var.target_group_arns,
    ) :
    permutation[0] => permutation[1]
  }
  autoscaling_group_name = each.key
  lb_target_group_arn    = each.value
}

When I add a new node group and attach the target group using this method I get

The "for_each" map includes keys derived from resource attributes that cannot be determined until apply, and so Terraform cannot determine the full set of keys that will identify the instances of this resource.

And using the AWS CLI with null-resource is rather messy and leaves "orphan" resources

Is the aws_eks_node_group resource designed to only work with the AWS Loadbalance Controller?

We also want to disable the AZRebalance process which needs to be done through CLI too ☠️

This is the full hack we were considering but I think we are going to backtrack to ASGs

resource "null_resource" "nodegroup_asg_hack" {
  triggers = merge(
    var.asg_force_patching_suspended_processes ? {
      timestamp = timestamp()
    } : {},
    {
      asg_suspended_processes = join(",", var.asg_suspended_processes)
      asg_names               = join("", module.eks_managed_node_group.node_group_autoscaling_group_names)
    }
  )

  provisioner "local-exec" {
    interpreter = ["/bin/sh", "-c"]
    environment = {
      AWS_DEFAULT_REGION = local.aws_region
    }
    command = <<EOF
set -e

$(aws sts assume-role --role-arn "${data.aws_iam_session_context.current.issuer_arn}" --role-session-name terraform_asg_no_cap_rebalance --query 'Credentials.[`export#AWS_ACCESS_KEY_ID=`,AccessKeyId,`#AWS_SECRET_ACCESS_KEY=`,SecretAccessKey,`#AWS_SESSION_TOKEN=`,SessionToken]' --output text | sed $'s/\t//g' | sed 's/#/ /g')

for asg_name in ${join(" ", formatlist("'%s'", module.eks_managed_node_group.node_group_autoscaling_group_names))} ; do
  aws autoscaling update-auto-scaling-group \
    --auto-scaling-group-name $${asg_name} \
    --no-capacity-rebalance

  aws autoscaling suspend-processes \
    --auto-scaling-group-name $${asg_name} \
    --scaling-processes ${join(" ", var.asg_suspended_processes)}

%{if length(var.target_group_arns) > 0~}
  aws autoscaling attach-load-balancer-target-groups \
    --auto-scaling-group-name $${asg_name} \
    --target-group-arns ${join(" ", formatlist("'%s'", var.target_group_arns))}
%{endif~}
done
EOF
  }
}

@kr3cj
Copy link

kr3cj commented Jan 4, 2024

Another workaround I plan to test is to add postStart and preStop lifecycle event...

Did you ever get that working @jodem ?

@jodem
Copy link

jodem commented Jan 8, 2024

Another workaround I plan to test is to add postStart and preStop lifecycle event...

Did you ever get that working @jodem ?

Hello I ended up using "aws_autoscaling_attachment" in terraform

resource "aws_autoscaling_attachment" "ingress_attach" {
  count = ( var.attachToTargetGroup  ? length(var.targetGroupARNToAssociate) : 0)
  autoscaling_group_name = aws_eks_node_group.multi_tenant_worker_nodegroup.resources[0].autoscaling_groups[0].name
  lb_target_group_arn = var.targetGroupARNToAssociate[count.index]
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EKS Managed Nodes EKS Managed Nodes EKS Amazon Elastic Kubernetes Service
Projects
None yet
Development

No branches or pull requests