Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Create launch template for Managed Node Groups #1138

Merged
merged 19 commits into from
Apr 19, 2021

Conversation

ArchiFleKs
Copy link
Contributor

@ArchiFleKs ArchiFleKs commented Dec 7, 2020

Signed-off-by: Kevin Lefevre lefevre.kevin@gmail.com

PR o'clock

Description

Enable the creation of a default launch template if needed to use with managed node pool. This enable the use of kubelet_extra_args and to add taint quickly without having to manage a separate launch template Terraform config.

It implements this logic: aws/containers-roadmap#864

I think the launchtemplate default might need some trimming, tell me what you think.

Checklist

@cabrinha
Copy link
Contributor

cabrinha commented Dec 8, 2020

Did you try creating a cluster using this TF code? I'm getting the following error when using one or more node_groups:

Error: Invalid for_each argument

  on ../../../modules/terraform-aws-eks/modules/node_groups/launchtemplate.tf line 2, in data "template_file" "workers_userdata":
   2:   for_each = { for k, v in local.node_groups_expanded : k => v if v["create_launch_template"] }

The "for_each" value depends on resource attributes that cannot be determined
until apply, so Terraform cannot predict how many instances will be created.
To work around this, use the -target argument to first apply only the
resources that the for_each depends on.
  node_groups = {
    nginx = {
      create_launch_template = true
      desired_capacity = 4
      max_capacity     = 10
      min_capacity     = 3

      instance_type      = "m5.large"
      kubelet_extra_args = "--node-labels=role=nginx,group=nginx"

      additional_tags = {
        group = "nginx"
      }
    }
  }

I was able to get a cluster up by doing the following:

change both for_each = { for k, v in local.node_groups_expanded : k => v if v["create_launch_template"] } statements to:

for_each = local.node_groups_expanded

My next issue was with "disk_size", which didn't get a default value, so I set it to 50.

EC2 instances are also coming up without the "Name" tag being set either. I think you need to add name_prefix to your launch template like: https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/examples/launch_templates_with_managed_node_groups/launchtemplate.tf#L21

https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/workers_launch_template.tf#L5

another name_prefix spot: https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/workers_launch_template.tf#L235

@ArchiFleKs
Copy link
Contributor Author

ArchiFleKs commented Dec 8, 2020

@cabrinha thanks for the review, actually I used Terragrunt + Terraform but I was able to get a cluster running, I havent tested without disk_size (I tested with 50 also) so I think we need to set a default because there is none with launch template.

If we set the for_each to for_each = local.node_groups_expanded it will create a launch template for every node group

About the name I don think managed node group is setting the name tag on instances as even my "classic" managed node pool don't have one.

@ArchiFleKs
Copy link
Contributor Author

@cabrinha also the plan is passing with the examples/managed_node_groups and:

  node_groups = {
    example = {
      desired_capacity       = 1
      max_capacity           = 10
      min_capacity           = 1
      create_launch_template = true
      kubelet_extra_args     = "--node-labels=role=nginx,group=nginx"

      instance_type = "m5.large"
      k8s_labels = {
        Environment = "test"
        GithubRepo  = "terraform-aws-eks"
        GithubOrg   = "terraform-aws-modules"
      }
      additional_tags = {
        ExtraTag = "example"
      }
    }
  }

@cabrinha
Copy link
Contributor

cabrinha commented Dec 8, 2020

@cabrinha thanks for the review, actually I used Terragrunt + Terraform but I was able to get a cluster running, I havent tested without disk_size (I tested with 50 also) so I think we need to set a default because there is none with launch template.

If we set the for_each to for_each = local.node_groups_expanded it will create a launch template for every node group

Could we add a simple create = true/false on each group? I guess thats what the create_launch_template flag would do.

About the name I don think managed node group is setting the name tag on instances as even my "classic" managed node pool don't have one.

I wish there was a way to add the Name tag to the instances spun up. The EC2 instances list is horrible without any Names 😅

Also, it'd be nice to add capacity_type too, since AWS now supports Managed Node Groups with Spot Instances.

What version of Terraform are you using?

$ terraform version
Terraform v0.12.29
+ provider.aws v3.20.0
+ provider.kubernetes v1.13.3
+ provider.local v2.0.0
+ provider.null v3.0.0
+ provider.random v3.0.0
+ provider.template v2.2.0

I'm on 0.12.29, using the same example code block you are and I'm still getting the error:

Error: Invalid for_each argument

  on ../../../modules/terraform-aws-eks/modules/node_groups/launchtemplate.tf line 2, in data "template_file" "workers_userdata":
   2:   for_each = { for k, v in local.node_groups_expanded : k => v if v["create_launch_template"] }

The "for_each" value depends on resource attributes that cannot be determined
until apply, so Terraform cannot predict how many instances will be created.
To work around this, use the -target argument to first apply only the
resources that the for_each depends on.

disk_size = lookup(each.value, "disk_size", null)
instance_types = each.value["launch_template_id"] != null ? [] : [each.value["instance_type"]]
disk_size = each.value["launch_template_id"] != null || each.value["create_launch_template"] ? null : lookup(each.value, "disk_size", null)
instance_types = each.value["launch_template_id"] != null || each.value["create_launch_template"] ? [] : [each.value["instance_type"]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be cool to add capacity_type = lookup(each.value, "capacity_type", "ON_DEMAND") right above this, so users could choose to use "SPOT"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If think this is implemented by #1129

@ArchiFleKs
Copy link
Contributor Author

@cabrinha Alright for the spot and the name.

But I have to admit I really do not understand the part about about the create = true/false because that is what I'm trying to with for_each = { for k, v in local.node_groups_expanded : k => v if v["create_launch_template"] } which is same syntax as here

I'm using the latest Terraform 0.13.5, I have ont tried with 0.12 I will test on my end.

@cabrinha
Copy link
Contributor

cabrinha commented Dec 8, 2020

@cabrinha Alright for the spot and the name.

But I have to admit I really do not understand the part about about the create = true/false because that is what I'm trying to with for_each = { for k, v in local.node_groups_expanded : k => v if v["create_launch_template"] } which is same syntax as here

I'm using the latest Terraform 0.13.5, I have ont tried with 0.12 I will test on my end.

Supposedly this issue is more likely to happen on a blank tfstate.

@barryib barryib self-assigned this Dec 22, 2020
@barryib barryib changed the title feat: enable default launch template feat: Create launch template for Managed Node Groups Dec 23, 2020
@cabrinha
Copy link
Contributor

cabrinha commented Dec 29, 2020

Seems these three PRs are all targeting the same goal: #1161 #1129

@binnythomas-1989
Copy link

Guys I have a question related to remote access, you won't be able to mention remote access on node groups as per aws docs

Per our documentation[1] When using a launch template, if any of the following parameters are specified in the node group configuration, your create or update request will fail. Specify these in your launch template:

- Instance type
- Disk size
- Remote access configuration
- EC2 SSH key

I dont see a handle of that on your code launch_template.tf. would be great if you can clarify.

@ArchiFleKs
Copy link
Contributor Author

Guys I have a question related to remote access, you won't be able to mention remote access on node groups as per aws docs

Per our documentation[1] When using a launch template, if any of the following parameters are specified in the node group configuration, your create or update request will fail. Specify these in your launch template:

- Instance type
- Disk size
- Remote access configuration
- EC2 SSH key

I dont see a handle of that on your code launch_template.tf. would be great if you can clarify.

You mean to perform validation on the input ?

@binnythomas-1989
Copy link

binnythomas-1989 commented Jan 11, 2021

Guys I have a question related to remote access, you won't be able to mention remote access on node groups as per aws docs

Per our documentation[1] When using a launch template, if any of the following parameters are specified in the node group configuration, your create or update request will fail. Specify these in your launch template:

- Instance type
- Disk size
- Remote access configuration
- EC2 SSH key

I dont see a handle of that on your code launch_template.tf. would be great if you can clarify.

You mean to perform validation on the input ?

What I mean is if you add a remote access as an input to nodegroups with an associated launch template, you would end up with the below error.

Error: error creating EKS Node Group (nodegroup-test:x86_64-driving-louse): InvalidParameterException: Remote access configuration cannot be specified with a launch template.
{

You would need to handle the remote_access on the launch_template.tf. If im wrong do correct me.
Sample link
https://github.com/cloudposse/terraform-aws-eks-node-group/blob/master/launch-template.tf

@ArchiFleKs
Copy link
Contributor Author

@binnythomas-1989 I think I understand, we need to prevent setting remote_access here : https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/modules/node_groups/node_groups.tf#L21 if launchtemplate is used

@binnythomas-1989
Copy link

binnythomas-1989 commented Jan 11, 2021

you are correct @ArchiFleKs. You would need to add the key on the launch template too.

@binnythomas-1989
Copy link

binnythomas-1989 commented Jan 12, 2021

Guys adding the Kubelet argument is great. I just tested it. Im kinda sorry with amazon their EKS is kinda really annoying.

So let me explain what I have figured.
You can add node-taints something like below.

kubelet_extra_args = "--node-labels=eks.amazonaws.com/nodegroup=company-net --register-with-taints=network=company:NoSchedule"

The problem is once you add the taints with a NoSchedule for example, the node won't join the cluster. You would have issues with the coredns pod scheduling since its a deployment. It won't be able to schedule.

So letting people to use this for taints is a bad idea on EKS.

@ArchiFleKs
Copy link
Contributor Author

@binnythomas-1989 it is true for coreDNS, aws-node tolerate every taint, you should always have a "default" pool or a "criticalAddonsOnly" pool if you want.

I agree this is kind of a poweruser feature, node join also failed if you are using kubernetes.io forbidden label. But letting user the ability to have "reserved" node pool, for GPU etc is a must have feature in my opinion while it is not natively supported by the EKS API whereas label are.

@ArchiFleKs
Copy link
Contributor Author

There is an example right now on how to use a launch template but it is not straightforward, this PR goal is to enable a simple configuration while still allowing to use a custom launch template if needed or a basic node pool.

eksctl is using a default launch template also to enable taint on node groups

@binnythomas-1989
Copy link

@binnythomas-1989 it is true for coreDNS, aws-node tolerate every taint, you should always have a "default" pool or a "criticalAddonsOnly" pool if you want.

I agree this is kind of a poweruser feature, node join also failed if you are using kubernetes.io forbidden label. But letting user the ability to have "reserved" node pool, for GPU etc is a must have feature in my opinion while it is not natively supported by the EKS API whereas label are.

The kube-proxy and aws-node is a DaemonSet so that's okay, its just the core-dns is an issue since its a deployment. Anyways it could be useful if we add a condition on the README. i was trying to adopt your solution on my local terraform module I use. Im planning to handle this using terraform using kubectl provider for now. :-) I dont use eksctl

@ArchiFleKs
Copy link
Contributor Author

@binnythomas-1989
Copy link

@binnythomas-1989 this : https://github.com/terraform-aws-modules/terraform-aws-eks/pull/1138/files#diff-7a3fc6c7df17fda0c341e61255461bf1f149256a9ddf14d4a18ab6f020d08136R22 should take care of remote access, could you try ? If not I'll try to test today

This won't work again, Because now you need to handle the Remote access on Launch template and it would go like this

  key_name = each.value.ec2_ssh_key_pair

So that bit is being handled on the launch_template.tf

@ArchiFleKs
Copy link
Contributor Author

@binnythomas-1989 this : https://github.com/terraform-aws-modules/terraform-aws-eks/pull/1138/files#diff-7a3fc6c7df17fda0c341e61255461bf1f149256a9ddf14d4a18ab6f020d08136R22 should take care of remote access, could you try ? If not I'll try to test today

This won't work again, Because now you need to handle the Remote access on Launch template and it would go like this

  key_name = each.value.ec2_ssh_key_pair

So that bit is being handled on the launch_template.tf

I added the key_name lookup

@bobbywatson3
Copy link

This is exactly what we're looking for. Thank you for the work done on this so far!

Copy link
Member

@barryib barryib left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that I didn't submit my review. It was in pending state.

modules/node_groups/launchtemplate.tf Outdated Show resolved Hide resolved
modules/node_groups/launchtemplate.tf Outdated Show resolved Hide resolved
modules/node_groups/launchtemplate.tf Outdated Show resolved Hide resolved
}

# if you want to use a custom AMI
# image_id = var.ami_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we allow this ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think because if using a custom image, this does not apply: https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/examples/launch_templates_with_managed_node_groups/launchtemplate.tf#L18 I think this lead to different behavior between EKS ami and other AMI, but I have not tested with custom AMI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should stay simple, because we still allow user to use a custom launch template if needed, or maybe this can be added in another PR as I'm not really sure on how to handle cloud-init with custom AMI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not implemented for now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as you partially copied the LT from the examples that got added in my #997 , I might be able to help here:

so I am using LT with a custom AMI and it works just fine.
however there is indeed subtle but important differences between using an LT w/ or w/o a custom AMI. In the old PR, someone else described it quite well, see #997 (comment)

He also mentions a then required fix to the MIME boundary when using cloudinit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philicious do you mean that this part should be added manually when using custom AMI:

set -ex
B64_CLUSTER_CA=LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUN5RENDQWJDZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRJeE1ESXdNakUyTXpJeU0xb1hEVE14TURFek1URTJNekl5TTFvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBTkhXCk5vZjgxekorcGIxdEswMXRWVExSNEd0NDBDbkw5TU5vV0hSWGc3WndNVFkzcHVQMm05TlkvSXJ2bEZ2dDNNUVcKejUrb0FRdU8rcHA2RUFQOEZFK0JGaUVSVXpMZTYvbXFscGg2S2hmOEsyQU45QUN2RUYvMWlYNlQvWFlDdlRrRQp5MmhYSk1CUnVGSVF6dGVSaDEwRTFBZG5UWDdxNUY5RlhIY2VzR285TGlPbmRNMVpQRGpPS2lnZ0hMK2xheG4wCnN0bDlxeGZrYWZpMHNzb0ZCcUM3eGU1SGt2OVowYTYvRmxWeVNXazFQQXFCWDZOTlUvc0RjNTA3bXN0OEVMc0oKSU9naWFTcGZLaXVnekZNaTlTS3NQbjRQcm94UDEwRlErOGpSdTZZdm9tQmswMHFnU2NFTGxadng0bG1CVGloSgpCdDdFTlUxMzdvSXdhY1pCUUNFQ0F3RUFBYU1qTUNFd0RnWURWUjBQQVFIL0JBUURBZ0trTUE4R0ExVWRFd0VCCi93UUZNQU1CQWY4d0RRWUpLb1pJaHZjTkFRRUxCUUFEZ2dFQkFJam1ZWmthV3NuQ1lSNkpQUGw1WmVpcGkzYkYKREpBUzgvM2E4UFVnL3BsWTFVYlhCalU3b0FDb21UUzd2Y2hPUFU5aFNXdC9jNit5RnF5a0FwakMyRjFuSHg4WQpaQUg5NDFWYUNzRyt3VmE3MTJlcFRPTSt1TWxNSENFYVlMVTRKOXEvaUd1aVZtM2NPOGhmMTFoNjVGd3NuekE0CmdqQ0YxUC9Sdi9acnFSSk9XZmJaRE00MzlwajVqQzNYRVAyK1FXVlIzR2tzbW1NcDVISm9NZW5JaDBSTFhnK1oKTVRVNXFsdW0xTWZDdXRNVjkzNGJFQ21BRERJSm4rZVdHSERwRi9QOThnR1RyRU1QclhiUXZMblpwZHBNYldjNQp5LzZldkNtYXozMzllSlUwWkRaM1M0R2YvbEpBUTBZcFZoQkRlS2hXVHEwSXJYb2NWWHU5MDN0OXU5TT0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
API_SERVER_URL=https://A62655F81AE9347A761BB172E28A633F.sk1.eu-west-1.eks.amazonaws.com
K8S_CLUSTER_DNS_IP=172.20.0.10
/etc/eks/bootstrap.sh pio-thanos --kubelet-extra-args '--node-labels=eks.amazonaws.com/sourceLaunchTemplateVersion=1,GithubRepo=terraform-aws-eks,eks.amazonaws.com/nodegroup-image=ami-066fad1ae541d1cf9,Environment=test,eks.amazonaws.com/capacityType=ON_DEMAND,eks.amazonaws.com/nodegroup=pio-thanos-default-eu-west-1a-expert-bass,eks.amazonaws.com/sourceLaunchTemplateId=lt-079cbc5cf74ace131,GithubOrg=terraform-aws-modules' --b64-cluster-ca $B64_CLUSTER_CA --apiserver-endpoint $API_SERVER_URL --dns-cluster-ip $K8S_CLUSTER_DNS_IP

About the boudaries i think this is fixed and I implemented it here

I'm just not sure how to handle the custom AMI, how do you do it on your end ? From what I understand you need to pass the bootstrap command on your own when using custom AMI, because it does not get merge

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could be right about the boundaries. I remember that someone wanted to update the cloudinit to support custom boundaries. so ye, probably and if this PR produces a running EKS, then its proven.

and ye, with a custom AMI, you have to supply the entire userdata yourself as no merging happens with the default one from EKS.
so I in the end use it like I added in the examples.

I would have to have a closer look and do tests on how to add custom AMI support and satisfy these differences.

it would for sure be great if the module could also handle custom AMI if it already got LT generation added.

modules/node_groups/templates/userdata.sh.tpl Outdated Show resolved Hide resolved
@stevehipwell
Copy link
Contributor

@ArchiFleKs do you have a rough timeline for this work?

Also, would it be possible to add support for pre_userdata and additional_userdata? We need this for custom certificates, docker credentials and custom labels from AWS metadata.

Signed-off-by: Kevin Lefevre <lefevre.kevin@gmail.com>
@cabrinha
Copy link
Contributor

@ArchiFleKs looks like we got some conflicts here. Can you fix these up please?

@cabrinha I"m not sure I'll have the time to retest after the merge this week end but I'll try when I can. It would be great if you could or someone else

Sure, I can retest this at any time and post my configs here.

@ArchiFleKs
Copy link
Contributor Author

Alright conflict should be fixed now

@cabrinha
Copy link
Contributor

@ArchiFleKs looks like we got some conflicts here. Can you fix these up please?

@cabrinha I"m not sure I'll have the time to retest after the merge this week end but I'll try when I can. It would be great if you could or someone else

Just tested this config:

  node_groups = {
    managed = {
      desired_capacity = 1
      max_capacity     = 5
      min_capacity     = 1

      instance_types = [
        "c3.2xlarge",
        "c4.xlarge",
        "c4.2xlarge",
      ]
      capacity_type  = "SPOT"
      root_volume_type = "gp2"
      root_volume_size = 10
      kubelet_extra_args = "--node-labels=node.kubernetes.io/lifecycle=spot,role=worker,node.kubernetes.io/exclude-from-external-load-balancers --register-with-taints=dedicated=managed:NoSchedule"
      k8s_labels = {
        Environment = "test"
        GithubRepo  = "terraform-aws-eks"
        GithubOrg   = "terraform-aws-modules"
      }

      additional_tags = {
        CustomTag = "EKS example"
      }
    }
  }

Seems to be working well

@devy294
Copy link

devy294 commented Apr 19, 2021

When will this be merged?

@cabrinha
Copy link
Contributor

cabrinha commented Apr 19, 2021

When will this be merged?

Great question. @barryib time to review and merge?

@stevehipwell
Copy link
Contributor

@barryib it looks like the changes in this PR didn't make it into v15.0.0 or v15.1.0, do you have a plan for when they are going to be released?

@cabrinha
Copy link
Contributor

cabrinha commented Apr 20, 2021

@barryib it looks like the changes in this PR didn't make it into v15.0.0 or v15.1.0, do you have a plan for when they are going to be released?

Aren't the changes here? v15.1.0...master 2e1651d

@archoversight
Copy link

@barryib it looks like the changes in this PR didn't make it into v15.0.0 or v15.1.0, do you have a plan for when they are going to be released?

Aren't the changes here? v15.1.0...master 2e1651d

That first link is showing the diff between v15.1.0 and master. Your second link shows that it is only on master and not yet in any tags.

@martin308
Copy link

I'm not seeing this work with kubelet_extra_args. When this is used the userdata that is provided to the launch template only includes the additions provided to kubelet_extra_args and none of the other required parameters to bootstrap.

Example:

retriever = {
       instance_types    = ["c5.xlarge"]
       desired_capacity  = 1
       max_capacity      = 1
       min_capacity      = 1
       disk_size         = 20
       kubelet_extra_args = "--node-labels=test=extra_args"
       create_launch_template = true
      }

Produces the following userdata for the launch configuration:

Content-Type: multipart/mixed; boundary="//"
MIME-Version: 1.0

--//
Content-Transfer-Encoding: 7bit
Content-Type: text/x-shellscript
Mime-Version: 1.0

#!/bin/bash -e

# Allow user supplied pre userdata code


sed -i '/^KUBELET_EXTRA_ARGS=/a KUBELET_EXTRA_ARGS+=" --node-labels=test=extra_args"' /etc/eks/bootstrap.sh

--//--

Which is missing all of the other required parameters provided by the EKS AMI which is vital for registering the nodes in the cluster.

Am I missing something or does this not work?

@stevehipwell
Copy link
Contributor

@martin308 this hasn't been released yet, so unless you're using ref=master it won't do anything.

@jfoechsler
Copy link


Which is missing all of the other required parameters provided by the EKS AMI which is vital for registering the nodes in the cluster.

Am I missing something or does this not work?

@martin308 I'm pretty sure what you are missing/forgetting, is this: https://docs.aws.amazon.com/eks/latest/userguide/launch-templates.html#launch-template-user-data about using custom user data snippets while still using official AMI.

You would need full user data in case of custom AMI, but in that case you would also have created the full regular launch template.

@martin308
Copy link

@martin308 this hasn't been released yet, so unless you're using ref=master it won't do anything.

yup, pulling down the merge commit by ref 👍

source = "github.com/terraform-aws-modules/terraform-aws-eks?ref=2e1651df86bd315000738cf901a4cc0586be1af3"


Which is missing all of the other required parameters provided by the EKS AMI which is vital for registering the nodes in the cluster.

Am I missing something or does this not work?

@martin308 I'm pretty sure what you are missing/forgetting, is this: https://docs.aws.amazon.com/eks/latest/userguide/launch-templates.html#launch-template-user-data about using custom user data snippets while still using official AMI.

You would need full user data in case of custom AMI, but in that case you would also have created the full regular launch template.

I'm not using a custom AMI as per my example. I guess I'm just confused as how to use the kubelet_extra_args feature added in this PR.

Are you saying that it is expected that my example above would not work? If so is there an example of how to make use of the kubelet_extra_args feature with the official AMI?

@jfoechsler
Copy link

Are you saying that it is expected that my example above would not work? If so is there an example of how to make use of the kubelet_extra_args feature with the official AMI?

No I'm saying the opposite :) My understanding is your example with the resulting LT should work and be able to join cluster (due to the merging of user data done outside control of this terraform module). I'm interested to hear if your testing confirm that.

@ArchiFleKs
Copy link
Contributor Author

I just tested with master and the following configuration:

  node_groups = {
    "default-${local.aws_region}a" = {
      ami_type = "AL2_ARM_64"
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[0]]
      disk_size        = 20
    }

    "default-${local.aws_region}b" = {
      ami_type = "AL2_ARM_64"
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[1]]
      disk_size        = 20
    }

    "default-${local.aws_region}c" = {
      ami_type = "AL2_ARM_64"
      create_launch_template        = true
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[2]]
      kubelet_extra_args = "--node-labels=role=private --register-with-taints=dedicated=private:NoSchedule"
      disk_size        = 20
    }
  }

It is working as expected

@ArchiFleKs
Copy link
Contributor Author

I can confirm it works with x86 also:

  node_groups = {
    "default-${local.aws_region}a" = {
      ami_type = "AL2_ARM_64"
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[0]]
      disk_size        = 20
    }

    "default-${local.aws_region}b" = {
      ami_type = "AL2_ARM_64"
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[1]]
      disk_size        = 20
    }

    "default-${local.aws_region}c" = {
      ami_type = "AL2_ARM_64"
      create_launch_template        = true
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t4g.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[2]]
      kubelet_extra_args = "--node-labels=role=private --register-with-taints=dedicated=private:NoSchedule"
      disk_size        = 20
    }
    "taint-${local.aws_region}c" = {
      create_launch_template        = true
      desired_capacity = 1
      max_capacity     = 3
      min_capacity     = 1
      instance_types    = ["t3a.large"]
      subnets          = [dependency.vpc.outputs.private_subnets[2]]
      kubelet_extra_args = "--node-labels=role=private --register-with-taints=dedicated=private:NoSchedule"
      disk_size        = 20
    }
  }

@ArchiFleKs
Copy link
Contributor Author

You have to use the create_launch_template flag which is not by default or else the kubelet_extra_args are not passed to anything.

@stevehipwell
Copy link
Contributor

@ArchiFleKs I've been looking at further customizing the manage node group bootstrap process and I'd be interested if you tried setting the KUBELET_EXTRA_ARGS environment variable instead of using sed?

@ipleten
Copy link

ipleten commented Jul 15, 2021

@ArchiFleKs I've been looking at further customizing the manage node group bootstrap process and I'd be interested if you tried setting the KUBELET_EXTRA_ARGS environment variable instead of using sed?

I tried and seems cloud-init don't preserve exported variables between its parts (w/o custom AMI user-data get merged with the one provided by AWS). One of solutions might be to write vars to some file like /etc/eks/boostrap-vars and modify bootstrap.sh to read them later.

@stevehipwell
Copy link
Contributor

@ipleten it was a leading question, I've done this exact thing for some of the other env variables by persisting the export. I'll probably open a PR to change this as it's more resilient to AMI changes than the sed solution.

@stevehipwell
Copy link
Contributor

@ipleten it looks like I already did, #1433.

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.