Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: LaunchTemplate support for managed node-groups #997

Merged
merged 4 commits into from
Nov 2, 2020
Merged

feat: LaunchTemplate support for managed node-groups #997

merged 4 commits into from
Nov 2, 2020

Conversation

philicious
Copy link
Contributor

@philicious philicious commented Aug 30, 2020

PR o'clock

Description

fixes #979

Just recently on 17th August, AWS released LaunchTemplate support for managed node-groups. https://aws.amazon.com/blogs/containers/introducing-launch-template-and-custom-ami-support-in-amazon-eks-managed-node-groups/
Furthermore the aws provider also supports it since 3.3.0 hashicorp/terraform-provider-aws#14639

This module didn't support it yet, only LTs for self-managed worker groups.

As the module is quite complex already, I only added support for providing the id of LT you create yourself and then supply the Id.
The existing workers_launch_template.tf couldn't have been easily reused imho as its related to also creating ASGs and other resources for self-managed node-groups. Also at least the iam_instance_profile should NOT be supplied for LTs being used for managed node-groups as I noticed. AWS API will error then.
So instead of adding another LT manifest and variables and wiring that all together, I prefered to take care of the LT myself and copied the userdata template

I now create a LT like

data "template_file" "launch_template_userdata" {
  template = file("${path.module}/templates/userdata.sh.tpl")

  vars = {
    cluster_name        = var.cluster_name
    endpoint            = module.eks.cluster_endpoint
    cluster_auth_base64 = module.eks.cluster_certificate_authority_data

    bootstrap_extra_args = ""
    kubelet_extra_args = ""
    }
}

// this is mostly the default LT that AWS would create if you dont specify your own
resource "aws_launch_template" "default" {
  name_prefix     = "${var.cluster_name}-"
  description     = "Default Launch-Template for clusters"
  update_default_version = true

  block_device_mappings {
    device_name = "/dev/xvda"

    ebs {
      volume_size           = 100
      volume_type           = "gp2"
      delete_on_termination = true
    }
  }

  ebs_optimized = true // some instance types dont support it, so check when changing type

  image_id      = "ami-00341e507eb458a09" //TODO use our custom AMI
  instance_type = var.instance_type

  monitoring {
    enabled = true
  }

  network_interfaces {
    associate_public_ip_address = false
    delete_on_termination       = true
    security_groups             = [module.eks.worker_security_group_id]
  }

  user_data = base64encode(
    data.template_file.launch_template_userdata.rendered,
  )

  lifecycle {
    create_before_destroy = true
  }  
}

and pass it to the eks module

module "eks" {
  source          = "../../../../../terraform-aws-eks/"
  cluster_name    = var.cluster_name
  cluster_version = "1.16"
  subnets         = data.terraform_remote_state.network.outputs.private_subnets

  vpc_id = data.terraform_remote_state.network.outputs.vpc_id

  node_groups = {
    initial_group = {
      desired_capacity = 1
      max_capacity     = 15 // lets cap a cluster to 15 nodes, so ASGs cannot go insane
      min_capacity     = 1

      launch_template_id = aws_launch_template.default.id
      launch_template_version = aws_launch_template.default.default_version 

    }
  }  
...
...
}

Noteworthy: I had to wrap the userdata script in MIME as otherwise API complained it wasnt MIME. so I took the MIME wrapper as seen when creating a node group manually and letting AWS create a LT for you

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="//"

--//
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash 
set -xe

# Bootstrap and join the cluster
/etc/eks/bootstrap.sh --b64-cluster-ca '${cluster_auth_base64}' --apiserver-endpoint '${endpoint}' ${bootstrap_extra_args} --kubelet-extra-args "${kubelet_extra_args}" '${cluster_name}'

--//--

Checklist

@philicious philicious marked this pull request as draft August 30, 2020 22:35
@philicious philicious marked this pull request as ready for review September 1, 2020 22:48
@philicious
Copy link
Contributor Author

PR would be ready. Maybe adding another example would help tho, how to actually use the launch_template_id as outlined above

@calvinbui
Copy link

I had success using https://registry.terraform.io/providers/hashicorp/cloudinit/latest/docs/data-sources/cloudinit_config to wrap the userdata but I would say it shouldn't be part of this module like how you're making the LT not part of this module

@calvinbui
Copy link

calvinbui commented Sep 2, 2020

Though I might suggest some documentation with an example of what the LT should look like.

i.e. instance profile cannot be set, EKS will complain about that

@ghost
Copy link

ghost commented Sep 8, 2020

One good candidate for example is installing ssm agent on startup. See aws/containers-roadmap#593 (comment)

@philicious
Copy link
Contributor Author

I will add an example and make this PR fully ready towards weekend.
I am super busy and stressed with customers roadmap atm

@dpiddockcmp
Copy link
Contributor

Does this actually work? Reading through the documentation they list some node group configurations that are prohibited when using launch templates:

Prohibited value Module default aws provider default
ami_type (if you specify custom AMI) unset AL2_x86_64
instance_type m4.large t3.medium
disk_size unset 20 GB
ssh keypair and extra security groups unset unset

Or maybe by "prohibited" they mean ignored from the API values?

@philicious
Copy link
Contributor Author

Yes it does. Have cluster running with custom AMI by that

@philicious
Copy link
Contributor Author

@dpiddockcmp the documentation reads

The following table lists the settings that are prohibited in a managed node group configuration and which similar settings (if any) are required in a launch template.

so it is correct to not set the disk_size / ami_type and instance_typein the node_groups config block but rather in the aws_launch_template , as seen in my example :)

@nxf5025
Copy link
Contributor

nxf5025 commented Sep 21, 2020

@dpiddockcmp

Does this actually work?

I tested this over the weekend and it worked great. Only addition I needed was to add iam_role_arn to the node_groups map:

  node_groups = {
    initial_group = {
      desired_capacity = 1
      max_capacity     = 15
      min_capacity     = 1
      iam_role_arn     = <arn>

      launch_template_id = aws_launch_template.default.id
      launch_template_version = aws_launch_template.default.default_version 

    }
  }  

@karlderkaefer
Copy link

@philicious I can confirm that it works as you propose.

@Carles-Figuerola
Copy link

Made a PR to make the random naming solution work properly when the node group needs to be replaced:
https://github.com/Bahn-X/terraform-aws-eks/pull/1

@myoung34
Copy link
Contributor

myoung34 commented Oct 1, 2020

Anything holding this back? I'd love to make use of it so we can use security groups and tags

cc @dpiddockcmp

@philicious
Copy link
Contributor Author

That PR of @Carles-Figuerola looks very good and besides that only example missing. Iam in holidays and had some stressful weeks at my client before but next week Wednesday could finish it up !
Using this productively at the client already :)

@barryib
Copy link
Member

barryib commented Oct 6, 2020

Thanks @philicious for working on this. We have an terraform-aws-modules working session this friday. We'll discuss about the direction we want to take about this feature. We'll come back to you pretty soon.

@huguesalary
Copy link
Contributor

If that helps, I have tested this PR as well and it works as advertised, thank you @philicious and the maintainers of this repo for your work!

@davidalger
Copy link
Contributor

davidalger commented Oct 8, 2020

When using a custom AMI, Amazon EKS doesn't merge any user data. Rather, you are responsible for supplying the required bootstrap commands for nodes to join the cluster. If your nodes fail to join the cluster, the Amazon EKS CreateNodegroup and UpdateNodegroupVersion actions also fail.

Another important bit for documentation. The example in this PR description shows the call to "Bootstrap and join the cluster" as part of the user data. Including this will in fact fail the managed node when NOT using a custom AMI. The above note is from EKS docs. When using the default AMI, the call to bootstrap.sh is merged into the user-data.

We have an eks module wrapping this one, and this is what I used for the LT looks like:

data "cloudinit_config" "node_group" {
  gzip          = false
  base64_encode = false

  part {
    content_type = "text/x-shellscript"
    content      = <<-EOT
      yum install -y gnupg
      wget https://inspector-agent.amazonaws.com/linux/latest/install
      wget https://d1wk0tztpsntt1.cloudfront.net/linux/latest/inspector.gpg
      gpg --import inspector.gpg
      wget https://inspector-agent.amazonaws.com/linux/latest/install.sig
      gpg --verify install.sig || exit 1
      sudo bash install
    EOT
  }
}

resource "aws_launch_template" "node_group" {
  for_each = node_groups

  tags = var.tags
  name = "${var.name}-${each.key}-node-group"

  instance_type = lookup(each.value, "instance_type", var.node_group_instance_type)
  user_data     = base64encode(replace(data.cloudinit_config.node_group.rendered, "MIMEBOUNDARY", "//"))

  dynamic "tag_specifications" {
    for_each = ["instance", "volume"]

    content {
      tags          = var.tags
      resource_type = tag_specifications.value
    }
  }
  update_default_version = true
}

module "eks" {
  # Loading from this PR branch for now until launch template configuration is
  # merged: https://github.com/terraform-aws-modules/terraform-aws-eks/pull/997
  source          = "github.com/bahn-x/terraform-aws-eks.git?ref=ca321509"

  // ...

  node_groups = { for key, value in var.node_groups : key => merge({
      instance_type           = var.node_group_instance_type
      min_capacity            = var.node_group_min_capacity
      max_capacity            = var.node_group_max_capacity
      desired_capacity        = var.node_group_min_capacity
      launch_template_id      = aws_launch_template.node_group[key].id
      launch_template_version = aws_launch_template.node_group[key].default_version
    }, value) }
  )
}

Replacing the correct multi-part boundary string into the user data seemed to be important or I'd get an error about not using multi-part:

user_data     = base64encode(replace(data.cloudinit_config.node_group.rendered, "MIMEBOUNDARY", "//"))

Others noted the need to use iam_role_arn; wasn't something I ran into. Working without it in my case.

I haven't checked when using a custom AMI, but in my case (where a custom AMI is NOT being used) EKS is creating a new launch template based on the provided one, with the bootstrap script merged into the user data.

Many thanks to @philicious for putting this together! I've got it applied to multiple EKS clusters, where I can confirm it's working, and I hope to see it incorporated in some way on a tagged release soon.

@philicious
Copy link
Contributor Author

philicious commented Oct 9, 2020

Another thing to note is about using disk/EBS encryption with LaunchTemplates, as asked for in #1023

I left that out of my initial example but its worth going into the example then too:

If you set

    ebs {
..
      encrypted             = true
      kms_key_id            = var.kms_key_arn
    }

you also need to add a key policy to that KMS key, so the cluster-autoscaler can decrypt volumes

resource "aws_iam_service_linked_role" "autoscaling" {
  aws_service_name = "autoscaling.amazonaws.com"
  description      = "Default Service-Linked Role enables access to AWS Services and Resources used or managed by Auto Scaling"
}

data "aws_caller_identity" "current" {}

// This policy is required for the EKS KMS, so the cluster is allowed to enc/dec/attach encrypted EBS volumes
data "aws_iam_policy_document" "ebs_decryption" {
  // copy of default KMS policy that lets us manage it
  statement {
    sid    = "Enable IAM User Permissions"
    effect = "Allow"

    principals {
      type        = "AWS"
      identifiers = ["arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"]
    }

    actions = [
      "kms:*"
    ]

    resources = ["*"]
  }

  // required for EKS
  statement {
    sid    = "Allow service-linked role use of the CMK"
    effect = "Allow"

    principals {
      type = "AWS"
      identifiers = [
        "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling", // required for the ASG to manage encrypted volumes for nodes
        module.eks.cluster_iam_role_arn,                                                                                                            // required for the app cluster / persistentvolume-controller to create encrypted PVCs
        module.obs_eks.cluster_iam_role_arn                                                                                                         // required for the obs cluster / persistentvolume-controller to create encrypted PVCs
      ]
    }

    actions = [
      "kms:Encrypt",
      "kms:Decrypt",
      "kms:ReEncrypt*",
      "kms:GenerateDataKey*",
      "kms:DescribeKey"
    ]

    resources = ["*"]
  }

  statement {
    sid    = "Allow attachment of persistent resources"
    effect = "Allow"

    principals {
      type = "AWS"
      identifiers = [
        "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling", // required for the ASG to manage encrypted volumes for nodes
        module.eks.cluster_iam_role_arn,                                                                                                            // required for the app cluster / persistentvolume-controller to create encrypted PVCs
        module.obs_eks.cluster_iam_role_arn                                                                                                         // required for the obs cluster / persistentvolume-controller to create encrypted PVCs
      ]
    }

    actions = [
      "kms:CreateGrant"
    ]

    resources = ["*"]

    condition {
      test     = "Bool"
      variable = "kms:GrantIsForAWSResource"
      values   = ["true"]
    }

  }
}

@philicious
Copy link
Contributor Author

@barryib any news based on your last fridays working session? I'd like to see this PR merged and I think many other would also be happy. I need to know what I should still add to it so you guys are happy with it :)

@huguesalary

This comment has been minimized.

@philicious
Copy link
Contributor Author

@huguesalary are you sure you havent accidentally also set instance_type on the node-group in addition to launchtemplate?
as I cant reproduce described behavior. i.e. aws eks describe-nodegroup --cluster-name prod-app-cluster --nodegroup-name prod-app-cluster-initial_group-teaching-mackerel

doesnt contain instanceTypes for me !

@huguesalary

This comment has been minimized.

@huguesalary
Copy link
Contributor

Re-creating the cluster from scratch fixed the issue mentioned in comments #997 (comment) and #997 (comment)

@stibi
Copy link

stibi commented Oct 15, 2020

I just deployed (actually redeployed just the node groups) a couple of EKS cluster and everything is fine. Thanks a lot for this pull request!

How can I help here so it can be merged? Maybe provide an example of the code? Anything else?

@vara-bonthu
Copy link

I am really looking forward for this change to be merged. I have tested this branch and it works fine for me with the launch templates for multiple node groups

@barryib
Copy link
Member

barryib commented Oct 15, 2020

@barryib any news based on your last fridays working session? I'd like to see this PR merged and I think many other would also be happy. I need to know what I should still add to it so you guys are happy with it :)

I'm going to do some tests during the week-end so we can merge it. Thanks in advance for your comprehension.

@philicious
Copy link
Contributor Author

@barryib sounds awesome ! I'll allot some weekend time then for adding examples / doc, based on comments in here

@philicious philicious requested a review from barryib October 31, 2020 00:12
@philicious
Copy link
Contributor Author

@barryib i addressed the change requests. Pls have a look 🙂

@barryib barryib merged commit 127a3a8 into terraform-aws-modules:master Nov 2, 2020
@barryib
Copy link
Member

barryib commented Nov 2, 2020

Thanks a lot @philicious for your contribution. I'll push a new release during the day.

@barryib
Copy link
Member

barryib commented Nov 2, 2020

v13.1.0 is now released. Thank you all for your works.

@philicious
Copy link
Contributor Author

@barryib thanks for doing such a quick release !! ❤️

@yoanisgil
Copy link

@philicious I just wanted to thank you for your work on this. You're a life saver!

@MBalazs90
Copy link

MBalazs90 commented Nov 14, 2020

@philicious I tried your launch_templates_with_managed_node_groups with custom userdata installing amazon-ssm-agent but I'm getting this error

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="//"

--//
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash 
set -xe

sudo yum install -y https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/linux_amd64/amazon-ssm-agent.rpm
sudo systemctl enable amazon-ssm-agent
sudo systemctl start amazon-ssm-agent
sudo systemctl status amazon-ssm-agent

# Bootstrap and join the cluster
/etc/eks/bootstrap.sh --b64-cluster-ca '${cluster_auth_base64}' --apiserver-endpoint '${endpoint}' ${bootstrap_extra_args} --kubelet-extra-args "${kubelet_extra_args}" '${cluster_name}'

--//--

Error: error waiting for EKS Node Group (test-eks-lt-zPGZQU0N:test-eks-lt-zPGZQU0N-example-polished-leopard) creation: NodeCreationFailure: Instances failed to join the kubernetes cluster. Resource IDs: [i-0b13057dca7c45b3e]

I am using the default AMI, custom AMI is not enabled

@philicious
Copy link
Contributor Author

@MBalazs90 I have myself rarely but have seen that error before. For different reasons.

For debugging, ssh to the machine and try to CURL the masters URL and also try to run the bootstrap command by hand.
Thereby making sure that your network setup is fine and allowing that connection.

@pre
Copy link

pre commented Nov 17, 2020

@philicious I tried your launch_templates_with_managed_node_groups with custom userdata installing amazon-ssm-agent but I'm getting this error
[..]
Error: error waiting for EKS Node Group (test-eks-lt-zPGZQU0N:test-eks-lt-zPGZQU0N-example-polished-leopard) creation: NodeCreationFailure: Instances failed to join the kubernetes cluster. Resource IDs: [i-0b13057dca7c45b3e]

I am using the default AMI, custom AMI is not enabled

I can replicate this error with custom userdata.

I have noticed that

  • ec2 instances become healthy
  • instances join the Kubernetes cluster
  • instances do not register as healthy to the managed node group (also visible in the AWS Console > EKS > Compute)

I did spin instances successfully up with managed node groups and the default (implicit) launch template. I inspected the default userdata from the AWS Console > EKS > Compute > MNG > Advanced > Userdata and I noticed that the following labels are present there (in the configuration which works):

# Node labels in the default launch template which works
# whatever is in `k8s_labels` +
--node-labels=
eks.amazonaws.com/nodegroup-image=ami-06cfd5b2a2d58e09a,
eks.amazonaws.com/capacityType=ON_DEMAND,
eks.amazonaws.com/sourceLaunchTemplateVersion=5,
eks.amazonaws.com/nodegroup=my-nodegroup-name,
eks.amazonaws.com/sourceLaunchTemplateId=lt-04f5758c3c9507b88

So - in order to get managed node groups with launch template WITH custom userdata requirement, you need to MANUALLY fill in the eks.amazonaws.com/sourceLaunchTemplateVersion, eks.amazonaws.com/nodegroup and eks.amazonaws.com/sourceLaunchTemplateId in order for workers to register to the managed node group.

I played around with this and found out that those three are the minimum to get nodes register properly. However, the default (implicit) launch template also fills eks.amazonaws.com/nodegroup-image and eks.amazonaws.com/capacityType so they might serve a real purpose.

The examples/launch_templates_with_managed_node_groups/templates/userdata.sh.tpl provides a reference but it does not work without the labels mentioned above.

@philicious philicious deleted the launch-template branch November 19, 2020 01:05
@philicious
Copy link
Contributor Author

I personally only tested with MNG with LT with using a custom AMI, which the userdata template is used for. I never tried userdata w/o a custom AMI.

However see #997 (comment) where @davidalger successfully did that. Maybe he has an idea ?

@pre
Copy link

pre commented Nov 19, 2020

I investigated deeper into this. The problem I was facing is related to the merge of userdata done by EKS Managed Node Groups (MNG).

My problem is that I need to pass custom K8s node-labels to the kubelet. Normally you'd be able to do this by just passing --kubelet-extra-args '--node-labels xyz=zyz to eks-bootstrap.sh . This is also how aws-eks-terraform normally does it, but this does not work with MNG:

/etc/eks/bootstrap.sh --b64-cluster-ca '${cluster_auth_base64}' --apiserver-endpoint '${endpoint}' ${bootstrap_extra_args} --kubelet-extra-args "${kubelet_extra_args}" '${cluster_name}'

The problem is that Managed Node Group "merging of userdata" will place a /etc/eks/bootstrap.sh command of its own as the last part of the merged userdata. If you call /etc/eks/bootstrap.sh of your own, the script will have only those parameters which you have provided. In addition: because the userdata is merged, there will be two calls to eks-bootstrap.sh.

The last part of the userdata (provided by MNG) will contain the eks-boostrap.sh line with eks.amazonaws.com parameters (listed in #997 (comment)) which will label the node properly in order for the Managed Node Group to determine that the node has successfully joined the cluster.

Without these eks.amazonaws.com labels, the MNG will display status of "Creation failed" and Terraform will fail with Error: error waiting for EKS Node Group (xxx) creation: NodeCreationFailure: Instances failed to join the kubernetes cluster. Resource IDs: [i-xxx]

Userdata - update EKS Managed Node Group EC2 instances to the newest AWS Kernel

  • Update to AWS Linux 5.5.x, EKS MNG has only 5.4.x at the moment.
  • Reboot after setting the kernel up.
  • Join the cluster with eks-bootstrap.sh after reboot.
  • This is an unpolished example, but it works.
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="//"

--//
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
set -xe

# Install newer Amazon supported kernel
amazon-linux-extras install -y kernel-ng
yum install -y amazon-ssm-agent
yum update -y
 
TOKEN="$(curl -X PUT -H "X-aws-ec2-metadata-token-ttl-seconds: 600" "http://169.254.169.254/latest/api/token")"
INSTANCE_LIFECYCLE="$(curl -H "X-aws-ec2-metadata-token: $TOKEN" -s http://169.254.169.254/latest/meta-data/instance-life-cycle)"
INSTANCE_ID="$(curl -H "X-aws-ec2-metadata-token: $TOKEN" --silent http://169.254.169.254/latest/dynamic/instance-identity/document | jq -r .instanceId -r)"
REGION="$(curl -H "X-aws-ec2-metadata-token: $TOKEN" --silent http://169.254.169.254/latest/dynamic/instance-identity/document | jq -r .region -r)"
LAUNCH_TEMPLATE_VERSION="$(aws ec2 describe-tags --region "$REGION" --filters "Name=resource-id,Values=$INSTANCE_ID" "Name=tag-key,Values=aws:ec2launchtemplate:version" --query 'Tags[0].Value')"
LAUNCH_TEMPLATE_ID="$(aws ec2 describe-tags --region "$REGION" --filters "Name=resource-id,Values=$INSTANCE_ID" "Name=tag-key,Values=aws:ec2launchtemplate:id" --query 'Tags[0].Value')"
NODEGROUP="$(aws ec2 describe-tags --region "$REGION" --filters "Name=resource-id,Values=$INSTANCE_ID" "Name=tag-key,Values=eks:nodegroup-name" --query 'Tags[0].Value')"

# AMI ID is passed by the default MNG launch template, but node joins the cluster without it also. 
# Also as we have just updated the kernel, ami id would need to be queried from somewhere.
# eks.amazonaws.com/nodegroup-image=ami-05cd1e07212dd719a

# TODO: dynamic eks.amazonaws.com/capacityType=ON_DEMAND from INSTANCE_LIFECYCLE
EKS_MNG_LABELS="eks.amazonaws.com/sourceLaunchTemplateVersion=$LAUNCH_TEMPLATE_VERSION,eks.amazonaws.com/capacityType=ON_DEMAND,eks.amazonaws.com/sourceLaunchTemplateId=$LAUNCH_TEMPLATE_ID,eks.amazonaws.com/nodegroup=$NODEGROUP"

# https://github.com/awslabs/amazon-eks-ami/blob/0a96824d7b60d0930c846f5d6841d1c10ff411d2/files/bootstrap.sh#L273
K8S_CLUSTER_DNS_IP=172.20.0.10

# Userdata is only executed at the first boot of an EC2 instance.
# Prepare bootstrap instructions which will be executed at the second boot.
cat >/etc/rc.d/rc.local <<EOF
#!/bin/bash
set -xe

# Bootstrap and join the cluster
# https://github.com/awslabs/amazon-eks-ami/blob/master/files/bootstrap.sh
/etc/eks/bootstrap.sh \
  --b64-cluster-ca '${cluster_auth_base64}' \
  --apiserver-endpoint '${endpoint}' \
  --dns-cluster-ip "$K8S_CLUSTER_DNS_IP" \
  ${bootstrap_extra_args} \
  --kubelet-extra-args '--node-labels=${k8s_labels} --node-labels=$EKS_MNG_LABELS' \
  '${cluster_name}'

touch /var/lock/subsys/local
EOF

chmod +x /etc/rc.d/rc.local
systemctl enable rc-local.service

# Start again with the new kernel
reboot

--//--

Launch template:

  user_data = base64encode(
    data.template_file.launch_template_userdata_osd.rendered
  )

template_file

data "template_file" "launch_template_userdata_osd" {
  template = file("${path.module}/templates/userdata.sh.tpl")

  vars = {
    cluster_name        = var.cluster_name
    endpoint            = module.eks.cluster_endpoint
    cluster_auth_base64 = module.eks.cluster_certificate_authority_data

    bootstrap_extra_args = ""
    k8s_labels   = "node.rdx.net/example-role=example-value"
  }
}

I'm over my head now and I'm no longer sure whether the MNG was able to recognize the nodes before I introduced the reboot. However, if you want a newer kernel with custom k8s node-labels, the reboot and custom eks-bootstrap.sh call is required.

@philicious
Copy link
Contributor Author

Wow @pre , That looks like a lot of hassle you had to do to get this working, thanks for sharing !

to me it seems as if using just a custom AMI would be easier: user packer to easily build a custom AMI with the kernel update baked in. Then have the simple userdata from the examples and have a smooth experience with managed node groups and not having to worry about labels yourself

@pre
Copy link

pre commented Nov 24, 2020

to me it seems as if using just a custom AMI would be easier

Are you aware of how to install a newer kernel from Amazon Linux Extras as an AMI? The Amazon FAQ only tells the amazon-linux-extras install way. I tried searching using the AWS AMI ID browser in AWS Console but maybe I had wrong keywords, I didn't find any AMI for Amazon Linux Extras.

My goal was not to build an AMI of my own and I definitely did not want a third party AMI. Even though the setup above was a hassle, it now provides a newer Linux 5.5 kernel which is managed by Amazon.

I tried using the official AWS Ubuntu image, which also provides Linux 5.5, but it was too different from the Amazon Linux to use without many changes elsewhere.

TL;DR Do you know how to find an AMI ID for an image which has the latest Amazon Linux Extras with a Linux 5.5 kernel?

@pre
Copy link

pre commented Nov 24, 2020

PS. If you want to pass custom Kubernetes Node labels such as --kubelet-extra-args '--node-labels=${k8s_labels}' - you will need custom userdata. Otherwise the nodes are missing the labels which the Managed Node Group expects, and hence, Managed Node Group will fail with "Nodes failed to join the cluster" .

@daenney
Copy link

daenney commented Mar 25, 2021

encrypt

For the life of me I can't figure out what to do with this. I assume I need to create an AWS IAM policy with that document, but then what? Do I need to attach it to a role? Do I need to do a KMS grant?

It would be really useful to have a complete example.

@philicious
Copy link
Contributor Author

philicious commented Mar 25, 2021

@daenney I guess you seen https://github.com/terraform-aws-modules/terraform-aws-eks/tree/master/examples/launch_templates_with_managed_node_groups
and in particular https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/examples/launch_templates_with_managed_node_groups/launchtemplate.tf#L34-L36

so the ebs encryption policy should be added as policy to a KMS key as stated, then that KMS key ARN needs to be set in the LT. by that, your cluster node disks will be encrypted and EKS will be able to actually enc/dec them

resource "aws_kms_key" "this" {
  customer_master_key_spec = "SYMMETRIC_DEFAULT"
  description              = "EKS key"
  enable_key_rotation      = true
  is_enabled               = true
  key_usage                = "ENCRYPT_DECRYPT"

  policy = var.key_policy
}

// optionally give the kms a human-readable name
resource "aws_kms_alias" "this" {
  name          = "alias/eks_key"
  target_key_id = join("", aws_kms_key.this.*.id)
}

@daenney
Copy link

daenney commented Mar 26, 2021

Got it. Thanks a ton! Somehow I had missed the policy attribute when looking at aws_kms_key.

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support launch template & custom ami for managed node groups