Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Karpenter & External residing CMK #2052

Open
1 task done
devops-inthe-east opened this issue Dec 2, 2024 · 2 comments
Open
1 task done

Karpenter & External residing CMK #2052

devops-inthe-east opened this issue Dec 2, 2024 · 2 comments

Comments

@devops-inthe-east
Copy link

devops-inthe-east commented Dec 2, 2024

  • ✋ I have searched the open/closed issues and my issue is not listed.

Please describe your question here

A quite simple problem statement has bugged me lately,

Karpenter is unable to provision node groups with AMI that have the EBS volume encrypted with a CMK in an external account.

As the node get created, however instantaneous get terminated due error message : [Client.InvalidKMSKey.InvalidState]

I followed this AWS Document, that'll help me add permissions to the karpenter-worker-nodes roles. however I still get the same error.

The role file looks like this ::

#IAM role and policy for worker node EC2 instances
resource "aws_iam_role" "eks_worker_node_role" {
  name = "${var.cluster_name}-workernode-role"

  assume_role_policy = jsonencode({
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "ec2.amazonaws.com"
      }
    }]
    Version = "2012-10-17"
  })
}

resource "aws_iam_role_policy_attachment" "eks_AmazonEKSWorkerNodePolicy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
  role       = aws_iam_role.eks_worker_node_role.name
}

resource "aws_iam_role_policy_attachment" "eks_AmazonEKS_CNI_Policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
  role       = aws_iam_role.eks_worker_node_role.name
}

resource "aws_iam_role_policy_attachment" "eks_AmazonEC2ContainerRegistryReadOnly" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
  role       = aws_iam_role.eks_worker_node_role.name
}

resource "aws_iam_role_policy_attachment" "eks_CloudWatchAgentServerPolicy" {
  policy_arn = "arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy"
  role       = aws_iam_role.eks_worker_node_role.name
}

resource "aws_iam_role_policy_attachment" "eks_AmazonEC2RoleforSSM" {
  policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonEC2RoleforSSM"
  role       = aws_iam_role.eks_worker_node_role.name
}

resource "aws_iam_role_policy_attachment" "eks_AmazonS3ReadOnlyAccess" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess"
  role       = aws_iam_role.eks_worker_node_role.name
}

resource "aws_iam_role_policy_attachment" "eks_AmazonSSMManagedInstanceCore" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
  role       = aws_iam_role.eks_worker_node_role.name
}

resource "aws_iam_instance_profile" "worker_node_instances_profile" {
  name = "${var.cluster_name}-instance-profile"
  role = aws_iam_role.eks_worker_node_role.name
}


data "aws_eks_cluster" "cluster" {
  name = var.cluster_name
}

locals {
  worker_node_role_arn = aws_iam_role.eks_worker_node_role.arn
}

data "aws_iam_policy_document" "karpenter_controller_assume_role_policy" {
  statement {
    actions = ["sts:AssumeRoleWithWebIdentity"]
    effect  = "Allow"
    condition {
      test     = "StringEquals"
      variable = "${replace(data.aws_eks_cluster.cluster.identity[0].oidc[0].issuer, "https://", "")}:sub"
      values   = ["system:serviceaccount:karpenter:karpenter"]
    }  
    condition {
      test     = "StringEquals"
      variable = "${replace(data.aws_eks_cluster.cluster.identity[0].oidc[0].issuer, "https://", "")}:aud"
      values   = ["sts.amazonaws.com"]
    }
    principals {
      identifiers = ["arn:aws:iam::${var.tenantaccount}:oidc-provider/${replace(data.aws_eks_cluster.cluster.identity[0].oidc[0].issuer, "https://", "")}"]
      type        = "Federated"
    }
  }
}

resource "aws_iam_role" "karpenter_controller" {
  assume_role_policy = data.aws_iam_policy_document.karpenter_controller_assume_role_policy.json
  name               = "${var.cluster_name}-KarpenterRole"
}

resource "aws_iam_policy" "karpenter_controller" {
  name        = "${var.cluster_name}-KarpenterPolicy"
  description = "Karpenter controller policy for autoscaling"
  policy = <<EOF
{
    "Statement": [
        {
            "Action": [
                "ssm:GetParameter",
                "ec2:DescribeImages",
                "ec2:RunInstances",
                "ec2:DescribeSubnets",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeLaunchTemplates",
                "ec2:DescribeInstances",
                "ec2:DescribeInstanceTypes",
                "ec2:DescribeInstanceTypeOfferings",
                "ec2:DescribeAvailabilityZones",
                "ec2:DeleteLaunchTemplate",
                "ec2:CreateTags",
                "ec2:CreateLaunchTemplate",
                "ec2:CreateFleet",
                "ec2:DescribeSpotPriceHistory",
                "pricing:GetProducts",
                "kms:Encrypt",
                "kms:Decrypt",
                "kms:ReEncrypt*",
                "kms:GenerateDataKey*",
                "kms:DescribeKey"
                
            ],
            "Effect": "Allow",
            "Resource": "*",
            "Sid": "Karpenter"
        },
        {
            "Action": "ec2:TerminateInstances",
            "Condition": {
                "StringLike": {
                    "ec2:ResourceTag/karpenter.sh/nodepool": "*"
                }
            },
            "Effect": "Allow",
            "Resource": "*",
            "Sid": "ConditionalEC2Termination"
        },
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "${local.worker_node_role_arn}",
            "Sid": "PassNodeIAMRole"
        },
        {
            "Effect": "Allow",
            "Action": "eks:DescribeCluster",
            "Resource": "arn:aws:eks:${var.region}:${var.tenantaccount}:cluster/${var.cluster_name}",
            "Sid": "EKSClusterEndpointLookup"
        },
        {
            "Sid": "AllowScopedInstanceProfileCreationActions",
            "Effect": "Allow",
            "Resource": "*",
            "Action": [
            "iam:CreateInstanceProfile"
            ],
            "Condition": {
            "StringEquals": {
                "aws:RequestTag/kubernetes.io/cluster/${var.cluster_name}": "owned",
                "aws:RequestTag/topology.kubernetes.io/region": "${var.region}"
            },
            "StringLike": {
                "aws:RequestTag/karpenter.k8s.aws/ec2nodeclass": "*"
            }
            }
        },
        {
            "Sid": "AllowScopedInstanceProfileTagActions",
            "Effect": "Allow",
            "Resource": "*",
            "Action": [
            "iam:TagInstanceProfile"
            ],
            "Condition": {
            "StringEquals": {
                "aws:ResourceTag/kubernetes.io/cluster/${var.cluster_name}": "owned",
                "aws:ResourceTag/topology.kubernetes.io/region": "${var.region}",
                "aws:RequestTag/kubernetes.io/cluster/${var.cluster_name}": "owned",
                "aws:RequestTag/topology.kubernetes.io/region": "${var.region}"
            },
            "StringLike": {
                "aws:ResourceTag/karpenter.k8s.aws/ec2nodeclass": "*",
                "aws:RequestTag/karpenter.k8s.aws/ec2nodeclass": "*"
            }
            }
        },
        {
            "Sid": "AllowScopedInstanceProfileActions",
            "Effect": "Allow",
            "Resource": "*",
            "Action": [
            "iam:AddRoleToInstanceProfile",
            "iam:RemoveRoleFromInstanceProfile",
            "iam:DeleteInstanceProfile",
            "iam:CreateServiceLinkedRole",
            "iam:ListRoles",
            "iam:ListInstanceProfiles"
            ],
            "Condition": {
            "StringEquals": {
                "aws:ResourceTag/kubernetes.io/cluster/${var.cluster_name}": "owned",
                "aws:ResourceTag/topology.kubernetes.io/region": "${var.region}"
            },
            "StringLike": {
                "aws:ResourceTag/karpenter.k8s.aws/ec2nodeclass": "*"
            }
            }
        },
        {
            "Sid": "AllowInstanceProfileReadActions",
            "Effect": "Allow",
            "Resource": "*",
            "Action": "iam:GetInstanceProfile"
        }
    ],
    "Version": "2012-10-17"
  }
  EOF
}

resource "aws_iam_role_policy_attachment" "karpenter_controller_attach" {
depends_on = [aws_iam_policy.karpenter_controller, aws_iam_role.karpenter_controller]
role = aws_iam_role.karpenter_controller.name
policy_arn = aws_iam_policy.karpenter_controller.arn
}

`


Few Qs,

I currently use following seq to provision my nodegroup

  1. EKS Control provisioned by Terraform.
  2. Karpenter Pods with Fargate Profile will provision the nodegroup referencing the AMI.
  3. NodeGroup is provisioned with CoreDNS the first pod to get placed.

A grant is been created on the Key owner account & cross-account-kms on the consumer account

I was curious to know, if there are any other piece of infra need to integrated so that Karpenter can create nodegroups from AMIs that have encrypted snapshot with an external residing CMK.

@devops-inthe-east
Copy link
Author

There is one thing that I missed out on, I was able it work by creating a 'new' Grant in the account that is consuming the KMS for the autoscaling role.

Support engineers indicate that this should be a one time activity for when a new autoscaling role get created.

@devops-inthe-east
Copy link
Author

devops-inthe-east commented Dec 3, 2024

The updated iam.tf will have the following lines of permission to add the CreateGrant action.


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowCreationOfGrantForTheKMSKeyinExternalAccount444455556666",
      "Effect": "Allow",
      "Action": "kms:CreateGrant",
      "Resource": "arn:aws:kms:us-west-2:444455556666:key/1a2b3c4d-5e6f-1a2b-3c4d-5e6f1a2b3c4d"
    }
  ]
}

The awscli cmd :


aws kms create-grant \
  --region us-west-2 \
  --key-id arn:aws:kms:us-west-2:444455556666:key/1a2b3c4d-5e6f-1a2b-3c4d-5e6f1a2b3c4d \
  --grantee-principal arn:aws:iam::111122223333:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling \
  --operations "Encrypt" "Decrypt" "ReEncryptFrom" "ReEncryptTo" "GenerateDataKey" "GenerateDataKeyWithoutPlaintext" "DescribeKey" "CreateGrant" 



However what bugs me is the fact that the CT logs indicate the my full admin role as 'username' making the 'CreateGrant' API call.

When it should be the autoscaling group role making the trigger.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant