Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specifying iam_path, causes node access issues in aws-auth config map. #1595

Closed
leanrobot opened this issue Sep 22, 2021 · 7 comments
Closed

Comments

@leanrobot
Copy link

Description

Setting up an EKS cluster at my company. I noticed that if I specify the iam_path input variable for the eks module, the first apply succeeds, but a subsequent apply will cause the node's IAM role mapping to be removed from the config-map, and replaced with one that does not include the proper IAM path in the ARN.

This causes the node's group health to become degraded in the EKS console.

By removing iam_path from the input parameters, the module behaves as expected for first and all subsequent applies.

Versions

  • Terraform:
Terraform v1.0.7
on darwin_amd64
+ provider registry.terraform.io/gavinbunney/kubectl v1.11.3
+ provider registry.terraform.io/hashicorp/aws v3.59.0
+ provider registry.terraform.io/hashicorp/cloudinit v2.2.0
+ provider registry.terraform.io/hashicorp/helm v2.3.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.5.0
+ provider registry.terraform.io/hashicorp/local v2.1.0
+ provider registry.terraform.io/terraform-aws-modules/http v2.4.1
  • Module: terraform-aws-modules/eks/aws 17.20.0

Reproduction

Steps to reproduce the behavior:
Workspace: default
Cleared Cache: yes

  • Wrote module configuration and applied.
  • Cluster works correctly.
  • Every following apply causes the aws-auth map_roles to be constantly updated, the entry for the worker node.

Code Snippet to Reproduce

// based on: https://github.com/hashicorp/learn-terraform-provision-eks-cluster
// docs: https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest
module "eks" {
  version = ">= 17.0.0, < 18.0.0"

  source           = "terraform-aws-modules/eks/aws"
  cluster_name     = "test-cluster"
  cluster_version  = "1.20"
  write_kubeconfig = false

  vpc_id  = aws_vpc.main.id
  subnets = [for zone, subnet in aws_subnet.list : subnet.id]

  cluster_endpoint_public_access  = true
  cluster_endpoint_public_access_cidrs = var.endpoint_ip_cidrs

  cluster_endpoint_private_access = true
  cluster_endpoint_private_access_cidrs = [ for zone, cidr in local.subnet_data: cidr ]

  # iam_path  = "/${local.cluster_name}/eks/" # adding causes issue
  map_roles = concat(
    var.iam_role_mapping,
    [
      # { # adding this fixes the issue, but is not ideal, as it can only be added after second apply to cluster.
      #   groups = [
      #     "system:bootstrappers",
      #     "system:nodes",
      #   ]
      #   rolearn = data.aws_iam_role.nodes.arn
      #   username = "system:node:{{EC2PrivateDNSName}}"
      # },
      {
        rolearn  = "arn:aws:iam::xxxxxx:role/JC-User" # censored account id
        groups   = [ "system:masters" ]
        username = "JC-User"
      },
    ],
  )

  map_users = var.iam_user_mapping

  node_groups_defaults = {
    root_volume_type = "gp2"
    key_name         = aws_key_pair.main.id
    additional_tags  = {
      "Name" = "${local.resource_prefix}-nodes"
    }
  }

  node_groups = {
    primary = {
      name                          = "${local.resource_prefix}-nodes"
      instance_type                 = var.eks_worker_instance_type
      asg_min_size                  = var.eks_asg_min_size
      asg_max_size                  = var.eks_asg_max_size
      asg_desired_capacity          = var.eks_asg_desired_size

      tags = {
        "Name" = "${local.cluster_name}-nodes"
      }
    }
  }
}

data "aws_iam_role" "nodes" { # used for workaround until I discovered removing IAM path fixed.
  name = module.eks.worker_iam_role_name
}

# EC2.tf =======================================================================
// allow access to worker nodes.
resource "aws_security_group_rule" "private_ingress" {
  type              = "ingress"
  from_port         = -1
  to_port           = -1
  protocol          = "ALL"
  cidr_blocks       = var.endpoint_ip_cidrs
  security_group_id = module.eks.cluster_primary_security_group_id
}

resource "aws_key_pair" "main" {
  key_name = "${local.cluster_name}"
  public_key = var.node_ssh_public_key
}



# vpc.tf =======================================================================
resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name = local.resource_prefix
  }
}

resource "aws_subnet" "list" {
  for_each = local.subnet_data

  availability_zone = each.key
  cidr_block        = each.value
  vpc_id            = aws_vpc.main.id

  map_public_ip_on_launch = true

  tags = {
    Name = "${local.resource_prefix}-${each.key}"
  }
}

resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "${local.resource_prefix}-gw"
  }
}

resource "aws_route_table" "main" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }

  tags = {
    Name = "${local.resource_prefix}-rt"
  }
}

resource "aws_route_table_association" "eks" {
  for_each = local.subnet_data

  subnet_id = aws_subnet.list[each.key].id
  route_table_id = aws_route_table.main.id
}

# locals.tf ====================================================================
locals {
  base_name = var.cluster_name_prefix
  cluster_name = "${local.base_name}-cluster"
  resource_prefix = "${local.base_name}-eks"

  subnet_data = {
    for index, zone in var.availability_zones: zone => "10.0.${0 + index}.0/24"
  }
}

# variables.tf =================================================================
# EC2
variable "region" {
  description = "The AWS region to create the cluster in."
  type        = string
}

variable "availability_zones" {
  description = "EC2 availability zones where K8S worker nodes will be launched."
  type        = list(string)
}

# SECURITY/PERMISSIONS
variable "node_ssh_public_key" {
  type = string
  description = "public key for ssh access into K8S nodes."
}

variable "endpoint_ip_cidrs" {
  type = list(string)
  description = "list of allow IP CIDRs for full network access, including EKS API endpoints and SSH for nodes."
}

variable "public_ip_cidrs" {
  type = list(string)
  description = "IP CIDRs allowed to access public endpoints and ports for the cluster."
}

# EKS NODES CONFIG
variable "eks_worker_instance_type" {
  type = string
}

# auto scaling group settings
variable "eks_asg_min_size" {
  type = number
}
variable "eks_asg_max_size" {
  type = number
}
variable "eks_asg_desired_size" {
  type = number
}

# providers.tf =================================================================
provider "aws" {
  profile = var.aws_profile
  region = var.aws_region
  allowed_account_ids = []
}

provider "kubernetes" {
  host = module.internal_ranges_cluster.cluster_endpoint
  cluster_ca_certificate = base64decode(module.internal_ranges_cluster.cluster_ca_data)
  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    args = [
      "eks",
      "get-token",
      "--cluster-name",
      module.eks.cluster_id,
      "--profile",
      var.aws_profile,
      "--region",
      var.aws_region,
    ]
  }
}

# versions.tf ==================================================================
terraform {
  required_providers {
    aws = {
      source = "hashicorp/aws"
      version = ">= 3.0, < 4.0"
    }
  }

  # TERRAFORM REQUIRED VERSION, use https://github.com/tfutils/tfenv to manage installations
  required_version = "~> 1.0"
}

Expected behavior

I expected that the correct iam role -> cluster role mapping would be set up to allow the control plane and node group to communicate.

Actual behavior

Nodes would enter a degraded state unless I did either of the following:

  • Removed the iam_path input to the module
  • Explicitly specified the iam role -> cluster role mapping in the map_roles input.

Terminal Output Screenshot(s)

Screen Shot 2021-09-21 at 4 44 46 PM

@daroga0002
Copy link
Contributor

looks that this is in

# Work around https://github.com/kubernetes-sigs/aws-iam-authenticator/issues/153
# Strip the leading slash off so that Terraform doesn't think it's a regex
rolearn = replace(role["worker_role_arn"], replace(var.iam_path, "/^//", ""), "")

and there is comment about kubernetes-sigs/aws-iam-authenticator#153

I dont know does this is still related or not. We will need to investigate this deeper

@antonbabenko
Copy link
Member

@daroga0002 Please use labels to reflect the status. Someone else from the community may be able to help if they see it. For e.g., needs triage or help wanted.

@joanayma
Copy link

joanayma commented Oct 1, 2021

@leanrobot can you test if #1524 fixes also your issue? I think it's the same.

@leanrobot
Copy link
Author

@joanayma Hi Joan, I read through #1524 but was unclear how I should test it for myself to see if it addresses my issue.

@github-actions
Copy link

This issue has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this issue will be closed in 10 days

@github-actions github-actions bot added the stale label Nov 17, 2021
@github-actions
Copy link

This issue was automatically closed because of stale in 10 days

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants