Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error deleting security group: DependencyViolation #2048

Closed
kaykhancheckpoint opened this issue Apr 29, 2022 · 5 comments
Closed

Error deleting security group: DependencyViolation #2048

kaykhancheckpoint opened this issue Apr 29, 2022 · 5 comments
Labels

Comments

@kaykhancheckpoint
Copy link

kaykhancheckpoint commented Apr 29, 2022

Description

I've been unable to cleanly destroy the kubernetes cluster created by this module. Having created and destroyed the cluster multiple times ive always experienced the issue where 3 security groups have not been deleted because of a dependency violation.

  • Module version [Required]: latest

  • Terraform version:

Terraform v1.1.9

`- Provider version(s):

+ provider registry.terraform.io/hashicorp/aws v4.12.0
+ provider registry.terraform.io/hashicorp/cloudinit v2.2.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.11.0
+ provider registry.terraform.io/hashicorp/tls v3.3.0

Reproduction Code [Required]

locals {
  workspace       = terraform.workspace
  name            = "${var.org_name}-${terraform.workspace}-k8s"
  cluster_version = "1.22"
  region          = var.region

  tags = {
    Terraform   = "true"
    Environment = local.workspace
    ClusterName = local.name
  }
}

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id, "--profile", "terraformtest"]
  }
}

module "eks" {
  source                          = "terraform-aws-modules/eks/aws"
  cluster_name                    = local.name
  cluster_version                 = local.cluster_version
  cluster_endpoint_private_access = true
  cluster_endpoint_public_access  = true
  cluster_enabled_log_types       = ["audit", "api", "authenticator", "controllerManager", "scheduler"]
  cluster_ip_family               = "ipv4"

  cluster_addons = {
    coredns = {
      resolve_conflicts = "OVERWRITE"
    }
    kube-proxy = {}
    vpc-cni = {
      resolve_conflicts        = "OVERWRITE"
      service_account_role_arn = module.vpc_cni_irsa.iam_role_arn
    }
    aws-ebs-csi-driver = {
      resolve_conflicts = "OVERWRITE"
    }

  }

  vpc_id                    = data.terraform_remote_state.networking.outputs.vpc_id
  subnet_ids                = concat(data.terraform_remote_state.networking.outputs.public_subnets, data.terraform_remote_state.networking.outputs.private_subnets)
  manage_aws_auth_configmap = true

  # Extend cluster security group rules
  cluster_security_group_additional_rules = {
    egress_nodes_ephemeral_ports_tcp = {
      description                = "To node 1025-65535"
      protocol                   = "tcp"
      from_port                  = 1025
      to_port                    = 65535
      type                       = "egress"
      source_node_security_group = true
    }
  }
  cluster_additional_security_group_ids = []

  # Extend node-to-node security group rules
  node_security_group_additional_rules = {
    ingress_self_all = {
      description = "Node to node all ports/protocols"
      protocol    = "-1"
      from_port   = 0
      to_port     = 0
      type        = "ingress"
      self        = true
    }
    egress_all = {
      description = "Node all egress"
      protocol    = "-1"
      from_port   = 0
      to_port     = 0
      type        = "egress"
      cidr_blocks = ["0.0.0.0/0"]
    }
  }

  node_security_group_tags = {
    "kubernetes.io/cluster/${local.name}" = null
  }

  eks_managed_node_group_defaults = {
    iam_role_attach_cni_policy            = true
    attach_cluster_primary_security_group = true

    key_name               = data.terraform_remote_state.networking.outputs.geeiq_key_pair_key_name
    vpc_security_group_ids = [data.terraform_remote_state.networking.outputs.remote_access_sg_id]

    block_device_mappings = {
      xvda = {
        device_name = "/dev/xvda"
        ebs = {
          volume_size           = 250
          volume_type           = "gp3"
          delete_on_termination = true
        }
      }
    }


  }

  eks_managed_node_groups = {
    worker_group = {
      ami_type       = "AL2_x86_64"
      name           = "worker-group"
      min_size       = 1
      max_size       = 10
      desired_size   = 1
      instance_types = ["m5.large"]
      capacity_type  = "ON_DEMAND"
      ebs_optimized  = true
      labels = {
        "geeiq/node-type" = "worker"
        Environment       = local.workspace
      }

      tags = local.tags
    },
    ops_group = {
      ami_type       = "AL2_x86_64"
      name           = "ops-group"
      min_size       = 1
      max_size       = 10
      desired_size   = 1
      instance_types = ["m5.large"]
      capacity_type  = "ON_DEMAND"
      ebs_optimized  = true
      labels = {
        "geeiq/node-type" = "ops"
        Environment       = local.workspace
      }

      tags = local.tags
    },
    cron_group = {
      ami_type       = "AL2_x86_64"
      name           = "cron-group"
      min_size       = 1
      max_size       = 10
      desired_size   = 1
      instance_types = ["m5.large"]
      capacity_type  = "ON_DEMAND"
      ebs_optimized  = true
      labels = {
        "geeiq/node-type" = "cron"
        Environment       = local.workspace
      }

      tags = local.tags
    },


  }

  aws_auth_roles = []

  aws_auth_users = []

  aws_auth_accounts = ["<redacted>"]


}

module "vpc_cni_irsa" {
  source  = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
  version = "~> 4.12"

  role_name_prefix      = "VPC-CNI-IRSA"
  attach_vpc_cni_policy = true
  vpc_cni_enable_ipv4   = true

  oidc_providers = {
    main = {
      provider_arn               = module.eks.oidc_provider_arn
      namespace_service_accounts = ["kube-system:aws-node"]
    }
  }

  tags = local.tags
}

Steps to reproduce the behavior:

terraform workspace new test
terraform apply
terraform destroy

Expected behavior

I expect the cluster and all related resource created by this module to be destroyed.

Actual behavior

terraform destroy method eventually fails with the following errors:

│ Error: Error deleting security group: DependencyViolation: resource sg-065ad5c1c3e5581d0 has a dependent object
│   status code: 400, request id: 269fdc4c-cf59-4f0e-8eb8-6a8c2b4a5bce
│
│
╵
╷
│ Error: Error deleting security group: DependencyViolation: resource sg-0e3585c15decb2ae4 has a dependent object
│   status code: 400, request id: e0162a95-d896-48da-bc68-17726a5faf85
│
│
╵
╷
│ Error: Error deleting security group: DependencyViolation: resource sg-07cf0d71470046ce7 has a dependent object
│   status code: 400, request id: 33eed905-3ac1-4b55-8b84-3064da4db630

Terminal Output Screenshot(s)

image

Additional context

sgs

image

image

image

If i attempt to delete one of the security groups manually in the aws console, i get the following error telling me there is a network interface still associated...

image
image

if i attempt to run terraform destory again you can see it is again trying to remove the same 3 sg that failed and continue failing wiht the same problem

terraform destroy -var-file="test.tfvars"
module.eks.module.eks_managed_node_group["cron_group"].aws_security_group.this[0]: Refreshing state... [id=sg-065ad5c1c3e5581d0]
module.eks.module.eks_managed_node_group["worker_group"].aws_security_group.this[0]: Refreshing state... [id=sg-07cf0d71470046ce7]
module.eks.aws_security_group.node[0]: Refreshing state... [id=sg-0e3585c15decb2ae4]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the
following symbols:
  - destroy

Terraform will perform the following actions:

  # module.eks.aws_security_group.node[0] will be destroyed
  - resource "aws_security_group" "node" {
      - arn                    = "arn:aws:ec2:eu-west-2:<redacted>:security-group/sg-0e3585c15decb2ae4" -> null
      - description            = "EKS node shared security group" -> null
      - egress                 = [] -> null
      - id                     = "sg-0e3585c15decb2ae4" -> null
      - ingress                = [] -> null
      - name                   = "geeiq-test-k8s-node-20220429141745238200000007" -> null
      - name_prefix            = "geeiq-test-k8s-node-" -> null
      - owner_id               = "<redacted>" -> null
      - revoke_rules_on_delete = false -> null
      - tags                   = {
          - "Name" = "geeiq-test-k8s-node"
        } -> null
      - tags_all               = {
          - "Name" = "geeiq-test-k8s-node"
        } -> null
      - vpc_id                 = "vpc-0b713bee7978b870b" -> null
    }

  # module.eks.module.eks_managed_node_group["cron_group"].aws_security_group.this[0] will be destroyed
  - resource "aws_security_group" "this" {
      - arn                    = "arn:aws:ec2:eu-west-2:<redacted>:security-group/sg-065ad5c1c3e5581d0" -> null
      - description            = "EKS managed node group security group" -> null
      - egress                 = [] -> null
      - id                     = "sg-065ad5c1c3e5581d0" -> null
      - ingress                = [] -> null
      - name                   = "cron-group-eks-node-group-20220429141745238100000006" -> null
      - name_prefix            = "cron-group-eks-node-group-" -> null
      - owner_id               = "<redated>" -> null
      - revoke_rules_on_delete = false -> null
      - tags                   = {
          - "ClusterName" = "geeiq-test-k8s"
          - "Environment" = "test"
          - "Name"        = "cron-group-eks-node-group"
          - "Terraform"   = "true"
        } -> null
      - tags_all               = {
          - "ClusterName" = "geeiq-test-k8s"
          - "Environment" = "test"
          - "Name"        = "cron-group-eks-node-group"
          - "Terraform"   = "true"
        } -> null
      - vpc_id                 = "vpc-0b713bee7978b870b" -> null
    }

  # module.eks.module.eks_managed_node_group["worker_group"].aws_security_group.this[0] will be destroyed
  - resource "aws_security_group" "this" {
      - arn                    = "arn:aws:ec2:eu-west-2:<redacted>:security-group/sg-07cf0d71470046ce7" -> null
      - description            = "EKS managed node group security group" -> null
      - egress                 = [] -> null
      - id                     = "sg-07cf0d71470046ce7" -> null
      - ingress                = [] -> null
      - name                   = "worker-group-eks-node-group-20220429141745237900000004" -> null
      - name_prefix            = "worker-group-eks-node-group-" -> null
      - owner_id               = "<redacted>" -> null
      - revoke_rules_on_delete = false -> null
      - tags                   = {
          - "ClusterName" = "geeiq-test-k8s"
          - "Environment" = "test"
          - "Name"        = "worker-group-eks-node-group"
          - "Terraform"   = "true"
        } -> null
      - tags_all               = {
          - "ClusterName" = "geeiq-test-k8s"
          - "Environment" = "test"
          - "Name"        = "worker-group-eks-node-group"
          - "Terraform"   = "true"
        } -> null
      - vpc_id                 = "vpc-0b713bee7978b870b" -> null
    }

Plan: 0 to add, 0 to change, 3 to destroy.

Changes to Outputs:
  - cloudwatch_log_group_name                        = "/aws/eks/geeiq-test-k8s/cluster" -> null
  - cluster_identity_providers                       = {} -> null
  - fargate_profiles                                 = {} -> null
  - node_security_group_arn                          = "arn:aws:ec2:eu-west-2:<redacted>:security-group/sg-0e3585c15decb2ae4" -> null
  - node_security_group_id                           = "sg-0e3585c15decb2ae4" -> null
  - self_managed_node_groups                         = {} -> null
  - self_managed_node_groups_autoscaling_group_names = [] -> null

Do you really want to destroy all resources in workspace "test"?
  Terraform will destroy all your managed infrastructure, as shown above.
  There is no undo. Only 'yes' will be accepted to confirm.

Ofcourse ive been able to clean this up by deleting the network interface manually. However i don't understand why its not able to do that, this terraform cluster is always created in a clean environment in a new region with a clean workspace. Is there something unusual about my config in particular that might cause this?

@bryantbiggs
Copy link
Member

You are most likely facing aws/amazon-vpc-cni-k8s#1223

@kaykhancheckpoint
Copy link
Author

Okay so it looks like we have to wait until they release 1.11.1 for this to be resolved?

@bryantbiggs
Copy link
Member

I don't know when it will be fixed, but it is a known issue. Your best bet is to ensure all cluster workloads are removed first *EXCEPT for the VPC CNI before tearing down the cluster

@bryantbiggs
Copy link
Member

closing for now since this is not something the module can solve

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants