Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[aws-eks] EKS - 1.18.0 - Configuration changes to Cluster which require replacement create cluster with random name causing CF Stack to be inconsistent #5259

Closed
febus982 opened this issue Nov 29, 2019 · 1 comment · Fixed by #5540
Assignees
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug. language/python Related to Python bindings p1

Comments

@febus982
Copy link

febus982 commented Nov 29, 2019

Replacing a cluster with specified name cause the new cluster to be created with a random name. Trying to delete the stack fails throwing a ResourceNotFoundException

Reproduction Steps

before:

        Cluster(
            self,
            cluster_name,
            cluster_name="dev-EKS-Cluster"
            vpc=vpc,
            version=kubernetes_version,
            default_capacity=0,
            masters_role=cluster_admin_role,
            vpc_subnets=[SubnetSelection(subnet_type=SubnetType.PRIVATE)],
        )

after:

        Cluster(
            self,
            cluster_name,
            cluster_name="dev-EKS-Cluster"
            vpc=vpc,
            version=kubernetes_version,
            default_capacity=0,
            masters_role=cluster_admin_role,
            vpc_subnets=[SubnetSelection(subnet_type=SubnetType.PRIVATE), SubnetSelection(subnet_type=SubnetType.PUBLIC)],
        )

Error Log

The update applies correctly, but the created cluster name is cluster-RandomAlphaNumericString

Trying to delete the stack causes this:

  32 | 3:06:09 PM | DELETE_FAILED        | Custom::AWSCDK-EKS-Cluster            | dev-EKS-Cluster/Resource/Resource/Default (devEKSCluster5CC73604) Failed to delete resource. An error occurred (ResourceNotFoundException) when calling the DeleteCluster operation: No cluster found for name: dev-EKS-Cluster.
	new CustomResource (/tmp/jsii-kernel-MirCRQ/node_modules/@aws-cdk/aws-cloudformation/lib/custom-resource.js:56:25)
	\_ new ClusterResource (/tmp/jsii-kernel-MirCRQ/node_modules/@aws-cdk/aws-eks/lib/cluster-resource.js:46:26)
	\_ new Cluster (/tmp/jsii-kernel-MirCRQ/node_modules/@aws-cdk/aws-eks/lib/cluster.js:73:24)
	\_ /home/circleci/.local/share/virtualenvs/cdk_app--h4ksco0/lib/python3.7/site-packages/jsii/_embedded/jsii/jsii-runtime.js:7749:49
	\_ Kernel._wrapSandboxCode (/home/circleci/.local/share/virtualenvs/cdk_app--h4ksco0/lib/python3.7/site-packages/jsii/_embedded/jsii/jsii-runtime.js:8202:20)
	\_ Kernel._create (/home/circleci/.local/share/virtualenvs/cdk_app--h4ksco0/lib/python3.7/site-packages/jsii/_embedded/jsii/jsii-runtime.js:7749:26)
	\_ Kernel.create (/home/circleci/.local/share/virtualenvs/cdk_app--h4ksco0/lib/python3.7/site-packages/jsii/_embedded/jsii/jsii-runtime.js:7503:21)
	\_ KernelHost.processRequest (/home/circleci/.local/share/virtualenvs/cdk_app--h4ksco0/lib/python3.7/site-packages/jsii/_embedded/jsii/jsii-runtime.js:7293:28)
	\_ KernelHost.run (/home/circleci/.local/share/virtualenvs/cdk_app--h4ksco0/lib/python3.7/site-packages/jsii/_embedded/jsii/jsii-runtime.js:7233:14)
	\_ Immediate._onImmediate (/home/circleci/.local/share/virtualenvs/cdk_app--h4ksco0/lib/python3.7/site-packages/jsii/_embedded/jsii/jsii-runtime.js:7236:37)

Environment

  • **CLI Version :1.18.0
  • **Framework Version:1.18.0
  • **OS :Linux
  • **Language :python

Other


This is 🐛 Bug Report

@febus982 febus982 added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Nov 29, 2019
@SomayaB SomayaB added @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service language/python Related to Python bindings labels Nov 29, 2019
@eladb
Copy link
Contributor

eladb commented Nov 30, 2019

This is indeed a bug. Updates that require replacement should not be allowed for CloudFormation resources that have explicit physical names. This is because it is impossible to create a new resource with the same name before deleting the old resource (which is how CloudFormation implements replacements).

If you want to use the same explicit physical name for the new cluster, you will have to first rename the old cluster and then create a new cluster with the updated configuration.

@SomayaB SomayaB removed the needs-triage This issue or PR still needs to be triaged. label Dec 2, 2019
@eladb eladb added the p1 label Dec 11, 2019
eladb pushed a commit that referenced this issue Dec 30, 2019
There were two causes of timeouts for EKS cluster creation: create time which is longer than the AWS Lambda timeout (15min) and lack of retry when applying kubectl after the cluster has been created.

The change fixes the first issue by leveraging the custom resource provider framework to implement the cluster resource as an async resource. The custom resource providers are now bundled as nested stacks so they don't take up too many resources from users, and are also reused by multiple clusters within the same stack. This required that the creation role will not be the same as the lambda role, so we define this role separately and assume it within the providers.

The second issue is fixed by adding 3 retries to "kubectl apply".

**Backwards compatibility**: as described in #5544, since the resource provider handler of `Cluster` and `KubernetesResource` has been changed, this change requires a replacement of existing clusters (deployment fails with "service token cannot be changed" error). Since this can be disruptive to users, this change includes an exact copy of the previous version under a new module called `@aws-cdk/aws-eks-legacy`, which can be used as a drop-in replacement until users decide to upgrade to the new version. Using the legacy cluster will emit a synthesis warning that this module will no longer be released as part of the CDK starting March 1st, 2020.

- Fixes #4087
- Fixes #4695
- Fixes #5259
- Fixes #5501

---

BREAKING CHANGE: (in experimental module) the providers behind the AWS EKS module have been rewritten to address multiple stability issues. Since this change requires cluster replacement, the old version of this module is available under `@aws-cdk/aws-eks-legacy`. Please read #5544 carefully for upgrade instructions.
@mergify mergify bot closed this as completed in #5540 Dec 30, 2019
mergify bot added a commit that referenced this issue Dec 30, 2019
There were two causes of timeouts for EKS cluster creation: create time which is longer than the AWS Lambda timeout (15min) and lack of retry when applying kubectl after the cluster has been created.

The change fixes the first issue by leveraging the custom resource provider framework to implement the cluster resource as an async resource. The custom resource providers are now bundled as nested stacks so they don't take up too many resources from users, and are also reused by multiple clusters within the same stack. This required that the creation role will not be the same as the lambda role, so we define this role separately and assume it within the providers.

The second issue is fixed by adding 3 retries to "kubectl apply".

**Backwards compatibility**: as described in #5544, since the resource provider handler of `Cluster` and `KubernetesResource` has been changed, this change requires a replacement of existing clusters (deployment fails with "service token cannot be changed" error). Since this can be disruptive to users, this change includes an exact copy of the previous version under a new module called `@aws-cdk/aws-eks-legacy`, which can be used as a drop-in replacement until users decide to upgrade to the new version. Using the legacy cluster will emit a synthesis warning that this module will no longer be released as part of the CDK starting March 1st, 2020.

- Fixes #4087
- Fixes #4695
- Fixes #5259
- Fixes #5501

---

BREAKING CHANGE: (in experimental module) the providers behind the AWS EKS module have been rewritten to address multiple stability issues. Since this change requires cluster replacement, the old version of this module is available under `@aws-cdk/aws-eks-legacy`. Please read #5544 carefully for upgrade instructions.

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
@iliapolo iliapolo changed the title EKS - 1.18.0 - Configuration changes to Cluster which require replacement create cluster with random name causing CF Stack to be inconsistent [aws-eks] EKS - 1.18.0 - Configuration changes to Cluster which require replacement create cluster with random name causing CF Stack to be inconsistent Aug 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug. language/python Related to Python bindings p1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants