Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(eks): in place updates for EKS security group and Subnets #30114

Merged
merged 7 commits into from
May 10, 2024

Conversation

mrlikl
Copy link
Contributor

@mrlikl mrlikl commented May 8, 2024

Issue # (if applicable)

Closes #28584

Reason for this change

To have in place updates for EKS clusters when subnets or SG values are changed.

Description of changes

Removed replaceVpc logic and introduced updateVpc to track changes and errors to handle multiple updates in one go

Description of how you validated changes

Have tested the changes by first deploying a cluster with below config:

const vpc = ec2.Vpc.fromLookup(stack, 'Vpc', { isDefault: true });
new eks.Cluster(stack, 'Cluster', {
  vpc,
  ...getClusterVersionConfig(stack, eks.KubernetesVersion.V1_24),
  defaultCapacity: 0,
});

TestCase - 1 Update both subnets and Access at the same time

new eks.Cluster(stack, 'Cluster', {
  vpc,
  ...getClusterVersionConfig(stack, eks.KubernetesVersion.V1_29),
  defaultCapacity: 0,
  tags: {
    foo: 'bar',
  },
  endpointAccess: eks.EndpointAccess.PUBLIC,
  vpcSubnets: [{ subnetType: ec2.SubnetType.PUBLIC }],
});

Error below is thrown for Cluster custom resource -

{
    "errorType": "Error",
    "errorMessage": "Only one type of update - VpcConfigUpdate, LoggingUpdate or EndpointAccessUpdate can be allowed",
    "stack": [
        "Error: Only one type of update - VpcConfigUpdate, LoggingUpdate or EndpointAccessUpdate can be allowed",
        "    at Pi.onUpdate (/var/task/index.js:55:651127)",
        "    at Pi.onEvent (/var/task/index.js:55:647590)",
        "    at Runtime.yR [as handler] (/var/task/index.js:55:657995)",
        "    at Runtime.handleOnceNonStreaming (file:///var/runtime/index.mjs:1173:29)"
    ]
}

TestCase - 2 Update subnets to public

new eks.Cluster(stack, 'Cluster', {
  vpc,
  ...getClusterVersionConfig(stack, eks.KubernetesVersion.V1_29),
  defaultCapacity: 0,
  vpcSubnets: [{ subnetType: ec2.SubnetType.PUBLIC }],
});
{
    "updates": {
        "replaceName": false,
        "updateVpc": true,
        "updateAccess": false,
        "replaceRole": false,
        "updateVersion": false,
        "updateEncryption": false,
        "updateLogging": false
    }
}
{
  clientName: 'EKSClient',
  commandName: 'UpdateClusterConfigCommand',
  input: {
    name: 'Cluster9EE0221C-0b6f58b0698348aea43866b93a62b2c9',
    resourcesVpcConfig: { subnetIds: [Array], securityGroupIds: [Array] }
  },
  output: {
    update: {
      createdAt: 2024-05-08T20:55:00.013Z,
      errors: [],
      id: '7d5cd243-5536-3f52-b5ca-4c6e6c044529',
      params: [Array],
      status: 'InProgress',
      type: 'VpcConfigUpdate'
    }
  },
  metadata: {}
}

Checklist


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@aws-cdk-automation aws-cdk-automation requested a review from a team May 8, 2024 21:50
@github-actions github-actions bot added beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK bug This issue is a bug. effort/medium Medium work item – several days of effort p1 labels May 8, 2024
Copy link
Collaborator

@aws-cdk-automation aws-cdk-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pull request linter has failed. See the aws-cdk-automation comment below for failure reasons. If you believe this pull request should receive an exemption, please comment and provide a justification.

A comment requesting an exemption should contain the text Exemption Request. Additionally, if clarification is needed add Clarification Request to a comment.

@mrlikl mrlikl changed the title fix(eks): In place updates for EKS security group and Subnets fix(eks): in place updates for EKS security group and Subnets May 8, 2024
@aws-cdk-automation aws-cdk-automation dismissed their stale review May 8, 2024 22:49

✅ Updated pull request passes all PRLinter validations. Dismissing previous PRLinter review.

@pahud
Copy link
Contributor

pahud commented May 8, 2024

Hey, this PR makes sense to me. Thanks for your ownership.

I guess we probably need to add some unit tests at packages/@aws-cdk/custom-resource-handlers/test/aws-eks/cluster-resource-provider.test.ts though. What do you think?

@mrlikl
Copy link
Contributor Author

mrlikl commented May 8, 2024

@pahud yes, I am checking the same

@aws-cdk-automation aws-cdk-automation added the pr/needs-maintainer-review This PR needs a review from a Core Team Member label May 8, 2024
Copy link
Contributor

@GavinZZ GavinZZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. I have a question regarding this change. I notice that you removed replaceVpc and replace it updateVpc when subnet id and security group id changes. This looks valid to me.

However, I'm curious would there be a case where the actual VPC is replaced and we still want to keep replaceVpc in this case?


const app = new App();

const stack = new EksClusterStack(app, 'aws-cdk-eks-cluster-stack');
Copy link
Contributor

@GavinZZ GavinZZ May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this test test subnet updates? It only get deployed once right? Want to make sure you've tested the flow manually for update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the flow was manually tested, and then changed to this integ test here.

@mrlikl
Copy link
Contributor Author

mrlikl commented May 10, 2024

Thanks for the PR. I have a question regarding this change. I notice that you removed replaceVpc and replace it updateVpc when subnet id and security group id changes. This looks valid to me.

However, I'm curious would there be a case where the actual VPC is replaced and we still want to keep replaceVpc in this case?

@GavinZZ EKS does not support updating the subnets (to a different VPC) post its creation at the moment. In CFN, if we specify subnets from a different VPC it does not trigger a replacement type update for the resource rather the error Subnets specified must belong to the VPC: vpc-xxxxx (Service: Eks, Status Code: 400, Request ID: 47f6ddc6-e957-471b-b97f-58c837054879)" (RequestToken: 31954395-3f14-3635-1c33-69987d6e21c4, HandlerErrorCode: InvalidRequest) is thrown. Same is the case with SG belonging to a different VPC - Security group(s) [sg-xxxx] are not in the same VPC as the subnets. Please specify a security group that is associated with the VPC: vpc-xxxxx (Service: Eks, Status Code: 400, Request ID: 48021c3d-b2af-4375-b7ad-6ca6d4341b78)" (RequestToken: 4f63b78c-a19a-1271-2c1b-2d4303c0e126, HandlerErrorCode: InvalidRequest)

So to match the CFN behavior to not replace the cluster, we can let go of replaceVpc here.

@mrlikl mrlikl requested a review from GavinZZ May 10, 2024 17:09
Copy link
Contributor

@GavinZZ GavinZZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation! LGTM

Copy link
Contributor

mergify bot commented May 10, 2024

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@aws-cdk-automation aws-cdk-automation removed the pr/needs-maintainer-review This PR needs a review from a Core Team Member label May 10, 2024
@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
  • Commit ID: 2ee56ab
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mergify mergify bot merged commit eb39d9e into aws:main May 10, 2024
12 checks passed
Copy link
Contributor

mergify bot commented May 10, 2024

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mrlikl mrlikl deleted the eks-vpc branch May 10, 2024 18:26
@aws-cdk-automation
Copy link
Collaborator

Comments on closed issues and PRs are hard for our team to see. If you need help, please open a new issue that references this one.

@aws aws locked as resolved and limited conversation to collaborators Jul 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK bug This issue is a bug. effort/medium Medium work item – several days of effort p1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

eks: Changing securityGroup deletes entire cluster
4 participants