Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Update Bottlerocket nodes to latest AMI 1.29 #7627

Closed
BogdanRS opened this issue Feb 29, 2024 · 18 comments · Fixed by #7666
Closed

[Bug] Update Bottlerocket nodes to latest AMI 1.29 #7627

BogdanRS opened this issue Feb 29, 2024 · 18 comments · Fixed by #7666
Labels

Comments

@BogdanRS
Copy link

BogdanRS commented Feb 29, 2024

I am trying to upgrade my manged node-groups with ( eksctl version v0.172):

eksctl upgrade nodegroup --name=mo-2vcpu-16gb-spot-v2 --kubernetes-version 1.29 --cluster=my-cluster

I am expecting for the nodes to be upgraded with the latest AMI version on kubernetes version 1.29, but they get reverted to 1.27.

eksctl get nodegroup --cluster=my-cluster --region eu-west-1 -o yaml 

- AutoScalingGroupName: eks-mo-2vcpu-16gb-spot-v2-9ac30932-e47d-4be4-aaab-052df2e57413
  Cluster: my-cluster
  CreationTime: "2023-02-02T13:51:48.277Z"
  DesiredCapacity: 2
  ImageID: BOTTLEROCKET_x86_64
  InstanceType: r5n.large,r5b.large,r5a.large
  MaxSize: 5
  MinSize: 2
  Name: mo-2vcpu-16gb-spot-v2
  NodeInstanceRoleARN: <>
  StackName: <>
  Status: ACTIVE
  Type: managed
  Version: "1.27"

I have also experienced it when i have upgraded the nodes to version 1.28, i thought it was a bug that would eventually get fixed, so i have upgraded manually from the AWS EKS console.

Copy link
Contributor

Hello BogdanRS 👋 Thank you for opening an issue in eksctl project. The team will review the issue and aim to respond within 1-5 business days. Meanwhile, please read about the Contribution and Code of Conduct guidelines here. You can find out more information about eksctl on our website

@BogdanRS
Copy link
Author

BogdanRS commented Mar 8, 2024

Any updates on this?

@yuxiang-zhang
Copy link
Member

Hey @BogdanRS could you please share the cluster config you are using?

@BogdanRS
Copy link
Author

BogdanRS commented Mar 14, 2024

If this helps, sure, but i don't see why is this relevant, I am not using the cluster config to upgrade my clusters.

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
    name: my-cluster
    region: eu-west-1
    version: "1.29"
vpc:
    id: "vpc-xxxxx"
    subnets:
      public:
        eu-west-1a:
            id: "subnet-xxxx"
        eu-west-1b:
            id: "subnet-xxxx"
      private:
        eu-west-1a:
            id: "subnet-xxxx"
        eu-west-1b:
            id: "subnet-xxxx"
    clusterEndpoints:
      publicAccess: true
      privateAccess: true
    publicAccessCIDRs: ["x.x.x.x/32"]
iam:
  withOIDC: true  #enables the IAM OIDC provider as well as IRSA for the Amazon CNI plugin
managedNodeGroups:
    - name: gp-2vcpu-8gb-ondemand-v2
      amiFamily: Bottlerocket
      minSize: 2
      maxSize: 3
      desiredCapacity: 2
      volumeSize: 50
      volumeType: gp3
      volumeEncrypted: true
      ssh:
        allow: false
      instanceTypes: ["m6a.large", "m6i.large", "m5a.large"]
      labels:
        lifecycle: OnDemand
      privateNetworking: true
    - name: mo-2vcpu-16gb-spot-v2
      amiFamily: Bottlerocket
      minSize: 2
      maxSize: 5
      desiredCapacity: 2
      volumeSize: 50
      volumeType: gp3
      volumeEncrypted: true
      ssh:
        allow: false
      instanceTypes: ["r5n.large", "r5b.large", "r5a.large"]
      spot: true
      labels:
        lifecycle: Ec2Spot
      privateNetworking: true
secretsEncryption:
  keyARN: arn:aws:kms:eu-west-1:xxxxx
cloudWatch:
      clusterLogging:
        enableTypes: ["*"]

@yuxiang-zhang
Copy link
Member

yuxiang-zhang commented Mar 18, 2024

Indeed it seems that upgrading Bottlerocket nodes doesn't work currently. I'm able to reproduce the issue and it's not only happening to a specific configuration.

To reproduce this issue:

  1. eksctl create cluster 1.27 cluster with a 1.27 Bottlerocket nodegroup
  2. eksctl upgrade cluster --version 1.28
  3. eksctl upgrade nodegroup --kubernetes-version 1.28 (command succeeds but nodegroup stays on 1.27)

The support to upgrade Bottlerocket nodes seems to be added recently via #6766, but it is still unclear to me how Bottlerocket upgrades are different from AL2 nodes.

@yuxiang-zhang
Copy link
Member

yuxiang-zhang commented Mar 18, 2024

Upgrade fails because the changeset only contains the following changes

[
  {
    "type": "Resource",
    "resourceChange": {
      "action": "Modify",
      "logicalResourceId": "ManagedNodeGroup",
      "physicalResourceId": "bot/ng1",
      "resourceType": "AWS::EKS::Nodegroup",
      "replacement": "False",
      "scope": [
        "Properties"
      ],
      "details": [
        {
          "target": {
            "attribute": "Properties",
            "name": "ForceUpdateEnabled",
            "requiresRecreation": "Never"
          },
          "evaluation": "Static",
          "changeSource": "DirectModification"
        }
      ]
    }
  }
]

and as I removed the changes in #6923, the changeset fails again, although this time it includes Version update among other updated fields.

#4423 seems relevant here.

From #4666, it seems the idea was (when we)

upgrade non-al2 nodegroups we update the Version field in the template to the correct kubernetes versions

and MakeManagedSSMParameterName has to return empty to let the Version field populate instead. However, #6923 changed that behaviour allowing MakeManagedSSMParameterName to populate the latestReleaseVersion:

latestReleaseVersion, err := m.getLatestReleaseVersion(ctx, kubernetesVersion, nodegroup)
if err != nil {
return err
}
if latestReleaseVersion != "" {
if err := m.updateReleaseVersion(latestReleaseVersion, options.LaunchTemplateVersion, nodegroup, ngResource); err != nil {
return err
}
} else {
ngResource.Version = gfnt.NewString(kubernetesVersion)
}

and as a result, Version never gets populated. Because latestReleaseVersion for Bottlerocket doesn't upgrade with higher Kubernetes version, Bottlerocket nodegroup upgrades do nothing right now.

@yuxiang-zhang
Copy link
Member

yuxiang-zhang commented Mar 19, 2024

and as I removed the changes in #6923, the changeset fails again, although this time it includes Version update among other updated fields.

I compared the new CFN template in the changeset against the old CFN template -- there is no change except the ManagedNodeGroup.Version field, yet the changeset still list all of the following as Changes:

  • LaunchTemplate
    • Tags
  • ManagedNodeGroup
    • Tags
    • NodeRole
    • Version
  • NodeInstanceRole
    • Tags

@BogdanRS
Copy link
Author

Thank you, @yuxiang-zhang, so as i understand there should also be some changes to be made on the bottlerocket side, right? Is it possible for you guys to talk with them about this?

@yuxiang-zhang
Copy link
Member

@BogdanRS Opened a PR, I tested it myself and it works for me. Would you mind doing a review and test if the fix works for you?

@BogdanRS
Copy link
Author

Is it ok if you make a patch release with this one? We have a bit of an automated process that only uses eksctl binary from releases.

@yuxiang-zhang
Copy link
Member

Sure, you can expect a release this week!

@TiberiuGC
Copy link
Collaborator

@BogdanRS - please find the release that contains the fix here.

@BogdanRS
Copy link
Author

BogdanRS commented Mar 22, 2024

@yuxiang-zhang it doesn't seem to work for me, i have the latest version of eksctl 0.175 and when I upgrade my nodegroups, they still get reverted to eks 1.27...

This is what I can see in the Cloudformation Stack of one of the managed nodegroups(after upgrade):

        "NodegroupName": "gp-2vcpu-8gb-ondemand-aza-v2",
        "ReleaseVersion": "1.19.2-29cc92cc",
        "ScalingConfig": {
          "DesiredSize": 1,
          "MaxSize": 3,
          "MinSize": 1
        },
        "Subnets": [
          "subnet-00002137f5bb9effa"
        ],
        "Tags": {
          "alpha.eksctl.io/nodegroup-name": "gp-2vcpu-8gb-ondemand-aza-v2",
          "alpha.eksctl.io/nodegroup-type": "managed"
        },
        "Version": "1.27"
      }
    },

@yuxiang-zhang
Copy link
Member

is your cluster on 1.29? how did you upgrade your cluster?

@yuxiang-zhang yuxiang-zhang reopened this Mar 22, 2024
@BogdanRS
Copy link
Author

yes, my cluster is on 1.29. Using eksctl upgrade cluster(never had any issues on control plane upgrade, only with nodes)

@yuxiang-zhang
Copy link
Member

yuxiang-zhang commented Mar 22, 2024

I couldn't reproduce what you have. I created a 1.27 cluster, upgraded to 1.29, and then upgraded the nodegroup from 1.27 to 1.29.

I think to mitigate the issue you are seeing, you could just manually create a changeset that changes the Version to 1.29.

This is the command I used:

eksctl upgrade nodegroup --name ng1 --kubernetes-version 1.29 --cluster bot

@BogdanRS
Copy link
Author

BogdanRS commented Mar 25, 2024

Well, i see it is the same command, the one that i used in the initial post, but for some reason for me it doesn't work and still has the same behavior. Maybe its the fact that the last version that the nodegroups have been created/upgraded with a version of eksctl that didn't contained your changes(the ones in 0.175)? For example, i have this version on the nodegroups of the clusters that haven't been upgraded yet, alpha.eksctl.io/eksctl-version - 0.151.0(CF stack).

You could also try, for a test, to create some cluster with that version of eksctl, then upgrade eksctl, then upgrade the nodes. That's the only thing that comes to mind.

@BogdanRS
Copy link
Author

BogdanRS commented Apr 2, 2024

The Version field was stuck on 1.27 on the Cloudformation stack. So the solution was either to remove that manually or just recreate the nodegroups from scratch.

@BogdanRS BogdanRS closed this as completed Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants