Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clusterctl move not compatible with AWSMachinePools #3624

Closed
AverageMarcus opened this issue Jul 27, 2022 · 2 comments · Fixed by #3798
Closed

clusterctl move not compatible with AWSMachinePools #3624

AverageMarcus opened this issue Jul 27, 2022 · 2 comments · Fixed by #3798
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-priority triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@AverageMarcus
Copy link
Member

/kind bug

What steps did you take and what happened:

  1. Spin up a bootstrap cluster in Kind
  2. Create a new target cluster with at least one AWSMachinePool defined
  3. Wait for target cluster to be created and ready
  4. Perform a pivot of the cluster using clusterctl move so the target cluster is self-managing.
  5. The following error will be reported:
Performing move...
Discovering Cluster API objects
Moving Cluster API objects Clusters=1
Moving Cluster API objects ClusterClasses=0
Creating objects in the target cluster
Error: [action failed after 10 attempts: error creating "infrastructure.cluster.x-k8s.io/v1beta1, Kind=AWSMachinePool" default/golem-def00a: admission webhook "validation.awsmachinepool.infrastructure.cluster.x-k8s.io" denied the request: AWSMachinePool.infrastructure.cluster.x-k8s.io "golem-def00a" is invalid: spec.awsLaunchTemplate.rootVolume.deviceName: Forbidden: root volume shouldn't have device name, action failed after 10 attempts: error creating "infrastructure.cluster.x-k8s.io/v1beta1, Kind=AWSMachinePool" default/golem-def00b: admission webhook "validation.awsmachinepool.infrastructure.cluster.x-k8s.io" denied the request: AWSMachinePool.infrastructure.cluster.x-k8s.io "golem-def00b" is invalid: spec.awsLaunchTemplate.rootVolume.deviceName: Forbidden: root volume shouldn't have device name, action failed after 10 attempts: error creating "infrastructure.cluster.x-k8s.io/v1beta1, Kind=AWSMachinePool" default/golem-def00c: admission webhook "validation.awsmachinepool.infrastructure.cluster.x-k8s.io" denied the request: AWSMachinePool.infrastructure.cluster.x-k8s.io "golem-def00c" is invalid: spec.awsLaunchTemplate.rootVolume.deviceName: Forbidden: root volume shouldn't have device name]

What did you expect to happen:
All resources moved to the target cluster successfully.

Anything else you would like to add:
The rootVolume.deviceName is initially not provided when first creating the cluster resources in the bootstrap cluster. Once the AWS Launch Template has been created the details of the root volume are retrieved and the deviceName value is populated on the AWSMachinePool resource(s). When it comes to moving to the new cluster, the property remains populated and is then blocked by the admission webhook, preventing the move completing.

This value only seems to be used during the initial setup of the Launch Template and as far as I can see is never referenced by anything else after that. Manually removing the deviceName property from the AWSMachinePool resources allows the move to be performed but the value is never re-populated again as it's only fetched when initially creating the Launch Template.

Also discussed on Slack: https://kubernetes.slack.com/archives/CD6U2V71N/p1658902259480619

Environment:

  • Cluster-api-provider-aws version: v1.4.1
  • Cluster-api version: v1.1.5
  • clusterctl version: v1.2.0
  • Kubernetes version: (use kubectl version): v1.21.1
  • OS (e.g. from /etc/os-release): Ubuntu
@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 27, 2022
@sedefsavas
Copy link
Contributor

Thanks for reporting this issue!

This is happening because the deviceName field under rootVolume section is not allowed to be non-nil during creation, but is set by the controllers. During clusterctl move, with that field set, creation fails.

For the proper fix, we need to wait for v1beta2 release, as we need webhook/field changes.
But as a workaround, before the move, if users manually delete the deviceName, it won't get readded by the controllers, and move succeeds.

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 27, 2022
@sedefsavas sedefsavas mentioned this issue Jul 27, 2022
4 tasks
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-priority triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants