Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rds: NodePool CRD is producing endless loop if autoUpgrade enabled #249

Closed
stoetti opened this issue Mar 8, 2023 · 6 comments · Fixed by #456
Closed

rds: NodePool CRD is producing endless loop if autoUpgrade enabled #249

stoetti opened this issue Mar 8, 2023 · 6 comments · Fixed by #456
Labels

Comments

@stoetti
Copy link

stoetti commented Mar 8, 2023

What happened?

I created a GKE Cluster using the releaseChannel REGULAR and attached a NodePool with management.autoUpgrade set to true. The creation of the node pool in GCP works great. After the first upgrade was initiated we noticed that the node pools get downgraded again which leads to an endless loop of node upgrades and downgrades.
After some investigation I noticed that the provider writes the node version into spec.forProvider.version upon creation. This field is used in the reconciliation cycles which leads to the endless loop of upgrades and downgrades described above.

How can we reproduce it?

  1. Create a cluster with releaseChannel set to REGULAR
  2. Attach a node pool with management.autoUpgrade set to true

What environment did it happen in?

  • Universal Crossplane Version: v1.11.1
  • Provider Version: v0.28.0
@stoetti stoetti added the bug Something isn't working label Mar 8, 2023
@ytsarev
Copy link
Collaborator

ytsarev commented Mar 8, 2023

@stoetti thanks a lot for this great catch.
Please see the related proposal crossplane/crossplane#3822

@lsviben looks like this case is a great material for easy e2e testing of ignore changes functionality in the future

@ferpizza
Copy link

+1 to the reported issue. I experienced the same thing, and had to disable autoUpgrade to be able to continue to work with CrossPlane and provider-gcp.

In addition, I'm experiencing another similar issue with NodePools. In this case, we create a NodePool WITHOUT nodeCount in it, as we have autoScaling enabled.

After the a few cluster scale up/down events (triggered by GKE), the NodePool manifest gets updated with nodePool: X, and from then on GKE AutoScaling and CrossPlane start a game of cat and mouse, scaling the cluster up and down (GKE trying it to get it to its needed size and CrossPlane forcing it to remain at X).

To make things worst GKE has what I believe to be a bug on its API and, when CrossPlane or anyone else updates the nodeCount value to Y via an API call, it results on the nodePool adding the initialNodeCount property with value Y to the NodePool.

This, has the potential of breaking the possibility of syncing that NodePool further, as initialNodeCount is an static value that cannot be updated through API, and will force the deletion and re-creation of that nodepool to get it to the desired value (which is blocked by default) ... so, NodePool falls out of sync without any chance of recovering on its own.

Both this situation can be solved by crossplane/crossplane#3822 ... looking forward to its development.

@vladfr
Copy link

vladfr commented Nov 27, 2023

Similar issue in #340

@ferpizza
Copy link

ferpizza commented Nov 28, 2023

@vladfr , you should be able to work around these issues with the new managementPolicies field on provider's resources, eg. gcp nodepools

https://marketplace.upbound.io/providers/upbound/provider-gcp-container/v0.38.1/resources/container.gcp.upbound.io/NodePool/v1beta1#doc:spec-managementPolicies

the description of this field includes links to documentation on how to use it

@vladfr
Copy link

vladfr commented Nov 28, 2023 via email

@Demonsthere
Copy link
Contributor

Also running into this. The nodepools is in an upgrade/downgrade loop even though we set:

  Management Policies:
    Create
    Update
    Observe

This happens on crossplane v1.14.5 and the provider v0.41.0. Any suggestion on how to proceed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants