Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No late initialize for container NodePool's node_count and initial_node_count #600

Merged

Conversation

JonathanO
Copy link
Contributor

@JonathanO JonathanO commented Aug 9, 2024

Description of your changes

In the gcp terraform provider there are two ways to configure the size of NodePools:

  • initial_node_count specifies the initial size of the node pool, allowing the pool size to be controlled by the cluster autoscaler (or other tooling);
  • node_count explicitly configures the size of the node pool, allowing TF to manage the size of the pool.

You cannot specify both when creating a node pool, but this is not (currently) validated by the TF provider when updating an existing node pool.

At present provider-gcp late initializes both properties, resulting in a spec with both values set. Not only would this not be valid for creating a new node pool, but in the case where the user explicitly set only the initial size the late initialization causes the provider to attempt to manage the ongoing size of the node pool as if the user had set node_count too, resulting in fights with the GKE cluster autoscaler, etc.

While this can be worked around by disabling LateInitialize in the managementPolicy, the current behaviour is undesirable and surprising to users as it effectively removes the distinction between the two ways to specify pool size.

The TF docs also warn that the initial_node_count reported can change if node pools are resized, and suggests using a lifecycle block to ignore changes.

This PR removes both node_count and intial_node_count from late initialization, and ignores changes to the initial_node_count field.

I have:

  • Read and followed Crossplane's [contribution process].
  • Run make reviewable to ensure this PR is ready for review.
  • Added backport release-x.y labels to auto-backport this PR if necessary.

How has this code been tested

Note: The TF provider calculates current node_count as int(instances / AZs), so when testing it's necessary to resize the node pool by at least the number of AZs or the provider will not detect the change!

  • Create new node pool with initial_node_count, resize via the console, provider does nothing.
  • Create new node pool with node_count, resize via the console, provider resets to specified size.

Signed-off-by: Jonathan Oddy <jonathan@woaf.net>
The calculated initial_node_count can change if the node pool is resized.
The TF docs suggest using a lifecycle block to ignore changes to this
attribute.

Signed-off-by: Jonathan Oddy <jonathan@woaf.net>
Signed-off-by: Jonathan Oddy <jonathan@woaf.net>
@jeanduplessis
Copy link
Collaborator

/test-examples="examples/container/v1beta2/nodepool.yaml"

@jeanduplessis
Copy link
Collaborator

We can ignore the failed Uptest- job on this PR. It was due to a typo in the comment I made (now deleted).

Copy link
Collaborator

@turkenf turkenf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your detailed explanation and efforts @JonathanO, LGTM.

@turkenf turkenf merged commit a3ff591 into crossplane-contrib:main Aug 21, 2024
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants