Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for version upgrades in google_container_node_pool #633

Closed
davidquarles opened this issue Oct 26, 2017 · 10 comments · Fixed by #1266
Closed

Add support for version upgrades in google_container_node_pool #633

davidquarles opened this issue Oct 26, 2017 · 10 comments · Fixed by #1266
Assignees

Comments

@davidquarles
Copy link
Contributor

Right now node_version is only accessible as a cluster-level field. What I'd like (aka need) is the ability to trigger upgrades at the node pool level.

Furthermore, according to the current go docs, ClusterUpdate.DesiredNodePoolId is...

    // mandatory if
    // "desired_node_version", "desired_image_family"
    // or
    // "desired_node_pool_autoscaling" is specified and there is more than
    // one
    // node pool on the cluster.

Given the existing logic, I think this is not just missing granularity but broken functionality for clusters with multiple node pools:

if d.HasChange("node_version") {
   desiredNodeVersion := d.Get("node_version").(string)

   req := &container.UpdateClusterRequest{
       Update: &container.ClusterUpdate{
           DesiredNodeVersion: desiredNodeVersion,
       },
   }

I think this would subsume #397.

@davidquarles
Copy link
Contributor Author

@danawillow Is this an accurate assessment? Do you have any thoughts / insight / design considerations, if so? I'm still working through IP Aliasing, but I'm happy to help if this is in limbo when I'm finished.

@danawillow
Copy link
Contributor

Yup, that sounds accurate. Unfortunately, the way this resource was first designed seems to have tried to oversimplify the API a bit (or maybe the API's changed a bunch since then, who knows).

My initial guess would be that the node_version within a cluster should apply to every node pool in the cluster, and that a version attribute should be added to node pool, but I haven't thought much about it yet.

@davidquarles
Copy link
Contributor Author

davidquarles commented Oct 26, 2017

The API is definitely evolving pretty rapidly, which is rad! What you're suggesting sounds good to me. My main question, then -- at both layers -- is how strict that attribute should be.

More verbosely:

  • Should we trigger downgrades if/when the node pools have been externally upgraded?
  • What happens when the world moves on and the given node_version is no longer supported?

I do really like the existing min_master_version mechanism, if that abstraction makes sense here.

@danawillow
Copy link
Contributor

Generally, we try to make Terraform config files the source of truth if it can be. Things become really complicated if we want Terraform to only be the source of truth sometimes. This generally ends up meaning that if an attribute can be changed outside of Terraform without the user's involvement, we need a way to work around that, but if a user decided to change something that's already managed in Terraform they should understand that Terraform will try to clobber those changes.

With regards to node pools, it looks like they can be managed without the user's involvement, but that's only if the user explicitly opts into it via setting autoUpgrade (not currently an option in Terraform but that's just because we haven't gotten around to it yet). I'm not sure whether there's a use case that a user would want to set that to true but also be able to supply an explicit version.

Since autoUpgrade is opt-in, we probably don't need a min_master_version-like abstraction, unless we want users to be able to have autoUpgrade set and also explicitly set a minimum version. I think it would be simpler to just have a version attribute (and set it to Computed so removing it from the config wouldn't actually try to change it- this is for the case that somebody had previously set a version and then decided to use autoUpgrade). This would trigger downgrades if the user externally upgraded it.

As for a given version no longer being supported, do you know what happens in GKE? Would it just let the node run at that version with no "support" (whatever that means) or would it upgrade the node anyway? If the latter, then most of what I've said above no longer applies. If the former, then we're fine, since Terraform will only call the API to set the version on changes.

@davidquarles
Copy link
Contributor Author

I think GKE will let you hang out on stale versions. You're right -- that is probably the most valid concern of the bunch. I can do a bit of research and circle back.

I can speak to auto-upgrade with some certainty, though. That setting is configurable on each node pool, and it's a setting you can toggle. Our current strategy is to use auto-upgrade for most of our node pools. For our most mission-critical workloads, we'll disable auto-upgrade and carefully babysit manual upgrades, using dedicated node pools. We also want to canary and test node pool upgrades ahead of the auto-upgrade cycle, as soon as the master is upgraded. This is totally possible; auto-upgrades and manual upgrades aren't mutually exclusive upstream, and auto-upgrade doesn't specifically track the most recent version.

That said, I think we can accomplish all of our goals with explicit version and auto-upgrade attributes at the node pool layer, provided the two settings aren't mutually exclusive. It looks like there's a WIP implementation for auto-upgrade / auto-repair, but it doesn't expose either attribute on the node pool or manage the post-creation lifecycle. I'm not sure if that merits a separate issue/PR or can be achieved in that change. I'll ask over there.

@michaelbannister
Copy link
Contributor

michaelbannister commented Jan 3, 2018

Hi - I just hit the same problem in #397 and happy to look into this if others don't have time?
Or do you think that a direct fix to #397 is a reasonable first step (i.e. apply node_version from google_container_cluster to all node pools) without going into the per-node_pool config described in this issue?

@danawillow
Copy link
Contributor

I'd rather see this one fixed, personally. If you have some time, go for it!

@dhawal55
Copy link

dhawal55 commented Jan 3, 2018

+1 for node_version attribute in google_container_node_pool resource

@michaelbannister
Copy link
Contributor

Sorry I haven't made a start on this yet. At work I'm busy with our first migration to GCP while at home I'm preparing for my wedding at the end of Feburary! If it's still open by March I'll be sure to pick it up but meanwhile if anyone else fancies it… go ahead. :)

@ghost
Copy link

ghost commented Nov 19, 2018

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 hashibot-feedback@hashicorp.com. Thanks!

@ghost ghost locked and limited conversation to collaborators Nov 19, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants