Please reconsider deprecating google_container_node_pool -> initial_node_count #1160

james-masson · 2018-03-07T15:07:21Z

Terraform Version

Terraform v0.11.3
google-provider 1.6.0

Affected Resource(s)

google_container_node_pool

Terraform Configuration Files

resource "google_container_node_pool" "k8s" {
  name_prefix         = "${var.nodepool_name}-"
  zone                = "${var.region}-${element( split(",", lookup(var.zones_lookup, var.region)), 0 )}"
  cluster             = "${var.cluster_name}"
  initial_node_count  = "${var.initial_nodes_per_zone}"
.....

  autoscaling {
    min_node_count = "${var.min_nodes_per_zone}"
    max_node_count = "${var.max_nodes_per_zone}"
  }

  lifecycle {
    create_before_destroy = true
  }
}

Problem

I'm trying to provide seamless nodepool upgrade/replacement with Terraform - always maintaining enough nodes to run all services on the cluster during the upgrade.
The cluster workload is highly elastic, and uses node autoscaling heavily.

I have blue/green nodepools, and create_before_destroy, and use initial_node_count to ensure that there's enough free capacity for this seamless migration.

I cannot use node_count, as every subsequent terraform run will show my actual number of autoscaled nodes does not match my pre-configured amount - and offer to destroy the nodepool.

Not specifying either node_count or initial_node_count results in zero nodes in the nodepool initially, which doesn't provide capacity to run all services during the upgrade process. Node autoscaling is too slow to catch-up to workload demands during the nodepool replacement cycle, to be useful during this replacement process.

initial_node_count provides this "on-creation" boost to the nodepool, to provide capacity for the seamless migration, before autoscaling drags the node numbers down to the required minimum.

The text was updated successfully, but these errors were encountered:

paultyng · 2018-03-07T15:50:42Z

@james-masson I'm not sure it would work in this scenario, but have you experimented with ignore_changes?

danawillow · 2018-03-07T19:11:59Z

There's a bit of discussion around this in #844. Can you take a look at #844 (comment) and see if solution #2 would work for your use case?

james-masson · 2018-03-08T10:02:10Z

Apologies for not finding #844 - it is a similar use case.

Unfortunately, the solution you mentioned wouldn't work for me. I want the initial_node_count to be significantly higher than min_node_count and close to the max_node_count.

This initial capacity boost and multiple nodepools is the only way I've found to make the upgrade process close to seamless.

The reason I have blue/green nodepools is that the Google API for creating nodepools ( and hence terraform ) returns success too soon. It's quite normal for the API to return success, and to have zero nodes available to host services for quite a while afterwards. I've also seen problems where no nodes will ever be created successfully, due to bugs or misconfigs.

create_before_destroy gives the new nodes a chance to be up and ready, before the old ones are removed, although this isn't guarrenteed. Multiple nodepools gives some protection against bugs, and losing the create/destroy race.

What I'm trying to say is that my need for large initial_node_count and everything else is a workaround, because I can't trust Google ( and hence Terraform ) to replace nodepools in a reliable way.

I presume you're waiting for "RUNNING" from https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1beta1/projects.locations.clusters.nodePools#status ? I wonder why the discrepency between the API state and what K8s sees...

paultyng · 2018-03-08T14:10:25Z

So you are saying something similar to how ASG's work on the AWS provider (https://www.terraform.io/docs/providers/aws/r/autoscaling_group.html#wait_for_elb_capacity) is what you are looking for here? The ability to wait for a certain number of healthy nodes before moving on to dependencies?

james-masson · 2018-03-08T14:12:34Z

Exactly.

Signed-off-by: Modular Magician <magic-modules@google.com>

ghost · 2020-03-29T14:19:37Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 hashibot-feedback@hashicorp.com. Thanks!

rosbo assigned danawillow Mar 7, 2018

danawillow added the enhancement label Mar 7, 2018

danawillow mentioned this issue Mar 9, 2018

revive initial_node_count #1176

Merged

danawillow closed this as completed in #1176 Mar 13, 2018

yob mentioned this issue Oct 8, 2018

google_container_cluster/google_container_node_pool creates empty node pool #2126

Closed

modular-magician added a commit to modular-magician/terraform-provider-google that referenced this issue Sep 27, 2019

Add metadataFilters to GlobalForwardingRule (hashicorp#1160)

c3763f6

Signed-off-by: Modular Magician <magic-modules@google.com>

ghost locked and limited conversation to collaborators Mar 29, 2020

github-actions bot added service/container forward/review In review; remove label to forward labels Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Please reconsider deprecating google_container_node_pool -> initial_node_count #1160

Please reconsider deprecating google_container_node_pool -> initial_node_count #1160

james-masson commented Mar 7, 2018

paultyng commented Mar 7, 2018

danawillow commented Mar 7, 2018

james-masson commented Mar 8, 2018

paultyng commented Mar 8, 2018

james-masson commented Mar 8, 2018

ghost commented Mar 29, 2020

Please reconsider deprecating google_container_node_pool -> initial_node_count #1160

Please reconsider deprecating google_container_node_pool -> initial_node_count #1160

Comments

james-masson commented Mar 7, 2018

Terraform Version

Affected Resource(s)

Terraform Configuration Files

Problem

paultyng commented Mar 7, 2018

danawillow commented Mar 7, 2018

james-masson commented Mar 8, 2018

paultyng commented Mar 8, 2018

james-masson commented Mar 8, 2018

ghost commented Mar 29, 2020