Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terraform gke scale out fix #711

Merged
merged 4 commits into from
Jul 31, 2019

Conversation

jlerche
Copy link
Contributor

@jlerche jlerche commented Jul 30, 2019

What problem does this PR solve?

Closes #708

What is changed and how does it work?

This incorporates changes from @aylei's' commit aylei@47be0a7 in his fork. This prevents node pools from being recreated.

In #673 the initial_node_count variable in the cluster resource was changed to depend on the tidb/monitor/tikv/pd count variables. Because of this, when a count variable is changed to scale out, the cluster resource is recreated by terraform. To get around this, a literal value of 5 is used. According to https://kubernetes.io/docs/setup/best-practices/cluster-large/#size-of-master-and-master-components this will cause a master node to be created that can accommodate up to 100 nodes.

Additional ordering in the resource creation to ensure that the tidb-cluster helm release is installed/upgraded only after the node pools are created/updated to prevent an issue where pods might be scheduled on old nodes.

Check List

Tests

  • Manual test (add detailed scripts or steps below)
  • terraform apply
  • wait for finish
  • check master node is not upgrading
  • increase tikv_count tikv_replica_count (availabillity zones means tikv_count=1 will have 3 nodes, 2 -> 6, etc)
  • terraform apply
  • check plan to make sure cluster is not recreated
  • type yes
  • check that cluster is not recreated, tidb node pool is created, then tidb-cluster release is upgraded
  • check tidb pods are running properly
  • terraform destroy

Code changes

  • Terraform changes

Side effects

  • Breaking backward compatibility

Related changes

  • Need to cherry-pick to the release branch
  • Need to update the documentation

Does this PR introduce a user-facing change?:

NONE

@jlerche jlerche requested review from gregwebs and aylei July 30, 2019 17:41
Copy link
Contributor

@aylei aylei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@tennix tennix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gregwebs gregwebs merged commit 5b152a9 into pingcap:master Jul 31, 2019
@sre-bot
Copy link
Contributor

sre-bot commented Jul 31, 2019

cherry pick to release-1.0 in PR #714

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GCP Terraform scripts destroy and create a new one when increasing the cluster size
6 participants