Skip to content

Commit

Permalink
Gke terraform and guide (#493)
Browse files Browse the repository at this point in the history
* Initial progress

* Adds infrastructure

* Some changes to main.tf and adds manifests and charts

* Final touches, everything works

* Adds readme

* Cleans up README a little

* Adds firewall rule to allow ssh from bastion to GKE nodes

* Fixes name of ssh from bastion firewall rule

* Adds tweaks and formatting

* Adds reclaim policy to delete disks

* Updates readme to reflect disk reclaim policy

* Adds on-destroy to change persistent volume claimpolicy to delete

* Removes superfluous command changing reclaim policy

* Adds monitor node count variable

* Adds startup script daemonset to properly change open fd on pd and tikv nodes

* Streamlines startup daemonset and adds Linux Guest Environment installation

* Adds note about default set up exceeding GCP default cpu quota

* Adds link to GCP quota documentation page

* Fixes formatting

* Adds clarifications to README, adds tidb and monitor port to outputs

* Adds section on how to delete nodes that are automatically replicated across zones in a regional cluster
  • Loading branch information
jlerche authored and tennix committed May 23, 2019
1 parent 8917fa2 commit b9889ec
Show file tree
Hide file tree
Showing 14 changed files with 1,205 additions and 0 deletions.
4 changes: 4 additions & 0 deletions deploy/gcp/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.terraform
*.tfstate*
credentials
rendered
126 changes: 126 additions & 0 deletions deploy/gcp/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# Deploy TiDB Operator and TiDB cluster on GCP GKE

## Requirements:
* [gcloud](https://cloud.google.com/sdk/install)
* [terraform](https://www.terraform.io/downloads.html)
* [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/#install-kubectl) >= 1.11
* [helm](https://github.com/helm/helm/blob/master/docs/install.md#installing-the-helm-client) >= 2.9.0
* [jq](https://stedolan.github.io/jq/download/)

## Configure gcloud

https://cloud.google.com/sdk/docs/initializing

## Setup

The default setup will create a new VPC, two subnetworks, and an f1-micro instance as a bastion machine. The GKE cluster is created with the following instance types as worker nodes:

* 3 n1-standard-4 instances for PD
* 3 n1-highmem-8 instances for TiKV
* 3 n1-standard-16 instances for TiDB
* 3 n1-standard-2 instances for monitor

> *NOTE*: The number of nodes created depends on how many availability zones there are in the chosen region. Most have 3 zones, but us-central1 has 4. See https://cloud.google.com/compute/docs/regions-zones/ for more information. Please refer to the `Customize` section for information on how to customize node pools in a regional cluster.
> *NOTE*: The default setup, as listed above, will exceed the default CPU quota of a GCP project. To increase your project's quota, please follow the instructions [here](https://cloud.google.com/compute/quotas). The default setup will require at least 91 CPUs, more if you need to scale out.
The terraform script expects three environment variables. You can let Terraform prompt you for them, or `export` them ahead of time. If you choose to export them, they are:

* `TF_VAR_GCP_CREDENTIALS_PATH`: Path to a valid GCP credentials file. It is generally considered a good idea to create a service account to be used by Terraform. See [this page](https://cloud.google.com/iam/docs/creating-managing-service-accounts) for more information on how to manage them. See [this page](https://cloud.google.com/iam/docs/creating-managing-service-account-keys) for creating and managing service account keys which, when downloaded, will be the needed credentials file.
* `TF_VAR_GCP_REGION`: The region to create the resources in, for example: `us-west1`
* `TF_VAR_GCP_PROJECT`: The name of the GCP project



The service account should have sufficient permissions to create resources in the project. The `Project Editor` primitive will accomplish this.

If the GCP project is new, make sure the relevant APIs are enabled:

```bash
gcloud services enable cloudresourcemanager.googleapis.com && \
gcloud services enable cloudbilling.googleapis.com && \
gcloud services enable iam.googleapis.com && \
gcloud services enable compute.googleapis.com && \
gcloud services enable container.googleapis.com
```

Now we can launch the script:

```bash
git clone --depth=1 https://github.com/pingcap/tidb-operator
cd tidb-operator/deploy/gcp
terraform init
terraform apply
```

After `terraform apply` is successful, the TiDB cluster can be accessed by SSHing into the bastion machine and connecting via MySQL:
```bash
gcloud compute ssh bastion --zone <zone>
mysql -h <tidb_ilb_ip> -P 4000 -u root
```

It is possible to interact with the cluster using `kubectl` and `helm` with the kubeconfig file `credentials/kubeconfig_<cluster_name>`. The default `cluster_name` is `my-cluster`, it can be changed in `variables.tf`
```bash
# By specifying --kubeconfig argument
kubectl --kubeconfig credentials/kubeconfig_<cluster_name> get po -n tidb
helm --kubeconfig credentials/kubeconfig_<cluster_name> ls

# Or setting KUBECONFIG environment variable
export KUBECONFIG=$PWD/credentials/kubeconfig_<cluster_name>
kubectl get po -n tidb
helm ls
```

When done, the infrastructure can be torn down by running `terraform destroy`


## Upgrade TiDB cluster

To upgrade TiDB cluster, modify `tidb_version` variable to a higher version in variables.tf and run `terraform apply`.

> *Note*: The upgrading doesn't finish immediately. You can watch the upgrading process by `watch kubectl --kubeconfig credentials/kubeconfig_<cluster_name> get po -n tidb`
## Scale TiDB cluster

To scale TiDB cluster, modify `tikv_count`, `tikv_replica_count`, `tidb_count`, and `tidb_replica_count` to your desired count, and then run `terraform apply`.

> *Note*: Currently, scaling in is not supported since we cannot determine which node to remove. Scaling out needs a few minutes to complete, you can watch the scaling out by `watch kubectl --kubeconfig credentials/kubeconfig_<cluster_name> get po -n tidb`
> *Note*: Incrementing the node count will create a node per GCP availability zones.
## Customize

### Customize GCP resources

GCP allows attaching a local SSD to any instance type that is `n1-standard-1` or greater. This allows for good customizability.

### Customize TiDB Parameters

Currently, there are not too many parameters exposed to be customized. However, you can modify `templates/tidb-cluster-values.yaml.tpl` before deploying. If you modify it after the cluster is created and then run `terraform apply`, it will not take effect unless the pod(s) is manually deleted.

### Customizing node pools

The cluster is created as a regional, as opposed to a zonal cluster. This means that GKE will replicate node pools to each availability zone. This is desired to maintain high availability, however for the monitoring services, like Grafana, this is potentially unnecessary. It is possible to manually remove nodes if desired via `gcloud`.

> *NOTE*: GKE node pools are managed instance groups, so a node deleted by `gcloud compute instances delete` will be automatically recreated and added back to the cluster.
Suppose we wish to delete a node from the monitor pool, we can do
```bash
$ gcloud compute instance-groups managed list | grep monitor
```
and the result will be something like this
```bash
gke-my-cluster-monitor-pool-08578e18-grp us-west1-b zone gke-my-cluster-monitor-pool-08578e18 0 0 gke-my-cluster-monitor-pool-08578e18 no
gke-my-cluster-monitor-pool-7e31100f-grp us-west1-c zone gke-my-cluster-monitor-pool-7e31100f 1 1 gke-my-cluster-monitor-pool-7e31100f no
gke-my-cluster-monitor-pool-78a961e5-grp us-west1-a zone gke-my-cluster-monitor-pool-78a961e5 1 1 gke-my-cluster-monitor-pool-78a961e5 no
```
The first column is the name of the managed instance group, and the second column is the zone it was created in. We will also need the name of the instance in that group, we can get it as follows
```bash
$ gcloud compute instance-groups managed list-instances gke-my-cluster-monitor-pool-08578e18-grp --zone us-west1-b
NAME ZONE STATUS ACTION INSTANCE_TEMPLATE VERSION_NAME LAST_ERROR
gke-my-cluster-monitor-pool-08578e18-c7vd us-west1-b RUNNING NONE gke-my-cluster-monitor-pool-08578e18
```
Now we can delete the instance
```bash
$ gcloud compute instance-groups managed delete-instances gke-my-cluster-monitor-pool-08578e18-grp --instances=gke-my-cluster-monitor-pool-08578e18-c7vd --zone us-west1-b
```
1 change: 1 addition & 0 deletions deploy/gcp/charts/tidb-cluster
1 change: 1 addition & 0 deletions deploy/gcp/charts/tidb-operator
31 changes: 31 additions & 0 deletions deploy/gcp/data.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
data "template_file" "tidb_cluster_values" {
template = "${file("${path.module}/templates/tidb-cluster-values.yaml.tpl")}"

vars {
cluster_version = "${var.tidb_version}"
pd_replicas = "${var.pd_replica_count}"
tikv_replicas = "${var.tikv_replica_count}"
tidb_replicas = "${var.tidb_replica_count}"
operator_version = "${var.tidb_operator_version}"
}
}

data "external" "tidb_ilb_ip" {
depends_on = ["null_resource.deploy-tidb-cluster"]
program = ["bash", "-c", "kubectl --kubeconfig ${local.kubeconfig} get svc -n tidb tidb-cluster-tidb -o json | jq '.status.loadBalancer.ingress[0]'"]
}

data "external" "monitor_ilb_ip" {
depends_on = ["null_resource.deploy-tidb-cluster"]
program = ["bash", "-c", "kubectl --kubeconfig ${local.kubeconfig} get svc -n tidb tidb-cluster-grafana -o json | jq '.status.loadBalancer.ingress[0]'"]
}

data "external" "tidb_port" {
depends_on = ["null_resource.deploy-tidb-cluster"]
program = ["bash", "-c", "kubectl --kubeconfig ${local.kubeconfig} get svc -n tidb tidb-cluster-tidb -o json | jq '.spec.ports | .[] | select( .name == \"mysql-client\") | {port: .port|tostring}'"]
}

data "external" "monitor_port" {
depends_on = ["null_resource.deploy-tidb-cluster"]
program = ["bash", "-c", "kubectl --kubeconfig ${local.kubeconfig} get svc -n tidb tidb-cluster-grafana -o json | jq '.spec.ports | .[] | select( .name == \"grafana\") | {port: .port|tostring}'"]
}
Loading

0 comments on commit b9889ec

Please sign in to comment.