Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GKE terraform #585

Merged
merged 11 commits into from
Jun 18, 2019
2 changes: 2 additions & 0 deletions deploy/gcp/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,5 @@
*.tfstate*
credentials
rendered
terraform-key.json
credentials.auto.tfvars
70 changes: 41 additions & 29 deletions deploy/gcp/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,30 +34,44 @@ gcloud services enable container.googleapis.com

### Configure Terraform

The terraform script expects three environment variables. You can let Terraform prompt you for them, or `export` them in the `~/.bash_profile` file ahead of time. The required environment variables are:
The terraform script expects three variables to be set.

* `TF_VAR_GCP_CREDENTIALS_PATH`: Path to a valid GCP credentials file.
- It is recommended to create a new service account to be used by Terraform. See [this page](https://cloud.google.com/iam/docs/creating-managing-service-accounts) to create a service account and grant `Project Editor` role to it.
- See [this page](https://cloud.google.com/iam/docs/creating-managing-service-account-keys) to create service account keys, and choose `JSON` key type during creation. The downloaded `JSON` file that contains the private key is the credentials file you need.
* `TF_VAR_GCP_REGION`: The region to create the resources in, for example: `us-west1`.
* `TF_VAR_GCP_PROJECT`: The name of the GCP project.
* `TF_VAR_GCP_CREDENTIALS_PATH`: Path to a valid GCP credentials file.
- It is recommended to create a new service account to be used by Terraform as shown in the below example.

Below we will set these environment variables

```bash
# Replace the region with your GCP region and your GCP project name.
echo GCP_REGION=us-west1 >> terraform.tfvars
# First make sure you are connected to the correct project. gcloud config set project $PROJECT
echo "GCP_PROJECT=$(gcloud config get-value project)" >> terraform.tfvars
# Create a service account for terraform with restricted permissions and set the credentails path
./create-service-account.sh
```

## Deploy

> *Note*: The service account must have sufficient permissions to create resources in the project. The `Project Editor` primitive will accomplish this.

To set the three environment variables, for example, you can enter in your terminal:
Now that you have configured gcloud access, make sure you have a copy of the repo:

```bash
# Replace the values with the path to the JSON file you have downloaded, the GCP region and your GCP project name.
export TF_VAR_GCP_CREDENTIALS_PATH="/Path/to/my-project.json"
export TF_VAR_GCP_REGION="us-west1"
export TF_VAR_GCP_PROJECT="my-project"
git clone --depth=1 https://github.com/pingcap/tidb-operator
cd tidb-operator/deploy/gcp
```

You can also append them in your `~/.bash_profile` so they will be exported automatically next time.
You need to decide on instance types. If you just want to get a feel for a TiDB deployment and lower your cost, you can use the small settings.

## Deploy
cat small.tfvars >> terraform.tfvars

If you want to benchmark a production deployment, run:

The default setup creates a new VPC, two subnetworks, and an f1-micro instance as a bastion machine. The GKE cluster is created with the following instance types as worker nodes:
cat prod.tfvars >> terraform.tfvars

The terraform creates a new VPC, two subnetworks, and an f1-micro instance as a bastion machine.
The production setup used the following instance types:

* 3 n1-standard-4 instances for PD
* 3 n1-highmem-8 instances for TiKV
Expand All @@ -66,13 +80,11 @@ The default setup creates a new VPC, two subnetworks, and an f1-micro instance a

> *Note*: The number of nodes created depends on how many availability zones there are in the chosen region. Most have 3 zones, but us-central1 has 4. See [Regions and Zones](https://cloud.google.com/compute/docs/regions-zones/) for more information and see the [Customize](#customize) section on how to customize node pools in a regional cluster.

The default setup, as listed above, requires at least 91 CPUs which exceed the default CPU quota of a GCP project. To increase your project's quota, follow the instructions [here](https://cloud.google.com/compute/quotas). You need more CPUs if you need to scale out.
The production setup, as listed above, requires at least 91 CPUs which exceed the default CPU quota of a GCP project. To increase your project's quota, follow the instructions [here](https://cloud.google.com/compute/quotas). You need more CPUs if you need to scale out.

Now that you have configured everything needed, you can launch the script to deploy the TiDB cluster:
Once you choose your instances, you can install your TiDB cluster with:

```bash
git clone --depth=1 https://github.com/pingcap/tidb-operator
cd tidb-operator/deploy/gcp
terraform init
terraform apply
```
Expand All @@ -86,11 +98,11 @@ Apply complete! Resources: 17 added, 0 changed, 0 destroyed.

Outputs:

cluster_id = my-cluster
cluster_name = my-cluster
cluster_id = tidb
cluster_name = tidb
how_to_connect_to_mysql_from_bastion = mysql -h 172.31.252.20 -P 4000 -u root
how_to_ssh_to_bastion = gcloud compute ssh bastion --zone us-west1-b
kubeconfig_file = ./credentials/kubeconfig_my-cluster
kubeconfig_file = ./credentials/kubeconfig_tidb
monitor_ilb_ip = 35.227.134.146
monitor_port = 3000
region = us-west1
Expand All @@ -113,7 +125,7 @@ mysql -h <tidb_ilb_ip> -P 4000 -u root

## Interact with the cluster

You can interact with the cluster using `kubectl` and `helm` with the kubeconfig file `credentials/kubeconfig_<cluster_name>` as follows. The default `cluster_name` is `my-cluster`, which can be changed in `variables.tf`.
You can interact with the cluster using `kubectl` and `helm` with the kubeconfig file `credentials/kubeconfig_<cluster_name>` as follows. The default `cluster_name` is `tidb`, which can be changed in `variables.tf`.

```bash
# By specifying --kubeconfig argument.
Expand Down Expand Up @@ -178,7 +190,7 @@ You can change default values in `variables.tf` (such as the cluster name and th

### Customize GCP resources

GCP allows attaching a local SSD to any instance type that is `n1-standard-1` or greater. This allows for good customizability.
GCP allows attaching a local SSD to any instance type that is `n1-standard-1` or greater.

### Customize TiDB parameters

Expand All @@ -199,9 +211,9 @@ gcloud compute instance-groups managed list | grep monitor
And the result will be something like this:

```bash
gke-my-cluster-monitor-pool-08578e18-grp us-west1-b zone gke-my-cluster-monitor-pool-08578e18 0 0 gke-my-cluster-monitor-pool-08578e18 no
gke-my-cluster-monitor-pool-7e31100f-grp us-west1-c zone gke-my-cluster-monitor-pool-7e31100f 1 1 gke-my-cluster-monitor-pool-7e31100f no
gke-my-cluster-monitor-pool-78a961e5-grp us-west1-a zone gke-my-cluster-monitor-pool-78a961e5 1 1 gke-my-cluster-monitor-pool-78a961e5 no
gke-tidb-monitor-pool-08578e18-grp us-west1-b zone gke-tidb-monitor-pool-08578e18 0 0 gke-tidb-monitor-pool-08578e18 no
gke-tidb-monitor-pool-7e31100f-grp us-west1-c zone gke-tidb-monitor-pool-7e31100f 1 1 gke-tidb-monitor-pool-7e31100f no
gke-tidb-monitor-pool-78a961e5-grp us-west1-a zone gke-tidb-monitor-pool-78a961e5 1 1 gke-tidb-monitor-pool-78a961e5 no
```

The first column is the name of the managed instance group, and the second column is the zone in which it was created. You also need the name of the instance in that group, and you can get it by running:
Expand All @@ -213,16 +225,16 @@ gcloud compute instance-groups managed list-instances <the-name-of-the-managed-i
For example:

```bash
$ gcloud compute instance-groups managed list-instances gke-my-cluster-monitor-pool-08578e18-grp --zone us-west1-b
$ gcloud compute instance-groups managed list-instances gke-tidb-monitor-pool-08578e18-grp --zone us-west1-b

NAME ZONE STATUS ACTION INSTANCE_TEMPLATE VERSION_NAME LAST_ERROR
gke-my-cluster-monitor-pool-08578e18-c7vd us-west1-b RUNNING NONE gke-my-cluster-monitor-pool-08578e18
gke-tidb-monitor-pool-08578e18-c7vd us-west1-b RUNNING NONE gke-tidb-monitor-pool-08578e18
```

Now you can delete the instance by specifying the name of the managed instance group and the name of the instance, for example:

```bash
gcloud compute instance-groups managed delete-instances gke-my-cluster-monitor-pool-08578e18-grp --instances=gke-my-cluster-monitor-pool-08578e18-c7vd --zone us-west1-b
gcloud compute instance-groups managed delete-instances gke-tidb-monitor-pool-08578e18-grp --instances=gke-tidb-monitor-pool-08578e18-c7vd --zone us-west1-b
```

## Destroy
Expand All @@ -235,7 +247,7 @@ terraform destroy

You have to manually delete disks in the Google Cloud Console, or with `gcloud` after running `terraform destroy` if you do not need the data anymore.

> *Note*: When `terraform destroy` is running, an error with the following message might occur: `Error reading Container Cluster "my-cluster": Cluster "my-cluster" has status "RECONCILING" with message""`. This happens when GCP is upgrading the kubernetes master node, which it does automatically at times. While this is happening, it is not possible to delete the cluster. When it is done, run `terraform destroy` again.
> *Note*: When `terraform destroy` is running, an error with the following message might occur: `Error reading Container Cluster "tidb": Cluster "tidb" has status "RECONCILING" with message""`. This happens when GCP is upgrading the kubernetes master node, which it does automatically at times. While this is happening, it is not possible to delete the cluster. When it is done, run `terraform destroy` again.


## More information
Expand Down
27 changes: 27 additions & 0 deletions deploy/gcp/create-service-account.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#!/usr/bin/env bash

set -euo pipefail
cd "$(dirname "$0")"
PROJECT="${TF_VAR_GCP_PROJECT:-$(cat terraform.tfvars | awk -F '=' '/GCP_PROJECT/ {print $2}' | cut -d '"' -f 2)}"
echo "$PROJECT"

cred_file=credentials.auto.tfvars
if test -f "$cred_file" ; then
if cat "$cred_file" | awk -F'=' '/GCP_CREDENTIALS/ {print $2}' >/dev/null ; then
echo "GCP_CREDENTAILS_PATH already set in $cred_file"
exit 1
fi
fi

gcloud iam service-accounts create --display-name terraform terraform
email="terraform@${PROJECT}.iam.gserviceaccount.com"
gcloud projects add-iam-policy-binding "$PROJECT" --member "$email" --role roles/container.clusterAdmin
gcloud projects add-iam-policy-binding "$PROJECT" --member "$email" --role roles/compute.networkAdmin
gcloud projects add-iam-policy-binding "$PROJECT" --member "$email" --role roles/compute.viewer
gcloud projects add-iam-policy-binding "$PROJECT" --member "$email" --role roles/compute.securityAdmin
gcloud projects add-iam-policy-binding "$PROJECT" --member "$email" --role roles/iam.serviceAccountUser
gcloud projects add-iam-policy-binding "$PROJECT" --member "$email" --role roles/compute.instanceAdmin.v1

mkdir -p credentials
gcloud iam service-accounts keys create credentials/terraform-key.json --iam-account "$email"
echo GCP_CREDENTIALS_PATH="$(pwd)/credentials/terraform-key.json" > "$cred_file"
6 changes: 2 additions & 4 deletions deploy/gcp/data.tf
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,11 @@ data "template_file" "tidb_cluster_values" {
tikv_replicas = var.tikv_replica_count
tidb_replicas = var.tidb_replica_count
operator_version = var.tidb_operator_version
tidb_operator_registry = var.tidb_operator_registry
}
}

data external "available_zones_in_region" {
depends_on = [null_resource.prepare-dir]
program = ["bash", "-c", "gcloud compute regions describe ${var.GCP_REGION} --format=json | jq '{zone: .zones|.[0]|match(\"[^/]*$\"; \"g\")|.string}'"]
}
data "google_compute_zones" "available" { }

data "external" "tidb_ilb_ip" {
depends_on = [null_resource.deploy-tidb-cluster]
Expand Down
85 changes: 60 additions & 25 deletions deploy/gcp/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,11 @@ resource "google_container_node_pool" "pd_pool" {
name = "pd-pool"
initial_node_count = var.pd_count

management {
auto_repair = false
auto_upgrade = false
}

node_config {
machine_type = var.pd_instance_type
local_ssd_count = 0
Expand Down Expand Up @@ -146,6 +151,11 @@ resource "google_container_node_pool" "tikv_pool" {
name = "tikv-pool"
initial_node_count = var.tikv_count

management {
auto_repair = false
auto_upgrade = false
}

node_config {
machine_type = var.tikv_instance_type
image_type = "UBUNTU"
Expand Down Expand Up @@ -177,6 +187,11 @@ resource "google_container_node_pool" "tidb_pool" {
name = "tidb-pool"
initial_node_count = var.tidb_count

management {
auto_repair = false
auto_upgrade = false
}

node_config {
machine_type = var.tidb_instance_type

Expand All @@ -203,6 +218,11 @@ resource "google_container_node_pool" "monitor_pool" {
name = "monitor-pool"
initial_node_count = var.monitor_count

management {
auto_repair = false
auto_upgrade = false
}

node_config {
machine_type = var.monitor_instance_type
tags = ["monitor"]
Expand Down Expand Up @@ -254,7 +274,7 @@ resource "google_compute_firewall" "allow_ssh_from_bastion" {

resource "google_compute_instance" "bastion" {
project = var.GCP_PROJECT
zone = data.external.available_zones_in_region.result["zone"]
zone = data.google_compute_zones.available.names[0]
machine_type = var.bastion_instance_type
name = "bastion"

Expand Down Expand Up @@ -308,62 +328,77 @@ resource "null_resource" "setup-env" {
depends_on = [
google_container_cluster.cluster,
null_resource.get-credentials,
var.tidb_operator_registry,
var.tidb_operator_version,
]

provisioner "local-exec" {
working_dir = path.module
interpreter = ["bash", "-c"]

command = <<EOS
kubectl create clusterrolebinding cluster-admin-binding --clusterrole cluster-admin --user $(gcloud config get-value account)
kubectl create serviceaccount --namespace kube-system tiller
set -euo pipefail

if ! kubectl get clusterrolebinding cluster-admin-binding 2>/dev/null; then
kubectl create clusterrolebinding cluster-admin-binding --clusterrole cluster-admin --user $(gcloud config get-value account)
fi

if ! kubectl get serviceaccount -n kube-system tiller 2>/dev/null ; then
kubectl create serviceaccount --namespace kube-system tiller
fi

kubectl apply -f manifests/crd.yaml
kubectl apply -k manifests/local-ssd
kubectl apply -f manifests/gke/persistent-disk.yaml
kubectl apply -f manifests/tiller-rbac.yaml

helm init --service-account tiller --upgrade --wait
until helm ls; do
echo "Wait until tiller is ready"
done
helm install --namespace tidb-admin --name tidb-operator ${path.module}/charts/tidb-operator
helm upgrade --install tidb-operator --namespace tidb-admin ${path.module}/charts/tidb-operator --set operatorImage=${var.tidb_operator_registry}/tidb-operator:${var.tidb_operator_version}
EOS


environment = {
KUBECONFIG = local.kubeconfig
}
}
environment = {
KUBECONFIG = local.kubeconfig
}
}
}

resource "null_resource" "deploy-tidb-cluster" {
depends_on = [
null_resource.setup-env,
local_file.tidb-cluster-values,
google_container_node_pool.pd_pool,
google_container_node_pool.tikv_pool,
google_container_node_pool.tidb_pool,
]

triggers = {
values = data.template_file.tidb_cluster_values.rendered
}
depends_on = [
null_resource.setup-env,
local_file.tidb-cluster-values,
google_container_node_pool.pd_pool,
google_container_node_pool.tikv_pool,
google_container_node_pool.tidb_pool,
]

provisioner "local-exec" {
triggers = {
values = data.template_file.tidb_cluster_values.rendered
}

provisioner "local-exec" {
interpreter = ["bash", "-c"]
command = <<EOS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use EOT instead of EOS to make Terraform 0.12 format this correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any word will work here.

set -euo pipefail
gregwebs marked this conversation as resolved.
Show resolved Hide resolved

helm upgrade --install tidb-cluster ${path.module}/charts/tidb-cluster --namespace=tidb -f ${local.tidb_cluster_values_path}
until kubectl get po -n tidb -lapp.kubernetes.io/component=tidb | grep Running; do
echo "Wait for TiDB pod running"
sleep 5
done

until kubectl get svc -n tidb tidb-cluster-tidb -o json | jq '.status.loadBalancer.ingress[0]' | grep ip; do
echo "Wait for TiDB internal loadbalancer IP"
sleep 5
done
EOS


environment = {
KUBECONFIG = local.kubeconfig
}
}
environment = {
KUBECONFIG = local.kubeconfig
}
}
}

3 changes: 3 additions & 0 deletions deploy/gcp/prod.tfvars
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
pd_instance_type = "n1-standard-4"
tikv_instance_type = "n1-highmem-8"
tidb_instance_type = "n1-standard-16"
3 changes: 3 additions & 0 deletions deploy/gcp/small.tfvars
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
pd_instance_type = "n1-standard-2"
tikv_instance_type = "n1-highmem-4"
tidb_instance_type = "n1-standard-8"
2 changes: 1 addition & 1 deletion deploy/gcp/templates/tidb-cluster-values.yaml.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ services:
type: ClusterIP

discovery:
image: pingcap/tidb-operator:${operator_version}
image: ${tidb_operator_registry}/tidb-operator:${operator_version}
imagePullPolicy: IfNotPresent
resources:
limits:
Expand Down
Loading