tidb-google-maintenance

We use similar approach as aerospike: watch GCP maintenance events on TiDB/TiKV/PD nodes and take proper actions:

TiDB: Put the TiDB offline by cordon the TiDB node and delete the TiDB pod (the node pool of TiDB instance MUST be set to auto-scale, the cordon node is expected to be reclaimed by auto-scaler)
TiKV: Ecivt leaders on TiKV store during maintenance.
PD: Resign leader if the current PD instance is the PD leader

An additional container is added to run the maintenance watching script.

Deploy

Sidecar Image

Used Sidecar Public image: pingcap/tidb-gcp-live-migration:${TIDB_VERSION} (e.g. pingcap/tidb-gcp-live-migration:v7.1.0)

or build the image on your own: TIDB_VERSION=v7.1.0 IMAGE=${YOUR_IMAGE}/tidb-gcp-live-migration make image-release

Add the Sidecar Image into manifest

For Cluster with TLS Enabled

For TiDB, add content below to spec.tidb (replace ${CLUSTR_NAME})

run

# replace ${SERVICEACCOUNT}, ${NAMESPACE} and ${CLUSTR_NAME}
kubectl apply -f rbac.yaml

        additionalContainers:
          - command:
              - python3
              - /main.py
            env:
              - name: TLS
                value: true
              - name: CLUSTER_NAME
                value: ${CLUSTR_NAME}
              - name: ROLE
                value: tidb
              - name: NODENAME
                valueFrom:
                  fieldRef:
                    fieldPath: spec.nodeName
            image: pingcap/tidb-gcp-live-migration:v7.1.0
            name: gcp-maintenance-script

For TiKV, add content below to spec.tikv (replace ${CLUSTR_NAME})

        additionalVolumes:
          - name: pd-tls
            secret:
              secretName: ${CLUSTR_NAME}-pd-cluster-secret
        additionalContainers:
          - command:
              - python3
              - /main.py
            env:
              - name: TLS
                value: true
              - name: CLUSTER_NAME
                value: ${CLUSTR_NAME}
              - name: ROLE
                value: tikv
            image: pingcap/tidb-gcp-live-migration:v7.1.0
            name: gcp-maintenance-script
            volumeMounts:
              - name: pd-tls
                mountPath: /var/lib/pd-tls
              - name: tikv-tls
                mountPath: /var/lib/tikv-tls

For PD, add content below to spec.pd (replace ${CLUSTR_NAME}),

        additionalContainers:
          - command:
              - python3
              - /main.py
            env:
              - name: TLS
                value: true
              - name: CLUSTER_NAME
                value: ${CLUSTR_NAME}
              - name: ROLE
                value: PD
            image: pingcap/tidb-gcp-live-migration:v7.1.0
            name: gcp-maintenance-script
            volumeMounts:
              - name: pd-tls
                mountPath: /var/lib/pd-tls

For Cluster with TLS Disabled

For TiDB, add content below to spec.tidb (replace ${CLUSTR_NAME})

run

# replace ${SERVICEACCOUNT}, ${NAMESPACE} and ${CLUSTR_NAME}
kubectl apply -f rbac.yaml

        additionalContainers:
          - command:
              - python3
              - /main.py
            env:
              - name: TLS
                value: false
              - name: CLUSTER_NAME
                value: ${CLUSTR_NAME}
              - name: ROLE
                value: tidb
              - name: NODENAME
                valueFrom:
                  fieldRef:
                    fieldPath: spec.nodeName
            image: pingcap/tidb-gcp-live-migration:v7.1.0
            name: gcp-maintenance-script

For TiKV, add content below to spec.tikv (replace ${CLUSTR_NAME})

        additionalContainers:
          - command:
              - python3
              - /main.py
            env:
              - name: TLS
                value: false
              - name: CLUSTER_NAME
                value: ${CLUSTR_NAME}
              - name: ROLE
                value: tikv
            image: pingcap/tidb-gcp-live-migration:v7.1.0
            name: gcp-maintenance-script

For PD, add content below to spec.pd (replace ${CLUSTR_NAME}),

        additionalContainers:
          - command:
              - python3
              - /main.py
            env:
              - name: TLS
                value: false
              - name: CLUSTER_NAME
                value: ${CLUSTR_NAME}
              - name: ROLE
                value: PD
            image: pingcap/tidb-gcp-live-migration:v7.1.0
            name: gcp-maintenance-script

PD scheduler configuration

Increase the PD leader-schedule limit after the cluster is deployed, through sql:

set config pd `leader-schedule-limit`=100;

To Simulate a GCP Maintenance Event

see: https://cloud.google.com/compute/docs/instances/simulating-host-maintenance

And a script to simulate the effect when live migration happened.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
test		test
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
main.py		main.py
rbac.yaml		rbac.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tidb-google-maintenance

Deploy

Sidecar Image

Add the Sidecar Image into manifest

For Cluster with TLS Enabled

For Cluster with TLS Disabled

PD scheduler configuration

To Simulate a GCP Maintenance Event

About

Releases

Packages

Contributors 2

Languages

License

PingCAP-QE/tidb-google-maintenance

Folders and files

Latest commit

History

Repository files navigation

tidb-google-maintenance

Deploy

Sidecar Image

Add the Sidecar Image into manifest

For Cluster with TLS Enabled

For Cluster with TLS Disabled

PD scheduler configuration

To Simulate a GCP Maintenance Event

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages