Skip to content

Commit

Permalink
en: update docs by comment
Browse files Browse the repository at this point in the history
Signed-off-by: WangLe1321 <wangle1321@163.com>
  • Loading branch information
WangLe1321 committed Aug 23, 2023
1 parent f304104 commit 4462641
Show file tree
Hide file tree
Showing 3 changed files with 19 additions and 14 deletions.
16 changes: 8 additions & 8 deletions en/backup-by-ebs-snapshot-across-multiple-kubernetes.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,6 @@ This document describes how to back up the data of a TiDB cluster deployed acros

The backup method described in this document is implemented based on CustomResourceDefinition (CRD) in [BR Federation](br-federation-architecture.md#br-federation-architecture-and-processes) and TiDB Operator. [BR](https://docs.pingcap.com/tidb/stable/backup-and-restore-overview) (Backup & Restore) is a command-line tool for distributed backup and recovery of the TiDB cluster data. For the underlying implementation, BR gets the backup data of the TiDB cluster, and then sends the data to the AWS storage.

> **Note:**
>
> > storage blocks on volumes that were created from snapshots must be initialized (pulled down from Amazon S3 and written to the volume) before you can access the block. This preliminary action takes time and can cause a significant increase in the latency of an I/O operation the first time each block is accessed. Volume performance is achieved after all blocks have been downloaded and written to the volume.
>
> From AWS documentation, the EBS volume restored from snapshot may have high latency before it's initialized, which can result in big performance hit of restored TiDB cluster. See details in [ebs create volume from snapshot](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-creating-volume.html#ebs-create-volume-from-snapshot).
>
> To initialize the restored volume more efficiently, you should **separate WAL and raft log to a dedicated small volume from TiKV data**. So that we can improve write performance of restored TiDB cluster by full initializing the volume of WAL and raft log.
## Usage scenarios

If you have the following requirements when backing up TiDB cluster data, you can use TiDB Operator to back up the data using volume snapshots and metadata to Amazon S3:
Expand All @@ -26,6 +18,14 @@ If you have the following requirements when backing up TiDB cluster data, you ca

If you have any other requirements, refer to [Backup and Restore Overview](backup-restore-overview.md) and select an appropriate backup method.

## Prerequisites

> storage blocks on volumes that were created from snapshots must be initialized (pulled down from Amazon S3 and written to the volume) before you can access the block. This preliminary action takes time and can cause a significant increase in the latency of an I/O operation the first time each block is accessed. Volume performance is achieved after all blocks have been downloaded and written to the volume.
From AWS documentation, the EBS volume restored from snapshot may have high latency before it's initialized, which can result in big performance hit of restored TiDB cluster. See details in [ebs create volume from snapshot](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-creating-volume.html#ebs-create-volume-from-snapshot).

To initialize the restored volume more efficiently, you should **separate WAL and raft log to a dedicated small volume from TiKV data**. So that we can improve write performance of restored TiDB cluster by full initializing the volume of WAL and raft log.

## Limitations

- Snapshot backup is applicable to TiDB Operator v1.5.1 or later versions, and TiDB v6.5.4 or later versions.
Expand Down
12 changes: 6 additions & 6 deletions en/br-federation-architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,11 @@ BR Federation coordinates `Backup` and `Restore` Custom Resources (CRs) in the d

The backup process in the data plane consists of three phases:

1. **Phase One:** Request PD to pause region scheduling and Garbage Collection (GC). As each TiKV instance might take snapshots at different times, pausing scheduling and GC can avoid data inconsistencies between TiKV instances during snapshot taking. Since the TiDB components are interconnected across multiple Kubernetes clusters, executing this operation in one Kubernetes cluster affects the entire TiDB cluster.
1. **Phase One:** TiDB Operator schedules a backup pod to request PD to pause region scheduling and Garbage Collection (GC). As each TiKV instance might take snapshots at different times, pausing scheduling and GC can avoid data inconsistencies between TiKV instances during snapshot taking. Since the TiDB components are interconnected across multiple Kubernetes clusters, executing this operation in one Kubernetes cluster affects the entire TiDB cluster.

2. **Phase Two:** Collect meta information such as `TidbCluster` CR and EBS volumes, and then request AWS API to create EBS snapshots. This phase must be executed in each Kubernetes cluster.
2. **Phase Two:** TiDB Operator collects meta information such as `TidbCluster` CR and EBS volumes, and then schedules another backup pod to request AWS API to create EBS snapshots. This phase must be executed in each Kubernetes cluster.

3. **Phase Three:** After EBS snapshots are completed, resume region scheduling and GC for the TiDB cluster. This operation is required only in the Kubernetes cluster where Phase One was executed.
3. **Phase Three:** After EBS snapshots are completed, TiDB Operator deletes the first backup pod to resume region scheduling and GC for the TiDB cluster. This operation is required only in the Kubernetes cluster where Phase One was executed.

![backup process in data plane](/media/volume-backup-process-data-plane.png)

Expand All @@ -45,11 +45,11 @@ The orchestration process of `Backup` from the control plane to the data plane i

The restore process in the data plane consists of three phases:

1. **Phase One:** Call the AWS API to restore the EBS volumes using EBS snapshots based on the backup information. The volumes are then mounted onto the TiKV nodes, and TiKV instances are started in recovery mode. This phase must be executed in each Kubernetes cluster.
1. **Phase One:** TiDB Operator schedules a restore pod to request the AWS API to restore the EBS volumes using EBS snapshots based on the backup information. The volumes are then mounted onto the TiKV nodes, and TiKV instances are started in recovery mode. This phase must be executed in each Kubernetes cluster.

2. **Phase Two:** Use BR to restore all raft logs and KV data in TiKV instances to a consistent state, and then instructs TiKV instances to exit recovery mode. As TiKV instances are interconnected across multiple Kubernetes clusters, this operation can restore all TiKV data and only needs to be executed in one Kubernetes cluster.
2. **Phase Two:** TiDB Operator schedules another restore pod to restore all raft logs and KV data in TiKV instances to a consistent state, and then instructs TiKV instances to exit recovery mode. As TiKV instances are interconnected across multiple Kubernetes clusters, this operation can restore all TiKV data and only needs to be executed in one Kubernetes cluster.

3. **Phase Three:** Restart all TiKV instances to run in normal mode, and start TiDB finally. This phase must be executed in each Kubernetes cluster.
3. **Phase Three:** TiDB Operator restarts all TiKV instances to run in normal mode, and start TiDB finally. This phase must be executed in each Kubernetes cluster.

![restore process in data plane](/media/volume-restore-process-data-plane.png)

Expand Down
5 changes: 5 additions & 0 deletions en/restore-from-ebs-snapshot-across-multiple-kubernetes.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,11 @@ Before restoring a TiDB cluster across multiple Kubernetes clusters from EBS vol
- Deploy a TiDB cluster across multiple Kubernetes clusters that you want to restore data to. For detailed steps, refer to [Deploy a TiDB Cluster across Multiple Kubernetes Clusters](deploy-tidb-cluster-across-multiple-kubernetes.md).
- When deploying the TiDB cluster, add the `recoveryMode: true` field to the spec of `TidbCluster`.

> **Note:**
>
> The EBS volume restored from snapshot may have high latency before it's initialized, which can result in big performance hit of restored TiDB cluster. See details in [ebs create volume from snapshot](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-creating-volume.html#ebs-create-volume-from-snapshot).
> So we recommend that you can configure `spec.template.warmup: sync` to initialize TiKV volumes automatically during restoration process.
## Restore process

### Step 1. Set up the environment for EBS volume snapshot restore in every data plane
Expand Down

0 comments on commit 4462641

Please sign in to comment.