Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ticdc: add 4 docs for data replication scenarios #10276

Merged
merged 34 commits into from
Jul 7, 2022
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
bd7634f
ticdc: add 4 docs for data replication scenarios
shichun-0415 Jun 27, 2022
ddac622
fix jenkins
shichun-0415 Jun 27, 2022
628a6de
Apply suggestions from code review
shichun-0415 Jun 28, 2022
078f286
fix typo
shichun-0415 Jun 28, 2022
d190313
migrete from tidb to mysql
Jun 28, 2022
2362f28
Merge branch 'integrate-data' of github.com:shichun-0415/docs-cn into…
Jun 28, 2022
7f3c794
Apply suggestions from code review
Jun 29, 2022
f5a6aa7
update file names and remove integration tool.md
shichun-0415 Jun 30, 2022
820e9c5
unify title and fix code format
shichun-0415 Jun 30, 2022
25d074e
refine wording and format
shichun-0415 Jun 30, 2022
cbd6129
remove copyable
shichun-0415 Jun 30, 2022
b4f892e
fix CI and jenkins
shichun-0415 Jun 30, 2022
4b456af
fix three links
shichun-0415 Jul 1, 2022
7408138
add a missing aliases
shichun-0415 Jul 1, 2022
bc459ed
Update wording and format
lilin90 Jul 1, 2022
30f701e
update-overview
Jul 1, 2022
4c42648
Merge branch 'integrate-data' of github.com:shichun-0415/docs-cn into…
Jul 1, 2022
1741a73
refine wording and fix code language
shichun-0415 Jul 4, 2022
26d0c4d
fix upper case
shichun-0415 Jul 4, 2022
205f8fc
Apply suggestions from code review
shichun-0415 Jul 4, 2022
57e968c
remain the original files and remove shell for conf files
shichun-0415 Jul 4, 2022
be4e095
remove two new docs of data integration
shichun-0415 Jul 4, 2022
cfd3e83
fix ci
shichun-0415 Jul 5, 2022
ce909c6
Merge remote-tracking branch 'upstream/master' into integrate-data
shichun-0415 Jul 6, 2022
12e51f5
avoid usage of we
shichun-0415 Jul 6, 2022
b830a47
Update wording
lilin90 Jul 6, 2022
8348ddf
Merge branch 'integrate-data' of https://github.com/shichun-0415/docs…
lilin90 Jul 6, 2022
14b037b
Update description
lilin90 Jul 6, 2022
0291787
Remove extra space in metadata
lilin90 Jul 7, 2022
837fef0
Make wording consistent
lilin90 Jul 7, 2022
bb45fe8
*: update wording and fix format
lilin90 Jul 7, 2022
8f049ad
Add inline code format
lilin90 Jul 7, 2022
f36fb54
Merge remote-tracking branch 'upstream/master' into integrate-data
shichun-0415 Jul 7, 2022
b04a904
fix format and make wording consistent
shichun-0415 Jul 7, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion TOC.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,12 +108,18 @@
- [从 CSV 文件迁移数据到 TiDB](/migrate-from-csv-files-to-tidb.md)
- [从 SQL 文件迁移数据到 TiDB](/migrate-from-sql-files-to-tidb.md)
- [从 TiDB 集群迁移数据至另一 TiDB 集群](/migrate-from-tidb-to-tidb.md)
- [从 TiDB 同步数据至 Apache Kafka](/replicate-data-to-kafka.md)
- [从 TiDB 集群迁移数据至 MySQL 兼容数据库](/replicate-from-tidb-to-mysql.md)
- 复杂迁移场景
- [上游使用 pt/gh-ost 工具的持续同步场景](/migrate-with-pt-ghost.md)
- [下游存在更多列的迁移场景](/migrate-with-more-columns-downstream.md)
- [如何根据类型或 DDL 内容过滤 binlog 事件](/filter-binlog-event.md)
- [如何通过 SQL 表达式过滤 DML binlog 事件](/filter-dml-event.md)
- 数据集成
- [数据集成综述](/replication-overview.md)
- [同步工具](/replication-tools.md)
shichun-0415 marked this conversation as resolved.
Show resolved Hide resolved
- 数据集成场景
- [与 Confluent Cloud 进行数据集成](/replicate-from-tidb-to-confluent.md)
- [与 Apache Kafka 和 Apache Flink 进行数据集成](/replicate-from-tidb-to-kafka-flink.md)
- 运维操作
- 升级 TiDB 版本
- [使用 TiUP 升级(推荐)](/upgrade-tidb-using-tiup.md)
Expand Down
Binary file added media/integrate/add-snowflake-sink-connector.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added media/integrate/authentication.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added media/integrate/configuration.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added media/integrate/confluent-topics.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added media/integrate/credentials.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added media/integrate/data-preview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added media/integrate/results.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added media/integrate/select-from-orders.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added media/integrate/sql-query-result.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added media/integrate/topic-selection.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
216 changes: 216 additions & 0 deletions migrate-from-tidb-to-mysql.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,216 @@
---
title: 从 TiDB 集群迁移数据至 MySQL 兼容数据库
summary: 了解如何将数据从 TiDB 集群迁移至 MySQL 兼容数据库
shichun-0415 marked this conversation as resolved.
Show resolved Hide resolved
aliases: ['/zh/tidb/dev/incremental-replication-between-clusters/']
---

# 从 TiDB 集群迁移数据至 MySQL 兼容数据库

本文档介绍如何将数据从 TiDB 集群迁移至 MySQL 兼容数据库,如 Aurora、MySQL、MariaDB 等。本文将模拟整个迁移过程,具体包括以下四个步骤:

1. 搭建环境
2. 迁移全量数据
3. 迁移增量数据
4. 平滑切换业务

## 第 1 步:搭建环境

1. 部署上游 TiDB 集群。

使用 tiup playground 快速部署上下游测试集群。更多部署信息,请参考 [tiup 官方文档](//tiup/tiup-cluster.md)。
zier-one marked this conversation as resolved.
Show resolved Hide resolved

{{< copyable "shell-regular" >}}

```shell
# 创建上游集群
tiup playground --db 1 --pd 1 --kv 1 --tiflash 0 --ticdc 1
# 查看集群状态
tiup status
```

2. 部署下游 MySQL 实例。

在实验环境中,我们可以使用 Docker 快速部署 MySQL 实例,执行如下命令:

```
docker run --name some-mysql -e MYSQL_ROOT_PASSWORD=my-secret-pw -p 3306:3306 -d mysql
```

在生产环境中,您可以参考 [Installing MySQL](https://dev.mysql.com/doc/refman/8.0/en/installing.html) 来部署 MySQL 实例。

3. 模拟业务负载。

在测试实验环境下,我们可以使用 go-tpc 向上游 TiDB 集群写入数据,以让 TiDB 产生事件变更数据。如下命令,首先在上游 TiDB 创建名为 tpcc 的数据库,然后使用 TiUP bench 写入数据到刚创建的 tpcc 数据库中。

```
tiup bench tpcc -H 127.0.0.1 -P 4000 -D tpcc --warehouses 4 prepare
tiup bench tpcc -H 127.0.0.1 -P 4000 -D tpcc --warehouses 4 run --time 300s
```

关于 go-tpc 的更多详细内容,可以参考[如何对 TiDB 进行 TPC-C 测试](/benchmark/benchmark-tidb-using-tpcc.md)。


## 第 2 步:迁移全量数据

搭建好测试环境后,可以使用 [Dumpling](dumpling-overview.md) 工具导出上游集群的全量数据。
zier-one marked this conversation as resolved.
Show resolved Hide resolved

1. 关闭 GC。
shichun-0415 marked this conversation as resolved.
Show resolved Hide resolved

为了保证增量迁移过程中新写入的数据不丢失,在开始全量导出之前,需要关闭上游集群的垃圾回收 (GC) 机制,以确保系统不再清理历史数据。

{{< copyable "sql" >}}

```sql
MySQL [test]> SET GLOBAL tidb_gc_enable=FALSE;
Query OK, 0 rows affected (0.01 sec)
MySQL [test]> SELECT @@global.tidb_gc_enable;
+-------------------------+:
| @@global.tidb_gc_enable |
+-------------------------+
| 0 |
+-------------------------+
1 row in set (0.00 sec)
```

> **注意:**
>
> 在生产集群中,关闭 GC 机制和备份操作会一定程度上降低集群的读性能,建议在业务低峰期进行备份,并设置合适的 RATE_LIMIT 限制备份操作对线上业务的影响。
shichun-0415 marked this conversation as resolved.
Show resolved Hide resolved

2. 备份数据。

使用 Dumpling 导出 SQL 格式的数据:

```
tiup dumpling -u root -P 4000 -h 127.0.0.1 --filetype sql -t 8 -o ./dumpling_output -r 200000 -F256MiB
```

导出完毕后,执行如下命令查看导数数据的元信息,metadata 文件中的 Pos 即是导出快照的 TSO,记这个数字为 BackupTS:

```
[root@test ~]# cat dumpling_output/metadata
Started dump at: 2022-06-28 17:49:54
SHOW MASTER STATUS:
Log: tidb-binlog
Pos: 434217889191428107
GTID:

Finished dump at: 2022-06-28 17:49:57
```

3. 恢复数据。

我们使用开源工具 MyLoader 导入数据到下游 MySQL,MyLoader 的安装和详细用例详见:[MyDumpler/MyLoader](https://github.com/mydumper/mydumper)。执行一下指令,将 Dumpling 导出的上游全量数据导入到下游 MySQL 实例:

```
myloader -h 127.0.0.1 -P 3306 -d ./dumpling_output/
```

4. (可选)校验数据。

通过 [sync-diff-inspector](/sync-diff-inspector/sync-diff-inspector-overview.md) 工具,可以验证上下游数据在某个时间点的一致性。

{{< copyable "shell-regular" >}}

```shell
sync_diff_inspector -C ./config.yaml
```

关于 sync-diff-inspector 的配置方法,请参考[配置文件说明](/sync-diff-inspector/sync-diff-inspector-overview.md#配置文件说明),在本文中,相应的配置为:

{{< copyable "shell-regular" >}}

```shell
# Diff Configuration.
######################### Datasource config #########################
[data-sources]
[data-sources.upstream]
host = "127.0.0.1" # 需要替换为实际上游集群 ip
port = 4000
user = "root"
password = ""
snapshot = "434217889191428107" # 配置为实际的备份时间点(参见「备份」小节的 BackupTS)
[data-sources.downstream]
host = "127.0.0.1" # 需要替换为实际下游集群 ip
port = 3306
user = "root"
password = ""

######################### Task config #########################
[task]
output-dir = "./output"
source-instances = ["upstream"]
target-instance = "downstream"
target-check-tables = ["*.*"]
```

## 第 3 步:迁移增量数据

1. 部署 TiCDC。

完成全量数据迁移后,就可以部署并配置 TiCDC 集群同步增量数据,实际生产集群中请参考 [TiCDC 部署](/ticdc/deploy-ticdc.md)。本文在创建测试集群时,已经启动了一个 TiCDC 节点,因此可以直接进行 changefeed 的配置。

2. 创建同步任务。

在上游集群中,执行以下命令创建从上游到下游集群的同步链路:

{{< copyable "shell-regular" >}}

```shell
tiup ctl:v6.1.0 cdc changefeed create --pd=http://127.0.0.1:2379 --sink-uri="mysql://root:@127.0.0.1:3306" --changefeed-id="upstream-to-downstream" --start-ts="434217889191428107"
```

以上命令中:

- --pd:实际的上游集群的地址
- --sink-uri:同步任务下游的地址
- --changefeed-id:同步任务的 ID,格式需要符合正则表达式 ^[a-zA-Z0-9]+(\-[a-zA-Z0-9]+)*$
- --start-ts:TiCDC 同步的起点,需要设置为实际的备份时间点(也就是第二章「备份」小节提到的 BackupTS)
shichun-0415 marked this conversation as resolved.
Show resolved Hide resolved

更多关于 changefeed 的配置,请参考[同步任务配置文件描述](/ticdc/manage-ticdc.md#同步任务配置文件描述)。

3. 重新开启 GC。

TiCDC 可以保证 GC 只回收已经同步的历史数据。因此,创建完从上游到下游集群的 changefeed 之后,就可以执行如下命令恢复集群的垃圾回收功能。详情请参考 [TiCDC GC safepoint 的完整行为](/ticdc/troubleshoot-ticdc.md#ticdc-gc-safepoint-的完整行为是什么)。
zier-one marked this conversation as resolved.
Show resolved Hide resolved

{{< copyable "sql" >}}

```sql
MySQL [test]> SET GLOBAL tidb_gc_enable=TRUE;
Query OK, 0 rows affected (0.01 sec)
MySQL [test]> SELECT @@global.tidb_gc_enable;
+-------------------------+
| @@global.tidb_gc_enable |
+-------------------------+
| 1 |
+-------------------------+
1 row in set (0.00 sec)
```

## 第 4 步:平滑切换业务

通过 TiCDC 创建上下游的同步链路后,原集群的写入数据会以非常低的延迟同步到新集群,此时可以逐步将读流量迁移到新集群了。观察一段时间,如果新集群表现稳定,就可以将写流量接入新集群,主要分为三个步骤:

1. 停止上游集群的写业务。确认上游数据已全部同步到下游后,停止上游到下游集群的 changefeed。

{{< copyable "shell-regular" >}}

```shell
# 停止旧集群到新集群的 changefeed
tiup cdc cli changefeed pause -c "upstream-to-downstream" --pd=http://172.16.6.122:2379

# 查看 changefeed 状态
tiup cdc cli changefeed list
[
{
"id": "upstream-to-downstream",
"summary": {
"state": "stopped", # 需要确认这里的状态为 stopped
"tso": 434218657561968641,
"checkpoint": "2022-06-28 18:38:45.685", # 确认这里的时间晚于停写的时间
"error": null
}
}
]
```

2. 将写业务迁移到下游集群,观察一段时间后,等新集群表现稳定,便可以弃用原集群。
106 changes: 0 additions & 106 deletions replicate-data-to-kafka.md

This file was deleted.

Loading