pingcap · ti-chi-bot · Aug 19, 2022 · Aug 19, 2022 · Aug 19, 2022 · Aug 19, 2022
diff --git a/TOC.md b/TOC.md
@@ -117,12 +117,17 @@
     - [Migrate from CSV Files](/migrate-from-csv-files-to-tidb.md)
     - [Migrate from SQL Files](/migrate-from-sql-files-to-tidb.md)
     - [Migrate from One TiDB Cluster to Another TiDB Cluster](/migrate-from-tidb-to-tidb.md)
-    - [Replicate Data from TiDB to Kafka](/replicate-data-to-kafka.md)
+    - [Migrate from TiDB to MySQL-compatible Databases](/migrate-from-tidb-to-mysql.md)
   - Advanced Migration
     - [Continuous Replication with gh-ost or pt-osc](/migrate-with-pt-ghost.md)
     - [Migrate to a Downstream Table with More Columns](/migrate-with-more-columns-downstream.md)
     - [Filter Binlog Events](/filter-binlog-event.md)
     - [Filter DML Events Using SQL Expressions](/filter-dml-event.md)
+- Integrate
+  - [Overview](/integration-overview.md)
+  - Integration Scenarios
+    - [Integrate with Confluent Cloud and Snowflake](/ticdc/integrate-confluent-using-ticdc.md)
+    - [Integrate with Apache Kafka and Apache Flink](/replicate-data-to-kafka.md)
 - Maintain
   - Upgrade
     - [Use TiUP (Recommended)](/upgrade-tidb-using-tiup.md)
@@ -499,7 +504,6 @@
       - [TiCDC Open Protocol](/ticdc/ticdc-open-protocol.md)
       - [TiCDC Avro Protocol](/ticdc/ticdc-avro-protocol.md)
       - [TiCDC Canal-JSON Protocol](/ticdc/ticdc-canal-json.md)
-      - [Integrate TiDB with Confluent and Snowflake](/ticdc/integrate-confluent-using-ticdc.md)
     - [FAQs](/ticdc/ticdc-faq.md)
     - [Glossary](/ticdc/ticdc-glossary.md)
   - [Dumpling](/dumpling-overview.md)

diff --git a/integration-overview.md b/integration-overview.md
@@ -0,0 +1,16 @@
+---
+title: Data Integration Overview
+summary: Learn the overview of data integration scenarios.
+---
+
+# Data Integration Overview
+
+Data integration means the flow, transfer, and consolidation of data among various data sources. As data grows exponentially in volume and data value is more profoundly explored, data integration has become increasingly popular and urgent. To avoid the situation that TiDB becomes data silos and to integrate data with different platforms, TiCDC offers the capability to replicate TiDB incremental data change logs to other data platforms. This document describes the data integration applications using TiCDC. You can choose an integration solution that suits your business scenarios.
+
+## Integrate with Confluent Cloud
+
+You can use TiCDC to replicate incremental data from TiDB to Confluent Cloud, and replicate the data to ksqlDB, Snowflake, and SQL Server via Confluent Cloud. For details, see [Integrate with Confluent Cloud](/ticdc/integrate-confluent-using-ticdc.md).
+
+## Integrate with Apache Kafka and Apache Flink
+
+You can use TiCDC to replicate incremental data from TiDB to Apache Kafka, and consume the data using Apache Flink. For details, see [Integrate with Apache Kafka and Apache Flink](/replicate-data-to-kafka.md).
diff --git a/media/integrate/sql-query-result.png b/media/integrate/sql-query-result.png
diff --git a/migrate-aurora-to-tidb.md b/migrate-aurora-to-tidb.md
@@ -149,7 +149,7 @@ If you need to enable TLS in the TiDB cluster, refer to [TiDB Lightning Configur
     - Check progress in [the monitoring dashboard](/tidb-lightning/monitor-tidb-lightning.md).
     - Check progress in [the TiDB Lightning web interface](/tidb-lightning/tidb-lightning-web-interface.md).
 
-4. After TiDB Lightning completes the import, it exits automatically. If you find the last 5 lines of its log print `the whole procedure completed`, the import is successful.
+4. After TiDB Lightning completes the import, it exits automatically. Check whether `tidb-lightning.log` contains `the whole procedure completed` in the last lines. If yes, the import is successful. If no, the import encounters an error. Address the error as instructed in the error message.
 
 > **Note:**
 >

diff --git a/migrate-from-csv-files-to-tidb.md b/migrate-from-csv-files-to-tidb.md
@@ -127,7 +127,7 @@ After the import starts, you can check the progress of the import by either of t
 - Check progress in [the monitoring dashboard](/tidb-lightning/monitor-tidb-lightning.md).
 - Check progress in [the TiDB Lightning web interface](/tidb-lightning/tidb-lightning-web-interface.md).
 
-After TiDB Lightning completes the import, it exits automatically. If you find the last 5 lines of its log print `the whole procedure completed`, the import is successful.
+After TiDB Lightning completes the import, it exits automatically. Check whether `tidb-lightning.log` contains `the whole procedure completed` in the last lines. If yes, the import is successful. If no, the import encounters an error. Address the error as instructed in the error message.
 
 > **Note:**
 >

diff --git a/migrate-from-sql-files-to-tidb.md b/migrate-from-sql-files-to-tidb.md
@@ -89,7 +89,7 @@ After the import is started, you can check the progress in one of the following
 - Use the Grafana dashboard. For details, see [TiDB Lightning Monitoring](/tidb-lightning/monitor-tidb-lightning.md).
 - Use web interface. For details, see [TiDB Lightning Web Interface](/tidb-lightning/tidb-lightning-web-interface.md).
 
-After the import is completed, TiDB Lightning automatically exits. If `the whole procedure completed` is in the last 5 lines of the log, it means that the import is successfully completed.
+After the import is completed, TiDB Lightning automatically exits. Check whether `tidb-lightning.log` contains `the whole procedure completed` in the last lines. If yes, the import is successful. If no, the import encounters an error. Address the error as instructed in the error message.
 
 > **Note:**
 >

diff --git a/migrate-from-tidb-to-mysql.md b/migrate-from-tidb-to-mysql.md
@@ -0,0 +1,229 @@
+---
+title: Migrate Data from TiDB to MySQL-compatible Databases
+summary: Learn how to migrate data from TiDB to MySQL-compatible databases.
+---
+
+# Migrate Data from TiDB to MySQL-compatible Databases
+
+This document describes how to migrate data from TiDB clusters to MySQL-compatible databases, such as Aurora, MySQL, and MariaDB. The whole process contains four steps:
+
+1. Set up the environment.
+2. Migrate full data.
+3. Migrate incremental data.
+4. Switch services to the new TiDB cluster.
+
+## Step 1. Set up the environment
+
+1. Deploy a TiDB cluster upstream.
+
+    Deploy a TiDB cluster by using TiUP Playground. For more information, refer to [Deploy and Maintain an Online TiDB Cluster Using TiUP](/tiup/tiup-cluster.md).
+
+    ```shell
+    # Create a TiDB cluster
+    tiup playground --db 1 --pd 1 --kv 1 --tiflash 0 --ticdc 1
+    # View cluster status
+    tiup status
+    ```
+
+2. Deploy a MySQL instance downstream.
+
+    - In a lab environment, you can use Docker to quickly deploy a MySQL instance by running the following command:
+
+        ```shell
+        docker run --name some-mysql -e MYSQL_ROOT_PASSWORD=my-secret-pw -p 3306:3306 -d mysql
+        ```
+
+    - In a production environment, you can deploy a MySQL instance by following instructions in [Installing MySQL](https://dev.mysql.com/doc/refman/8.0/en/installing.html).
+
+3. Simulate service workload.
+
+    In the lab environment, you can use `go-tpc` to write data to the TiDB cluster upstream. This is to generate event changes in the TiDB cluster. Run the following command to create a database named `tpcc` in the TiDB cluster, and then use TiUP bench to write data to this database.
+
+    ```shell
+    tiup bench tpcc -H 127.0.0.1 -P 4000 -D tpcc --warehouses 4 prepare
+    tiup bench tpcc -H 127.0.0.1 -P 4000 -D tpcc --warehouses 4 run --time 300s
+    ```
+
+    For more details about `go-tpc`, refer to [How to Run TPC-C Test on TiDB](/benchmark/benchmark-tidb-using-tpcc.md).
+
+## Step 2. Migrate full data
+
+After setting up the environment, you can use [Dumpling](/dumpling-overview.md) to export the full data from the upstream TiDB cluster.
+
+> **Note:**
+>
+> In production clusters, performing a backup with GC disabled might affect cluster performance. It is recommended that you complete this step in off-peak hours.
+
+1. Disable Garbage Collection (GC).
+
+    To ensure that newly written data is not deleted during incremental migration, you should disable GC for the upstream cluster before exporting full data. In this way, history data is not deleted.
+
+    Run the following command to disable GC:
+
+    ```sql
+    MySQL [test]> SET GLOBAL tidb_gc_enable=FALSE;
+    ```
+
+    ```
+    Query OK, 0 rows affected (0.01 sec)
+    ```
+
+    To verify that the change takes effect, query the value of `tidb_gc_enable`:
+
+    ```sql
+    MySQL [test]> SELECT @@global.tidb_gc_enable;
+    ```
+
+    ```
+    +-------------------------+：
+    | @@global.tidb_gc_enable |
+    +-------------------------+
+    |                       0 |
+    +-------------------------+
+    1 row in set (0.00 sec)
+    ```
+
+2. Back up data.
+
+    1. Export data in SQL format using Dumpling:
+
+        ```shell
+        tiup dumpling -u root -P 4000 -h 127.0.0.1 --filetype sql -t 8 -o ./dumpling_output -r 200000 -F256MiB
+        ```
+
+    2. After finishing exporting data, run the following command to check the metadata. `Pos` in the metadata is the TSO of the export snapshot and can be recorded as the BackupTS.
+
+        ```shell
+        cat dumpling_output/metadata
+        ```
+
+        ```
+        Started dump at: 2022-06-28 17:49:54
+        SHOW MASTER STATUS:
+                Log: tidb-binlog
+                Pos: 434217889191428107
+                GTID:
+        Finished dump at: 2022-06-28 17:49:57
+        ```
+
+3. Restore data.
+
+    Use MyLoader (an open-source tool) to import data to the downstream MySQL instance. For details about how to install and use MyLoader, see [MyDumpler/MyLoader](https://github.com/mydumper/mydumper). Run the following command to import full data exported by Dumpling to MySQL:
+
+    ```shell
+    myloader -h 127.0.0.1 -P 3306 -d ./dumpling_output/
+    ```
+
+4. (Optional) Validate data.
+
+    You can use [sync-diff-inspector](/sync-diff-inspector/sync-diff-inspector-overview.md) to check data consistency between upstream and downstream at a certain time.
+
+    ```shell
+    sync_diff_inspector -C ./config.yaml
+    ```
+
+    For details about how to configure the sync-diff-inspector, see [Configuration file description](/sync-diff-inspector/sync-diff-inspector-overview.md#configuration-file-description). In this document, the configuration is as follows:
+
+    ```toml
+    # Diff Configuration.
+    ######################### Datasource config #########################
+    [data-sources]
+    [data-sources.upstream]
+            host = "127.0.0.1" # Replace the value with the IP address of your upstream cluster
+            port = 4000
+            user = "root"
+            password = ""
+            snapshot = "434217889191428107" # Set snapshot to the actual backup time (BackupTS in the "Back up data" section in [Step 2. Migrate full data](#step-2-migrate-full-data))
+    [data-sources.downstream]
+            host = "127.0.0.1" # Replace the value with the IP address of your downstream cluster
+            port = 3306
+            user = "root"
+            password = ""
+    ######################### Task config #########################
+    [task]
+            output-dir = "./output"
+            source-instances = ["upstream"]
+            target-instance = "downstream"
+            target-check-tables = ["*.*"]
+    ```
+
+## Step 3. Migrate incremental data
+
+1. Deploy TiCDC.
+
+    After finishing full data migration, deploy and configure a TiCDC cluster to replicate incremental data. In production environments, deploy TiCDC as instructed in [Deploy TiCDC](/ticdc/deploy-ticdc.md). In this document, a TiCDC node has been started upon the creation of the test cluster. Therefore, you can skip the step of deploying TiCDC and proceed with the next step to create a changefeed.
+
+2. Create a changefeed.
+
+    In the upstream cluster, run the following command to create a changefeed from the upstream to the downstream clusters:
+
+    ```shell
+    tiup ctl:v6.1.0 cdc changefeed create --pd=http://127.0.0.1:2379 --sink-uri="mysql://root:@127.0.0.1:3306" --changefeed-id="upstream-to-downstream" --start-ts="434217889191428107"
+    ```
+
+    In this command, the parameters are as follows:
+
+    - `--pd`: PD address of the upstream cluster
+    - `--sink-uri`: URI of the downstream cluster
+    - `--changefeed-id`: changefeed ID, must be in the format of a regular expression, `^[a-zA-Z0-9]+(\-[a-zA-Z0-9]+)*$`
+    - `--start-ts`: start timestamp of the changefeed, must be the backup time (or BackupTS in the "Back up data" section in [Step 2. Migrate full data](#step-2-migrate-full-data))
+
+    For more information about the changefeed configurations, see [Task configuration file](/ticdc/manage-ticdc.md#task-configuration-file).
+
+3. Enable GC.
+
+    In incremental migration using TiCDC, GC only removes history data that is replicated. Therefore, after creating a changefeed, you need to run the following command to enable GC. For details, see [What is the complete behavior of TiCDC garbage collection (GC) safepoint](/ticdc/ticdc-faq.md#what-is-the-complete-behavior-of-ticdc-garbage-collection-gc-safepoint).
+
+   To enable GC, run the following command:
+
+    ```sql
+    MySQL [test]> SET GLOBAL tidb_gc_enable=TRUE;
+    ```
+
+    ```
+    Query OK, 0 rows affected (0.01 sec)
+    ```
+
+    To verify that the change takes effect, query the value of `tidb_gc_enable`:
+
+    ```sql
+    MySQL [test]> SELECT @@global.tidb_gc_enable;
+    ```
+
+    ```
+    +-------------------------+
+    | @@global.tidb_gc_enable |
+    +-------------------------+
+    |                       1 |
+    +-------------------------+
+    1 row in set (0.00 sec)
+    ```
+
+## Step 4. Switch services
+
+After creating a changefeed, data written to the upstream cluster is replicated to the downstream cluster with low latency. You can migrate read stream to the downstream cluster gradually. Observe the read stream for a period. If the downstream cluster is stable, you can switch write stream to the downstream cluster as well in the following steps:
+
+1. Stop write services in the upstream cluster. Make sure that all upstream data are replicated to downstream before stopping the changefeed.
+
+    ```shell
+    # Stop the changefeed from the upstream cluster to the downstream cluster
+    tiup cdc cli changefeed pause -c "upstream-to-downstream" --pd=http://172.16.6.122:2379
+    # View the changefeed status
+    tiup cdc cli changefeed list
+    ```
+
+    ```
+    [
+      {
+        "id": "upstream-to-downstream",
+        "summary": {
+        "state": "stopped",  # Ensure that the status is stopped
+        "tso": 434218657561968641,
+        "checkpoint": "2022-06-28 18:38:45.685", # This time should be later than the time of stopping writing
+        "error": null
+        }
+      }
+    ]
+    ```
+
+2. After migrating writing services to the downstream cluster, observe for a period. If the downstream cluster is stable, you can quit the upstream cluster.