[ISSUE-1827][Mysql] Mysql connector support parallel snapshot when th… #2046

zstraw · 2023-03-27T03:42:18Z

Resolve #1827

…ere are no primary key for tables

ruanhang1993

Thanks for the contribution. There are still some missing places to handle.

ruanhang1993 · 2023-04-17T10:09:07Z

...dc/src/main/java/com/ververica/cdc/connectors/mysql/debezium/reader/SnapshotSplitReader.java

@@ -403,4 +403,58 @@ public boolean isRunning() {
            return currentTaskRunning;
        }
    }
+
+    private boolean containsPrimaryKey() {
+        return !CollectionUtil.isNullOrEmpty(currentSnapshotSplit.getSplitKeyType().getFields());


It is better not to judge in this way. The chunk key will be set when we have to handle the big table in some cases to provide the at-least-once guarantee.

We could init the SnapshotRecordCollector when the record.key() returns null.

It is better not to judge in this way. The chunk key will be set when we have to handle the big table in some cases to provide the at-least-once guarantee.

Thanks for your review. You mean that the chunk key may be set even though the table has no primary keys?

Yes. If the table contains too much data, there is a performance problem with this solution.
In this case, we may provide the at-least-once semantics instead of the exactly once semantics.

Is it possible that the exactly once semantic is required in some scenarios ? How about to provide an option to make it optional for users who can choose between better performance or exactly once semantic?

I think we should control this by the experimental option scan.incremental.snapshot.chunk.key-column.
By default, we will return a single split. If users set the scan.incremental.snapshot.chunk.key-column, we will split the table into multi splits.

ruanhang1993 · 2023-04-17T10:10:20Z

...dc/src/main/java/com/ververica/cdc/connectors/mysql/debezium/reader/SnapshotSplitReader.java

+    }
+
+    /** Collect records need to be sent, except low/high watermark. */
+    interface SnapshotRecords {


SnapshotRecords -> SnapshotRecordCollector

ruanhang1993 · 2023-04-17T10:13:02Z

...dc/src/main/java/com/ververica/cdc/connectors/mysql/source/assigners/MySqlChunkSplitter.java

+                minMaxOfSplitColumn = queryMinMax(jdbcConnection, tableId, splitColumn.name());
+            } else {
+                minMaxOfSplitColumn = new Object[2];
+            }


The chunk splitter should return a total split immediately when the table does not have the primary key. This part should not be changed.

ruanhang1993 · 2023-04-17T10:13:53Z

...ctor-mysql-cdc/src/main/java/com/ververica/cdc/connectors/mysql/source/utils/ChunkUtils.java

+    public static RowType getChunkKeyColumnType(@Nullable Column chunkKeyColumn) {
+        if (chunkKeyColumn == null) {
+            return (RowType) ROW().getLogicalType();
+        }


This should not be changed as this. I think we could use the first column as the chunk key if the chunk key is not set in the table options.

You mean using the first column as chunk key to split the table if it has no primary key ?

I replied about this part at the first comment.

ruanhang1993 · 2023-04-17T10:15:21Z

...dc/src/main/java/com/ververica/cdc/connectors/mysql/debezium/reader/SnapshotSplitReader.java

+        @Override
+        public void collect(SourceRecord record, boolean reachBinlogStart) {
+            if (!reachBinlogStart) {
+                snapshotRecords.add(record);


How to handle the delete and update events?

IMHO, the snapshot stage for the table without primary key only read snapshot data, then binlog data would be read only in the incremental stage. So there are no delete and update events in SnapshotSplitReader if the table has no primary key. Please correct me.

If I understand right, you want to skip the snapshot backfill task.
It is a good idea to skip the snapshot backfill task when there is a single split. I think you should set this in source but I do not see the relative changes.

ruanhang1993 · 2023-05-22T08:27:54Z

Hi, @zstraw ,
I have opened a new PR #2150 to implement this based on this PR. Maybe you could help to review it.
Thanks ~

[ISSUE-1827][Mysql] Mysql connector support parallel snapshot when th…

b3fe7ef

…ere are no primary key for tables

zstraw force-pushed the ISSUE-1827 branch from cda77ab to b3fe7ef Compare March 27, 2023 05:38

ruanhang1993 reviewed Apr 17, 2023

View reviewed changes

zstraw mentioned this pull request May 26, 2023

[mysql] Support tables which do not contain a primary key (#1827) #2150

Merged

ruanhang1993 closed this May 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ISSUE-1827][Mysql] Mysql connector support parallel snapshot when th… #2046

[ISSUE-1827][Mysql] Mysql connector support parallel snapshot when th… #2046

zstraw commented Mar 27, 2023

ruanhang1993 left a comment

ruanhang1993 Apr 17, 2023

ruanhang1993 Apr 17, 2023

zstraw Apr 18, 2023

ruanhang1993 Apr 18, 2023

zstraw Apr 18, 2023

ruanhang1993 Apr 20, 2023

ruanhang1993 Apr 17, 2023

ruanhang1993 Apr 17, 2023

ruanhang1993 Apr 17, 2023

zstraw Apr 18, 2023

ruanhang1993 Apr 20, 2023

ruanhang1993 Apr 17, 2023

zstraw Apr 18, 2023

ruanhang1993 Apr 21, 2023 •

edited

Loading

ruanhang1993 commented May 22, 2023 •

edited

Loading

[ISSUE-1827][Mysql] Mysql connector support parallel snapshot when th… #2046

[ISSUE-1827][Mysql] Mysql connector support parallel snapshot when th… #2046

Conversation

zstraw commented Mar 27, 2023

ruanhang1993 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ruanhang1993 Apr 21, 2023 • edited Loading

Choose a reason for hiding this comment

ruanhang1993 commented May 22, 2023 • edited Loading

ruanhang1993 Apr 21, 2023 •

edited

Loading

ruanhang1993 commented May 22, 2023 •

edited

Loading