-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ISSUE-1827][Mysql] Mysql connector support parallel snapshot when th… #2046
Conversation
…ere are no primary key for tables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution. There are still some missing places to handle.
@@ -403,4 +403,58 @@ public boolean isRunning() { | |||
return currentTaskRunning; | |||
} | |||
} | |||
|
|||
private boolean containsPrimaryKey() { | |||
return !CollectionUtil.isNullOrEmpty(currentSnapshotSplit.getSplitKeyType().getFields()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better not to judge in this way. The chunk key will be set when we have to handle the big table in some cases to provide the at-least-once guarantee.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could init the SnapshotRecordCollector when the record.key()
returns null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better not to judge in this way. The chunk key will be set when we have to handle the big table in some cases to provide the at-least-once guarantee.
Thanks for your review. You mean that the chunk key may be set even though the table has no primary keys?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. If the table contains too much data, there is a performance problem with this solution.
In this case, we may provide the at-least-once semantics instead of the exactly once semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that the exactly once semantic is required in some scenarios ? How about to provide an option to make it optional for users who can choose between better performance or exactly once semantic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should control this by the experimental option scan.incremental.snapshot.chunk.key-column
.
By default, we will return a single split. If users set the scan.incremental.snapshot.chunk.key-column
, we will split the table into multi splits.
} | ||
|
||
/** Collect records need to be sent, except low/high watermark. */ | ||
interface SnapshotRecords { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SnapshotRecords -> SnapshotRecordCollector
minMaxOfSplitColumn = queryMinMax(jdbcConnection, tableId, splitColumn.name()); | ||
} else { | ||
minMaxOfSplitColumn = new Object[2]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The chunk splitter should return a total split immediately when the table does not have the primary key. This part should not be changed.
public static RowType getChunkKeyColumnType(@Nullable Column chunkKeyColumn) { | ||
if (chunkKeyColumn == null) { | ||
return (RowType) ROW().getLogicalType(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be changed as this. I think we could use the first column as the chunk key if the chunk key is not set in the table options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean using the first column as chunk key to split the table if it has no primary key ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replied about this part at the first comment.
@Override | ||
public void collect(SourceRecord record, boolean reachBinlogStart) { | ||
if (!reachBinlogStart) { | ||
snapshotRecords.add(record); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to handle the delete and update events?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, the snapshot stage for the table without primary key only read snapshot data, then binlog data would be read only in the incremental stage. So there are no delete and update events in SnapshotSplitReader if the table has no primary key. Please correct me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand right, you want to skip the snapshot backfill task.
It is a good idea to skip the snapshot backfill task when there is a single split. I think you should set this in source but I do not see the relative changes.
Resolve #1827