Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-34688][cdc-connector] CDC framework split snapshot chunks asynchronously #3510

Merged
merged 4 commits into from
Dec 17, 2024

Conversation

loserwang1024
Copy link
Contributor

As shown in https://issues.apache.org/jira/browse/FLINK-34688 :
In Mysql CDC, MysqlSnapshotSplitAssigner splits snapshot chunks asynchronously(#931). But CDC framework lacks it.
If table is too big to split, the enumerator will be stuck, and checkpoint will be influenced( sometime will checkpoint timeout occurs).

@loserwang1024
Copy link
Contributor Author

@leonardBang leonardBang requested a review from GOODBOY008 August 6, 2024 07:23
@leonardBang
Copy link
Contributor

Thanks @loserwang1024 for the improvement, @GOODBOY008 would you like to help review this PR when you have time>?

Comment on lines 30 to +63
public interface ChunkSplitter {

/**
* Called to open the chunk splitter to acquire any resources, like threads or jdbc connections.
*/
void open();

/** Generates all snapshot splits (chunks) for the give data collection. */
Collection<SnapshotSplit> generateSplits(TableId tableId);
Collection<SnapshotSplit> generateSplits(TableId tableId) throws Exception;

/** Get whether the splitter has more chunks for current table. */
boolean hasNextChunk();

/**
* Creates a snapshot of the state of this chunk splitter, to be stored in a checkpoint.
*
* <p>This method takes the ID of the checkpoint for which the state is snapshotted. Most
* implementations should be able to ignore this parameter, because for the contents of the
* snapshot, it doesn't matter for which checkpoint it gets created. This parameter can be
* interesting for source connectors with external systems where those systems are themselves
* aware of checkpoints; for example in cases where the enumerator notifies that system about a
* specific checkpoint being triggered.
*
* @param checkpointId The ID of the checkpoint for which the snapshot is created.
* @return an object containing the state of the split enumerator.
*/
ChunkSplitterState snapshotState(long checkpointId);

TableId getCurrentSplittingTableId();

/**
* Called to open the chunk splitter to release any resources, like threads or jdbc connections.
*/
void close() throws Exception;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @loserwang1024 for this great work.

I have a small suggestion for the ChunkSplitter interface.
The methods open, hasNextChunk, close, getCurrentSplittingTableId, close are more like those of an Iterator or Cursor. Perhaps ChunkSplitter.generateSplits could be designed to open a Cursor, which can unify the iteration logic and support both one-time splitting and partial splitting. By using cursors, we might be able to support simultaneous spliting of multiple tables.

However, maintaining the state might become complex, as we need to keep track of the state of all open cursors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, It seem more flexible to support simultaneous splitting of multiple tables..However, we should maintain each table's spitter progress information in state. It may be rather than heavy.

@GOODBOY008
Copy link
Member

@loserwang1024 Please rebase to master and solove confilcts.

@loserwang1024
Copy link
Contributor Author

@loserwang1024 Please rebase to master and solove confilcts.

@GOODBOY008 done it, and all the tests are passed.

@leonardBang
Copy link
Contributor

@loserwang1024 Could you kindly rebase your PR to latest master branch to resolve potential conflicts?

@loserwang1024
Copy link
Contributor Author

@loserwang1024 Could you kindly rebase your PR to latest master branch to resolve potential conflicts?

done it

} else if (!remainingTables.isEmpty()) {
try {
// wait for the asynchronous split to complete
lock.wait();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lock has already used the synchronized keyword,wait/notify doesn't seem necessary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to use wait to release this lock. Then the lock can be gotten by other thread.

Copy link
Contributor

@ruanhang1993 ruanhang1993 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ruanhang1993 ruanhang1993 merged commit 12cf22f into apache:master Dec 17, 2024
24 checks passed
ChaomingZhangCN pushed a commit to ChaomingZhangCN/flink-cdc that referenced this pull request Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants