-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cdc-connector][mongodb] Avoid mongodb source to read data after high_watermark in backfill phase #2893
Conversation
@Jiabao-Sun , @Shawn-Hx , CC |
private boolean shouldEmit(ChangeStreamOffset currentOffset) { | ||
return !isBoundedRead() || currentOffset.isAtOrBefore(streamSplit.getEndingOffset()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code don't be called at other place. Maybe it shoud be deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, I first use it to determine whether to emit, then think a unbounded stream read currentOffset is a waste, But I forget to remove it.
if (currentOffset.isAtOrBefore(streamSplit.getEndingOffset())) { | ||
queue.enqueue(new DataChangeEvent(changeRecord)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe currentOffset.isAtOrBefore
shoud be currentOffset.isBefore
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe in another PR? Now other cdc source read also includes high_watermark, this PR just fix bug, not to change the common logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe in another PR? Now other cdc source read also includes high_watermark, this PR just fix bug, not to change the common logic?
Could it cause read same data twice if use includes high_watermark?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will fix in #2885 for all connectors.But it is a big influence, still need to discuss more, for example, Whether to read skip high_watermark in backfill phase, or skip low_watermark in streaming phase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will fix in #2885 for all connectors.But it is a big influence, still need to discuss more, for example, Whether to read skip high_watermark, or streaming phase skip low_watermark.
ok
…_watermark in backfill phase
b8b5c24
to
d56945d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @loserwang1024 for this contribution and @gong for the review.
LGTM.
…_watermark in backfill phase (apache#2893)
…_watermark in backfill phase (apache#2893)
Fix #2892