-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-5095] Flink: Stores a special watermark(flag) to identify the current progress of writing data #8753
base: master
Are you sure you want to change the base?
Conversation
…urrent progress of writing data
Can you elaborate a little more what is the end-2-end user story here? |
@danny0405, in the data warehouse incremental scenario, whether partition completes to write is depended on the progress of writing, which is used to commit partition for trigger downstream task to read the committed partition data. The watermark could identify the current progress of writing data and infer the committed partition. |
Can you sketch the inference details a little more? |
@danny0405,
BTW, the main purpose has also mentioned in #7099. The difference is the implementation that current implementation follows the watermark mechanism of Flink. |
@danny0405, could you help to review this implementation? |
Change Logs
AbstractStreamWriteFunction
could process watermark fromStreamWriteOperator
and sendWriteMetadataEvent
with the watermark of write function toStreamWriteOperatorCoordinator
.StreamWriteOperatorCoordinator
stores the min watermark ofWriteMetadataEvent
s from subtasks into the extra metadata ofHoodieCommitMetadata
. Meanwhile,StreamWriteOperatorCoordinator
advances the min watermark for subtasks in which no data has written, and does not advance watermark for commit on empty batch.Impact
StreamWriteOperatorCoordinator
stores the min watermark of write function into the extra metadata ofHoodieCommitMetadata
to identify the current progress of writing.Risk level (write none, low medium or high below)
None.
Documentation Update
None.
Contributor's checklist