-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Improvement]: Clarify the concept of watermark and optimize its implementation #1805
Comments
Before starting the improvement, I would like to collect some information about table watermark. What is table watermark? A watermark is a time-based property on a table that indicates the freshness of the table, indicating that data older than this time has already been written to the table. What is the usage of table watermark?
How to generate table watermark? The watermark changes as data is written, so it is usually generated and recorded in the table ingestion job. It is generally only generated in streaming ingestion jobs as It is difficult to ensure the semantic meaning of watermark in batch write jobs. Here are some implementation methods of different table formats:
What should we improve?
|
It seems that the Iceberg community has discussed watermark quite further. I made some conclusions:
|
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. |
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' |
Search before asking
What would you like to be improved?
We have implemented calculating and showing the watermark for Mixed format tables.
As #944 discusses, the current implementation has some issues that should be fixed.
In addition, as we support more table formats, we should consider better ways to support the watermark on other format tables.
How should we improve?
Are you willing to submit PR?
Subtasks
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: