-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Decouple iceberg sink commit from risingwave checkpoint #78
Conversation
|
||
![Write parquet every checkpoint](images/0078-iceberg-sink-decouple-checkpoint/write_parquet_per_cp.svg) | ||
|
||
While this method is simple enough, we may still experience small file problems if we do checkpoint frequently. We can further decouple flushing parquet file with checkpoint. Instead of flusing parquet files in every checkpoint, we flush parquet row groups. Following diagram illustraes the process: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can work only if the row group boundary can be completely controlled by ourselves. In other words, we can close a row group immediately when iceberge sink receives the checkpoint barrier. Is this already supported in icelake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
![Write row group every checkpoint](images/0078-iceberg-sink-decouple-checkpoint/write_row_group_per_cp.svg) | ||
|
||
There are chances we don't even need to flush row group. For example we can save the record sequence id of current row in log store to skip flusing row group, but I don't introduce to much dependency on such characteristics of log store to make things more complicated. One row group only adds a record in parquet's `FileMetaData`, and it has no impact on other readers of parquet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
save the flushed file paths into state table.
I think we can record the current write position in the state table instead of log store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we will commit to meta store.
|
||
1. This will increase failure recovery time for iceberg sink. For example, when the commit interval is set to 30 minutes, and sink failed in the 29 minute, we will need to replay all data for the first 29 minutes. | ||
|
||
#### Approach 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel the approach will be much more complicated if we want to support updates/deletes on Iceberg Sink, saying, if a deleted row is in the previous RowGroup
instead of a previous iceberg version, how to handle it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little more complicated, but not that much. We can further discuss this later when we want to take approach 2.
In this case, maybe we don't need to flush data into parquet files every checkpoint to avoid small file.🤔 |
|
||
Pros of this approach: | ||
|
||
1. Easier to implement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a key benefit of this solution. I suggest adding less optimizations at this point. We need to test the stability against real-life workloads, so please don't over-design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, let's solve the most urgent problem first, and refine it when necessary.
Conclusion: we will take approach 1. |
Rendered