RFC: Decouple iceberg sink commit from risingwave checkpoint #78

liurenjie1024 · 2023-11-21T10:51:42Z

hzxa21 · 2023-11-28T08:37:37Z

rfcs/0078-iceberg-sink-decouple-checkpoint.md

+
+![Write parquet every checkpoint](images/0078-iceberg-sink-decouple-checkpoint/write_parquet_per_cp.svg)
+
+While this method is simple enough, we may still experience small file problems if we do checkpoint frequently. We can further decouple flushing parquet file with checkpoint. Instead of flusing parquet files in every checkpoint, we flush parquet row groups. Following diagram illustraes the process:


This can work only if the row group boundary can be completely controlled by ourselves. In other words, we can close a row group immediately when iceberge sink receives the checkpoint barrier. Is this already supported in icelake?

Yes, we can, see: https://docs.rs/parquet/latest/parquet/file/writer/struct.SerializedRowGroupWriter.html

hzxa21 · 2023-11-28T08:39:17Z

rfcs/0078-iceberg-sink-decouple-checkpoint.md

+
+![Write row group every checkpoint](images/0078-iceberg-sink-decouple-checkpoint/write_row_group_per_cp.svg)
+
+There are chances we don't even need to flush row group. For example we can save the record sequence id of current row in log store to skip flusing row group, but I don't introduce to much dependency on such characteristics of log store to make things more complicated. One row group only adds a record in parquet's `FileMetaData`, and it has no impact on other readers of parquet.


save the flushed file paths into state table.

I think we can record the current write position in the state table instead of log store.

Yes, we will commit to meta store.

fuyufjh · 2023-11-28T08:53:09Z

rfcs/0078-iceberg-sink-decouple-checkpoint.md

+
+1. This will increase failure recovery time for iceberg sink. For example, when the commit interval is set to 30 minutes, and sink failed in the 29 minute, we will need to replay all data for the first 29 minutes.
+
+#### Approach 2


I feel the approach will be much more complicated if we want to support updates/deletes on Iceberg Sink, saying, if a deleted row is in the previous RowGroup instead of a previous iceberg version, how to handle it?

It's a little more complicated, but not that much. We can further discuss this later when we want to take approach 2.

ZENOTME · 2023-11-28T09:20:11Z

In this approach we don't do any modification to sink log writer, but modifies iceberg sink. Instead of committing all data files to iceberg table in every checkpoint, we flush data into parquet files, and save the flushed file paths into state table. Following graph illustrates the case:

In this case, maybe we don't need to flush data into parquet files every checkpoint to avoid small file.🤔

neverchanje · 2023-11-28T09:42:29Z

rfcs/0078-iceberg-sink-decouple-checkpoint.md

+
+Pros of this approach:
+
+1. Easier to implement.


This is a key benefit of this solution. I suggest adding less optimizations at this point. We need to test the stability against real-life workloads, so please don't over-design.

Agree, let's solve the most urgent problem first, and refine it when necessary.

liurenjie1024 · 2023-11-28T10:18:28Z

Conclusion: we will take approach 1.

RFC: Decouple iceberg sink commit from risingwave checkpoint

5ed7dac

liurenjie1024 marked this pull request as draft November 21, 2023 10:51

Fix according discussion

538ede2

liurenjie1024 marked this pull request as ready for review November 23, 2023 03:09

liurenjie1024 added 2 commits November 23, 2023 11:12

Try fix math

6e0d162

Fix another equation

bb796e8

hzxa21 reviewed Nov 28, 2023

View reviewed changes

fuyufjh reviewed Nov 28, 2023

View reviewed changes

neverchanje reviewed Nov 28, 2023

View reviewed changes

liurenjie1024 mentioned this pull request Dec 11, 2023

Decouple iceberg commit from risingwave commit. risingwavelabs/risingwave#13899

Closed

ly9chee mentioned this pull request May 19, 2024

feat(sink): decouple starrocks commit from risingwave commit risingwavelabs/risingwave#16816

Merged

9 tasks

Update 0078-iceberg-sink-decouple-checkpoint.md

b097c04

fuyufjh approved these changes May 21, 2024

View reviewed changes

fuyufjh merged commit e90b4f9 into main May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Decouple iceberg sink commit from risingwave checkpoint #78

RFC: Decouple iceberg sink commit from risingwave checkpoint #78

liurenjie1024 commented Nov 21, 2023 •

edited

Loading

hzxa21 Nov 28, 2023

liurenjie1024 Nov 28, 2023

hzxa21 Nov 28, 2023

liurenjie1024 Nov 28, 2023

fuyufjh Nov 28, 2023

liurenjie1024 Nov 28, 2023

ZENOTME commented Nov 28, 2023

neverchanje Nov 28, 2023

liurenjie1024 Nov 28, 2023

liurenjie1024 commented Nov 28, 2023


		![Write parquet every checkpoint](images/0078-iceberg-sink-decouple-checkpoint/write_parquet_per_cp.svg)

		While this method is simple enough, we may still experience small file problems if we do checkpoint frequently. We can further decouple flushing parquet file with checkpoint. Instead of flusing parquet files in every checkpoint, we flush parquet row groups. Following diagram illustraes the process:


		![Write row group every checkpoint](images/0078-iceberg-sink-decouple-checkpoint/write_row_group_per_cp.svg)

		There are chances we don't even need to flush row group. For example we can save the record sequence id of current row in log store to skip flusing row group, but I don't introduce to much dependency on such characteristics of log store to make things more complicated. One row group only adds a record in parquet's `FileMetaData`, and it has no impact on other readers of parquet.


		1. This will increase failure recovery time for iceberg sink. For example, when the commit interval is set to 30 minutes, and sink failed in the 29 minute, we will need to replay all data for the first 29 minutes.

		#### Approach 2

RFC: Decouple iceberg sink commit from risingwave checkpoint #78

RFC: Decouple iceberg sink commit from risingwave checkpoint #78

Conversation

liurenjie1024 commented Nov 21, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ZENOTME commented Nov 28, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liurenjie1024 commented Nov 28, 2023

liurenjie1024 commented Nov 21, 2023 •

edited

Loading