Skip to content

Flink: Refactor to replace Row type with RowData type in write path. #1305

@openinx

Description

@openinx

We have upgraded the flink version to 1.11, and flink 1.11 have turned its Row data type to RowData. While the previous parquet/avro readers writers we developed were based on Row type, now @JingsongLi have contributed the RowData avro reader and writer (#1232), @chenjunjiedada is helping to contribute the RowData parquet reader(#1266) and writer (#1272), and I've pushed a RowData orc reader and writer (#1255) for reviewing.

IMO, we'd better to replace the Row with RowData in the flink module as soon as possible, so that we could unify all the path and put all the resources (both developing and reviewing resources) on RowData path. My plan is:

  1. As the patch (Implement the flink stream writer to accept the row data and emit the complete data files event to downstream #1145) about flink IcebergStreamWriter has been reviewed and is ready to merge now, so we let this patch get into master branch firstly.
  2. The flink TaskWriter unit tests are running based on Row partition key, before turning to RowData we need to implement RowData partition key firstly. So I prepared the patch RowDataWrapper (Flink: support to RowData partition. #1299). Get this patch merged is the second step.
  3. We will need an extra patch doing the refactor to replace all the Row type with RowData (I have implemented one in my own branch 2af37c5), and make sure all the unit tests could pass. From this point in time, all flink development and unit tests will use RowData.
  4. The future RowData parquet/orc reader and writer will be added in the TaskWriter tests.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions