Skip to content

Conversation

@luoyuxia
Copy link
Contributor

@luoyuxia luoyuxia commented May 12, 2025

Purpose

Linked issue: close #438

Brief change log

  • Implement PaimonLakeTieringFactory to create PaimonLakeWriter and PaimonLakeCommiter
  • PaimonLakeWriter/PaimonLakeCommiter use Paimon sdk to write/commit to paimon

Note:

  • this pr use Paimon sdk to write row by row. It left to convert arrow to parquet in the future pr. In the future, we can call TableWrite#writeBundle(BinaryRow partition, int bucket, BundleRecords bundle) to write ArrowBatch of Fluss to paimon directly.
  • this pr only consider one signle partition, let's consider mutiple partition in Paimon tiering factory should support mutiple partitions #953

Tests

PaimonTieringTest to verify the logic of PaimonLakeWriter/PaimonLakeCommiter

API and Format

Documentation

@luoyuxia luoyuxia force-pushed the support-paimon-write-final branch from 0a35f68 to cc2c383 Compare May 12, 2025 07:40
@luoyuxia luoyuxia requested a review from Copilot May 12, 2025 08:43
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces support for paimon tiering in Fluss, providing a new tiering factory implementation along with writers, committers, serializers, and corresponding tests.

  • Added generic type support to the LakeTieringFactory interface and its usage in PaimonLakeStorage.
  • Implemented classes for writing and committing paimon data (e.g., PaimonLakeWriter, MergeTreeWriter, AppendOnlyWriter) along with serializers (PaimonWriteResultSerializer, PaimonCommittableSerializer).
  • Added comprehensive tests for paimon record conversions and tiering functionality (PaimonTieringTest, FlussRecordAsPaimonRowTest).

Reviewed Changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated no comments.

Show a summary per file
File Description
fluss-server/src/test/java/com/alibaba/fluss/server/lakehouse/TestingPaimonStoragePlugin.java Updated method signature to return a generic LakeTieringFactory.
fluss-lake/fluss-lake-paimon/src/test/java/com/alibaba/fluss/lake/paimon/tiering/PaimonTieringTest.java Added tests for writing and committing records with paimon tiering.
fluss-lake/fluss-lake-paimon/src/test/java/com/alibaba/fluss/lake/paimon/tiering/FlussRecordAsPaimonRowTest.java Added tests for verifying the conversion of Fluss records to Paimon rows.
fluss-lake/fluss-lake-paimon/src/main/java/com/alibaba/fluss/lake/paimon/utils/PaimonConversions.java Introduced new conversion utilities for mapping between Fluss and Paimon constructs.
fluss-lake/fluss-lake-paimon/src/main/java/com/alibaba/fluss/lake/paimon/tiering/mergetree/MergeTreeWriter.java Implemented a record writer for Paimon’s primary-key table using merge tree strategy.
fluss-lake/fluss-lake-paimon/src/main/java/com/alibaba/fluss/lake/paimon/tiering/append/AppendOnlyWriter.java Implemented a record writer for Paimon’s append-only table.
fluss-lake/fluss-lake-paimon/src/main/java/com/alibaba/fluss/lake/paimon/tiering/RecordWriter.java Defined a common abstract base for writing records to Paimon.
fluss-lake/fluss-lake-paimon/src/main/java/com/alibaba/fluss/lake/paimon/tiering/PaimonWriteResultSerializer.java Added serializer for write results.
fluss-lake/fluss-lake-paimon/src/main/java/com/alibaba/fluss/lake/paimon/tiering/PaimonWriteResult.java Introduced a write result class to convey commit information.
fluss-lake/fluss-lake-paimon/src/main/java/com/alibaba/fluss/lake/paimon/tiering/PaimonLakeWriter.java Provided an implementation of LakeWriter for paimon tiering.
fluss-lake/fluss-lake-paimon/src/main/java/com/alibaba/fluss/lake/paimon/tiering/PaimonLakeTieringFactory.java Added the paimon tiering factory implementation.
fluss-lake/fluss-lake-paimon/src/main/java/com/alibaba/fluss/lake/paimon/tiering/PaimonLakeCommitter.java Implemented the LakeCommitter for paimon with commit logic.
fluss-lake/fluss-lake-paimon/src/main/java/com/alibaba/fluss/lake/paimon/tiering/PaimonCommittableSerializer.java Added serializer for committable objects.
fluss-lake/fluss-lake-paimon/src/main/java/com/alibaba/fluss/lake/paimon/tiering/PaimonCommittable.java Introduced a committable class to encapsulate manifest commit information.
fluss-lake/fluss-lake-paimon/src/main/java/com/alibaba/fluss/lake/paimon/tiering/PaimonCatalogProvider.java Added a provider for creating Paimon catalogs.
fluss-lake/fluss-lake-paimon/src/main/java/com/alibaba/fluss/lake/paimon/tiering/FlussRecordAsPaimonRow.java Provided an adapter to wrap Fluss records as Paimon InternalRow instances.
fluss-lake/fluss-lake-paimon/src/main/java/com/alibaba/fluss/lake/paimon/PaimonLakeStorage.java Updated to return a proper LakeTieringFactory for paimon.
fluss-common/src/main/java/com/alibaba/fluss/lakehouse/writer/LakeTieringFactory.java Extended the interface to support generic types and include CommitterInitContext.
fluss-common/src/main/java/com/alibaba/fluss/lakehouse/lakestorage/LakeStorage.java Updated LakeStorage to use a generic LakeTieringFactory.
fluss-common/src/main/java/com/alibaba/fluss/lakehouse/committer/CommitterInitContext.java Introduced context for committer initialization.

@luoyuxia luoyuxia force-pushed the support-paimon-write-final branch from cc2c383 to c0bad6c Compare May 12, 2025 08:47
@luoyuxia luoyuxia marked this pull request as ready for review May 12, 2025 08:50
@luoyuxia luoyuxia requested review from leonardBang and wuchong May 12, 2025 08:59
@luoyuxia luoyuxia force-pushed the support-paimon-write-final branch 3 times, most recently from ef3db1d to 0eaa752 Compare May 30, 2025 08:00
@luoyuxia luoyuxia force-pushed the support-paimon-write-final branch from 0eaa752 to 4f06f60 Compare May 30, 2025 08:25
* @throws IOException if an I/O error occurs
*/
void commit(CommittableT committable) throws IOException;
long commit(CommittableT committable) throws IOException;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is all lake formats' snapshot id use long type? otherwise generic type is recommended.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in the previous investigation, all known lake formats, paimon, iceberg, delta, hudi can use long type to represent snapshot or other simliar concept, like timeline in hudi.

}

@Override
public long commit(PaimonCommittable committable) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For successfully commits, I didn't find the logic how we notify Fluss coordinator, could you clarify this ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the next following pr to introduce a committer operator to Flink, it'll notify Fluss coordinator after call commit method.

Copy link
Contributor

@leonardBang leonardBang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @luoyuxia for the clarification, +1

@leonardBang leonardBang merged commit 1afdcb0 into apache:main Jun 6, 2025
5 of 6 checks passed
polyzos pushed a commit to polyzos/fluss that referenced this pull request Jun 6, 2025
luoyuxia added a commit to luoyuxia/fluss that referenced this pull request Jun 11, 2025
ZmmBigdata pushed a commit to ZmmBigdata/fluss that referenced this pull request Jun 20, 2025
polyzos pushed a commit to polyzos/fluss that referenced this pull request Aug 30, 2025
polyzos pushed a commit to Alibaba-HZY/fluss that referenced this pull request Aug 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Paimon writer related interface

2 participants