New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flink] Small changelog files can now be compacted into big files #4255

Open

tsreaper wants to merge 1 commit into apache:master from tsreaper:changelog-compact

Contributor

tsreaper commented Sep 25, 2024

Purpose

Currently, changelog files are not compacted. If Flink's checkpoint interval is short (for example, 30 seconds) and the number of buckets is large, each snapshot may produce lots of small changelog files. Too many files may put a burden on the distributed storage cluster.

This PR introduces a new feature to compact small changelog files into large ones.

Tests

IT cases.

API and Format

Introduces a special file format for compacted changelogs.

Documentation

Document is also added.


[flink] Small changelog files can now be compacted into big files

69a04c8

Contributor

herefree commented Sep 25, 2024

Will the compacted changelog be used when querying?

Contributor Author

tsreaper commented Sep 25, 2024

Will the compacted changelog be used when querying?

Yes. Flink can still read these changelogs.

wwj6591812 reviewed

View reviewed changes

...common/src/main/java/org/apache/paimon/flink/compact/changelog/ChangelogCompactOperator.java

+ * Operator to compact several changelog files from the same partition into one file, in order to
+ * reduce the number of small files.
+ */
+public class ChangelogCompactOperator extends AbstractStreamOperator<Committable>

Contributor

wwj6591812 Sep 25, 2024

Why choose "add a compact operator" instead of "compact changelog file in write operator"?

Contributor Author

tsreaper Sep 25, 2024

How can you compact changelog files from all buckets into one file inside the write operator? Please suggest your solution.

Contributor

wwj6591812 Sep 25, 2024

Sorry，at first, I guess the dimension of compact changelog is bucket, but after I read code, I discovered that the dimension of compact changelog is partition.
But before I submit comments, I forget delete this comment.

...common/src/main/java/org/apache/paimon/flink/compact/changelog/ChangelogCompactOperator.java

+ private void copyFile(
+ Path path, BinaryRow partition, int bucket, boolean isCompactResult, DataFileMeta meta)
+ throws Exception {
+ if (!outputStreams.containsKey(partition)) {

Contributor

wwj6591812 Sep 25, 2024

Here could change to outputStreams.computeIfAbsent?

...common/src/main/java/org/apache/paimon/flink/compact/changelog/ChangelogCompactOperator.java

+ outputStreams.put(
+ partition,
+ new OutputStream(
+ outputPath, table.fileIO().newOutputStream(outputPath, false)));

Contributor

wwj6591812 Sep 25, 2024

Why overwrite is true? If file already exist, throw a exception?

Contributor Author

tsreaper Sep 25, 2024

There is no true in this line of code. What do you mean?

Contributor

wwj6591812 Sep 25, 2024

Why overwrite is false? If file already exist, throw a exception?

...common/src/main/java/org/apache/paimon/flink/compact/changelog/ChangelogCompactOperator.java

+ offset,
+ outputStream.out.getPos() - offset));
+ if (outputStream.out.getPos() >= table.coreOptions().targetFileSize(false)) {

Contributor

wwj6591812 Sep 25, 2024

Only pk table has changelog, so why here call targetFileSize with false?

Contributor Author

tsreaper Sep 25, 2024

Changelog files are not LSM tree data files. They're just a bunch of records and they don't care about what the primary keys are.

Contributor

wwj6591812 Sep 25, 2024

OK, you are right.

...common/src/main/java/org/apache/paimon/flink/compact/changelog/ChangelogCompactOperator.java

+ + CompactedChangelogReadOnlyFormat.getIdentifier(
+ baseResult.meta.fileFormat())));
+ Map<Integer, List<Result>> grouped = new HashMap<>();

Contributor

wwj6591812 Sep 25, 2024

This map can init capacity with bucketNum.

...common/src/main/java/org/apache/paimon/flink/compact/changelog/ChangelogCompactOperator.java

+ }
+ }
+ CommitMessageImpl newMessage =

Contributor

wwj6591812 Sep 25, 2024

like line 113, here can add:
// send commit message only with changelog files

...common/src/main/java/org/apache/paimon/flink/compact/changelog/ChangelogCompactOperator.java

+ throws Exception {
+ if (!outputStreams.containsKey(partition)) {
+ Path outputPath =
+ new Path(path.getParent(), "tmp-compacted-changelog-" + UUID.randomUUID());

Contributor

wwj6591812 Sep 25, 2024

private final String xxx = "tmp-compacted-changelog-";

.../java/org/apache/paimon/flink/compact/changelog/format/CompactedChangelogReadOnlyFormat.java

+ public static class OrcFactory extends AbstractFactory {
+ public OrcFactory() {
+ super("orc");

Contributor

wwj6591812 Sep 25, 2024

Use "OrcFileFormat.IDENTIFIER" instead.
Also parquet and avro.

.../java/org/apache/paimon/flink/compact/changelog/format/CompactedChangelogReadOnlyFormat.java

+ return "cc-" + wrappedFormat;
+ }
+ static class AbstractFactory implements FileFormatFactory {

Contributor

wwj6591812 Sep 25, 2024

private

.../java/org/apache/paimon/flink/compact/changelog/format/CompactedChangelogReadOnlyFormat.java


		import java.util.List;

		/** {@link FileFormat} for compacted changelog. */

Contributor

wwj6591812 Sep 25, 2024

for compacted changelog which wrapped a real FileFormat(orc / parquet / avro).

JingsongLi reviewed

View reviewed changes

docs/content/maintenance/write-performance.md

+If the parallelism becomes larger, file copying will become faster.
+However, the number of resulting files will also become larger.
+As file copying is fast in most storage system,
+we suggest that you start experimenting with `'changelog.compact.parallelism' = '1'` and increase the value if needed.

Contributor

JingsongLi Sep 26, 2024

My idea is to have only one switch: changelog.precommit-compact = true.

We can add a Coordinator node to this pipeline to decide how to concatenate it into a target file size result file, which can be one or multiple files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment