Core: Allow SnapshotProducer to skip uncommitted manifest cleanup after commit #10523

grantatspothero · 2024-06-17T16:12:15Z

Skips FastAppend manifest cleanup after successful commit if no retries have occurred, as no orphaned manifests could exist if no retries have occurred. This speeds up the happy path of commits by removing 2 unnecessary reads:

table metadata READ
manifest list READ

Context from slack thread: https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1718381807647999

We are ingesting streaming data using a java service that does iceberg FastAppend
We noticed about ~20% (YMMV) of the fastappend commit time for our usecase is spent on nonrequired cleanup operations, specifically this bit which FastAppend inherits from SnapshotProducer:
https://github.com/apache/iceberg/blob/apache-iceberg-1.5.2/core/src/main/java/org/apache/iceberg/SnapshotProducer.java#L422-L439

Testing:

I manually tested this by running TestFastAppend.testRecoveryWithManifestList and verifying the cleanup bits are only run when a retry occurs.

Notes:

we do not skip cleanup operations on commit failures (see: cleanAll())
diff best viewed with ?w=1: https://github.com/apache/iceberg/pull/10523/files?w=1

core/src/main/java/org/apache/iceberg/SnapshotProducer.java

grantatspothero · 2024-06-17T23:55:58Z

Found a problem with the approach.

This assumption is incorrect for all SnapshotProducer

no orphaned manifests could exist if no retries have occurred.

This is incorrect for MergingSnapshotProducer which merges manifests by writing the unmerged manifests, creating a new merged manifest, and marking the unmerged manifests for deletion. You can see this in MergingSnapshotProducer.apply()

See new commit for new approach. It requires SnapshotProducer to opt-in to this optimization with a method like protected boolean canSkipCleanupAfterCommitSuccess() which defaults to false. Suggestion is only latency sensitive operations like FastAppend would override this method.

findepi

LGTM, thanks!
Please remember to squash commits.

amogh-jahagirdar

This is incorrect for MergingSnapshotProducer which merges manifests by writing the unmerged manifests, creating a new merged manifest, and marking the unmerged manifests for deletion. You can see this in MergingSnapshotProducer.apply()

@grantatspothero Thank you for finding this, that's a good catch, I forgot about the merging manifest case. Did we have tests which fail with the original assumption implementation (due to not cleaning up) or is this something you got from reading the code? If we don't have tests which assert that the old manifests are removed after the merging of manifests, I'd advocate for adding tests for that case.

Also adding @rdblue for his input since he's had thoughts on eager cleanups, to make sure we're not missing anything here.

core/src/main/java/org/apache/iceberg/FastAppend.java

core/src/main/java/org/apache/iceberg/SnapshotProducer.java

grantatspothero · 2024-06-18T17:13:19Z

This is incorrect for MergingSnapshotProducer which merges manifests by writing the unmerged manifests, creating a new merged manifest, and marking the unmerged manifests for deletion. You can see this in MergingSnapshotProducer.apply()

@grantatspothero Thank you for finding this, that's a good catch, I forgot about the merging manifest case. Did we have tests which fail with the original assumption implementation (due to not cleaning up) or is this something you got from reading the code? If we don't have tests which assert that the old manifests are removed after the merging of manifests, I'd advocate for adding tests for that case.

Also adding @rdblue for his input since he's had thoughts on eager cleanups, to make sure we're not missing anything here.

Tests caught this issue, was helpful 🙌 (one example: TestHadoopCommits.testMergeAppend)

amogh-jahagirdar

Thank you @grantatspothero this looks great! I will wait before merging in case @rdblue has any concerns

rdblue · 2024-06-21T00:02:40Z

core/src/main/java/org/apache/iceberg/FastAppend.java

@@ -192,6 +192,11 @@ protected void cleanUncommitted(Set<ManifestFile> committed) {
    }
  }

+  @Override
+  protected boolean cleanupAfterCommit() {


While I think this is probably safe today, it seems like a change that is going to make the code more brittle because it isn't obvious when this should be overridden or how the attempts interact with the manifests that are written. For instance, if the behavior of writeNewManifests changes and creates intermediate work to clean up, how would someone know that this also needs to change?

I wonder if there's a better check here than number of attempts. What about checking whether the current set of manifests is valid and would be committed? return !hasNewFiles && newManifests != null?

cleanupAfterCommit runs after commit, at that point hasNewFiles=false and newManifests != null so this check would not actually skip any work.

Looking into this further, I do not think any cleanup after commit is needed for FastAppend:

writeNewManifests already does the work of cleaning up newManifests. if no new files have been appended, reuse the already written manifests. if new files have been appended, delete the old manifests and create new manifests.

appendManifests never needs to be cleaned up

rewrittenAppendManifests never needs to be cleaned up because it is copied during appendManifest not during apply where it could potentially fail and retry

I added tests to ensure writeNewManifests is behaving as expected during retries (either due to commit failure, or due to multiple apply() calls)

The reason why cleanUncommitted must exist in FastAppend is because cleanUncommitted runs after both successful commit and unsuccessful commit. For successful commit, no work is needed. For unsuccessful commits that have exhausted retries, every staged file is deleted.

Was slightly wrong above, rewrittenAppendManifests do need to be cleaned up.

For example:
appendManifest is called, commit fails, we should cleanup the rewrittenAppendManifests.

If the commit fails, when is appendManifests cleaned up? I thought that writeNewManifests would replace the append manifests, but if no commit succeeds then what cleans up?

appendManifests shouldn't be cleaned up because they are not written by FastAppend.

see existing comment in cleanUncommitted method

// clean up only rewrittenAppendManifests as they are always owned by the table // don't clean up appendManifests as they are added to the manifest list and are not compacted

Ah, sorry. I was mistaken. Thanks for clarifying!

rdblue

Overall, I like the idea here, but I think the FastCommit implementation could be more directly tied to the state of that class rather than checking attempts, which is assumed to have side-effects in the class's state.

core/src/test/java/org/apache/iceberg/TestFastAppend.java

amogh-jahagirdar · 2024-07-12T15:40:44Z

Sorry for the delay on reviewing this @grantatspothero I'm taking a look with fresh eyes on the latest updates since the approach is different now

core/src/main/java/org/apache/iceberg/SnapshotProducer.java

core/src/test/java/org/apache/iceberg/TestFastAppend.java

core/src/main/java/org/apache/iceberg/SnapshotProducer.java

amogh-jahagirdar · 2024-07-12T18:02:46Z

core/src/main/java/org/apache/iceberg/FastAppend.java

+    // appendManifests are not rewritten, never need cleanup
+    // rewrittenAppendManifests are rewritten in appendManifest, never need cleanup
+    // newManifests are cleaned up in writeNewManifests


Nit: I'd make this a method level comment instead of inlining all this.

/** Cleanup after committing is disabled for FastAppend for the following reasons: 1.) Appended manifests are never rewritten 2.) Manifests which are written out as part of appendFile are already cleaned up between commit attempts in writeNewManifests */

rewrittenAppendManifests are rewritten in appendManifest, never need cleanup

Actually, I'll need to take a double pass on the rewrittenAppendManifests case, I'm not 100% sure yet this is the case. In the worst case though if we miss an opportunity to cleanup, orphan file removal would always pick it up anyways so I don't really consider it a blocker.

You are correct, changed the condition

amogh-jahagirdar · 2024-07-12T18:06:45Z

Thanks @grantatspothero the overall approach makes sense and this time it is closely dependent on the internal state of FastAppend which combined with the new tests should make it less brittle; if someone goes ahead and changes the logic, tests would fail. Just had some cleanups I think we should get in.

core/src/main/java/org/apache/iceberg/SnapshotProducer.java

rehevkor5 · 2024-07-23T16:30:39Z

A question:

I'm not exactly sure how the "table metadata READ" and "manifest list READ" translate into calls to the underlying object store. Does "manifest list READ" result in a request to list all objects matching a particular prefix? And if so, could having many manifest files result in this operation becoming slow? In that case, I assume that rewriting manifests could help such a situation?

If not, what makes those operations slow? And how slow - approximately - are they, in absolute terms?

grantatspothero · 2024-07-25T21:22:29Z

A question:

I'm not exactly sure how the "table metadata READ" and "manifest list READ" translate into calls to the underlying object store. Does "manifest list READ" result in a request to list all objects matching a particular prefix? And if so, could having many manifest files result in this operation becoming slow? In that case, I assume that rewriting manifests could help such a situation?

If not, what makes those operations slow? And how slow - approximately - are they, in absolute terms?

"table metadata READ" and "manifest list READ" are both single object storage GETs. so 2 extra network requests that are not needed to actually perform the commit.

Regarding how that affects total runtime, see the PR description:

We are ingesting streaming data using a java service that does iceberg FastAppend
We noticed about ~20% (YMMV) of the fastappend commit time for our usecase is spent on nonrequired cleanup operations, specifically this bit which FastAppend inherits from SnapshotProducer:

The extra network requests are definitely noticeable for fast appends of small files. For our usecase, the iceberg metadata files are large because there are lots of unexpired snapshots so fetching a large metadata file from s3 is slow and that exacerbates the problem.

But if you have small metadata files and are not using FastAppend then you probably do not care much about this optimization.

instead track the last committed metadata in memory

Allows FastAppend to skip cleanup

amogh-jahagirdar

Thanks @grantatspothero! Looks like @rdblue comments got addressed.

amogh-jahagirdar · 2024-07-27T01:07:49Z

core/src/main/java/org/apache/iceberg/SnapshotProducer.java

-        } else {
-          // saved may not be present if the latest metadata couldn't be loaded due to eventual
-          // consistency problems in refresh. in that case, don't clean up.
-          LOG.warn("Failed to load committed snapshot, skipping manifest clean-up");
        }


It's nice with this refactoring we're able to remove this

rdblue · 2024-08-01T19:31:26Z

Thanks, @grantatspothero!

…er commit (apache#10523)

github-actions bot added the core label Jun 17, 2024

findepi reviewed Jun 17, 2024

View reviewed changes

core/src/main/java/org/apache/iceberg/SnapshotProducer.java Outdated Show resolved Hide resolved

findepi reviewed Jun 17, 2024

View reviewed changes

core/src/main/java/org/apache/iceberg/SnapshotProducer.java Outdated Show resolved Hide resolved

findepi approved these changes Jun 18, 2024

View reviewed changes

grantatspothero force-pushed the gn/skipManifestCleanup branch from 5ea0141 to 6c66412 Compare June 18, 2024 14:25

amogh-jahagirdar changed the title ~~Core: Skip uncommitted manifest cleanup if no retries have occurred~~ Core: Skip uncommitted manifest cleanup if no retries have occurred for FastAppend Jun 18, 2024

amogh-jahagirdar requested changes Jun 18, 2024

View reviewed changes

amogh-jahagirdar requested a review from rdblue June 18, 2024 16:43

grantatspothero force-pushed the gn/skipManifestCleanup branch from 6c66412 to e510b3a Compare June 18, 2024 17:15

grantatspothero requested a review from amogh-jahagirdar June 18, 2024 17:21

amogh-jahagirdar approved these changes Jun 20, 2024

View reviewed changes

rdblue reviewed Jun 21, 2024

View reviewed changes

rdblue requested changes Jun 21, 2024

View reviewed changes

grantatspothero force-pushed the gn/skipManifestCleanup branch 7 times, most recently from fb61daf to addd33d Compare June 25, 2024 19:38

grantatspothero requested review from findepi, amogh-jahagirdar and rdblue June 25, 2024 19:40

grantatspothero force-pushed the gn/skipManifestCleanup branch 5 times, most recently from 86d3941 to 8a9d1ea Compare June 25, 2024 21:31

grantatspothero commented Jun 25, 2024

View reviewed changes

core/src/test/java/org/apache/iceberg/TestFastAppend.java Outdated Show resolved Hide resolved

grantatspothero changed the title ~~Core: Skip uncommitted manifest cleanup if no retries have occurred for FastAppend~~ Core: Allow SnapshotProducer to skip uncommitted manifest cleanup after commit Jun 26, 2024

grantatspothero force-pushed the gn/skipManifestCleanup branch from 8a9d1ea to e39f915 Compare July 11, 2024 18:34

amogh-jahagirdar requested changes Jul 12, 2024

View reviewed changes

grantatspothero force-pushed the gn/skipManifestCleanup branch 2 times, most recently from 4afa9d1 to b75b40a Compare July 12, 2024 19:37

grantatspothero requested a review from amogh-jahagirdar July 15, 2024 19:55

rdblue reviewed Jul 16, 2024

View reviewed changes

core/src/main/java/org/apache/iceberg/SnapshotProducer.java Outdated Show resolved Hide resolved

grantatspothero added 2 commits July 26, 2024 13:23

Remove read of just committed metadata in SnapshotProducer

0306fe0

instead track the last committed metadata in memory

Only run cleanUncommitted if SnapshotProducer subclass enables it

35bc97e

Allows FastAppend to skip cleanup

grantatspothero force-pushed the gn/skipManifestCleanup branch from b75b40a to 35bc97e Compare July 26, 2024 18:26

amogh-jahagirdar approved these changes Jul 27, 2024

View reviewed changes

amogh-jahagirdar requested a review from rdblue July 27, 2024 01:12

rdblue approved these changes Aug 1, 2024

View reviewed changes

rdblue merged commit 39373d0 into apache:main Aug 1, 2024
60 checks passed

grantatspothero mentioned this pull request Dec 19, 2024

Remove unneeded metadata read during update event generation #11829

Open

zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024

Core: Allow SnapshotProducer to skip uncommitted manifest cleanup aft…

ba55878

…er commit (apache#10523)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Core: Allow SnapshotProducer to skip uncommitted manifest cleanup after commit #10523

Core: Allow SnapshotProducer to skip uncommitted manifest cleanup after commit #10523

grantatspothero commented Jun 17, 2024 •

edited

Loading

grantatspothero commented Jun 17, 2024 •

edited

Loading

findepi left a comment

amogh-jahagirdar left a comment •

edited

Loading

grantatspothero commented Jun 18, 2024

amogh-jahagirdar left a comment

rdblue Jun 21, 2024

grantatspothero Jun 25, 2024 •

edited

Loading

grantatspothero Jun 25, 2024 •

edited

Loading

grantatspothero Jul 12, 2024

rdblue Jul 16, 2024

grantatspothero Jul 19, 2024

rdblue Jul 19, 2024

rdblue left a comment

amogh-jahagirdar commented Jul 12, 2024

amogh-jahagirdar Jul 12, 2024 •

edited

Loading

amogh-jahagirdar Jul 12, 2024 •

edited

Loading

grantatspothero Jul 12, 2024 •

edited

Loading

amogh-jahagirdar commented Jul 12, 2024 •

edited

Loading

rehevkor5 commented Jul 23, 2024

grantatspothero commented Jul 25, 2024 •

edited

Loading

amogh-jahagirdar left a comment •

edited

Loading

amogh-jahagirdar Jul 27, 2024

rdblue commented Aug 1, 2024

Core: Allow SnapshotProducer to skip uncommitted manifest cleanup after commit #10523

Core: Allow SnapshotProducer to skip uncommitted manifest cleanup after commit #10523

Conversation

grantatspothero commented Jun 17, 2024 • edited Loading

grantatspothero commented Jun 17, 2024 • edited Loading

findepi left a comment

Choose a reason for hiding this comment

amogh-jahagirdar left a comment • edited Loading

Choose a reason for hiding this comment

grantatspothero commented Jun 18, 2024

amogh-jahagirdar left a comment

Choose a reason for hiding this comment

rdblue Jun 21, 2024

Choose a reason for hiding this comment

grantatspothero Jun 25, 2024 • edited Loading

Choose a reason for hiding this comment

grantatspothero Jun 25, 2024 • edited Loading

Choose a reason for hiding this comment

grantatspothero Jul 12, 2024

Choose a reason for hiding this comment

rdblue Jul 16, 2024

Choose a reason for hiding this comment

grantatspothero Jul 19, 2024

Choose a reason for hiding this comment

rdblue Jul 19, 2024

Choose a reason for hiding this comment

rdblue left a comment

Choose a reason for hiding this comment

amogh-jahagirdar commented Jul 12, 2024

amogh-jahagirdar Jul 12, 2024 • edited Loading

Choose a reason for hiding this comment

amogh-jahagirdar Jul 12, 2024 • edited Loading

Choose a reason for hiding this comment

grantatspothero Jul 12, 2024 • edited Loading

Choose a reason for hiding this comment

amogh-jahagirdar commented Jul 12, 2024 • edited Loading

rehevkor5 commented Jul 23, 2024

grantatspothero commented Jul 25, 2024 • edited Loading

amogh-jahagirdar left a comment • edited Loading

Choose a reason for hiding this comment

amogh-jahagirdar Jul 27, 2024

Choose a reason for hiding this comment

rdblue commented Aug 1, 2024

grantatspothero commented Jun 17, 2024 •

edited

Loading

grantatspothero commented Jun 17, 2024 •

edited

Loading

amogh-jahagirdar left a comment •

edited

Loading

grantatspothero Jun 25, 2024 •

edited

Loading

grantatspothero Jun 25, 2024 •

edited

Loading

amogh-jahagirdar Jul 12, 2024 •

edited

Loading

amogh-jahagirdar Jul 12, 2024 •

edited

Loading

grantatspothero Jul 12, 2024 •

edited

Loading

amogh-jahagirdar commented Jul 12, 2024 •

edited

Loading

grantatspothero commented Jul 25, 2024 •

edited

Loading

amogh-jahagirdar left a comment •

edited

Loading