Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Allow SnapshotProducer to skip uncommitted manifest cleanup after commit #10523

Merged
merged 2 commits into from
Aug 1, 2024

Conversation

grantatspothero
Copy link
Contributor

@grantatspothero grantatspothero commented Jun 17, 2024

Skips FastAppend manifest cleanup after successful commit if no retries have occurred, as no orphaned manifests could exist if no retries have occurred. This speeds up the happy path of commits by removing 2 unnecessary reads:

  • table metadata READ
  • manifest list READ

Context from slack thread: https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1718381807647999

We are ingesting streaming data using a java service that does iceberg FastAppend
We noticed about ~20% (YMMV) of the fastappend commit time for our usecase is spent on nonrequired cleanup operations, specifically this bit which FastAppend inherits from SnapshotProducer:
https://github.com/apache/iceberg/blob/apache-iceberg-1.5.2/core/src/main/java/org/apache/iceberg/SnapshotProducer.java#L422-L439

Testing:

  • I manually tested this by running TestFastAppend.testRecoveryWithManifestList and verifying the cleanup bits are only run when a retry occurs.

Notes:

@github-actions github-actions bot added the core label Jun 17, 2024
@grantatspothero
Copy link
Contributor Author

grantatspothero commented Jun 17, 2024

Found a problem with the approach.

This assumption is incorrect for all SnapshotProducer

no orphaned manifests could exist if no retries have occurred.

This is incorrect for MergingSnapshotProducer which merges manifests by writing the unmerged manifests, creating a new merged manifest, and marking the unmerged manifests for deletion. You can see this in MergingSnapshotProducer.apply()

See new commit for new approach. It requires SnapshotProducer to opt-in to this optimization with a method like protected boolean canSkipCleanupAfterCommitSuccess() which defaults to false. Suggestion is only latency sensitive operations like FastAppend would override this method.

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!
Please remember to squash commits.

@grantatspothero grantatspothero force-pushed the gn/skipManifestCleanup branch from 5ea0141 to 6c66412 Compare June 18, 2024 14:25
@amogh-jahagirdar amogh-jahagirdar changed the title Core: Skip uncommitted manifest cleanup if no retries have occurred Core: Skip uncommitted manifest cleanup if no retries have occurred for FastAppend Jun 18, 2024
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect for MergingSnapshotProducer which merges manifests by writing the unmerged manifests, creating a new merged manifest, and marking the unmerged manifests for deletion. You can see this in MergingSnapshotProducer.apply()

@grantatspothero Thank you for finding this, that's a good catch, I forgot about the merging manifest case. Did we have tests which fail with the original assumption implementation (due to not cleaning up) or is this something you got from reading the code? If we don't have tests which assert that the old manifests are removed after the merging of manifests, I'd advocate for adding tests for that case.

Also adding @rdblue for his input since he's had thoughts on eager cleanups, to make sure we're not missing anything here.

@amogh-jahagirdar amogh-jahagirdar requested a review from rdblue June 18, 2024 16:43
@grantatspothero
Copy link
Contributor Author

This is incorrect for MergingSnapshotProducer which merges manifests by writing the unmerged manifests, creating a new merged manifest, and marking the unmerged manifests for deletion. You can see this in MergingSnapshotProducer.apply()

@grantatspothero Thank you for finding this, that's a good catch, I forgot about the merging manifest case. Did we have tests which fail with the original assumption implementation (due to not cleaning up) or is this something you got from reading the code? If we don't have tests which assert that the old manifests are removed after the merging of manifests, I'd advocate for adding tests for that case.

Also adding @rdblue for his input since he's had thoughts on eager cleanups, to make sure we're not missing anything here.

Tests caught this issue, was helpful 🙌 (one example: TestHadoopCommits.testMergeAppend)

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @grantatspothero this looks great! I will wait before merging in case @rdblue has any concerns

@@ -192,6 +192,11 @@ protected void cleanUncommitted(Set<ManifestFile> committed) {
}
}

@Override
protected boolean cleanupAfterCommit() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I think this is probably safe today, it seems like a change that is going to make the code more brittle because it isn't obvious when this should be overridden or how the attempts interact with the manifests that are written. For instance, if the behavior of writeNewManifests changes and creates intermediate work to clean up, how would someone know that this also needs to change?

I wonder if there's a better check here than number of attempts. What about checking whether the current set of manifests is valid and would be committed? return !hasNewFiles && newManifests != null?

Copy link
Contributor Author

@grantatspothero grantatspothero Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanupAfterCommit runs after commit, at that point hasNewFiles=false and newManifests != null so this check would not actually skip any work.

Looking into this further, I do not think any cleanup after commit is needed for FastAppend:

  • writeNewManifests already does the work of cleaning up newManifests. if no new files have been appended, reuse the already written manifests. if new files have been appended, delete the old manifests and create new manifests.
  • appendManifests never needs to be cleaned up
  • rewrittenAppendManifests never needs to be cleaned up because it is copied during appendManifest not during apply where it could potentially fail and retry

I added tests to ensure writeNewManifests is behaving as expected during retries (either due to commit failure, or due to multiple apply() calls)

Copy link
Contributor Author

@grantatspothero grantatspothero Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why cleanUncommitted must exist in FastAppend is because cleanUncommitted runs after both successful commit and unsuccessful commit. For successful commit, no work is needed. For unsuccessful commits that have exhausted retries, every staged file is deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was slightly wrong above, rewrittenAppendManifests do need to be cleaned up.

For example:
appendManifest is called, commit fails, we should cleanup the rewrittenAppendManifests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the commit fails, when is appendManifests cleaned up? I thought that writeNewManifests would replace the append manifests, but if no commit succeeds then what cleans up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

appendManifests shouldn't be cleaned up because they are not written by FastAppend.

see existing comment in cleanUncommitted method

// clean up only rewrittenAppendManifests as they are always owned by the table
// don't clean up appendManifests as they are added to the manifest list and are not compacted

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry. I was mistaken. Thanks for clarifying!

Copy link
Contributor

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I like the idea here, but I think the FastCommit implementation could be more directly tied to the state of that class rather than checking attempts, which is assumed to have side-effects in the class's state.

@grantatspothero grantatspothero force-pushed the gn/skipManifestCleanup branch 7 times, most recently from fb61daf to addd33d Compare June 25, 2024 19:38
@grantatspothero grantatspothero force-pushed the gn/skipManifestCleanup branch 5 times, most recently from 86d3941 to 8a9d1ea Compare June 25, 2024 21:31
@grantatspothero grantatspothero changed the title Core: Skip uncommitted manifest cleanup if no retries have occurred for FastAppend Core: Allow SnapshotProducer to skip uncommitted manifest cleanup after commit Jun 26, 2024
@grantatspothero grantatspothero force-pushed the gn/skipManifestCleanup branch from 8a9d1ea to e39f915 Compare July 11, 2024 18:34
@amogh-jahagirdar
Copy link
Contributor

Sorry for the delay on reviewing this @grantatspothero I'm taking a look with fresh eyes on the latest updates since the approach is different now

core/src/test/java/org/apache/iceberg/TestFastAppend.java Outdated Show resolved Hide resolved
core/src/test/java/org/apache/iceberg/TestFastAppend.java Outdated Show resolved Hide resolved
Comment on lines 203 to 205
// appendManifests are not rewritten, never need cleanup
// rewrittenAppendManifests are rewritten in appendManifest, never need cleanup
// newManifests are cleaned up in writeNewManifests
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'd make this a method level comment instead of inlining all this.

/**
Cleanup after committing is disabled for FastAppend for the following reasons:

1.) Appended manifests are never rewritten
2.) Manifests which are written out as part of appendFile are already cleaned up between commit attempts 
in writeNewManifests
*/

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rewrittenAppendManifests are rewritten in appendManifest, never need cleanup

Actually, I'll need to take a double pass on the rewrittenAppendManifests case, I'm not 100% sure yet this is the case. In the worst case though if we miss an opportunity to cleanup, orphan file removal would always pick it up anyways so I don't really consider it a blocker.

Copy link
Contributor Author

@grantatspothero grantatspothero Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct, changed the condition

@amogh-jahagirdar
Copy link
Contributor

amogh-jahagirdar commented Jul 12, 2024

Thanks @grantatspothero the overall approach makes sense and this time it is closely dependent on the internal state of FastAppend which combined with the new tests should make it less brittle; if someone goes ahead and changes the logic, tests would fail. Just had some cleanups I think we should get in.

@grantatspothero grantatspothero force-pushed the gn/skipManifestCleanup branch 2 times, most recently from 4afa9d1 to b75b40a Compare July 12, 2024 19:37
@rehevkor5
Copy link

A question:

I'm not exactly sure how the "table metadata READ" and "manifest list READ" translate into calls to the underlying object store. Does "manifest list READ" result in a request to list all objects matching a particular prefix? And if so, could having many manifest files result in this operation becoming slow? In that case, I assume that rewriting manifests could help such a situation?

If not, what makes those operations slow? And how slow - approximately - are they, in absolute terms?

@grantatspothero
Copy link
Contributor Author

grantatspothero commented Jul 25, 2024

A question:

I'm not exactly sure how the "table metadata READ" and "manifest list READ" translate into calls to the underlying object store. Does "manifest list READ" result in a request to list all objects matching a particular prefix? And if so, could having many manifest files result in this operation becoming slow? In that case, I assume that rewriting manifests could help such a situation?

If not, what makes those operations slow? And how slow - approximately - are they, in absolute terms?

"table metadata READ" and "manifest list READ" are both single object storage GETs. so 2 extra network requests that are not needed to actually perform the commit.

Regarding how that affects total runtime, see the PR description:

We are ingesting streaming data using a java service that does iceberg FastAppend
We noticed about ~20% (YMMV) of the fastappend commit time for our usecase is spent on nonrequired cleanup operations, specifically this bit which FastAppend inherits from SnapshotProducer:

The extra network requests are definitely noticeable for fast appends of small files. For our usecase, the iceberg metadata files are large because there are lots of unexpired snapshots so fetching a large metadata file from s3 is slow and that exacerbates the problem.

But if you have small metadata files and are not using FastAppend then you probably do not care much about this optimization.

@grantatspothero grantatspothero force-pushed the gn/skipManifestCleanup branch from b75b40a to 35bc97e Compare July 26, 2024 18:26
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @grantatspothero! Looks like @rdblue comments got addressed.

} else {
// saved may not be present if the latest metadata couldn't be loaded due to eventual
// consistency problems in refresh. in that case, don't clean up.
LOG.warn("Failed to load committed snapshot, skipping manifest clean-up");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice with this refactoring we're able to remove this

@amogh-jahagirdar amogh-jahagirdar requested a review from rdblue July 27, 2024 01:12
@rdblue rdblue merged commit 39373d0 into apache:main Aug 1, 2024
60 checks passed
@rdblue
Copy link
Contributor

rdblue commented Aug 1, 2024

Thanks, @grantatspothero!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants