-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Allow SnapshotProducer to skip uncommitted manifest cleanup after commit #10523
Conversation
Found a problem with the approach. This assumption is incorrect for all
This is incorrect for See new commit for new approach. It requires |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Please remember to squash commits.
5ea0141
to
6c66412
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect for MergingSnapshotProducer which merges manifests by writing the unmerged manifests, creating a new merged manifest, and marking the unmerged manifests for deletion. You can see this in MergingSnapshotProducer.apply()
@grantatspothero Thank you for finding this, that's a good catch, I forgot about the merging manifest case. Did we have tests which fail with the original assumption implementation (due to not cleaning up) or is this something you got from reading the code? If we don't have tests which assert that the old manifests are removed after the merging of manifests, I'd advocate for adding tests for that case.
Also adding @rdblue for his input since he's had thoughts on eager cleanups, to make sure we're not missing anything here.
Tests caught this issue, was helpful 🙌 (one example: TestHadoopCommits.testMergeAppend) |
6c66412
to
e510b3a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @grantatspothero this looks great! I will wait before merging in case @rdblue has any concerns
@@ -192,6 +192,11 @@ protected void cleanUncommitted(Set<ManifestFile> committed) { | |||
} | |||
} | |||
|
|||
@Override | |||
protected boolean cleanupAfterCommit() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I think this is probably safe today, it seems like a change that is going to make the code more brittle because it isn't obvious when this should be overridden or how the attempts interact with the manifests that are written. For instance, if the behavior of writeNewManifests
changes and creates intermediate work to clean up, how would someone know that this also needs to change?
I wonder if there's a better check here than number of attempts. What about checking whether the current set of manifests is valid and would be committed? return !hasNewFiles && newManifests != null
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cleanupAfterCommit
runs after commit, at that point hasNewFiles=false and newManifests != null
so this check would not actually skip any work.
Looking into this further, I do not think any cleanup after commit is needed for FastAppend:
writeNewManifests
already does the work of cleaning upnewManifests
. if no new files have been appended, reuse the already written manifests. if new files have been appended, delete the old manifests and create new manifests.appendManifests
never needs to be cleaned uprewrittenAppendManifests
never needs to be cleaned up because it is copied duringappendManifest
not duringapply
where it could potentially fail and retry
I added tests to ensure writeNewManifests
is behaving as expected during retries (either due to commit failure, or due to multiple apply() calls)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why cleanUncommitted
must exist in FastAppend
is because cleanUncommitted
runs after both successful commit and unsuccessful commit. For successful commit, no work is needed. For unsuccessful commits that have exhausted retries, every staged file is deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was slightly wrong above, rewrittenAppendManifests
do need to be cleaned up.
For example:
appendManifest
is called, commit fails, we should cleanup the rewrittenAppendManifests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the commit fails, when is appendManifests
cleaned up? I thought that writeNewManifests
would replace the append manifests, but if no commit succeeds then what cleans up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
appendManifests
shouldn't be cleaned up because they are not written by FastAppend
.
see existing comment in cleanUncommitted
method
// clean up only rewrittenAppendManifests as they are always owned by the table
// don't clean up appendManifests as they are added to the manifest list and are not compacted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry. I was mistaken. Thanks for clarifying!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I like the idea here, but I think the FastCommit
implementation could be more directly tied to the state of that class rather than checking attempts, which is assumed to have side-effects in the class's state.
fb61daf
to
addd33d
Compare
86d3941
to
8a9d1ea
Compare
8a9d1ea
to
e39f915
Compare
Sorry for the delay on reviewing this @grantatspothero I'm taking a look with fresh eyes on the latest updates since the approach is different now |
// appendManifests are not rewritten, never need cleanup | ||
// rewrittenAppendManifests are rewritten in appendManifest, never need cleanup | ||
// newManifests are cleaned up in writeNewManifests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I'd make this a method level comment instead of inlining all this.
/**
Cleanup after committing is disabled for FastAppend for the following reasons:
1.) Appended manifests are never rewritten
2.) Manifests which are written out as part of appendFile are already cleaned up between commit attempts
in writeNewManifests
*/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rewrittenAppendManifests are rewritten in appendManifest, never need cleanup
Actually, I'll need to take a double pass on the rewrittenAppendManifests
case, I'm not 100% sure yet this is the case. In the worst case though if we miss an opportunity to cleanup, orphan file removal would always pick it up anyways so I don't really consider it a blocker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct, changed the condition
Thanks @grantatspothero the overall approach makes sense and this time it is closely dependent on the internal state of |
4afa9d1
to
b75b40a
Compare
A question: I'm not exactly sure how the "table metadata READ" and "manifest list READ" translate into calls to the underlying object store. Does "manifest list READ" result in a request to list all objects matching a particular prefix? And if so, could having many manifest files result in this operation becoming slow? In that case, I assume that rewriting manifests could help such a situation? If not, what makes those operations slow? And how slow - approximately - are they, in absolute terms? |
"table metadata READ" and "manifest list READ" are both single object storage GETs. so 2 extra network requests that are not needed to actually perform the commit. Regarding how that affects total runtime, see the PR description:
The extra network requests are definitely noticeable for fast appends of small files. For our usecase, the iceberg metadata files are large because there are lots of unexpired snapshots so fetching a large metadata file from s3 is slow and that exacerbates the problem. But if you have small metadata files and are not using FastAppend then you probably do not care much about this optimization. |
instead track the last committed metadata in memory
Allows FastAppend to skip cleanup
b75b40a
to
35bc97e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @grantatspothero! Looks like @rdblue comments got addressed.
} else { | ||
// saved may not be present if the latest metadata couldn't be loaded due to eventual | ||
// consistency problems in refresh. in that case, don't clean up. | ||
LOG.warn("Failed to load committed snapshot, skipping manifest clean-up"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's nice with this refactoring we're able to remove this
Thanks, @grantatspothero! |
Skips FastAppend manifest cleanup after successful commit if no retries have occurred, as no orphaned manifests could exist if no retries have occurred. This speeds up the happy path of commits by removing 2 unnecessary reads:
Context from slack thread: https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1718381807647999
Testing:
TestFastAppend.testRecoveryWithManifestList
and verifying the cleanup bits are only run when a retry occurs.Notes:
cleanAll()
)?w=1
: https://github.com/apache/iceberg/pull/10523/files?w=1