-
Notifications
You must be signed in to change notification settings - Fork 928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Cleaning snapshots or deleting tags should skip those referenced by branches. #3787
[core] Cleaning snapshots or deleting tags should skip those referenced by branches. #3787
Conversation
11fc94b
to
f935b71
Compare
If the snapshot is not deleted here, when will it be cleared? |
@JingsongLi @LinMingQiang Maybe we both need to merge our PRs. |
paimon-core/src/main/java/org/apache/paimon/operation/FileDeletionBase.java
Outdated
Show resolved
Hide resolved
15113ad
to
6d14799
Compare
Please rebase master. |
6d14799
to
c2f9eda
Compare
done. |
9cdfc20
to
c4557ba
Compare
} | ||
|
||
public static List<Snapshot> mergeTreeSetToList( | ||
Collection<Snapshot> set1, Collection<Snapshot> set2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the tree set?
sortToList?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change method name to 'mergeSetToSortedList'
FileStoreTableFactory.create( | ||
fileIO, new Path(branchPath(tablePath, branchName))); | ||
SortedMap<Snapshot, List<String>> snapshotTags = branchTable.tagManager().tags(); | ||
Snapshot earliestSnapshot = branchTable.snapshotManager().earliestSnapshot(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is difference betweend branchCreateSnapshot and this earliestSnapshot?
List<String> tags = snapshotTags.get(snapshot); | ||
checkArgument(tags.size() == 1); | ||
sortedSnapshots | ||
.computeIfAbsent(snapshot, s -> new ArrayList<>()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why get snapshot from tags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After spending a lot of time looking at this PR, I found that the current implementation is too complicated.
@LinMingQiang Can we remove this feature? Branches can only be created through tags, or completely new branch (without inheriting data from the main branch). |
Purpose
Next part for #3808
The purpose of this pr :
The purpose of the branch is to asynchronously recover data and overwrite the main branch
paimon branch . If the recovery operation takes too long, the snapshot referenced by the branch may be expired and deleted, which will make the branch unavailable.
Let's say we created tag1 and branch1 based on snapshot-1, when snapshot-1 expires then the related
manifest file
will be deleted. However ,if themanifest file
is referenced by the some tag, it will not be cleaned up.But if the
manifest file
is still referenced by branch-1, it will still be cleaned, this led to query branch1 will throwmanifets file
not found exception.Solution :
1: Skip cleanup : Merge snapshots referenced by branch or tag in the code of
ExpireSnapshotsImpl#expireUntil
then skip the files associated with these snapshots.2: When to clean : Referring to the logic of
delete tag
, when we delete the branch, we also clean up the expired snapshot files (manifest files and data files).3: delete tag/branch : Now, when deleting a tag or branch, we need to consider whether the snapshot involved is still referenced by other tags/branches.
Linked issue: close #xxx
Tests
API and Format
TagManager#deleteTag
: Add logical , whether the snapshot is referenced by other branch, if so, thedoClean
of the snapshot should be skipped.BranchManager#deleteBranch
: Add logical , If the snapshot has expired and is not referenced by another branch or tag, triggerdoClean
to delete manifest&data file.Documentation