Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Cleaning snapshots or deleting tags should skip those referenced by branches. #3787

Conversation

LinMingQiang
Copy link
Contributor

@LinMingQiang LinMingQiang commented Jul 20, 2024

Purpose

Next part for #3808

The purpose of this pr :

The purpose of the branch is to asynchronously recover data and overwrite the main branch
paimon branch . If the recovery operation takes too long, the snapshot referenced by the branch may be expired and deleted, which will make the branch unavailable.
Let's say we created tag1 and branch1 based on snapshot-1, when snapshot-1 expires then the related manifest file will be deleted. However ,if the manifest file is referenced by the some tag, it will not be cleaned up.
But if the manifest file is still referenced by branch-1, it will still be cleaned, this led to query branch1 will throw manifets file not found exception.

Solution :

1: Skip cleanup : Merge snapshots referenced by branch or tag in the code of ExpireSnapshotsImpl#expireUntil then skip the files associated with these snapshots.
2: When to clean : Referring to the logic of delete tag, when we delete the branch, we also clean up the expired snapshot files (manifest files and data files).
3: delete tag/branch : Now, when deleting a tag or branch, we need to consider whether the snapshot involved is still referenced by other tags/branches.

Linked issue: close #xxx

Tests

API and Format

TagManager#deleteTag : Add logical , whether the snapshot is referenced by other branch, if so, the doClean of the snapshot should be skipped.

BranchManager#deleteBranch : Add logical , If the snapshot has expired and is not referenced by another branch or tag, trigger doClean to delete manifest&data file.

Documentation

@LinMingQiang LinMingQiang force-pushed the master-fix-skip-branch-snp-expire branch from 11fc94b to f935b71 Compare July 21, 2024 03:54
@JingsongLi
Copy link
Contributor

If the snapshot is not deleted here, when will it be cleared?

@BsoBird
Copy link
Contributor

BsoBird commented Jul 24, 2024

If the snapshot is not deleted here, when will it be cleared?

@JingsongLi @LinMingQiang Maybe we both need to merge our PRs.

@LinMingQiang LinMingQiang changed the title [core] Expire snapshot should skip branches create snapshot. [core] Cleaning snapshots or delete tags should skip parts referenced by branches. Jul 26, 2024
@LinMingQiang LinMingQiang changed the title [core] Cleaning snapshots or delete tags should skip parts referenced by branches. [core] Cleaning snapshots or deleting tags should skip those referenced by branches. Jul 26, 2024
@LinMingQiang LinMingQiang force-pushed the master-fix-skip-branch-snp-expire branch 7 times, most recently from 15113ad to 6d14799 Compare July 30, 2024 14:07
@JingsongLi
Copy link
Contributor

Please rebase master.

@LinMingQiang LinMingQiang force-pushed the master-fix-skip-branch-snp-expire branch from 6d14799 to c2f9eda Compare August 1, 2024 06:06
@LinMingQiang
Copy link
Contributor Author

Please rebase master.

done.

}

public static List<Snapshot> mergeTreeSetToList(
Collection<Snapshot> set1, Collection<Snapshot> set2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the tree set?

sortToList?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change method name to 'mergeSetToSortedList'

FileStoreTableFactory.create(
fileIO, new Path(branchPath(tablePath, branchName)));
SortedMap<Snapshot, List<String>> snapshotTags = branchTable.tagManager().tags();
Snapshot earliestSnapshot = branchTable.snapshotManager().earliestSnapshot();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is difference betweend branchCreateSnapshot and this earliestSnapshot?

List<String> tags = snapshotTags.get(snapshot);
checkArgument(tags.size() == 1);
sortedSnapshots
.computeIfAbsent(snapshot, s -> new ArrayList<>())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why get snapshot from tags?

Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After spending a lot of time looking at this PR, I found that the current implementation is too complicated.

@JingsongLi
Copy link
Contributor

@LinMingQiang Can we remove this feature? Branches can only be created through tags, or completely new branch (without inheriting data from the main branch).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants