-
Notifications
You must be signed in to change notification settings - Fork 509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-9802. Tool to fix corrupted snapshot chain #6386
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hemantk-12 for the patch. Found some typos, and suggested some minor improvements.
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/repair/om/SnapshotRepair.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/repair/om/SnapshotRepair.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/repair/om/SnapshotRepair.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/repair/om/SnapshotRepair.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/repair/om/SnapshotRepair.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/repair/om/SnapshotRepair.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/helpers/SnapshotInfo.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/repair/om/SnapshotRepair.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this tool @hemantk-12. Really helpful in fixing snapshot chain corruption. LGTM+1
Thanks, @adoroszlai and @aswinshakil for the review. |
(cherry picked from commit e907316)
(cherry picked from commit e907316)
(cherry picked from commit e907316)
) (cherry picked from commit e907316) Change-Id: I67950ede21d47449ce433b629e1a8f24a1d88a19
What changes were proposed in this pull request?
In the past, we have seen many instances (HDDS-8530, HDDS-8832, and recently HDDS-10524) when SnapshotChain corruption was causing OM to get stuck on restart. Lots of bugs related to snapshot chain corruption have been fixed but we still issue some occurrences here and there (HDDS-10524 is a very recent one).
This change is to implement a tool that can be used to repair the snapshot chain corruption once a corrupted entry is identified.
The idea for this tool is from HDDS-8100. This change also provides a base setup for HDDS-8100 and can be extended to provide support for other cases HDDS-8101, HDDS-8824, HDDS-10295, etc.
What is the link to the Apache JIRA
HDDS-9802
How was this patch tested?
Tested it on the docker.