-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-26187 Write straight into the store directory when Splitting an… #3574
Conversation
…d Merging Change-Id: I5d0bbd99d27a6b3ecdd0a069dc8741239b32edfa
void cleanupSplitsDir() throws IOException { | ||
deleteDir(getSplitsDir()); | ||
} | ||
|
||
/** | ||
* Clean up any split detritus that may have been left around from previous | ||
* split attempts. | ||
* Call this method on initial region deploy. | ||
* @throws IOException | ||
*/ | ||
void cleanupAnySplitDetritus() throws IOException { | ||
Path splitdir = this.getSplitsDir(); | ||
if (!fs.exists(splitdir)) return; | ||
// Look at the splitdir. It could have the encoded names of the daughter | ||
// regions we tried to make. See if the daughter regions actually got made | ||
// out under the tabledir. If here under splitdir still, then the split did | ||
// not complete. Try and do cleanup. This code WILL NOT catch the case | ||
// where we successfully created daughter a but regionserver crashed during | ||
// the creation of region b. In this case, there'll be an orphan daughter | ||
// dir in the filesystem. TOOD: Fix. | ||
FileStatus[] daughters = CommonFSUtils.listStatus(fs, splitdir, new FSUtils.DirFilter(fs)); | ||
if (daughters != null) { | ||
for (FileStatus daughter: daughters) { | ||
Path daughterDir = new Path(getTableDir(), daughter.getPath().getName()); | ||
if (fs.exists(daughterDir) && !deleteDir(daughterDir)) { | ||
throw new IOException("Failed delete of " + daughterDir); | ||
} | ||
} | ||
} | ||
cleanupSplitsDir(); | ||
LOG.info("Cleaned up old failed split transaction detritus: " + splitdir); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we won't be using temp dirs anymore, these methods would have no use. Any potential failed split/merge dir would be cleaned by CatalogJanitor, once it finds no related entry in meta for these regions.
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
@@ -572,51 +572,15 @@ public void removeStoreFiles(String familyName, Collection<HStoreFile> storeFile | |||
// =========================================================================== | |||
// Splits Helpers | |||
// =========================================================================== | |||
/** @return {@link Path} to the temp directory used during split operations */ | |||
/** @return {@link Path} to the table directory for split daughters */ | |||
Path getSplitsDir() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, let me remove it.
if (!deleteDir(splitdir)) { | ||
throw new IOException("Failed deletion of " + splitdir + " before creating them again."); | ||
} | ||
} | ||
// splitDir doesn't exists now. No need to do an exists() call for it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is incorrect now?
@@ -380,4 +382,190 @@ public void testTempAndCommit() throws IOException { | |||
|
|||
fs.delete(rootDir, true); | |||
} | |||
|
|||
@Test | |||
public void testSplitStoreDir() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better place the below methods to a different test class? And is it possible to only start mini cluster once? Now every test method needs to start and then stop the mini cluster, which is time consuming...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let me move these to a separate class and do the mini cluster init only once.
// splitDir doesn't exists now. No need to do an exists() call for it. | ||
if (!createDir(splitdir)) { | ||
throw new IOException("Failed create of " + splitdir); | ||
if (!fs.exists(splitdir)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the split directory is now the table directory, why we still need to check existence here? And IMO here we should not use 'splitDir' and 'mergeDir' in code any more, as now there are only region directories...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the check, not needed now that we are creating the resulting split region on the table dir itself.
Change-Id: I8e5a207000513dd3ecab5ca807e6b85c2e39aba6
Change-Id: Iaf396b20cf742485a5bb705365712a0cdd94a915
One thought I had now is that we probably need to back off CatalogJanitor from running whilst a split/merge is going on. As we are creating the dirs under table directory, I guess there's a risk CatalogJanitor wipe those dirs off before the proc reaches the point to add the region in meta. Let me check on this further. |
Ah, yes, I also noticed this when review this PR as you mentioned CatalogJanitor. But even we do not write directly to region dir, there could still be an interval between we move the region into place and add it in meta table? I guess we should have already found the way to deal with this, otherwise it could cause data loss... |
🎊 +1 overall
This message was automatically generated. |
After reviewing the code for CatalogJanitor, I do not think it will clean the partial regions on FileSystem, as I haven't seen any related code about scanning the table directory on FileSystem, it just scans the meta table. So in general, I think we should change the rollback code in SplitTableRegionProcedure and MergeTableRegionsProcedure, to remove the paritial regions when rolling back. And I think we should also have a tool in HBCK2 to remove the partial regions. Thanks. |
🎊 +1 overall
This message was automatically generated. |
Change-Id: I044bab8c9b1a4c8595a0ce6e01d73d93aa04c5bc
Yep, had just found the same. Had added the cleanup logic on the rollback. I will work on hbck2 on a separate issue. |
🎊 +1 overall
This message was automatically generated. |
...erver/src/main/java/org/apache/hadoop/hbase/master/assignment/SplitTableRegionProcedure.java
Show resolved
Hide resolved
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Change-Id: I6ff9b9526c08c49d361460989aa0b592a2ca2abf
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No big concerns from me. Will give a +1 after the final small nits are resolved.
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionFileSystem.java
Outdated
Show resolved
Hide resolved
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionFileSystem.java
Outdated
Show resolved
Hide resolved
Change-Id: Ifed765ffb52dde013977efa1315d29f40fef7d8f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM.
+1.
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionFileSystem.java
Outdated
Show resolved
Hide resolved
🎊 +1 overall
This message was automatically generated. |
Change-Id: If229fc474e22852e106568dc6d978f428cfd9884
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
apache#3574) Change-Id: I574d110d6885de5eb3c8141eb6d28bb295299b3a Signed-off-by: Duo Zhang <zhangduo@apache.org>
apache#3574) Signed-off-by: Duo Zhang <zhangduo@apache.org> Change-Id: I738cfe24ac599a01c76a8c2ee29bb94bd021151d
…d Merging
Change-Id: I5d0bbd99d27a6b3ecdd0a069dc8741239b32edfa