-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-26707: Reduce number of renames during bulkload #4066
Conversation
move files directly to the store dir when requireWritingToTmpDirFirst is false fix failedBulkLoad to work on second call change existing tests to run without tmp folder too add tests for SecureBulkLoadListener
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@Apache9 , @wchevreuil, @joshelser Could you please take a look? |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good, just some nits, and two additional questions.
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SecureBulkLoadManager.java
Outdated
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SecureBulkLoadManager.java
Show resolved
Hide resolved
// Target filesystem | ||
private final FileSystem fs; | ||
private final String stagingDir; | ||
private final Configuration conf; | ||
// Source filesystem | ||
private FileSystem srcFs = null; | ||
private Map<String, FsPermission> origPermissions = null; | ||
private Map<String, String> origlSources = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be <String,Path> and save need for path conversion again in failedBulkLoad
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: origSources
instead of origlSources
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left it like that because origPermissions
uses String as well so a conversion would be done anyway. And while I could switch both of those to use Paths, log messages will need a string too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left it like that because
origPermissions
uses String as well so a conversion would be done anyway. And while I could switch both of those to use Paths, log messages will need a string too.
Ok, let's just then follow var naming pattern, then.
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SecureBulkLoadManager.java
Show resolved
Hide resolved
@@ -390,11 +400,16 @@ public String prepareBulkLoad(final byte[] family, final String srcPath, boolean | |||
LOG.debug("Moving " + p + " to " + stageP); | |||
FileStatus origFileStatus = fs.getFileStatus(p); | |||
origPermissions.put(srcPath, origFileStatus.getPermission()); | |||
origlSources.put(stageP.toString(), srcPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One idea for FILE SFT, could we always force the copy
option above, instead of this one that relies on rename?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could. My only concern is that it might be impractical over a certain data size especially if there is no config or param that would allow the user to use rename if they need to.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. My main concern is how safe this rename would be, if we don't have hboss in the picture. Could concurrent bulkloads be a problem in such cases? Maybe a workaround for such deployments would be to always pass a custom staging dir?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think concurrency is an issue. SecureBulkLoadListener
keeps track of the moved files and a separate listener is created for each region in each bulkLoad. So even with parallel bulkLoads they can not touch each other's files. Using the same source folder can be an issue but it always was an issue.
I'm not sure I understand your comment about "always pass a custom staging dir". The staging dir by default is different for each bulkLoad process. I break this by introducing the "custom staging dir" which always points to the live data folder as a workaround to skip moving hfiles to an actual staging dir without loosing the existing error handling. We can't change it and decrease the number of moves at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah, then we would need to rename from the custom stage to the actual dir anyways. I think we can leave it this way.
For S3 with SFT, I think the best practice would be to pass the copy option, for consistency. Shouldn't have much difference, as s3 renames are basically copies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a small nit on variable naming pending, but am good to go without it.
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Thanks a lot @wchevreuil for your feedback and merging this! |
Signed-off-by: Wellington Ramos Chevreuil <wchevreuil@apache.org>
Signed-off-by: Wellington Ramos Chevreuil <wchevreuil@apache.org>
…pache#4122) Signed-off-by: Wellington Ramos Chevreuil <wchevreuil@apache.org> Conflicts: hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestBulkloadBase.java
…pache#4122) Signed-off-by: Wellington Ramos Chevreuil <wchevreuil@apache.org> Conflicts: hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestBulkloadBase.java
…pache#4122) Signed-off-by: Wellington Ramos Chevreuil <wchevreuil@apache.org> Conflicts: hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestBulkloadBase.java
…pache#4122) Signed-off-by: Wellington Ramos Chevreuil <wchevreuil@apache.org> Conflicts: hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestBulkloadBase.java
…others) to branch-2.5 Previous cherry picks: commit 6aaef89 HBASE-26064 Introduce a StoreFileTracker to abstract the store file tracking logic commit 43b40e9 HBASE-25988 Store the store file list by a file apache#3578) commit 6e05376 HBASE-26079 Use StoreFileTracker when splitting and merging apache#3617) commit 090b2fe HBASE-26224 HBASE-26224 Introduce a MigrationStoreFileTracker to support migratin… apache#3656) commit 0ee1689 HBASE-26246 Persist the StoreFileTracker configurations to TableDescriptor when creating table apache#3666) commit 2052e80 HBASE-26248 Should find a suitable way to let users specify the store… apache#3665) commit 5ff0f98 HBASE-26264 Add more checks to prevent misconfiguration on store file… apache#3681) commit fc4f6d1 HBASE-26280 HBASE-26280 Use store file tracker when snapshoting apache#3685) commit 06db852 HBASE-26326 CreateTableProcedure fails when FileBasedStoreFileTracker… apache#3721) commit e4e7cf8 HBASE-26386 Refactor StoreFileTracker implementations to expose the s… apache#3774) commit 08d1171 HBASE-26328 Clone snapshot doesn't load reference files into FILE SFT impl apache#3749) commit 8bec26e HBASE-26263 [Rolling Upgrading] Persist the StoreFileTracker configur… apache#3700) commit a288365 HBASE-26271: Cleanup the broken store files under data directory apache#3786) commit d00b5fa HBASE-26454 CreateTableProcedure still relies on temp dir and renames… apache#3845) commit 771e552 HBASE-26286: Add support for specifying store file tracker when restoring or cloning snapshot commit f16b7b1 HBASE-26265 Update ref guide to mention the new store file tracker im… apache#3942) commit 755b3b4 HBASE-26585 Add SFT configuration to META table descriptor when creating META apache#3998) commit 39c42c7 HBASE-26639 The implementation of TestMergesSplitsAddToTracker is pro… apache#4010) commit 6e1f5b7 HBASE-26586 Should not rely on the global config when setting SFT implementation for a table while upgrading apache#4006) commit f1dd865 HBASE-26654 ModifyTableDescriptorProcedure shoud load TableDescriptor… apache#4034) commit 8fbc9a2 HBASE-26674 Should modify filesCompacting under storeWriteLock apache#4040) commit 5aa0fd2 HBASE-26675 Data race on Compactor.writer apache#4035) commit 3021c58 HBASE-26700 The way we bypass broken track file is not enough in Stor… apache#4055) commit a8b68c9 HBASE-26690 Modify FSTableDescriptors to not rely on renaming when wr… apache#4054) commit dffeb8e HBASE-26587 Introduce a new Admin API to change SFT implementation (#… apache#4080) commit b265fe5 HBASE-26673 Implement a shell command for change SFT implementation apache#4113) commit 4cdb380 HBASE-26640 Reimplement master local region initialization to better … apache#4111) commit 77bb153 HBASE-26707: Reduce number of renames during bulkload (apache#4066) apache#4122) commit a4b192e HBASE-26611 Changing SFT implementation on disabled table is dangerous apache#4082) commit d3629bb HBASE-26837 Set SFT config when creating TableDescriptor in TestClone… apache#4226) commit 541d748 HBASE-26881 Backport HBASE-25368 to branch-2 (apache#4267) Fixups for precommit error prone, checkstyle, and javadoc warnings after applying cherry picks. Signed-off-by: Josh Elser <elserj@apache.org> Reviewed-by: Wellington Ramos Chevreuil <wchevreuil@apache.org>
Signed-off-by: Wellington Ramos Chevreuil <wchevreuil@apache.org> Conflicts: hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestBulkloadBase.java
…others) to branch-2.5 Previous cherry picks: commit 6aaef89 HBASE-26064 Introduce a StoreFileTracker to abstract the store file tracking logic commit 43b40e9 HBASE-25988 Store the store file list by a file #3578) commit 6e05376 HBASE-26079 Use StoreFileTracker when splitting and merging #3617) commit 090b2fe HBASE-26224 HBASE-26224 Introduce a MigrationStoreFileTracker to support migratin… #3656) commit 0ee1689 HBASE-26246 Persist the StoreFileTracker configurations to TableDescriptor when creating table #3666) commit 2052e80 HBASE-26248 Should find a suitable way to let users specify the store… #3665) commit 5ff0f98 HBASE-26264 Add more checks to prevent misconfiguration on store file… #3681) commit fc4f6d1 HBASE-26280 HBASE-26280 Use store file tracker when snapshoting #3685) commit 06db852 HBASE-26326 CreateTableProcedure fails when FileBasedStoreFileTracker… #3721) commit e4e7cf8 HBASE-26386 Refactor StoreFileTracker implementations to expose the s… #3774) commit 08d1171 HBASE-26328 Clone snapshot doesn't load reference files into FILE SFT impl #3749) commit 8bec26e HBASE-26263 [Rolling Upgrading] Persist the StoreFileTracker configur… #3700) commit a288365 HBASE-26271: Cleanup the broken store files under data directory #3786) commit d00b5fa HBASE-26454 CreateTableProcedure still relies on temp dir and renames… #3845) commit 771e552 HBASE-26286: Add support for specifying store file tracker when restoring or cloning snapshot commit f16b7b1 HBASE-26265 Update ref guide to mention the new store file tracker im… #3942) commit 755b3b4 HBASE-26585 Add SFT configuration to META table descriptor when creating META #3998) commit 39c42c7 HBASE-26639 The implementation of TestMergesSplitsAddToTracker is pro… #4010) commit 6e1f5b7 HBASE-26586 Should not rely on the global config when setting SFT implementation for a table while upgrading #4006) commit f1dd865 HBASE-26654 ModifyTableDescriptorProcedure shoud load TableDescriptor… #4034) commit 8fbc9a2 HBASE-26674 Should modify filesCompacting under storeWriteLock #4040) commit 5aa0fd2 HBASE-26675 Data race on Compactor.writer #4035) commit 3021c58 HBASE-26700 The way we bypass broken track file is not enough in Stor… #4055) commit a8b68c9 HBASE-26690 Modify FSTableDescriptors to not rely on renaming when wr… #4054) commit dffeb8e HBASE-26587 Introduce a new Admin API to change SFT implementation (#… #4080) commit b265fe5 HBASE-26673 Implement a shell command for change SFT implementation #4113) commit 4cdb380 HBASE-26640 Reimplement master local region initialization to better … #4111) commit 77bb153 HBASE-26707: Reduce number of renames during bulkload (#4066) #4122) commit a4b192e HBASE-26611 Changing SFT implementation on disabled table is dangerous #4082) commit d3629bb HBASE-26837 Set SFT config when creating TableDescriptor in TestClone… #4226) commit 541d748 HBASE-26881 Backport HBASE-25368 to branch-2 (#4267) Fixups for precommit error prone, checkstyle, and javadoc warnings after applying cherry picks. Signed-off-by: Josh Elser <elserj@apache.org> Reviewed-by: Wellington Ramos Chevreuil <wchevreuil@apache.org>
…pache#4122) Signed-off-by: Wellington Ramos Chevreuil <wchevreuil@apache.org> Conflicts: hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestBulkloadBase.java Change-Id: I7bc468b99b4641673e42a8fb0e887d6a6088d08e
move files directly to the store dir when requireWritingToTmpDirFirst is
false
fix failedBulkLoad to work on second call
change existing tests to run without tmp folder too
add tests for SecureBulkLoadListener