Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-26707: Reduce number of renames during bulkload #4066
HBASE-26707: Reduce number of renames during bulkload #4066
Changes from 1 commit
5ec17ea
6c0c369
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be <String,Path> and save need for path conversion again in
failedBulkLoad
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
origSources
instead oforiglSources
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left it like that because
origPermissions
uses String as well so a conversion would be done anyway. And while I could switch both of those to use Paths, log messages will need a string too.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let's just then follow var naming pattern, then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One idea for FILE SFT, could we always force the
copy
option above, instead of this one that relies on rename?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could. My only concern is that it might be impractical over a certain data size especially if there is no config or param that would allow the user to use rename if they need to.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. My main concern is how safe this rename would be, if we don't have hboss in the picture. Could concurrent bulkloads be a problem in such cases? Maybe a workaround for such deployments would be to always pass a custom staging dir?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think concurrency is an issue.
SecureBulkLoadListener
keeps track of the moved files and a separate listener is created for each region in each bulkLoad. So even with parallel bulkLoads they can not touch each other's files. Using the same source folder can be an issue but it always was an issue.I'm not sure I understand your comment about "always pass a custom staging dir". The staging dir by default is different for each bulkLoad process. I break this by introducing the "custom staging dir" which always points to the live data folder as a workaround to skip moving hfiles to an actual staging dir without loosing the existing error handling. We can't change it and decrease the number of moves at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah, then we would need to rename from the custom stage to the actual dir anyways. I think we can leave it this way.
For S3 with SFT, I think the best practice would be to pass the copy option, for consistency. Shouldn't have much difference, as s3 renames are basically copies.