-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-28483 backup merge fails on bulkloaded hfiles #5795
HBASE-28483 backup merge fails on bulkloaded hfiles #5795
Conversation
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so the problem here is that the Merge job sets the input dirs to table dirs:
backupIT/backup_1712319710829/default/table-false
backupIT/backup_1712319719106/default/table-false
Typically HFileInputFormat expects the input dirs to be region dirs like:
backupIT/backup_1712319710829/default/table-false/<regionname>
backupIT/backup_1712319719106/default/table-false/<regionname>
... etc, for each region in each table
In which case a region dir contains a column dir, and so it needed to handle that and does 1 level of "recursion".
But in the case of merge, the input dir is the table dir and it needs to do 2 levels of recursion.
I'm a little on the fence about whether this fix should be in HFileInputFormat or actually in MapReduceHFileSplitterJob. In the latter case we could list the backup dirs and add the regions within to the inputs, so that we honor the existing contract of HFileInputFormat.
That said I think for now I'm leaning towards this being the ok path, so we can leave it. There might be other benefits as well.
@@ -155,16 +154,23 @@ protected List<FileStatus> listStatus(JobContext job) throws IOException { | |||
// since HFiles are written to directories where the | |||
// directory name is the column name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment here is out of date relative to the implementation now. It would be good to update it about the case where sometimes it might need to recurse further, such as when pointed at a table dir.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I updated the comment to hint at both current cases for the input dirs: and (but without specifically mentioning MapReduceBackupMergeJob
)
Also please run |
a828ccf
to
9714577
Compare
Issue with |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
What I was referring to about spotless was this error in the first build: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5795/1/artifact/yetus-general-check/output/patch-spotless.txt Spotless helps maintain our formatting guidelines. Where did you see it complaining about the hamcrest dependency? I see the formatting issues have been fixed, but I'm not sure we should change the pom.xml unless there was a reason |
I investigated and it does seem like a true duplicate. I don't think this was related, but it's ok. More importantly, the spotless formatting issues have been fixed. Going to merge this |
Signed-off-by: Bryan Beaudreault <bbeaudreault@apache.org>
Signed-off-by: Bryan Beaudreault <bbeaudreault@apache.org>
Signed-off-by: Bryan Beaudreault <bbeaudreault@apache.org>
…apache#5795) Signed-off-by: Bryan Beaudreault <bbeaudreault@apache.org>
No description provided.