-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-27238 Backport backup restore to 2.x #4770
HBASE-27238 Backport backup restore to 2.x #4770
Conversation
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
7dba1c5
to
7350663
Compare
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@rda3mon I know this is still just a draft, but can you address the error-prone findings in the pre-commit? Shows up under javac: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4770/3/artifact/yetus-general-check/output/diff-compile-javac-root.txt Can you do it in a separate commit so that I can review it separately from the initial backport? I re-ran the pre-commit hooks and all the tests are passing, the above is the only automated failure test remaining. The spotbugs error is a known issue which I think will be fixed in https://issues.apache.org/jira/browse/HBASE-27363 |
@bbeaudreault These errors if I fix, would differ from master branch. right? |
@bbeaudreault Is there a way to run these checks locally? Pushing and checking might be tedious job to do for fixing non obvious ones. |
@rda3mon I find that the violations are usually pretty straightforward. If there are any that you have a question about, feel free to post here. I do think it's possible to run locally, maybe with But actually, you could try including 2c3abae in your back port here. Seems like Andrew already handled these in master. This has me wondering if there are other patches we need to include though. |
Highly unlikely. I did go into each of the commits from backup history to verify if anything is missed out. Looks like this pr was merged after that. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
@rda3mon looks like that cherry-pick resolved most of the error prone issues. Looks like there are 3 left: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4770/5/artifact/yetus-general-check/output/diff-compile-javac-root.txt They look pretty simple/javadoc related. would you mind pushing 1 more commit to fix those. otherwise the build looks clean; the test failure looks unrelated. once you push that fix, i can re-run the tests if needed. Gonna start reviewing the code this week hopefully. Can you link me to specific areas of this PR that had merge conflicts or net-new code, if possible. I know in slack you mentioned WALPlayer; I'll take a look at that. Any others would be helpful |
9d89e43
to
5f92b64
Compare
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
4ccda54
to
db06805
Compare
🎊 +1 overall
This message was automatically generated. |
db06805
to
7155d53
Compare
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Nice! I think this looks good. Just waiting on clean pre-commit checks. |
There are some javac (errorprone) warnings. They are all in the net-new code, and I checked them and master has the same issues. Seems like we skipped cleaning up errorprone in backup/restore prior to now. Let's ignore those failures, I think I will create a separate jira to do errorprone cleanup of edit: tracked in https://issues.apache.org/jira/browse/HBASE-27582 |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@bbeaudreault How to re run the CI pipeline? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry @rda3mon -- a couple more small changes. We really are so close here, i just missed the nuance in WALPlayer here. But this is the last class with modifications, then we can merge.
byte[] outKey = multiTableSupport | ||
? Bytes.add(table.getName(), Bytes.toBytes(tableSeparator), | ||
CellUtil.cloneRow(KeyValueUtil.ensureKeyValue(cell))) | ||
: CellUtil.cloneRow(KeyValueUtil.ensureKeyValue(cell)); | ||
context.write(new ImmutableBytesWritable(outKey), KeyValueUtil.ensureKeyValue(cell)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fear this block can cause a performance regression. Taking a look at KeyValueUtil.ensureKeyValue, depending on the type of cell
, ensureKeyValue
can result in the creation of a new KeyValue object. Unnecessary object creation in hadoop jobs can be expensive over a long job.
Can you please update this to only call KeyValueUtil.ensureKeyValue
once for each cell
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. A miss from my side.
// This relies on Hadoop Configuration to handle warning about deprecated configs and | ||
// to set the correct non-deprecated configs when an old one shows up. | ||
static { | ||
Configuration.addDeprecation("hlog.bulk.output", BULK_OUTPUT_CONF_KEY); | ||
Configuration.addDeprecation("hlog.input.tables", TABLES_KEY); | ||
Configuration.addDeprecation("hlog.input.tablesmap", TABLE_MAP_KEY); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there a reason to add these? They aren't an issue per-se, but I noticed these deprecations are already in HBaseConfiguration class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like unintended changes. I will remove these. Must have come as part of some commit cherry-picking. And they are already present as part of HbaseConfiguration as mentioned.
try (Connection conn = ConnectionFactory.createConnection(conf);) { | ||
List<TableInfo> tableInfoList = new ArrayList<>(); | ||
for (TableName tableName : tableNames) { | ||
Table table = conn.getTable(tableName); | ||
RegionLocator regionLocator = conn.getRegionLocator(tableName); | ||
tableInfoList.add(new TableInfo(table.getDescriptor(), regionLocator)); | ||
} | ||
MultiTableHFileOutputFormat.configureIncrementalLoad(job, tableInfoList); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so I think we might want to make one more compatibility fix here:
Previously, if a user called this method with more than 1 table it would throw an exception.
Now, those tables will be collected and passed into MultiTableHFileOutputFormat.
That is fine I think. However, if you call this method with just 1 table (as most users would do today since > 1 throws an error), you also will go to MultiTableHFileOutputFormat.
I think on line 374 here we should just be defensive and do this:
if (tableInfoList.size() > 1) {
MultiTableHFileOutputFormat.configureIncrementalLoad(job, tableInfoList);
} else {
TableInfo tableInfo = tableInfoList.get(0);
HFileOutputFormat2.configureIncrementalLoad(job, tableInfo.getTableDescriptor(), tableInfo.getRegionLocator());
}
This is important because if you look at HFileOutputFormat2.configureIncrementalLoad there is this:
boolean writeMultipleTables = false;
if (MultiTableHFileOutputFormat.class.equals(cls)) {
writeMultipleTables = true;
conf.setBoolean(MULTI_TABLE_HFILEOUTPUTFORMAT_CONF_KEY, true);
}
So the PR as we have it here would cause all users of WALPlayer to automatically pick up the writeMultipleTables
behavior, even if they just use 1 table. The writeMultipleTables behavior involves a bunch of differences from partitioner, splits, output directory, etc.
My suggestion above would ensure that you only go into the writeMultipleTable mode if you're actually writing to multiple tables as indicated by tableInfoList.size() > 1
.
Only committers can do it. Actually I was going to merge anyway, cuz we got 3 +1's which is all we really need. The pipeline failed on the last reporting step, unrelated. But now just the few more comments above |
Signed-off-by: Duo Zhang <zhangduo@apache.org> Signed-off-by: Viraj Jasani <vjasani@apache.org>
7155d53
to
fc898a7
Compare
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
fc898a7
to
2bf4f6a
Compare
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All changes look good. We've retained backwards compatibility with 2.x releases, no expected changes for existing users from normal behavior. They can enable hbase.hfileoutputformat.tablename.namespace.inclusive
to trigger change in directory structure to disambiguate tables in different namespaces, which will be default in 3.0. This is enabled by default for backups, as is wal.multi.tables.support
which enables MultiTableHFileOutputFormat for WALPlayer bulkload mode. Otherwise, users can manually enable wal.multi.tables.support
in WALPlayer if they want that behavior themselves.
All tests are passing, all net-new files confirmed equivalent with master. Note one small difference from master in IncrementalTableBackupClient.java, which adds handling of our compatiblity configs.
Final build step failed due to no space left on the jenkins host, but we got 3 +1's indicating success for this PR.
Thanks @rda3mon!
@bbeaudreault Thanks for taking time out to review this lengthy PR. |
…4770) Signed-off-by: Bryan Beaudreault <bbeaudreault@apache.org>
No description provided.