-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-29133: Implement "pitr" Command for Point-in-Time Restore #6717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
606ff9e to
d43fc8c
Compare
5a303ac to
00feaf1
Compare
00feaf1 to
dc20fbe
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| } else { | ||
| // This server is no longer active (e.g., RS moved or removed); skip | ||
| if (LOG.isDebugEnabled()) { | ||
| LOG.debug("Skipping replication marker timestamp for invalid server: {}", server); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is theLOG.isDebugEnabled() check necessary here? Shouldn't log4j just not log this message if the log level is set to INFO or higher?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, it's unnecessary here. We typically use the conditional check when building a complex string or performing expensive operations, which isn't the case in this line.
anmolnar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall lgtm, but surprising how low the level of unit tests' granularity is: for instance BackupAdminImpl is such a big class and there's no corresponding test class. It has a lot of private methods which could be tested as "units" if they were package private and we had a BackupAdminTest class. Please consider that.
| return request; | ||
| } | ||
|
|
||
| public static PointInTimeRestoreRequest createPointInTimeRestoreRequest(String backupRootDir, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Since you have Builder implemented for PointInTimeRestoreRequest, this method is redundant. Creating a new request is already a one-liner.
| } else { | ||
| // This server is no longer active (e.g., RS moved or removed); skip | ||
| if (LOG.isDebugEnabled()) { | ||
| LOG.debug("Skipping replication marker timestamp for invalid server: {}", server); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"inactive" maybe instead of "invalid" - could be slightly less misleading, these servers are probably not invalid
|
@vinayakphegde You also need to rebase the patch. Some flaky tests have already been fixed on the feature branch. |
True, there isn’t a separate test class specifically for That said, I’ll still take a closer look at the class and add tests if I find any gaps. |
f0c0a7f to
8063bbe
Compare
Could you please provide examples of that? |
Yes, for example, mergeBackups() is already covered by tests in TestBackupMerge, TestIncrementalBackupMergeWithBulkLoad, and IntegrationTestBackupRestore. The checkIfValidForMerge() method is private and is only called from mergeBackups(), which is the case for most of the methods in this class. That said, we can definitely create a dedicated test class for this and add the remaining/uncovered test cases. I’ll open a separate Jira for that. Does that sound good? |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Yeah, these examples are all integration tests as the same suggests in some cases: they create "real"/mini HBase cluster, sets up a connection, get reference to Admin and call methods which are publicly accessible. Looks like this is historically the main approach for HBase testing, but this is not unit testing. What I'm talking about is instantiating BackupAdminImpl class directly with a mocked connection and test the methods individually. In order to do this you have to make methods accessible from unit tests, for instance, by changing visibility to package private. I don't insist doing this strictly, just leaving here as something to think about. Let me approve this patch regardless. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. I just have some minor comments, most of which are for AbstractRestoreDriver.java.
| boolean overwrite = cmd.hasOption(OPTION_OVERWRITE); | ||
| if (overwrite) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - I see in Line 78 you're just using the hasOption() method and not assigning its output to a variable. Same with Lines 94, 101, and 107. Maybe this should be like that for consistency?
| boolean overwrite = cmd.hasOption(OPTION_OVERWRITE); | |
| if (overwrite) { | |
| if (cmd.hasOption(OPTION_OVERWRITE)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I am using the value of overwrite in another place (line 152).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, I missed that.
| boolean check = cmd.hasOption(OPTION_CHECK); | ||
| if (check) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the previous comment
| boolean check = cmd.hasOption(OPTION_CHECK); | |
| if (check) { | |
| if (cmd.hasOption(OPTION_CHECK)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed that you're using this at the end of the function as well, so no change is needed here.
| LOG.debug("Found -overwrite option in restore command, " | ||
| + "will overwrite to existing table if any in the restore target"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - I see you have -o, but from what I can tell -overwrite is not an available alternative option. It may be more clear to have the actual option in parenthesis.
| LOG.debug("Found -overwrite option in restore command, " | |
| + "will overwrite to existing table if any in the restore target"); | |
| LOG.debug("Found overwrite option (-{}) in restore command, " | |
| + "will overwrite to existing table if any in the restore target", OPTION_OVERWRITE); |
| LOG.debug( | ||
| "Found -check option in restore command, " + "will check and verify the dependencies"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - Same as the previous comment
| LOG.debug( | |
| "Found -check option in restore command, " + "will check and verify the dependencies"); | |
| LOG.debug( | |
| "Found check option (-{}) in restore command, will check and verify the dependencies", | |
| OPTION_CHECK); |
| System.err.println( | ||
| "Options -s and -t are mutually exclusive," + " you can not specify both of them."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a little info to quickly remind the user what -s and -t do?
| System.err.println( | |
| "Options -s and -t are mutually exclusive," + " you can not specify both of them."); | |
| System.err.printf( | |
| "Set name (-%s) and table list (-%s) are mutually exclusive, you can not specify both " | |
| + "of them.%n", OPTION_SET, OPTION_TABLE); |
If you decide this isn't necessary, then please remove the unnecessary +:
| System.err.println( | |
| "Options -s and -t are mutually exclusive," + " you can not specify both of them."); | |
| System.err.println( | |
| "Options -s and -t are mutually exclusive, you can not specify both of them."); |
| } | ||
| public class RestoreDriver extends AbstractRestoreDriver { | ||
| private static final String USAGE_STRING = """ | ||
| Usage: hbase restore <backup_path> <backup_id> [options] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should <table(s)> be in this usage line as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is already included in the options list. 3 line below.
| long dirEndTime = dirStartTime + ONE_DAY_IN_MILLISECONDS - 1; // End time of the day | ||
| // (23:59:59) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels a little odd to have an inline comment on two lines. It's probably because of mvn spotless:apply. You can probably get it all on one line if you trim the comment down:
| long dirEndTime = dirStartTime + ONE_DAY_IN_MILLISECONDS - 1; // End time of the day | |
| // (23:59:59) | |
| long dirEndTime = dirStartTime + ONE_DAY_IN_MILLISECONDS - 1; // End time of day (23:59:59) |
| // Capture the timestamp of the last WAL entry processed. This is used as the replication | ||
| // checkpoint | ||
| // so that point-in-time restores know the latest consistent time up to which replication has | ||
| // occurred. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - It looks like there is an accidental newline after checkpoint.
| // Capture the timestamp of the last WAL entry processed. This is used as the replication | |
| // checkpoint | |
| // so that point-in-time restores know the latest consistent time up to which replication has | |
| // occurred. | |
| // Capture the timestamp of the last WAL entry processed. This is used as the replication | |
| // checkpoint so that point-in-time restores know the latest consistent time up to which | |
| // replication has occurred. |
| * backups with or without continuous backup enabled. 4. Ensuring replication is complete before | ||
| * proceeding. | ||
| */ | ||
| private static void setUpBackupUps() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this method name a typo? Should it say setUpBackups()?
| String targetTableNames = | ||
| Arrays.stream(targetTables).map(TableName::getNameAsString).collect(Collectors.joining(",")); | ||
|
|
||
| return new String[] { "-t", sourceTableNames, "-m", targetTableNames, "-" + OPTION_TO_DATETIME, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be safe, you can use "-" + OPTION_TABLE instead of -t in case the option is ever changed. Similar for -m and the args in buildBackupArgs().
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
Signed-off-by: Andor Molnar <andor@apache.org> Signed-off-by: Tak Lon (Stephen) Wu <taklwu@apache.org>
…he#6717) Signed-off-by: Andor Molnar <andor@apache.org> Signed-off-by: Tak Lon (Stephen) Wu <taklwu@apache.org>
…he#6717) Signed-off-by: Andor Molnar <andor@apache.org> Signed-off-by: Tak Lon (Stephen) Wu <taklwu@apache.org>
Signed-off-by: Andor Molnar <andor@apache.org> Signed-off-by: Tak Lon (Stephen) Wu <taklwu@apache.org>
Signed-off-by: Andor Molnar <andor@apache.org> Signed-off-by: Tak Lon (Stephen) Wu <taklwu@apache.org>
This PR introduces the
pitrcommand to restore tables from the most recent backup before--to-datetimeand replay WAL logs for changes made after the last backup.Command Usage:
hbase pitr [-t <table_name[,table_name]>] [-s <backup_set_name>] [-q <name>] [-c] [-m <target_tables>] [-o] [--to-datetime <end_time>]