-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-29521: Update Restore Command to Handle Bulkloaded Files #7300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances Point-in-Time Restore (PITR) functionality to include restoration of bulk-loaded HFiles, ensuring complete data coverage beyond just WAL replay.
- Adds
BulkFilesCollectorutility to discover bulk-load files from WAL directories using a new MapReduce job - Extends PITR flow to re-bulkload discovered HFiles after WAL replay using the existing
RestoreJob - Moves
BackupFileSystemManagerto util package for broader use across backup components
Reviewed Changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| WALInputFormat.java | Makes WALSplit class public for MapReduce job access |
| BulkLoadProcessor.java | Moves to util package and adds single-entry API for WAL processing |
| BackupFileSystemManager.java | Relocates to util package with enhanced path resolution methods |
| BulkFilesCollector.java | New utility to run MapReduce bulk-load collection jobs |
| BulkLoadCollectorJob.java | New MapReduce job to discover bulk-load files from WALs |
| AbstractPitrRestoreHandler.java | Integrates bulk-load file restoration into PITR flow |
| PointInTimeRestoreRequest.java | Adds fields for restore directory and split preservation |
| TestPointInTimeRestore.java | Updates test to include bulk-load scenarios |
| Multiple test files | Comprehensive test coverage for new functionality |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| protected void setup(Context context) throws IOException { | ||
| String[] tableMap = context.getConfiguration().getStrings(TABLES_KEY); | ||
| String[] tablesToUse = context.getConfiguration().getStrings(TABLES_KEY); | ||
| if (tableMap == null) { | ||
| tableMap = tablesToUse; | ||
| } |
Copilot
AI
Sep 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 154 retrieves the same configuration key as line 153, making the assignment redundant. Line 154 should retrieve TABLE_MAP_KEY instead of TABLES_KEY to get the table mappings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a bug ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @taklwu. This looks like a bug to me. Maybe you meant to use TABLE_MAP_KEY for tableMap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a bug. I will fix it.
|
|
||
| Path full; // Build final path: | ||
| // <prefixBeforeWALs>/bulk-load-files/<dateSegment>/<relativeBulkPath> | ||
| if (prefixBeforeWALs == null || prefixBeforeWALs.toString().isEmpty()) { |
Copilot
AI
Sep 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling toString() on a potentially null prefixBeforeWALs will throw a NullPointerException. The null check should be performed first, and the empty string check should be separate.
| if (prefixBeforeWALs == null || prefixBeforeWALs.toString().isEmpty()) { | |
| if (prefixBeforeWALs == null || (prefixBeforeWALs != null && prefixBeforeWALs.toString().isEmpty())) { |
| } | ||
| } catch (ParseException e) { | ||
| LOG.warn("Skipping invalid directory name: " + dirName, e); | ||
| LOG.warn("Skipping invalid directory name: {}", dirName, e); |
Copilot
AI
Sep 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] This change from concatenated string to parameterized logging is good, but the comment on line 440 should be updated to reflect the corrected logging style for consistency.
taklwu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will have another look later today
| PointInTimeRestoreRequest pointInTimeRestoreRequest = | ||
| new PointInTimeRestoreRequest.Builder().withBackupRootDir(backupRootDir).withCheck(check) | ||
| .withFromTables(fromTables).withToTables(toTables).withOverwrite(isOverwrite) | ||
| .withToDateTime(endTime).withKeepOriginalSplits(false).withRestoreRootDir( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: withKeepOriginalSplits should this be configurable ? in what case it should be true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this should be configurable. But it is hardcoded in the already existing code as well. so, we need to fix in both cases. and I think it should be a separate ticket. But for the moment I added this comment
// TODO: Currently hardcoding keepOriginalSplits=false and restoreRootDir via tmp dir.
// These should come from user input (same issue exists in normal restore).
// Expose them as configurable options in future.
taklwu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few more minor comments
| Path p = new Path(d); | ||
| try { | ||
| FileSystem fsForPath = p.getFileSystem(conf); | ||
| if (fsForPath.exists(p)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is not atomic, after you check exist, there could be chance that someone manipulates the checked WAL directories, then this is not true for its existence.
I assumed you're just verifying before the actual execution, so let's keep this check for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. This is simply a sanity check before we send the list to the MR job. If the MR job can't find the file, it will raise an exception and cause the process to fail.
| LOG.error("Re-bulkload failed for {}: {}", targetTable, e.getMessage(), e); | ||
| throw new IOException("Re-bulkload failed for " + targetTable + ": " + e.getMessage(), e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: reuse the same error message for both log and exception
| LOG.error("Re-bulkload failed for {}: {}", targetTable, e.getMessage(), e); | |
| throw new IOException("Re-bulkload failed for " + targetTable + ": " + e.getMessage(), e); | |
| String errorMessage = String.format("Re-bulkload failed for %s: %s", targetTable, e.getMessage()); | |
| LOG.error(errorMessage, e); | |
| throw new IOException(errorMessage, e); |
| collectBulkFiles(sourceTable, targetTable, startTime, endTime, new Path(restoreRootDir)); | ||
|
|
||
| if (bulkloadFiles.isEmpty()) { | ||
| LOG.info("No bulk-load files found for {} in range {}-{}. Skipping bulkload restore.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| LOG.info("No bulk-load files found for {} in range {}-{}. Skipping bulkload restore.", | |
| LOG.info("No bulk-load files found for {} in time range {}-{}. Skipping bulkload restore.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one comment on a potential bug that may require a change and one nit comment. LGTM otherwise.
| try { | ||
| jobDriver.createSubmittableJob(new String[] { "file:/only/one/arg" }); | ||
| fail("Expected IOException for insufficient args"); | ||
| } catch (Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, you should only be catching IOException here since that is what you're expecting. Otherwise, the test catches every exception and always passes.
You can also add @Test(expected = IOException.class) to the top of your unit test and eliminate the try/catch block and fail() methods completely.
| protected void setup(Context context) throws IOException { | ||
| String[] tableMap = context.getConfiguration().getStrings(TABLES_KEY); | ||
| String[] tablesToUse = context.getConfiguration().getStrings(TABLES_KEY); | ||
| if (tableMap == null) { | ||
| tableMap = tablesToUse; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @taklwu. This looks like a bug to me. Maybe you meant to use TABLE_MAP_KEY for tableMap?
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
taklwu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. Please check unit test failure. If unrelated, let's ship it.
kgeisz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
🎊 +1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
@anmolnar , this test case doesn't appear to be connected to our change. It's a completely unrelated test, but it keeps failing consistently. I'm not sure what might be causing this. |
|
Looks good to me |
Kota-SH
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Tak Lon (Stephen) Wu <taklwu@apache.org> Signed-off-by: Andor Molnár andor@apache.org Reviewed by: Kevin Geiszler <kevin.j.geiszler@gmail.com> Reviewed by: Kota-SH <shanmukhaharipriya@gmail.com>
Point-in-Time Restore (PITR) with Bulkload File Support
Overview
This change enhances Point-in-Time Restore (PITR) to include restoration of bulk-loaded HFiles.
Previously, PITR only replayed WAL edits; bulk-loaded files referenced in WAL markers were ignored.
With this update, PITR now restores both WAL edits and bulkloaded files, ensuring full data coverage.
Key Changes
New Utility:
BulkFilesCollectorBulkLoadCollectorJobover WAL directories.Paths for PITR consumption.PITR Restore Flow
reBulkloadFiles(...).RestoreJobMapReduce job (originally for full/incremental restores) to perform HFile bulkload.Integration Tests
Updated
TestPointInTimeRestore:Logging
RestoreJobis now used not only for full/incremental backup restores but also for PITR bulkload restore.