Skip to content

Conversation

@vinayakphegde
Copy link
Contributor

@vinayakphegde vinayakphegde commented Sep 14, 2025

Point-in-Time Restore (PITR) with Bulkload File Support

Overview

This change enhances Point-in-Time Restore (PITR) to include restoration of bulk-loaded HFiles.
Previously, PITR only replayed WAL edits; bulk-loaded files referenced in WAL markers were ignored.
With this update, PITR now restores both WAL edits and bulkloaded files, ensuring full data coverage.

Key Changes

  • New Utility: BulkFilesCollector

    • Runs BulkLoadCollectorJob over WAL directories.
    • Collects and deduplicates bulkload file paths.
    • Returns discovered Paths for PITR consumption.
  • PITR Restore Flow

    • After replaying WALs, PITR calls reBulkloadFiles(...).
    • Uses the existing RestoreJob MapReduce job (originally for full/incremental restores) to perform HFile bulkload.
  • Integration Tests

    • Updated TestPointInTimeRestore:

      • Generates and bulk-loads HFiles during backup setup.
      • Verifies PITR restores both WAL edits and HFiles.
  • Logging

    • Clarified that RestoreJob is now used not only for full/incremental backup restores but also for PITR bulkload restore.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@vinayakphegde vinayakphegde marked this pull request as ready for review September 19, 2025 10:12
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@taklwu taklwu requested a review from Copilot September 23, 2025 17:22
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances Point-in-Time Restore (PITR) functionality to include restoration of bulk-loaded HFiles, ensuring complete data coverage beyond just WAL replay.

  • Adds BulkFilesCollector utility to discover bulk-load files from WAL directories using a new MapReduce job
  • Extends PITR flow to re-bulkload discovered HFiles after WAL replay using the existing RestoreJob
  • Moves BackupFileSystemManager to util package for broader use across backup components

Reviewed Changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
WALInputFormat.java Makes WALSplit class public for MapReduce job access
BulkLoadProcessor.java Moves to util package and adds single-entry API for WAL processing
BackupFileSystemManager.java Relocates to util package with enhanced path resolution methods
BulkFilesCollector.java New utility to run MapReduce bulk-load collection jobs
BulkLoadCollectorJob.java New MapReduce job to discover bulk-load files from WALs
AbstractPitrRestoreHandler.java Integrates bulk-load file restoration into PITR flow
PointInTimeRestoreRequest.java Adds fields for restore directory and split preservation
TestPointInTimeRestore.java Updates test to include bulk-load scenarios
Multiple test files Comprehensive test coverage for new functionality

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines 152 to 157
protected void setup(Context context) throws IOException {
String[] tableMap = context.getConfiguration().getStrings(TABLES_KEY);
String[] tablesToUse = context.getConfiguration().getStrings(TABLES_KEY);
if (tableMap == null) {
tableMap = tablesToUse;
}
Copy link

Copilot AI Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 154 retrieves the same configuration key as line 153, making the assignment redundant. Line 154 should retrieve TABLE_MAP_KEY instead of TABLES_KEY to get the table mappings.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a bug ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @taklwu. This looks like a bug to me. Maybe you meant to use TABLE_MAP_KEY for tableMap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a bug. I will fix it.


Path full; // Build final path:
// <prefixBeforeWALs>/bulk-load-files/<dateSegment>/<relativeBulkPath>
if (prefixBeforeWALs == null || prefixBeforeWALs.toString().isEmpty()) {
Copy link

Copilot AI Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling toString() on a potentially null prefixBeforeWALs will throw a NullPointerException. The null check should be performed first, and the empty string check should be separate.

Suggested change
if (prefixBeforeWALs == null || prefixBeforeWALs.toString().isEmpty()) {
if (prefixBeforeWALs == null || (prefixBeforeWALs != null && prefixBeforeWALs.toString().isEmpty())) {

Copilot uses AI. Check for mistakes.
}
} catch (ParseException e) {
LOG.warn("Skipping invalid directory name: " + dirName, e);
LOG.warn("Skipping invalid directory name: {}", dirName, e);
Copy link

Copilot AI Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] This change from concatenated string to parameterized logging is good, but the comment on line 440 should be updated to reflect the corrected logging style for consistency.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@taklwu taklwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have another look later today

PointInTimeRestoreRequest pointInTimeRestoreRequest =
new PointInTimeRestoreRequest.Builder().withBackupRootDir(backupRootDir).withCheck(check)
.withFromTables(fromTables).withToTables(toTables).withOverwrite(isOverwrite)
.withToDateTime(endTime).withKeepOriginalSplits(false).withRestoreRootDir(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: withKeepOriginalSplits should this be configurable ? in what case it should be true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should be configurable. But it is hardcoded in the already existing code as well. so, we need to fix in both cases. and I think it should be a separate ticket. But for the moment I added this comment

// TODO: Currently hardcoding keepOriginalSplits=false and restoreRootDir via tmp dir.
      // These should come from user input (same issue exists in normal restore).
      // Expose them as configurable options in future.

Copy link
Contributor

@taklwu taklwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few more minor comments

Path p = new Path(d);
try {
FileSystem fsForPath = p.getFileSystem(conf);
if (fsForPath.exists(p)) {
Copy link
Contributor

@taklwu taklwu Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is not atomic, after you check exist, there could be chance that someone manipulates the checked WAL directories, then this is not true for its existence.

I assumed you're just verifying before the actual execution, so let's keep this check for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. This is simply a sanity check before we send the list to the MR job. If the MR job can't find the file, it will raise an exception and cause the process to fail.

Comment on lines 364 to 365
LOG.error("Re-bulkload failed for {}: {}", targetTable, e.getMessage(), e);
throw new IOException("Re-bulkload failed for " + targetTable + ": " + e.getMessage(), e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: reuse the same error message for both log and exception

Suggested change
LOG.error("Re-bulkload failed for {}: {}", targetTable, e.getMessage(), e);
throw new IOException("Re-bulkload failed for " + targetTable + ": " + e.getMessage(), e);
String errorMessage = String.format("Re-bulkload failed for %s: %s", targetTable, e.getMessage());
LOG.error(errorMessage, e);
throw new IOException(errorMessage, e);

collectBulkFiles(sourceTable, targetTable, startTime, endTime, new Path(restoreRootDir));

if (bulkloadFiles.isEmpty()) {
LOG.info("No bulk-load files found for {} in range {}-{}. Skipping bulkload restore.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LOG.info("No bulk-load files found for {} in range {}-{}. Skipping bulkload restore.",
LOG.info("No bulk-load files found for {} in time range {}-{}. Skipping bulkload restore.",

Copy link
Contributor

@kgeisz kgeisz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one comment on a potential bug that may require a change and one nit comment. LGTM otherwise.

try {
jobDriver.createSubmittableJob(new String[] { "file:/only/one/arg" });
fail("Expected IOException for insufficient args");
} catch (Exception e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, you should only be catching IOException here since that is what you're expecting. Otherwise, the test catches every exception and always passes.

You can also add @Test(expected = IOException.class) to the top of your unit test and eliminate the try/catch block and fail() methods completely.

Comment on lines 152 to 157
protected void setup(Context context) throws IOException {
String[] tableMap = context.getConfiguration().getStrings(TABLES_KEY);
String[] tablesToUse = context.getConfiguration().getStrings(TABLES_KEY);
if (tableMap == null) {
tableMap = tablesToUse;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @taklwu. This looks like a bug to me. Maybe you meant to use TABLE_MAP_KEY for tableMap?

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

Copy link
Contributor

@taklwu taklwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@anmolnar anmolnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Please check unit test failure. If unrelated, let's ship it.

Copy link
Contributor

@kgeisz kgeisz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 34s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ HBASE-28957 Compile Tests _
+0 🆗 mvndep 0m 11s Maven dependency ordering for branch
+1 💚 mvninstall 3m 22s HBASE-28957 passed
+1 💚 compile 1m 13s HBASE-28957 passed
-0 ⚠️ checkstyle 0m 10s /buildtool-branch-checkstyle-hbase-backup.txt The patch fails to run checkstyle in hbase-backup
+1 💚 spotbugs 1m 1s HBASE-28957 passed
+1 💚 spotless 0m 51s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 13s Maven dependency ordering for patch
+1 💚 mvninstall 3m 7s the patch passed
+1 💚 compile 1m 9s the patch passed
-0 ⚠️ javac 0m 33s /results-compile-javac-hbase-backup.txt hbase-backup generated 4 new + 138 unchanged - 0 fixed = 142 total (was 138)
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 9s /buildtool-patch-checkstyle-hbase-backup.txt The patch fails to run checkstyle in hbase-backup
+1 💚 spotbugs 1m 14s the patch passed
+1 💚 hadoopcheck 12m 10s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 0m 45s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 18s The patch does not generate ASF License warnings.
34m 43s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7300/8/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #7300
JIRA Issue HBASE-29521
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 2e28d8651ca3 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision HBASE-28957 / ca010c1
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 85 (vs. ulimit of 30000)
modules C: hbase-mapreduce hbase-backup U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7300/8/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 30s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ HBASE-28957 Compile Tests _
+0 🆗 mvndep 0m 30s Maven dependency ordering for branch
+1 💚 mvninstall 3m 18s HBASE-28957 passed
+1 💚 compile 0m 40s HBASE-28957 passed
+1 💚 javadoc 0m 29s HBASE-28957 passed
+1 💚 shadedjars 6m 4s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 14s Maven dependency ordering for patch
+1 💚 mvninstall 2m 56s the patch passed
+1 💚 compile 0m 40s the patch passed
+1 💚 javac 0m 40s the patch passed
-0 ⚠️ javadoc 0m 14s /results-javadoc-javadoc-hbase-backup.txt hbase-backup generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0)
+1 💚 shadedjars 6m 4s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
-1 ❌ unit 31m 42s /patch-unit-hbase-mapreduce.txt hbase-mapreduce in the patch failed.
+1 💚 unit 18m 51s hbase-backup in the patch passed.
73m 51s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7300/8/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #7300
JIRA Issue HBASE-29521
Optional Tests javac javadoc unit compile shadedjars
uname Linux ff28021d7371 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision HBASE-28957 / ca010c1
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7300/8/testReport/
Max. process+thread count 3700 (vs. ulimit of 30000)
modules C: hbase-mapreduce hbase-backup U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7300/8/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@vinayakphegde
Copy link
Contributor Author

@anmolnar , this test case doesn't appear to be connected to our change. It's a completely unrelated test, but it keeps failing consistently. I'm not sure what might be causing this.

@ankitsol
Copy link

Looks good to me

Copy link
Contributor

@Kota-SH Kota-SH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@taklwu taklwu merged commit 6e9561e into apache:HBASE-28957 Sep 25, 2025
1 check failed
anmolnar pushed a commit that referenced this pull request Nov 6, 2025
Signed-off-by: Tak Lon (Stephen) Wu <taklwu@apache.org>
Signed-off-by: Andor Molnár andor@apache.org
Reviewed by: Kevin Geiszler <kevin.j.geiszler@gmail.com>
Reviewed by: Kota-SH <shanmukhaharipriya@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants