Skip to content

Conversation

@vinayakphegde
Copy link
Contributor

No description provided.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

Copy link
Contributor

@kgeisz kgeisz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me overall, aside from one nit comment.

One more thing, do you think a lot of these System.err/out.println() statements can be replaced with LOG.info/error()? I know we want to give some feedback to the user via the Terminal, but it seems like a lot of these messages should go to the log (like the messages in BackupCommands.updateBackupTableStartTimes(), BackupCommands. deleteOldWALFiles(), etc.)

Configuration conf = getConf() != null ? getConf() : HBaseConfiguration.create();
String backupWalDir = conf.get(CONF_CONTINUOUS_BACKUP_WAL_DIR);

if (backupWalDir == null || backupWalDir.isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - You can use Strings.isNullOrEmpty() from org.apache.hbase.thirdparty.com.google.common.base

Suggested change
if (backupWalDir == null || backupWalDir.isEmpty()) {
if (Strings.isNullOrEmpty(backupWalDir)) {

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@vinayakphegde
Copy link
Contributor Author

vinayakphegde commented Jun 3, 2025

One more thing, do you think a lot of these System.err/out.println() statements can be replaced with LOG.info/error()? I know we want to give some feedback to the user via the Terminal, but it seems like a lot of these messages should go to the log (like the messages in BackupCommands.updateBackupTableStartTimes(), BackupCommands. deleteOldWALFiles(), etc.)

Good point. we have lot of println lines everywhere in backup and restore code. let me create a new Jira to address this issue.

return;
}

try (Connection conn = ConnectionFactory.createConnection(conf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: avoid using generic name like conn, use specific like masterConn

return;
}

try (Connection conn = ConnectionFactory.createConnection(conf);
Copy link
Contributor

@abhradeepkundu abhradeepkundu Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This connection creation is unnecessary I feel. Super class already has a connection open. Please verify If you can reuse it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, we'll reuse that!

// If WAL files of that day are older than cutoff time, delete them
if (dayStart + ONE_DAY_IN_MILLISECONDS - 1 < cutoffTime) {
System.out.println("Deleting outdated WAL directory: " + dirPath);
fs.delete(dirPath, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is an api to delete in batches, we should use it. Also based on the nos of the file you are deleting this method can take lot of time. May be we can asynchronous here. Please give a thought

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is an api to delete in batches, we should use it.

Yeah, I checked but couldn’t find any API that supports batch deletion.

Also based on the nos of the file you are deleting this method can take lot of time. May be we can asynchronous here. Please give a thought

About going async — it’s a good idea, but it might add some complexity. We’d need to track if the delete actually finished, retry on failure, and maybe notify the user when it’s done.

So we should probably think about whether the added complexity is worth the gain. Also, right now, all our backup and restore commands (like full backup, incremental, restore) are synchronous anyway, and those can take hours.

I think async is definitely a good direction — just that it probably makes sense to build a proper framework around it first, so we can handle retries, tracking, and notifications across the board. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets build a job co-ordinator framework with zookeeper. We should build that outside the scope of this ticket off course.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me create a jira for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point guys, but before going down this rabbit hole, please do some performance tests for justification. Try to delete 100, 10000 and 1 million files in a single directory and share how much time does it take synchronously. Delete/unlink operations should be relatively quick in any filesystem, but let's see how it works with S3.

Copy link
Contributor

@abhradeepkundu abhradeepkundu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One discussion point. One change request.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

*/
public void updateContinuousBackupTableSet(Set<TableName> tablesToUpdate, long newStartTimestamp)
throws IOException {
try (Table table = connection.getTable(tableName)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Add a null check for tablesToUpdate

Copy link
Contributor

@abhradeepkundu abhradeepkundu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more minor comment, But overall LGTM

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

Copy link
Contributor

@anmolnar anmolnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vinayakphegde, patch looks good to me. However I have the same criticism that I mentioned previously: unit tests are missing.

Since all of your helper methods are private you cannot test them individually, so you need to set up an entire starship in your test case, call the command and verify the output. This is end 2 end testing. You will get a yes/no answer to your question about whether my function is working. If the answer is yes, we're fine, but if it's no, you'll have no idea about where the problem is and you have to debug.

Unit testing individual methods gives more detail about what's working and what's not.

// If WAL files of that day are older than cutoff time, delete them
if (dayStart + ONE_DAY_IN_MILLISECONDS - 1 < cutoffTime) {
System.out.println("Deleting outdated WAL directory: " + dirPath);
fs.delete(dirPath, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point guys, but before going down this rabbit hole, please do some performance tests for justification. Try to delete 100, 10000 and 1 million files in a single directory and share how much time does it take synchronously. Delete/unlink operations should be relatively quick in any filesystem, but let's see how it works with S3.

Comment on lines 947 to 952
/**
* Updates the start time for continuous backups if older than cutoff timestamp.
* @param sysTable Backup system table
* @param cutoffTimestamp Timestamp before which WALs are no longer needed
*/
private void updateBackupTableStartTimes(BackupSystemTable sysTable, long cutoffTimestamp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @vinayakphegde, this is the function that led me to ask for clarification on why we need to update the start times of the continuous backups. Maybe you could add another line or two to the docstring here that elaborates on why we need to do this? That may make it more clear to others in the future.

@vinayakphegde
Copy link
Contributor Author

Thanks @vinayakphegde, patch looks good to me. However I have the same criticism that I mentioned previously: unit tests are missing.

Since all of your helper methods are private you cannot test them individually, so you need to set up an entire starship in your test case, call the command and verify the output. This is end 2 end testing. You will get a yes/no answer to your question about whether my function is working. If the answer is yes, we're fine, but if it's no, you'll have no idea about where the problem is and you have to debug.

Unit testing individual methods gives more detail about what's working and what's not.

Sure @anmolnar I will try to add more unit tests to cover these methods.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-inline</artifactId>
<version>4.11.0</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to set version here, because it's already in the top level pom.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 33s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ HBASE-28957 Compile Tests _
+1 💚 mvninstall 3m 3s HBASE-28957 passed
+1 💚 compile 0m 29s HBASE-28957 passed
-0 ⚠️ checkstyle 0m 9s /buildtool-branch-checkstyle-hbase-backup.txt The patch fails to run checkstyle in hbase-backup
+1 💚 spotbugs 0m 28s HBASE-28957 passed
+1 💚 spotless 0m 43s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 1s the patch passed
+1 💚 compile 0m 29s the patch passed
-0 ⚠️ javac 0m 29s /results-compile-javac-hbase-backup.txt hbase-backup generated 5 new + 109 unchanged - 0 fixed = 114 total (was 109)
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 9s /buildtool-patch-checkstyle-hbase-backup.txt The patch fails to run checkstyle in hbase-backup
+1 💚 xmllint 0m 0s No new issues.
+1 💚 spotbugs 0m 36s the patch passed
+1 💚 hadoopcheck 11m 45s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 0m 44s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 9s The patch does not generate ASF License warnings.
29m 51s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7007/7/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #7007
JIRA Issue HBASE-29255
Optional Tests dupname asflicense javac codespell detsecrets xmllint hadoopcheck spotless compile spotbugs checkstyle hbaseanti
uname Linux 09aa76675c8b 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision HBASE-28957 / e3c24f0
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 83 (vs. ulimit of 30000)
modules C: hbase-backup U: hbase-backup
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7007/7/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3 xmllint=20913
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 43s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ HBASE-28957 Compile Tests _
+1 💚 mvninstall 3m 23s HBASE-28957 passed
+1 💚 compile 0m 20s HBASE-28957 passed
+1 💚 javadoc 0m 14s HBASE-28957 passed
+1 💚 shadedjars 6m 1s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 10s the patch passed
+1 💚 compile 0m 20s the patch passed
+1 💚 javac 0m 20s the patch passed
+1 💚 javadoc 0m 13s the patch passed
+1 💚 shadedjars 5m 54s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
-1 ❌ unit 22m 20s /patch-unit-hbase-backup.txt hbase-backup in the patch failed.
43m 46s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7007/7/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #7007
JIRA Issue HBASE-29255
Optional Tests javac javadoc unit shadedjars compile
uname Linux 4e665ef9b5fe 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision HBASE-28957 / e3c24f0
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7007/7/testReport/
Max. process+thread count 3293 (vs. ulimit of 30000)
modules C: hbase-backup U: hbase-backup
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7007/7/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@anmolnar anmolnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.
@vinayakphegde The test failures don't look good, you might want to keep an eye on them. They're all incremental backup related, but might nothing to do with this patch.

@vinayakphegde
Copy link
Contributor Author

@anmolnar all the tests are incremental backup related. we are trying fix those tests separately.

@anmolnar anmolnar merged commit 530feba into apache:HBASE-28957 Jun 11, 2025
1 check failed
anmolnar pushed a commit that referenced this pull request Jul 28, 2025
…nd (#7007)

* Store bulkload files in daywise bucket as well

* Integrate backup WAL cleanup logic with the delete command

* address the review comments

* address the review comments

* address the review comments

* add more unit tests to cover all cases

* address the review comments
vinayakphegde added a commit to vinayakphegde/hbase that referenced this pull request Jul 29, 2025
…nd (apache#7007)

* Store bulkload files in daywise bucket as well

* Integrate backup WAL cleanup logic with the delete command

* address the review comments

* address the review comments

* address the review comments

* add more unit tests to cover all cases

* address the review comments
vinayakphegde added a commit to vinayakphegde/hbase that referenced this pull request Jul 29, 2025
…nd (apache#7007)

* Store bulkload files in daywise bucket as well

* Integrate backup WAL cleanup logic with the delete command

* address the review comments

* address the review comments

* address the review comments

* add more unit tests to cover all cases

* address the review comments
anmolnar pushed a commit that referenced this pull request Sep 11, 2025
…nd (#7007)

* Store bulkload files in daywise bucket as well

* Integrate backup WAL cleanup logic with the delete command

* address the review comments

* address the review comments

* address the review comments

* add more unit tests to cover all cases

* address the review comments
kgeisz pushed a commit to kgeisz/hbase that referenced this pull request Sep 15, 2025
…nd (apache#7007)

* Store bulkload files in daywise bucket as well

* Integrate backup WAL cleanup logic with the delete command

* address the review comments

* address the review comments

* address the review comments

* add more unit tests to cover all cases

* address the review comments
kgeisz pushed a commit to kgeisz/hbase that referenced this pull request Sep 30, 2025
…nd (apache#7007)

* Store bulkload files in daywise bucket as well

* Integrate backup WAL cleanup logic with the delete command

* address the review comments

* address the review comments

* address the review comments

* add more unit tests to cover all cases

* address the review comments
anmolnar pushed a commit that referenced this pull request Nov 6, 2025
…nd (#7007)

* Store bulkload files in daywise bucket as well

* Integrate backup WAL cleanup logic with the delete command

* address the review comments

* address the review comments

* address the review comments

* add more unit tests to cover all cases

* address the review comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants