Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-28680 BackupLogCleaner should clean up archived HMaster logs #6006

Merged
merged 1 commit into from
Jun 21, 2024

Conversation

rmdmattingly
Copy link
Contributor

@rmdmattingly rmdmattingly commented Jun 20, 2024

See https://issues.apache.org/jira/browse/HBASE-28680

We have been trying to setup daily incremental backups for hundreds of clusters at my day job. Recently we discovered that old WALs were piling up across many clusters inline with when we began running incremental backups.

This led to the realization that the BackupLogCleaner will always skip archived HMaster WALs. This is a problem because, if a cleaner is skipping a given file, then the CleanerChore will never delete it.

At first this seemed like a misunderstanding of what it means to "skip" a WAL in a BaseLogCleanerDelegate, but when working through this code I now believe it is more of an ordinary bug that forgot to take archived HMaster WALs into account

@ndimiduk @charlesconnell @hgromer

Comment on lines -126 to -127
String fn = file.getPath().getName();
if (fn.startsWith(WALProcedureStore.LOG_PREFIX)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic was already in place for marking fresh HMaster WALs as deletable — I believe we just forgot to consider archived ones too

Comment on lines -132 to 130
try {
Address walServerAddress =
Address.fromString(BackupUtils.parseHostNameFromLogFile(file.getPath()));
long walTimestamp = AbstractFSWALProvider.getTimestamp(file.getPath().getName());

if (
!addressToLastBackupMap.containsKey(walServerAddress)
|| addressToLastBackupMap.get(walServerAddress) >= walTimestamp
) {
filteredFiles.add(file);
}
} catch (Exception ex) {
LOG.warn(
"Error occurred while filtering file: {} with error: {}. Ignoring cleanup of this log",
file.getPath(), ex.getMessage());
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this logic to a separate method to improve testability

Comment on lines +168 to +188
try {
String hostname = BackupUtils.parseHostNameFromLogFile(path);
if (hostname == null) {
LOG.warn(
"Cannot parse hostname from RegionServer WAL file: {}. Ignoring cleanup of this log",
path);
return false;
}
Address walServerAddress = Address.fromString(hostname);
long walTimestamp = AbstractFSWALProvider.getTimestamp(path.getName());

if (
!addressToLastBackupMap.containsKey(walServerAddress)
|| addressToLastBackupMap.get(walServerAddress) >= walTimestamp
) {
return true;
}
} catch (Exception ex) {
LOG.warn("Error occurred while filtering file: {}. Ignoring cleanup of this log", path, ex);
return false;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is largely the same as what was removed above, with two key differences:

  1. Of course, it will now return archived HMaster WALs as deletable too
  2. Better handling of the possible null from BackupUtils#parseHostNameFromLogFile. It was difficult to reason about this error without a patch in our fork because, from the user's perspective, we were previously just hitting an NPE when constructing an Address

Comment on lines +146 to +152
Path normalPath =
new Path("/hbase/oldWALs/hmaster%2C60000%2C1716224062663.1716247552189$masterlocalwal$");
assertTrue(BackupLogCleaner.canDeleteFile(Collections.emptyMap(), normalPath));

Path masterPath = new Path(
"/hbase/MasterData/oldWALs/hmaster%2C60000%2C1716224062663.1716247552189$masterlocalwal$");
assertTrue(BackupLogCleaner.canDeleteFile(Collections.emptyMap(), masterPath));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it is necessary to check both paths for backwards compatibility — I'm not totally sure when the directory changed, but the book tells me that it was likely around 2.3. Anyway, there's no additional business logic necessary to flag both paths, but I thought it was worth explicitly testing both

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 31s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 3m 15s master passed
+1 💚 compile 0m 24s master passed
+1 💚 checkstyle 0m 9s master passed
+1 💚 spotless 0m 44s branch has no errors when running spotless:check.
+1 💚 spotbugs 0m 29s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 52s the patch passed
+1 💚 compile 0m 22s the patch passed
+1 💚 javac 0m 22s the patch passed
+1 💚 checkstyle 0m 8s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 5m 29s Patch does not cause any errors with Hadoop 3.3.6.
+1 💚 spotless 0m 41s patch has no errors when running spotless:check.
+1 💚 spotbugs 0m 33s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 10s The patch does not generate ASF License warnings.
22m 35s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6006/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6006
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname Linux 539aed121e6d 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 62e7fe8
Default Java Eclipse Adoptium-11.0.17+8
Max. process+thread count 79 (vs. ulimit of 30000)
modules C: hbase-backup U: hbase-backup
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6006/1/console
versions git=2.34.1 maven=3.8.6 spotbugs=4.7.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 26s Docker mode activated.
-0 ⚠️ yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 2m 57s master passed
+1 💚 compile 0m 15s master passed
+1 💚 shadedjars 5m 17s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 13s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 41s the patch passed
+1 💚 compile 0m 17s the patch passed
+1 💚 javac 0m 17s the patch passed
+1 💚 shadedjars 5m 22s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 11s the patch passed
_ Other Tests _
+1 💚 unit 10m 44s hbase-backup in the patch passed.
29m 24s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6006/1/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6006
Optional Tests javac javadoc unit shadedjars compile
uname Linux b4731e9521e7 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 62e7fe8
Default Java Eclipse Adoptium-17.0.10+7
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6006/1/testReport/
Max. process+thread count 3850 (vs. ulimit of 30000)
modules C: hbase-backup U: hbase-backup
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6006/1/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 38s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
-1 ❌ mvninstall 3m 32s root in master failed.
+1 💚 compile 0m 22s master passed
+1 💚 shadedjars 7m 4s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 19s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 13s the patch passed
+1 💚 compile 0m 19s the patch passed
+1 💚 javac 0m 19s the patch passed
+1 💚 shadedjars 6m 22s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 14s the patch passed
_ Other Tests _
+1 💚 unit 13m 2s hbase-backup in the patch passed.
36m 4s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6006/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #6006
Optional Tests javac javadoc unit shadedjars compile
uname Linux b8b1eaccfa44 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 62e7fe8
Default Java Temurin-1.8.0_352-b08
mvninstall https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6006/1/artifact/yetus-jdk8-hadoop3-check/output/branch-mvninstall-root.txt
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6006/1/testReport/
Max. process+thread count 3727 (vs. ulimit of 30000)
modules C: hbase-backup U: hbase-backup
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6006/1/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 49s Docker mode activated.
-0 ⚠️ yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 3m 47s master passed
+1 💚 compile 0m 22s master passed
+1 💚 shadedjars 6m 31s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 16s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 34s the patch passed
+1 💚 compile 0m 23s the patch passed
+1 💚 javac 0m 23s the patch passed
+1 💚 shadedjars 6m 32s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 14s the patch passed
_ Other Tests _
+1 💚 unit 14m 25s hbase-backup in the patch passed.
37m 55s
Subsystem Report/Notes
Docker ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6006/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #6006
Optional Tests javac javadoc unit shadedjars compile
uname Linux b8a48366f93c 5.4.0-177-generic #197-Ubuntu SMP Thu Mar 28 22:45:47 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 62e7fe8
Default Java Eclipse Adoptium-11.0.17+8
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6006/1/testReport/
Max. process+thread count 3308 (vs. ulimit of 30000)
modules C: hbase-backup U: hbase-backup
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6006/1/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@rmdmattingly rmdmattingly changed the title BackupLogCleaner should clean up archived HMaster logs HBASE-28680 BackupLogCleaner should clean up archived HMaster logs Jun 20, 2024
@rmdmattingly
Copy link
Contributor Author

Pretty sure the build failure is just noise. I'm going to force push to rebuild.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 28s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 3m 19s master passed
+1 💚 compile 0m 25s master passed
+1 💚 checkstyle 0m 11s master passed
+1 💚 spotless 0m 45s branch has no errors when running spotless:check.
+1 💚 spotbugs 0m 31s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 57s the patch passed
+1 💚 compile 0m 23s the patch passed
+1 💚 javac 0m 23s the patch passed
+1 💚 checkstyle 0m 8s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 5m 38s Patch does not cause any errors with Hadoop 3.3.6.
+1 💚 spotless 0m 43s patch has no errors when running spotless:check.
+1 💚 spotbugs 0m 36s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 9s The patch does not generate ASF License warnings.
23m 12s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6006/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6006
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname Linux 61c6ea98ead5 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 62e7fe8
Default Java Eclipse Adoptium-11.0.17+8
Max. process+thread count 79 (vs. ulimit of 30000)
modules C: hbase-backup U: hbase-backup
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6006/2/console
versions git=2.34.1 maven=3.8.6 spotbugs=4.7.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 14s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 2m 55s master passed
+1 💚 compile 0m 19s master passed
+1 💚 shadedjars 5m 15s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 16s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 46s the patch passed
+1 💚 compile 0m 18s the patch passed
+1 💚 javac 0m 18s the patch passed
+1 💚 shadedjars 5m 17s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 14s the patch passed
_ Other Tests _
+1 💚 unit 11m 17s hbase-backup in the patch passed.
29m 57s
Subsystem Report/Notes
Docker ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6006/2/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #6006
Optional Tests javac javadoc unit shadedjars compile
uname Linux 14d3b81a8e0b 5.4.0-182-generic #202-Ubuntu SMP Fri Apr 26 12:29:36 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 62e7fe8
Default Java Eclipse Adoptium-11.0.17+8
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6006/2/testReport/
Max. process+thread count 3827 (vs. ulimit of 30000)
modules C: hbase-backup U: hbase-backup
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6006/2/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 43s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 2m 41s master passed
+1 💚 compile 0m 17s master passed
+1 💚 shadedjars 5m 9s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 15s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 27s the patch passed
+1 💚 compile 0m 17s the patch passed
+1 💚 javac 0m 17s the patch passed
+1 💚 shadedjars 5m 11s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 13s the patch passed
_ Other Tests _
+1 💚 unit 12m 33s hbase-backup in the patch passed.
31m 2s
Subsystem Report/Notes
Docker ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6006/2/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #6006
Optional Tests javac javadoc unit shadedjars compile
uname Linux eb97a599ab6f 5.4.0-174-generic #193-Ubuntu SMP Thu Mar 7 14:29:28 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 62e7fe8
Default Java Temurin-1.8.0_352-b08
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6006/2/testReport/
Max. process+thread count 3737 (vs. ulimit of 30000)
modules C: hbase-backup U: hbase-backup
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6006/2/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 46s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 4m 40s master passed
+1 💚 compile 0m 25s master passed
+1 💚 shadedjars 7m 26s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 19s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 4m 10s the patch passed
+1 💚 compile 0m 30s the patch passed
+1 💚 javac 0m 30s the patch passed
+1 💚 shadedjars 7m 39s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 17s the patch passed
_ Other Tests _
+1 💚 unit 12m 59s hbase-backup in the patch passed.
40m 25s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6006/2/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6006
Optional Tests javac javadoc unit shadedjars compile
uname Linux b20465729dce 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 62e7fe8
Default Java Eclipse Adoptium-17.0.10+7
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6006/2/testReport/
Max. process+thread count 3429 (vs. ulimit of 30000)
modules C: hbase-backup U: hbase-backup
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6006/2/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@ndimiduk ndimiduk merged commit f8aa3e2 into apache:master Jun 21, 2024
1 check passed
@ndimiduk ndimiduk deleted the HBASE-28680 branch June 21, 2024 13:18
ndimiduk pushed a commit to ndimiduk/hbase that referenced this pull request Jun 21, 2024
…ely (apache#6006)

We have been trying to setup daily incremental backups for hundreds of clusters at my day
job. Recently we discovered that old WALs were piling up across many clusters inline with when we
began running incremental backups.

This led to the realization that the BackupLogCleaner will always skip archived HMaster WALs. This
is a problem because, if a cleaner is skipping a given file, then the CleanerChore will never
delete it.

This seems like a misunderstanding of what it means to "skip" a WAL in a BaseLogCleanerDelegate,
and, instead, we should always return these HMaster WALs as deletable from the perspective of the
BackupLogCleaner. We could subject them to the same scrutiny as RegionServer WALs: are they older
than the most recent successful backup? But, if I understand correctly, HMaster WALs do not
contain any data relevant to table backups, so that would be unnecessary.

Co-authored-by: Ray Mattingly <rmattingly@hubspot.com>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
ndimiduk pushed a commit to ndimiduk/hbase that referenced this pull request Jun 21, 2024
…ely (apache#6006)

We have been trying to setup daily incremental backups for hundreds of clusters at my day
job. Recently we discovered that old WALs were piling up across many clusters inline with when we
began running incremental backups.

This led to the realization that the BackupLogCleaner will always skip archived HMaster WALs. This
is a problem because, if a cleaner is skipping a given file, then the CleanerChore will never
delete it.

This seems like a misunderstanding of what it means to "skip" a WAL in a BaseLogCleanerDelegate,
and, instead, we should always return these HMaster WALs as deletable from the perspective of the
BackupLogCleaner. We could subject them to the same scrutiny as RegionServer WALs: are they older
than the most recent successful backup? But, if I understand correctly, HMaster WALs do not
contain any data relevant to table backups, so that would be unnecessary.

Co-authored-by: Ray Mattingly <rmattingly@hubspot.com>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
ndimiduk pushed a commit to ndimiduk/hbase that referenced this pull request Jun 21, 2024
…ely (apache#6006)

We have been trying to setup daily incremental backups for hundreds of clusters at my day
job. Recently we discovered that old WALs were piling up across many clusters inline with when we
began running incremental backups.

This led to the realization that the BackupLogCleaner will always skip archived HMaster WALs. This
is a problem because, if a cleaner is skipping a given file, then the CleanerChore will never
delete it.

This seems like a misunderstanding of what it means to "skip" a WAL in a BaseLogCleanerDelegate,
and, instead, we should always return these HMaster WALs as deletable from the perspective of the
BackupLogCleaner. We could subject them to the same scrutiny as RegionServer WALs: are they older
than the most recent successful backup? But, if I understand correctly, HMaster WALs do not
contain any data relevant to table backups, so that would be unnecessary.

Co-authored-by: Ray Mattingly <rmattingly@hubspot.com>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
ndimiduk pushed a commit that referenced this pull request Jun 21, 2024
…ely (#6006)

We have been trying to setup daily incremental backups for hundreds of clusters at my day
job. Recently we discovered that old WALs were piling up across many clusters inline with when we
began running incremental backups.

This led to the realization that the BackupLogCleaner will always skip archived HMaster WALs. This
is a problem because, if a cleaner is skipping a given file, then the CleanerChore will never
delete it.

This seems like a misunderstanding of what it means to "skip" a WAL in a BaseLogCleanerDelegate,
and, instead, we should always return these HMaster WALs as deletable from the perspective of the
BackupLogCleaner. We could subject them to the same scrutiny as RegionServer WALs: are they older
than the most recent successful backup? But, if I understand correctly, HMaster WALs do not
contain any data relevant to table backups, so that would be unnecessary.

Co-authored-by: Ray Mattingly <rmattingly@hubspot.com>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
ndimiduk pushed a commit that referenced this pull request Jun 21, 2024
…ely (#6006)

We have been trying to setup daily incremental backups for hundreds of clusters at my day
job. Recently we discovered that old WALs were piling up across many clusters inline with when we
began running incremental backups.

This led to the realization that the BackupLogCleaner will always skip archived HMaster WALs. This
is a problem because, if a cleaner is skipping a given file, then the CleanerChore will never
delete it.

This seems like a misunderstanding of what it means to "skip" a WAL in a BaseLogCleanerDelegate,
and, instead, we should always return these HMaster WALs as deletable from the perspective of the
BackupLogCleaner. We could subject them to the same scrutiny as RegionServer WALs: are they older
than the most recent successful backup? But, if I understand correctly, HMaster WALs do not
contain any data relevant to table backups, so that would be unnecessary.

Co-authored-by: Ray Mattingly <rmattingly@hubspot.com>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
ndimiduk pushed a commit that referenced this pull request Jun 21, 2024
…ely (#6006)

We have been trying to setup daily incremental backups for hundreds of clusters at my day
job. Recently we discovered that old WALs were piling up across many clusters inline with when we
began running incremental backups.

This led to the realization that the BackupLogCleaner will always skip archived HMaster WALs. This
is a problem because, if a cleaner is skipping a given file, then the CleanerChore will never
delete it.

This seems like a misunderstanding of what it means to "skip" a WAL in a BaseLogCleanerDelegate,
and, instead, we should always return these HMaster WALs as deletable from the perspective of the
BackupLogCleaner. We could subject them to the same scrutiny as RegionServer WALs: are they older
than the most recent successful backup? But, if I understand correctly, HMaster WALs do not
contain any data relevant to table backups, so that would be unnecessary.

Co-authored-by: Ray Mattingly <rmattingly@hubspot.com>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
rmdmattingly added a commit to HubSpot/hbase that referenced this pull request Jun 24, 2024
…ely (apache#6006)

We have been trying to setup daily incremental backups for hundreds of clusters at my day
job. Recently we discovered that old WALs were piling up across many clusters inline with when we
began running incremental backups.

This led to the realization that the BackupLogCleaner will always skip archived HMaster WALs. This
is a problem because, if a cleaner is skipping a given file, then the CleanerChore will never
delete it.

This seems like a misunderstanding of what it means to "skip" a WAL in a BaseLogCleanerDelegate,
and, instead, we should always return these HMaster WALs as deletable from the perspective of the
BackupLogCleaner. We could subject them to the same scrutiny as RegionServer WALs: are they older
than the most recent successful backup? But, if I understand correctly, HMaster WALs do not
contain any data relevant to table backups, so that would be unnecessary.

Co-authored-by: Ray Mattingly <rmattingly@hubspot.com>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
rmdmattingly added a commit to HubSpot/hbase that referenced this pull request Jun 24, 2024
…ely (apache#6006)

We have been trying to setup daily incremental backups for hundreds of clusters at my day
job. Recently we discovered that old WALs were piling up across many clusters inline with when we
began running incremental backups.

This led to the realization that the BackupLogCleaner will always skip archived HMaster WALs. This
is a problem because, if a cleaner is skipping a given file, then the CleanerChore will never
delete it.

This seems like a misunderstanding of what it means to "skip" a WAL in a BaseLogCleanerDelegate,
and, instead, we should always return these HMaster WALs as deletable from the perspective of the
BackupLogCleaner. We could subject them to the same scrutiny as RegionServer WALs: are they older
than the most recent successful backup? But, if I understand correctly, HMaster WALs do not
contain any data relevant to table backups, so that would be unnecessary.

Co-authored-by: Ray Mattingly <rmattingly@hubspot.com>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
rmdmattingly added a commit to HubSpot/hbase that referenced this pull request Jul 8, 2024
… pile up indefinitely (apache#6006) (#101)

We have been trying to setup daily incremental backups for hundreds of clusters at my day
job. Recently we discovered that old WALs were piling up across many clusters inline with when we
began running incremental backups.

This led to the realization that the BackupLogCleaner will always skip archived HMaster WALs. This
is a problem because, if a cleaner is skipping a given file, then the CleanerChore will never
delete it.

This seems like a misunderstanding of what it means to "skip" a WAL in a BaseLogCleanerDelegate,
and, instead, we should always return these HMaster WALs as deletable from the perspective of the
BackupLogCleaner. We could subject them to the same scrutiny as RegionServer WALs: are they older
than the most recent successful backup? But, if I understand correctly, HMaster WALs do not
contain any data relevant to table backups, so that would be unnecessary.

Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Co-authored-by: Ray Mattingly <rmattingly@hubspot.com>
rmdmattingly added a commit to HubSpot/hbase that referenced this pull request Jul 8, 2024
… pile up indefinitely (apache#6006) (#100)

We have been trying to setup daily incremental backups for hundreds of clusters at my day
job. Recently we discovered that old WALs were piling up across many clusters inline with when we
began running incremental backups.

This led to the realization that the BackupLogCleaner will always skip archived HMaster WALs. This
is a problem because, if a cleaner is skipping a given file, then the CleanerChore will never
delete it.

This seems like a misunderstanding of what it means to "skip" a WAL in a BaseLogCleanerDelegate,
and, instead, we should always return these HMaster WALs as deletable from the perspective of the
BackupLogCleaner. We could subject them to the same scrutiny as RegionServer WALs: are they older
than the most recent successful backup? But, if I understand correctly, HMaster WALs do not
contain any data relevant to table backups, so that would be unnecessary.

Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Co-authored-by: Ray Mattingly <rmattingly@hubspot.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants