Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-28810 Improve BackupLogCleaner naming, debug logging #6195

Merged
merged 3 commits into from
Sep 9, 2024

Conversation

rmdmattingly
Copy link
Contributor

https://issues.apache.org/jira/browse/HBASE-28810

While implementing HBase's incremental backups across a few hundred clusters, we continue to step on some rakes. Now and again, we find old WALs piling up due to a poorly cleaned up BackupInfo, or a bug in the BackupLogCleaner, etc.

The BackupLogCleaner is difficult to debug for a couple of reasons:

This small refactor improves the logging and naming

cc @ndimiduk

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@@ -80,8 +81,13 @@ public void init(Map<String, Object> params) {
}
}

private Map<Address, Long> getServersToOldestBackupMapping(List<BackupInfo> backups)
throws IOException {
private Map<Address, Long> getServerToLastBackupTs(List<BackupInfo> backups) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imho this method name is still confusing. What's the relationship between "last" and "oldest"? Can backups not be chronologically ordered? How about getServerToOldestBackupTS, and the log message can talk about cleaning WALs that are older than the oldest backups.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Last" could indeed be interpreted as "most recent". I also prefer "Oldest" (or "Earliest")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestions, ty both

Copy link
Contributor Author

@rmdmattingly rmdmattingly Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, to clarify, we don't want to clean WALs that are older than the oldest backup. We want to clean WALs that are older than the newest backup. It's confusing because this was literally called the opposite, getServersToOldestBackupMapping, but I believe newest backup makes sense and the implementation clearly fetches the newest, despite what the name suggests: https://github.com/apache/hbase/blob/master/hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/master/BackupLogCleaner.java#L83-L102

  private Map<Address, Long> getServersToOldestBackupMapping(List<BackupInfo> backups)
    throws IOException {
    Map<Address, Long> serverAddressToLastBackupMap = new HashMap<>();

    Map<TableName, Long> tableNameBackupInfoMap = new HashMap<>();
    for (BackupInfo backupInfo : backups) {
      for (TableName table : backupInfo.getTables()) {
        tableNameBackupInfoMap.putIfAbsent(table, backupInfo.getStartTs());
        if (tableNameBackupInfoMap.get(table) <= backupInfo.getStartTs()) {
          // ie, if backup is *newer* than what we've already mapped, then overwrite
          tableNameBackupInfoMap.put(table, backupInfo.getStartTs());
          for (Map.Entry<String, Long> entry : backupInfo.getTableSetTimestampMap().get(table)
            .entrySet()) {
            serverAddressToLastBackupMap.put(Address.fromString(entry.getKey()), entry.getValue());
          }
        }
      }
    }

    return serverAddressToLastBackupMap;
  }

So, by last, I meant most recent. And it was still confusing! I'm going to refactor to newest?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pro naming this newest.

Realizing now that this tracks the newest backup, it means this logic also contains a bug: it does not keep track of multi-root backups. I.e. if there are 2 backups roots R1 and R2, where R1 was backed up 1 week ago, and R2 today, the WALS needed for R1 could be deleted (= data loss).
Perhaps log that as a separate issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also a good idea to use "newest" or "most recent" in the logs produced in this class, rather than "latest" or "last".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realizing now that this tracks the newest backup, it means this logic also contains a bug: it does not keep track of multi-root backups. I.e. if there are 2 backups roots R1 and R2, where R1 was backed up 1 week ago, and R2 today, the WALS needed for R1 could be deleted (= data loss).
Perhaps log that as a separate issue?

Agreed this sounds like a separate, but very real, issue. This multi-root case seems to be a gift that keeps giving...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hadn't seen an issue for this passing by yet, so logged it as https://issues.apache.org/jira/browse/HBASE-28833

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 28s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+1 💚 mvninstall 3m 14s master passed
+1 💚 compile 0m 29s master passed
+1 💚 checkstyle 0m 9s master passed
+1 💚 spotbugs 0m 28s master passed
+1 💚 spotless 0m 45s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 2s the patch passed
+1 💚 compile 0m 28s the patch passed
+1 💚 javac 0m 28s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 9s the patch passed
+1 💚 spotbugs 0m 34s the patch passed
+1 💚 hadoopcheck 11m 33s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 0m 44s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 9s The patch does not generate ASF License warnings.
29m 22s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6195/3/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6195
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 2e1f2f593781 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / a756f53
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 84 (vs. ulimit of 30000)
modules C: hbase-backup U: hbase-backup
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6195/3/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 42s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 3m 40s master passed
+1 💚 compile 0m 24s master passed
+1 💚 javadoc 0m 17s master passed
+1 💚 shadedjars 5m 45s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 2s the patch passed
+1 💚 compile 0m 20s the patch passed
+1 💚 javac 0m 20s the patch passed
+1 💚 javadoc 0m 14s the patch passed
+1 💚 shadedjars 5m 28s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 11m 1s hbase-backup in the patch passed.
32m 0s
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6195/3/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6195
Optional Tests javac javadoc unit compile shadedjars
uname Linux 7554148176b5 5.4.0-192-generic #212-Ubuntu SMP Fri Jul 5 09:47:39 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / a756f53
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6195/3/testReport/
Max. process+thread count 3438 (vs. ulimit of 30000)
modules C: hbase-backup U: hbase-backup
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6195/3/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Member

@ndimiduk ndimiduk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love it.

@rmdmattingly friendly reminder to be consistent with Jira issue titles and git commit summaries. These need to match, as per the comment in https://hbase.apache.org/book.html#committing.patches

@ndimiduk ndimiduk merged commit 53ca883 into apache:master Sep 9, 2024
1 check passed
@ndimiduk ndimiduk deleted the HBASE-28810 branch September 9, 2024 09:51
ndimiduk pushed a commit to ndimiduk/hbase that referenced this pull request Sep 9, 2024
Co-authored-by: Ray Mattingly <rmattingly@hubspot.com>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
ndimiduk pushed a commit to ndimiduk/hbase that referenced this pull request Sep 9, 2024
Co-authored-by: Ray Mattingly <rmattingly@hubspot.com>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
ndimiduk pushed a commit to ndimiduk/hbase that referenced this pull request Sep 9, 2024
Co-authored-by: Ray Mattingly <rmattingly@hubspot.com>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
ndimiduk pushed a commit that referenced this pull request Sep 9, 2024
Co-authored-by: Ray Mattingly <rmattingly@hubspot.com>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
ndimiduk pushed a commit that referenced this pull request Sep 9, 2024
Co-authored-by: Ray Mattingly <rmattingly@hubspot.com>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
ndimiduk pushed a commit that referenced this pull request Sep 9, 2024
Co-authored-by: Ray Mattingly <rmattingly@hubspot.com>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
rmdmattingly added a commit to HubSpot/hbase that referenced this pull request Sep 17, 2024
Co-authored-by: Ray Mattingly <rmattingly@hubspot.com>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
rmdmattingly added a commit to HubSpot/hbase that referenced this pull request Oct 8, 2024
…) (#113)

Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Co-authored-by: Ray Mattingly <rmattingly@hubspot.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants