-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-28810 Improve BackupLogCleaner naming, debug logging #6195
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/master/BackupLogCleaner.java
Outdated
Show resolved
Hide resolved
@@ -80,8 +81,13 @@ public void init(Map<String, Object> params) { | |||
} | |||
} | |||
|
|||
private Map<Address, Long> getServersToOldestBackupMapping(List<BackupInfo> backups) | |||
throws IOException { | |||
private Map<Address, Long> getServerToLastBackupTs(List<BackupInfo> backups) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imho this method name is still confusing. What's the relationship between "last" and "oldest"? Can backups not be chronologically ordered? How about getServerToOldestBackupTS
, and the log message can talk about cleaning WALs that are older than the oldest backups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Last" could indeed be interpreted as "most recent". I also prefer "Oldest" (or "Earliest")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestions, ty both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, to clarify, we don't want to clean WALs that are older than the oldest backup. We want to clean WALs that are older than the newest backup. It's confusing because this was literally called the opposite, getServersToOldestBackupMapping
, but I believe newest backup makes sense and the implementation clearly fetches the newest, despite what the name suggests: https://github.com/apache/hbase/blob/master/hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/master/BackupLogCleaner.java#L83-L102
private Map<Address, Long> getServersToOldestBackupMapping(List<BackupInfo> backups)
throws IOException {
Map<Address, Long> serverAddressToLastBackupMap = new HashMap<>();
Map<TableName, Long> tableNameBackupInfoMap = new HashMap<>();
for (BackupInfo backupInfo : backups) {
for (TableName table : backupInfo.getTables()) {
tableNameBackupInfoMap.putIfAbsent(table, backupInfo.getStartTs());
if (tableNameBackupInfoMap.get(table) <= backupInfo.getStartTs()) {
// ie, if backup is *newer* than what we've already mapped, then overwrite
tableNameBackupInfoMap.put(table, backupInfo.getStartTs());
for (Map.Entry<String, Long> entry : backupInfo.getTableSetTimestampMap().get(table)
.entrySet()) {
serverAddressToLastBackupMap.put(Address.fromString(entry.getKey()), entry.getValue());
}
}
}
}
return serverAddressToLastBackupMap;
}
So, by last
, I meant most recent
. And it was still confusing! I'm going to refactor to newest
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pro naming this newest
.
Realizing now that this tracks the newest backup, it means this logic also contains a bug: it does not keep track of multi-root backups. I.e. if there are 2 backups roots R1 and R2, where R1 was backed up 1 week ago, and R2 today, the WALS needed for R1 could be deleted (= data loss).
Perhaps log that as a separate issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also a good idea to use "newest" or "most recent" in the logs produced in this class, rather than "latest" or "last".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Realizing now that this tracks the newest backup, it means this logic also contains a bug: it does not keep track of multi-root backups. I.e. if there are 2 backups roots R1 and R2, where R1 was backed up 1 week ago, and R2 today, the WALS needed for R1 could be deleted (= data loss).
Perhaps log that as a separate issue?
Agreed this sounds like a separate, but very real, issue. This multi-root case seems to be a gift that keeps giving...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hadn't seen an issue for this passing by yet, so logged it as https://issues.apache.org/jira/browse/HBASE-28833
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/master/BackupLogCleaner.java
Show resolved
Hide resolved
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/master/BackupLogCleaner.java
Show resolved
Hide resolved
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
de53276
to
a756f53
Compare
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love it.
@rmdmattingly friendly reminder to be consistent with Jira issue titles and git commit summaries. These need to match, as per the comment in https://hbase.apache.org/book.html#committing.patches
Co-authored-by: Ray Mattingly <rmattingly@hubspot.com> Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Co-authored-by: Ray Mattingly <rmattingly@hubspot.com> Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Co-authored-by: Ray Mattingly <rmattingly@hubspot.com> Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Co-authored-by: Ray Mattingly <rmattingly@hubspot.com> Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Co-authored-by: Ray Mattingly <rmattingly@hubspot.com> Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Co-authored-by: Ray Mattingly <rmattingly@hubspot.com> Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Co-authored-by: Ray Mattingly <rmattingly@hubspot.com> Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
…) (#113) Signed-off-by: Nick Dimiduk <ndimiduk@apache.org> Co-authored-by: Ray Mattingly <rmattingly@hubspot.com>
https://issues.apache.org/jira/browse/HBASE-28810
While implementing HBase's incremental backups across a few hundred clusters, we continue to step on some rakes. Now and again, we find old WALs piling up due to a poorly cleaned up BackupInfo, or a bug in the BackupLogCleaner, etc.
The BackupLogCleaner is difficult to debug for a couple of reasons:
This small refactor improves the logging and naming
cc @ndimiduk