-
Notifications
You must be signed in to change notification settings - Fork 3.4k
[HBASE-29520] Utilize Backed-up Bulkloaded Files in Incremental Backup #7246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
taklwu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only have a minor comment for the warning message , otherwise lgtm
| LOG.debug("Backup bulkload file found {}", fullBulkLoadBackupPath); | ||
| p = fullBulkLoadBackupPath; | ||
| } else { | ||
| LOG.warn("Backup bulkload file not found {}", fullBulkLoadBackupPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed we're hitting the following
1/ it does not have any bulkload previously,
2/ something is wrong?
so, should we ask user to check if this is expected in the warning message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In non-continuous incremental backup approach, bulkload files are copied directly from the source cluster to the backup location.
In continuous backup approach, these files are instead copied from the bulkload backup location. I’ve added a warning here because if the required bulkload backup files are missing, they would be copied from the source cluster (as a fallback mechanism).
Source cluster files are only deleted after a successful full or incremental backup (in both non-continuous and continuous)
Ideally, with continuous backups, once a bulkload file has been backed up by the replication endpoint, it could be safely deleted from the source cluster. However, doing so would significantly complicate the checkpoint logic, since it would then depend on both WAL flushes and bulkload backups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In continuous backup approach, these files are instead copied from the bulkload backup location. I’ve added a warning here because if the required bulkload backup files are missing, they would be copied from the source cluster (as a fallback mechanism).
[nit] right, my concerns are the action after the WARN message, and current message does not clearly tell what the user should do when he see this, do you think we should add some suggestion in this message ?
we can improve it as a follow up JIRA or in the documentation, if user see this message , should they care about it? if so, what manual operation should they perform to get rid of it ?
|
|
||
| // For continuous backup: bulkload files are copied from backup directory defined by | ||
| // CONF_CONTINUOUS_BACKUP_WAL_DIR instead of source cluster. | ||
| String backupRootDir = conf.get(CONF_CONTINUOUS_BACKUP_WAL_DIR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] somehow, hbase.backup.continuous.wal.dir is become the root now for both WALs and Bulkloads, shouldn't we call it as hbase.backup.continuous.dir/hbase.backup.continuous.root.dir
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, makes sense. I will create a new JIRA to update this config from everywhere it is referenced
#7246) Signed-off-by: Tak Lon (Stephen) Wu <taklwu@apache.org>
#7246) Signed-off-by: Tak Lon (Stephen) Wu <taklwu@apache.org>
Utilize Backed-up Bulkloaded Files in Incremental Backup
Bulkload Files are backed up in ContinuousBackupReplicationEndpoint if bulkload replication is enabled.
When user trigger incremental backup, it takes backup (copy) of bulkload files too along with regular Hfiles of tables to be backed up. Through this change, in incremental backup, backed up bulkload hfiles are copied, instead of these files from source cluster
Jira: https://issues.apache.org/jira/browse/HBASE-29520