-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-26974 Introduce a LogRollProcedure #5408
base: master
Are you sure you want to change the base?
Conversation
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
The failed UT looks not related. |
Hi Duo, would you mind taking a look in your free time ? This is the last zk-based procedure, also the last sub-task of HBASE-21488 , I'd like to help promote this a bit @Apache9 |
The PR is big, I have already started to review it few days ago but haven't finished yet... |
Thanks for the review ! I briefly wrote down the main changes in the begin of the PR, I hope that could help review :) |
* @param backupRoot root directory path to backup | ||
* @throws IOException exception | ||
*/ | ||
public Long getRegionServerLastLogRollResult(String server, String backupRoot) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not return long? Seems the return value can never be null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I'll address it. Thanks Duo.
return Flow.HAS_MORE_STATE; | ||
case LOG_ROLL_ROLL_LOG_ON_EACH_RS: | ||
final List<ServerName> onlineServers = | ||
env.getMasterServices().getServerManager().getOnlineServersList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that we have race here and miss some region servers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we'd better access it under lock protection. I didn't add lock for two reasons:
a. it's acceptable to miss some newly registered servers. If a server is new, we are not likely to assign regions on it, so there is no data lost.
b. In our code base, the calls to this method elsewhere are also not locked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to make sure there is no problem. Usually it is not fixed by locking, but something like fencing. For example, before rolling we have done some preparing, and when rolling, even if we miss some new region servers, it does not cause any problems.
table.readRegionServerLastLogRollResult(backupRoot); | ||
final long now = EnvironmentEdgeManager.currentTime(); | ||
for (ServerName server : onlineServers) { | ||
long lastLogRollResult = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value is the time for last roll? Why name it lastRollResult?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I'll address it.
|
||
@Override | ||
public TableName getTableName() { | ||
return BackupSystemTable.getTableName(conf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here we just make this procedure as table procedure? Seems a bit strange...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we will try to do some BackupSystemTable-related operations, such as creating backup namespace and the BackupSystemTable.
Anyway, I think it is okay to declare it as a table procedure or a server procedure, because as I mentioned in the beginning of this PR, the LogRollProcedure itself does not need to acquire any lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC we have talked this before, maybe we need to discuss how to change the ProcedureScheduler first...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We have talked about this in HBASE-27905, and I added a new commit to address your comments. It's still in the POC stage and needs to be polished and more test cases.
@@ -762,4 +773,30 @@ public static String findMostRecentBackupId(String[] backupIds) { | |||
return BackupRestoreConstants.BACKUPID_PREFIX + recentTimestamp; | |||
} | |||
|
|||
public static void rollWALWriters(Admin admin, Map<String, String> props) throws IOException { | |||
byte[] ret = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not want to introduce a new admin method for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a. it is not general enough for Admin
. This call will not only make all rs roll WAL writers, but also do some backup-related operations, such as reading and writing BackupSystemTable.
b. this operation is a bit too lightweight if introduced in the BackupAdmin
, since it's only a small subprocedure of the whole backup job.
So I think maybe a static utility method is enough ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The execProcedure call is for zk based procedures, do we still have other procedures besides the log roll one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is the last one.
Any updates here? Thanks. |
Will push the newest code as soon as possible. Thanks Duo ! |
A new commit has been added. Let's wait for the UT result. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
The failed UT looks not related. |
This PR tries to reimplement the log-roll procedure with proc-v2.
Modifies the following things
client side:
when request all rs to roll WAL writers, instead of calling
admin.execProcedure()
, now we calladmin.execProcedureWithReturn
and the returned value depends on the configuration in the server side. If master is configured to used proc-v2, the value would be the procedure id, otherwise nothing. Then we will keep asking master if the procedure has finished by callingadmin.isProcedureFinished
until it finished or failed or timeout. This was implemented inBackupUtils#rollWALWriters
.server side
enhanced
LogRollMasterProcedureManager
to support both proc-v1 and proc-v2introduce 3 new procedures.
LogRollProcedure
The
LogRollProcedure
is used to roll WAL for all rs in the cluster. It does not acquire any lock and It has 3 states:LOG_ROLL_PRE_CHECK_NAMESPACE
: create backup namespace if not existsLOG_ROLL_PRE_CHECK_TABLES
: create backup system table and backup system bulkload table if not existsLOG_ROLL_ROLL_LOG_ON_EACH_RS
: roll all rs WAL writersRSLogRollProcedure
The RSLogRollProcedure is used to schedule a RSLogRollRemoteProcedure for each regionserver. When the subprocedure returns, the RSLogRollProcedure will check the logrolling result in the backup system table. If failed, The RSLogRollProcedure will schedule a new RSLogRollRemoteProcedure to retry.
RSLogRollRemoteProcedure
The RSLogRollRemoteProcedure is used to send the log roll request to the remote server.
any suggestions and feedbacks are appreciated.