-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-28522 UNASSIGN proc indefinitely stuck on dead rs #5995
Conversation
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
64b5f71
to
5cbe25d
Compare
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks nice, so far i don't see any issues. @gvprathyusha6 do you want to take a look?
return true; | ||
// We will test holdLock when releaseLock, if we are still at the first state, it means we fail | ||
// the SCP check and want to retry later, so we should release the lock | ||
return getCurrentState() != DisableTableState.DISABLE_TABLE_PREPARE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we would hold the lock at DISABLE_TABLE_PRE_OPERATION still right? what if the SCP gets created (and creates ASSIGN procs also) at that time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not get your point but I found that I made a mistake, I used to thought we will set DISABLING state in PREPARE but actually we have a separated step for setting it...
Setting DISABLING state is the fencing way here,so draining in the PREPARE state does not help.
But if we release locks in DISABLING state, it may break some assumption here as DisableTableProcedure can be executed at the same time with other procedures like ModifyTableProcedure...
Then I think first we need to introduce another mechanism to only allow one Enable/Modify/DisableTableProcedure at the same time, then we can be back here to change the implementation for DisableTableProcedure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More or less was pointing the same only, in current approach we are waiting for the SCP to finish but the table state is still not DISABLING yet,
so I was saying in DISABLE_TABLE_PRE_OPERATION (which is next state) since we are not waiting here, SCP could be again created at this time and we can again see this issue.
Then I think first we need to introduce another mechanism to only allow one Enable/Modify/DisableTableProcedure at the same time, then we can be back here to change the i
for DisableTableProcedure.
But then again, instead of releasing locks or handling locks for these Tablelevel procedures separately, if we share the lock with SCP, that also works well know?
Like if we see, everytime only SCP can interrupt any ongoing procedure,
Which ever procedure is waiting for an existing procedure to complete, it is kind of dependent on it and makes sense to share the lock of resources it holds, (since its waiting on that procedure anyway).
And we push the current retry strategy at procedure framework level @Apache9
Filed HBASE-28633. I think we need to get that done first... |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
The new PR is ready. PTAL. Thanks. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
LOG.info( | ||
"There are still {} region(s) in transition for closing regions of table {}" | ||
+ " when executing {}, suspend {}secs and try again later", | ||
inTransitionCount, tableName, getClass().getSimpleName(), backoffMillis / 1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this will be helpful
Signed-off-by: Viraj Jasani <vjasani@apache.org> Reviewed-by: Ray Mattingly <rmdmattingly@gmail.com> (cherry picked from commit 634b200)
No description provided.