-
Notifications
You must be signed in to change notification settings - Fork 514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-9627. Reset RaftPeer priorities after transfer leadership #5549
Conversation
@Xushaohong @szetszwo Could you help take a look if you have time? Thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ivandika3 The change looks good. Just a few comment.
} finally { | ||
// Raft peers priorities need to be reset regardless of the result | ||
// of transfer leadership | ||
resetPriorities(conf, raftGroup, tlsConfig); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a good practice to execute resetPriorities
in finally
? If the leadership transfer fails but resetPriorities
succeeds, could this lead to unexpected issues? maybe we can execute resetPriorities
only when the leadership transfer is successful."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reset priorities only if setConf success.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
" Ratis reply: {}", resetPeers, reply); | ||
} | ||
} catch (IOException e) { | ||
LOG.error("Exception thrown when trying to reset priorities for " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to prompt User that resetPriorities
failed by command line output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the review. I was thinking about this as well, but I think it's enough to reset priorities to be best effort and just log the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ivandika3 , thanks a lot for working on this! Please see the comments inlined and also https://issues.apache.org/jira/secure/attachment/13064314/5549_review.patch
try (RaftClient raftClient = newRaftClient(SupportedRpcType.GRPC, null, | ||
null, raftGroup, createRetryPolicy(conf), tlsConfig, conf)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reuse the existing client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated based on the patch diff.
} finally { | ||
// Raft peers priorities need to be reset regardless of the result | ||
// of transfer leadership | ||
resetPriorities(conf, raftGroup, tlsConfig); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reset priorities only if setConf success.
There are test failure on the Edit: Update the priority numbers (higher -> 1, neutral -> 0) Edit 2: I updated the reset logic to explicitly set to neutral priority instead of reverting to original to handle OM/SCM HA deployments that did transfer leadership without the priority reset (this) patch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 the change looks good.
LGTM +1 |
Thanks @ivandika3 for the patch, @szetszwo, @xichen01 for the review. |
Thank you @szetszwo @xichen01 for the reviews and @adoroszlai for the merge. |
…rship (apache#5549) (cherry picked from commit 2f71e29) Change-Id: Ifd0748375b2b25e808764d545f80fc166c1bb6c9
What changes were proposed in this pull request?
Currently, Ratis transfer leadership uses priority as a way for the previous leader to yield its leadership to the target peer with higher priority.
The implementation in Ozone uses setConfiguration API to set the target peer's priority to 2 (higher priority) while the rest is set to 1 (neutral priority).
However, we need to reset the priorities back to the same priorities (1 in this case) after transfer leadership (regardless of the transfer leadership result) to avoid cases where the same Raft peer will always be favored to be the leader (even if the peer is slow).
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-9627
How was this patch tested?
Integration testing. Add Raft peers priority assertions to
TestTransferLeadershipShell
after transfer leadership is finished.Clean CI run: https://github.com/ivandika3/ozone/actions/runs/6760662018