Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-9627. Reset RaftPeer priorities after transfer leadership #5549

Merged
merged 4 commits into from
Nov 12, 2023

Conversation

ivandika3
Copy link
Contributor

@ivandika3 ivandika3 commented Nov 5, 2023

What changes were proposed in this pull request?

Currently, Ratis transfer leadership uses priority as a way for the previous leader to yield its leadership to the target peer with higher priority.

The implementation in Ozone uses setConfiguration API to set the target peer's priority to 2 (higher priority) while the rest is set to 1 (neutral priority).

However, we need to reset the priorities back to the same priorities (1 in this case) after transfer leadership (regardless of the transfer leadership result) to avoid cases where the same Raft peer will always be favored to be the leader (even if the peer is slow).

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-9627

How was this patch tested?

Integration testing. Add Raft peers priority assertions to TestTransferLeadershipShell after transfer leadership is finished.

Clean CI run: https://github.com/ivandika3/ozone/actions/runs/6760662018

@ivandika3 ivandika3 marked this pull request as draft November 5, 2023 03:33
@ivandika3 ivandika3 marked this pull request as ready for review November 5, 2023 10:17
@ivandika3
Copy link
Contributor Author

ivandika3 commented Nov 5, 2023

@Xushaohong @szetszwo Could you help take a look if you have time? Thank you.

Copy link
Contributor

@xichen01 xichen01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivandika3 The change looks good. Just a few comment.

} finally {
// Raft peers priorities need to be reset regardless of the result
// of transfer leadership
resetPriorities(conf, raftGroup, tlsConfig);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a good practice to execute resetPriorities in finally? If the leadership transfer fails but resetPriorities succeeds, could this lead to unexpected issues? maybe we can execute resetPriorities only when the leadership transfer is successful."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reset priorities only if setConf success.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review @xichen01 @szetszwo. Yes, I agree. I have updated it based on the review patch diff.

" Ratis reply: {}", resetPeers, reply);
}
} catch (IOException e) {
LOG.error("Exception thrown when trying to reset priorities for " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to prompt User that resetPriorities failed by command line output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review. I was thinking about this as well, but I think it's enough to reset priorities to be best effort and just log the error.

Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivandika3 , thanks a lot for working on this! Please see the comments inlined and also https://issues.apache.org/jira/secure/attachment/13064314/5549_review.patch

Comment on lines 573 to 574
try (RaftClient raftClient = newRaftClient(SupportedRpcType.GRPC, null,
null, raftGroup, createRetryPolicy(conf), tlsConfig, conf)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reuse the existing client.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated based on the patch diff.

} finally {
// Raft peers priorities need to be reset regardless of the result
// of transfer leadership
resetPriorities(conf, raftGroup, tlsConfig);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reset priorities only if setConf success.

@ivandika3
Copy link
Contributor Author

ivandika3 commented Nov 11, 2023

There are test failure on the TestTransferLeadershipShell, let me address it first.

Edit: Update the priority numbers (higher -> 1, neutral -> 0)

Edit 2: I updated the reset logic to explicitly set to neutral priority instead of reverting to original to handle OM/SCM HA deployments that did transfer leadership without the priority reset (this) patch.

Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 the change looks good.

@xichen01
Copy link
Contributor

LGTM +1

@adoroszlai adoroszlai merged commit 2f71e29 into apache:master Nov 12, 2023
33 checks passed
@adoroszlai
Copy link
Contributor

Thanks @ivandika3 for the patch, @szetszwo, @xichen01 for the review.

@ivandika3
Copy link
Contributor Author

Thank you @szetszwo @xichen01 for the reviews and @adoroszlai for the merge.

ibrusentsev pushed a commit to ibrusentsev/ozone that referenced this pull request Nov 14, 2023
@ivandika3 ivandika3 deleted the HDDS-9627 branch January 28, 2024 05:31
jojochuang pushed a commit to jojochuang/ozone that referenced this pull request Feb 1, 2024
…rship (apache#5549)

(cherry picked from commit 2f71e29)
Change-Id: Ifd0748375b2b25e808764d545f80fc166c1bb6c9
@ivandika3 ivandika3 self-assigned this Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants