Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix deadlock during prevote migration process #42

Merged

Conversation

csmoe
Copy link
Contributor

@csmoe csmoe commented Mar 12, 2018

Fixes #20

// n1.state == Leader
// n2.state == Follower
// n3.state == Candidate
if nt.peers[&1].state != StateRole::Leader {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use assert_eq here

// n1.Term == 2
// n2.Term == 2
// n3.Term == 4
if nt.peers[&1].term != 2 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

// n2 is follower with term 2
// n3 is pre-candidate with term 4, and less log

// simulate leader down
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please format your code with rustfmt nightly version 0.3.4

src/raft.rs Outdated
@@ -955,6 +955,27 @@ impl<T: Storage> Raft<T> {
// will not create disruptive term increases
let to_send = new_message(m.get_from(), MessageType::MsgAppendResponse, None);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify what you'd like to see?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe need to copy from etcd?

n2.become_follower(1, INVALID_ID);
n3.become_follower(1, INVALID_ID);

// We intentionally do not enable pre_vote for n3, this is done so in order
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment should be put to L3854.

let mut e = Entry::new();
e.set_data(b"some data".to_vec());
nt.send(vec![
new_message_with_entries(1, 1, MessageType::MsgPropose, vec![e])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use new_message(1, 1, MessageType::MsgPropose, 1).

// n2.state == Follower
// n3.state == Candidate
assert_eq!(nt.peers[&1].state,
StateRole::Leader, "node 1 state: {:?}, want {:?}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to customize the message as assert_eq did it for you already.

// version cluster with replicas with pre_vote enabled, and replicas without.
let mut nt = Network::new(vec![Some(n1), Some(n2), Some(n3)]);

nt.peers.get_mut(&2).unwrap().pre_vote = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use new_test_raft_with_pre_vote.

fn test_prevote_migration_can_complete_election() {
let mut nt = new_prevote_migration_cluster();

// n1 is leader with term 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put the comment to L3923.


// Do we have a leader?
// assert_eq!(nt.peers[&2].state, StateRole::Leader )
if nt.peers[&2].state != StateRole::Leader && nt.peers[&3].state != StateRole::Follower {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use assert! instead.

src/raft.rs Outdated
@@ -955,6 +955,27 @@ impl<T: Storage> Raft<T> {
// will not create disruptive term increases
let to_send = new_message(m.get_from(), MessageType::MsgAppendResponse, None);
self.send(to_send);
} else if m.get_msg_type() == MessageType::MsgRequestPreVote {
// Before pre_vote enable, there may have candidate with higher term,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is confusing. I thought candidate here refers to the one who sends prevote request. Turns out it refers to the receiver. Could you make it better?

@Hoverbear
Copy link
Contributor

Hi @csmoe, would you like any help with getting this PR merged? I see @BusyJay left a few comments. If you're busy I can wrap things up for you.

@csmoe
Copy link
Contributor Author

csmoe commented Apr 12, 2018

@Hoverbear sorry for a delayed commit, I totally forgot this thread.
Feel free to do anything, thank you.

@Hoverbear Hoverbear force-pushed the fix_deadlock_during_prevote_migration_process branch from 9f9cfb4 to 2c24c57 Compare April 22, 2018 05:40
@Hoverbear Hoverbear requested a review from BusyJay April 22, 2018 05:40
@Hoverbear
Copy link
Contributor

@BusyJay Can you give this another look regarding the feedback you gave?

@siddontang
Copy link
Contributor

PTAL @hicqu

nt.send(vec![new_message(2, 2, MessageType::MsgHup, 0)]);

// Do we have a leader?
assert_eq!(nt.peers[&2].state, StateRole::Leader);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we guarantee the leader can always be elected? It is different from https://github.com/coreos/etcd/pull/8525/files#diff-ad12a9c9b66e1507ca04edc74b4b1e8aR3386

@siddontang
Copy link
Contributor

Rest LGTM

@siddontang
Copy link
Contributor

any update @csmoe?

nt.send(vec![new_message(2, 2, MessageType::MsgHup, 0)]);

// Do we have a leader?
assert_eq!((nt.peers[&2].state, nt.peers[&3].state), (StateRole::Leader, StateRole::Follower));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@csmoe csmoe May 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I don't know how to fix this since I thought this tuple comparsion might work like && here. could you give me some tips?
Will simulating leader down(nt.solate(1)) before new election guarantee that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in your test, you require 2 must be the leader, and 3 must be the follower. But in the etcd test, if 2 is the leader, 3 can be not the follower.

Copy link
Contributor Author

@csmoe csmoe May 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, thank you, I made a logic mistake here.

Copy link
Member

@BusyJay BusyJay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest LGTM

src/raft.rs Outdated
@@ -224,7 +224,7 @@ pub struct Raft<T: Storage> {
heartbeat_elapsed: usize,

pub check_quorum: bool,
pre_vote: bool,
pub pre_vote: bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a doc hidden attribute.

nt.isolate(1);

// Call for elections from both n2 and n3.
nt.send(vec![new_message(2, 2, MessageType::MsgHup, 0)]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Order is important, although it won't affect the test result (yet).

nt.send(vec![new_message(3, 3, MessageType::MsgHup, 0)]);

assert_eq!(nt.peers[&1].state,
StateRole::Leader, "node 1 state: {:?}, want {:?}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug message is unnecessary.

siddontang
siddontang previously approved these changes May 10, 2018
Copy link
Contributor

@siddontang siddontang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@siddontang
Copy link
Contributor

PTAL @Hoverbear

Hoverbear
Hoverbear previously approved these changes May 11, 2018
Copy link
Contributor

@Hoverbear Hoverbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. 👍

@Hoverbear
Copy link
Contributor

Please run cargo fmt and we can merge this. :) We just merged #55 which runs code on the CI through rustfmt.

@csmoe csmoe dismissed stale reviews from Hoverbear and siddontang via c74e0af May 11, 2018 03:27
@csmoe csmoe force-pushed the fix_deadlock_during_prevote_migration_process branch from fa74996 to c74e0af Compare May 11, 2018 03:27
@csmoe
Copy link
Contributor Author

csmoe commented May 11, 2018

@Hoverbear The current rustfmt in CI is stable-ver, but:

Warning: can't set `use_try_shorthand = true`, unstable features are only available in nightly channel.

So rustfmt config might need changes.

@siddontang
Copy link
Contributor

Ping @Hoverbear

@Hoverbear
Copy link
Contributor

Opened #62 to reflect the cargo fmt warnings.

@Hoverbear Hoverbear merged commit 444e99e into tikv:master May 14, 2018
@csmoe csmoe deleted the fix_deadlock_during_prevote_migration_process branch May 14, 2018 23:47
@Hoverbear Hoverbear added this to the 0.3.0 milestone Jul 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants