Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bumps default Raft protocol to version 3. #3477

Merged
merged 12 commits into from
Sep 25, 2017
Merged

Bumps default Raft protocol to version 3. #3477

merged 12 commits into from
Sep 25, 2017

Conversation

slackpad
Copy link
Contributor

Closes #3327.

There were a few changes and cleanups that were exposed by this change, so commits have been broken out for clarity.

r.Check(wantPeers(s2, 4))
r.Check(wantPeers(s3, 4))
r.Check(wantPeers(s4, 4))
r.Check(wantRaft([]*Server{s1, s2, s3, s4}))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there more places in the tests where it is more correct to call wantRaft instead of wantPeers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I got them all but I'll make one more pass through and look over the remaining wantPeers uses before I merge. Maybe we can get rid of wantPeers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a look through the rest and the other uses of wantPeers are justified to actually look for voters, given what they are testing.

This should have been there before, but it's more obvious that this
is incorrect now that we default the Raft protocol to 3, which puts
new servers in a read-only state while Autopilot waits for them to
become healthy.
Relaxed the check for a given number of voter peers and instead do
a thorough check that all servers see each other in their Raft
configurations.
These now just check for Raft replication to be set up, and don't
care about the number of voter peers.
@slackpad slackpad merged commit 45646ac into master Sep 25, 2017
@slackpad slackpad deleted the raft-protocol-3 branch September 25, 2017 22:27
slackpad added a commit that referenced this pull request Oct 26, 2017
When we defaulted the Raft protocol version to 3 in #3477 we made
the numPeers() routine more strict to only count voters (this is
more conservative and more correct). This had the side effect of
breaking rolling updates becuase it's at odds with the Autopilot
non-voter promotion logic.

That logic used to wait to only promote to maintain an odd quorum
of servers. During a rolling update (add one new server, wait, and
then kill an old server) the dead server cleanup would still count
the old server as a peer, which is conservative and the right thing
to do, and no longer count the non-voter. This would wait to promote,
so you could get into a stalemate. It is safer to promote early than
remove early, so by promoting as soon as possible we have chosen
that as the solution here.

Fixes #3611
slackpad added a commit that referenced this pull request Oct 27, 2017
When we defaulted the Raft protocol version to 3 in #3477 we made
the numPeers() routine more strict to only count voters (this is
more conservative and more correct). This had the side effect of
breaking rolling updates because it's at odds with the Autopilot
non-voter promotion logic.

That logic used to wait to only promote to maintain an odd quorum
of servers. During a rolling update (add one new server, wait, and
then kill an old server) the dead server cleanup would still count
the old server as a peer, which is conservative and the right thing
to do, and no longer count the non-voter. This would wait to promote,
so you could get into a stalemate. It is safer to promote early than
remove early, so by promoting as soon as possible we have chosen
that as the solution here.

Fixes #3611
slackpad added a commit that referenced this pull request Oct 31, 2017
* Relaxes Autopilot promotion logic.

When we defaulted the Raft protocol version to 3 in #3477 we made
the numPeers() routine more strict to only count voters (this is
more conservative and more correct). This had the side effect of
breaking rolling updates because it's at odds with the Autopilot
non-voter promotion logic.

That logic used to wait to only promote to maintain an odd quorum
of servers. During a rolling update (add one new server, wait, and
then kill an old server) the dead server cleanup would still count
the old server as a peer, which is conservative and the right thing
to do, and no longer count the non-voter. This would wait to promote,
so you could get into a stalemate. It is safer to promote early than
remove early, so by promoting as soon as possible we have chosen
that as the solution here.

Fixes #3611

* Gets rid of unnecessary extra not-a-voter check.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants