-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: stop pretending to deal with Raft leadership when draining #55619
kvserver: stop pretending to deal with Raft leadership when draining #55619
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion either way on this.
The draining code was pretending to deal with transferring Raft leadership, besides leases, but it was all an illusion. This patch stops the pretending, simplifying the code and preventing mis-interpretation (which I've suffered from). There are a few cases to discuss: 1) There is no lease. The code really looked like it will try to do something to the leadership, but ended up not doing anything because maybeTransferRaftLeadership() is a no-op when there's no lease. 2) The leases needs to be moved. In this case, the code was first trying to move the lease and then, if that *failed, was trying to move the leadership. This was a no-op since maybeTransferRaftLeadership() only moves to the leadership to the leaseholder; if the leaseholder hasn't moved, it's a no-op. 3) The lease doesn't need to be moved, but the leadership does. In this case the code could theoretically do the leadership transfer, but this case is very rare - the leadership moves as lease transfers apply, and if that fails, we'll continue attempting to move the leadership every Raft tick (every 200ms). Release note: None
maybeTransferRaftLeadership -> maybeTransferRaftLeadershipToLeaseholder to make it a lot more suggestive. Release note: None
389f8e3
to
9b554b6
Compare
This has changed because the code previously mislead me. Please read again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1, 3 of 3 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @nvanbenschoten, and @tbg)
Why is this very rare? Just because we expect the new leaseholder to keep calling an election every 200ms? Are you at all concerned that an election is less efficient than a leadership transfer? The change itself LGTM and I appreciate the simplification. Is this a backport target? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lease doesn't need to be moved, but the leadership does. In this
case the code could theoretically do the leadership transfer, but
this case is very rare
The case where (at the start of the drain) we are a leader without a lease is rare (because we try to correct this state continuously, not just in the background). But I'm a bit more concerned that this code was already mishandling an aspect of case 2: if we transfer a lease, the leadership will follow, but if the former leader doesn't remain alive long enough, the range will have to wait for an election timeout. We need to make sure that as we're transferring leases away we don't shut down immediately when the last lease is gone without waiting for the leadership to move too. (I'm not sure if the entire drain process might make up for this, but the if leaseTransferred { needsRaftTransfer = false }
line looks wrong to me)
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @nvanbenschoten, and @tbg)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs!
bors r+
Is this a backport target?
I wasn't planning on it...
Why is this very rare? Just because we expect the new leaseholder to keep calling an election every 200ms? Are you at all concerned that an election is less efficient than a leadership transfer?
I was not particularly aware of the perf differences of elections vs transfers, but I can't say I'm very concerned... Because we attempt the leadership transfer in leasePostApply
and then periodically which I think is quite enough to make it rare that we get here.
The case where (at the start of the drain) we are a leader without a lease is rare (because we try to correct this state continuously, not just in the background).
Ben what did you mean here ("continuously" vs "in the background")? Did you mean there's another called to maybeTransferRaftLedership
besides leasePostApply
and tick
? Cause I don't see it...
But I'm a bit more concerned that this code was already mishandling an aspect of case 2: if we transfer a lease, the leadership will follow, but if the former leader doesn't remain alive long enough, the range will have to wait for an election timeout. We need to make sure that as we're transferring leases away we don't shut down immediately when the last lease is gone without waiting for the leadership to move too. (I'm not sure if the entire drain process might make up for this, but the if leaseTransferred { needsRaftTransfer = false } line looks wrong to me)
Luckily #55460 introduces a sleep after draining which should allow time for the leaderships to follow the transferred leases.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @nvanbenschoten, and @tbg)
Build succeeded: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not particularly aware of the perf differences of elections vs transfers
The difference is mainly in availability instead of efficiency - an election implies 5-10 seconds of disruption while a transfer is just a few round trips. So this is a big deal, but I agree with you that it's rare enough that we'll be in this state when we get here.
Ben what did you mean here ("continuously" vs "in the background")?
Oops that's not what I meant to say at all. What I meant was "continuously and in the background" as opposed to "only during the drain process".
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)
The draining code was pretending to deal with transferring Raft
leadership, besides leases, but it was all an illusion. This patch stops
the pretending, simplifying the code and preventing mis-interpretation
(which I've suffered from).
There are a few cases to discuss:
something to the leadership, but ended up not doing anything because
maybeTransferRaftLeadership() is a no-op when there's no lease.
to move the lease and then, if that *failed, was trying to move the
leadership. This was a no-op since maybeTransferRaftLeadership() only
moves to the leadership to the leaseholder; if the leaseholder hasn't
moved, it's a no-op.
case the code could theoretically do the leadership transfer, but
this case is very rare - the leadership moves as lease transfers
apply, and if that fails, we'll continue attempting to move the
leadership every Raft tick (every 200ms).
Release note: None