-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raft: use a larger initial heartbeat/election timeout #15042
Conversation
…lier=1 and don't yet have the newer raft lib allowing us to reload heartbeat/election timeout config.
…t multiplier (3x), and making use of the new raft lib's ability to reload heartbeat/election timeout config after coming up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
|
||
leader := state.Leader | ||
deadline := time.Now().Add(2 * time.Minute) | ||
for time.Now().Before(deadline) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand how this test circles back to the addition and use of the reloadable config. Could you clarify a little on this please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was failing originally (see first commit). So it's basically just validating that restarting (or sealing/unsealing) a node doesn't provoke new elections, by virtue of the change to the initial timeout. We're using a perf multiplier of 5, and the default timeout is 1s, so by sleeping for 10s between seal/unseal we're hoping to trigger (or not, now that we have the fix) an election if the bug is still present.
…onfig can trigger elections, and if all nodes trigger an election at once that can delay things, causing test failures due to timeouts.
…o as high as 48s if you include the random factor, timeouts elsewhere must be increased.
Here we leave initialTimeoutMultiplier=1 and don't yet have the newer raft lib allowing us to reload heartbeat/election timeout config.