Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NRG (2.11): Don't run catchup when behind on applies #6216

Merged
merged 1 commit into from
Dec 12, 2024

Conversation

MauriceVanVeen
Copy link
Member

When a server restarts and is behind enough it will require to be caught up from a snapshot. If after receiving a snapshot from the leader the leader itself shuts down, the remaining server (in a R3 scenario) will become leader.

If this new leader is behind on applies it should not fulfill the catchup request. Messages that would be returned as part of the catchup might be deleted as part of the unapplied append entries. And sending these messages over to the follower would mean the follower wouldn't be able to remove them as part of the append entries if they were meant to be deleted.

Either way, the new leader is temporarily unable to fulfill the catchup request and must wait for its applies to reach the minimum required for the catchup response to be valid.

Signed-off-by: Maurice van Veen github@mauricevanveen.com

Copy link
Member

@derekcollison derekcollison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@derekcollison
Copy link
Member

@MauriceVanVeen let me know when we want this merged..

@derekcollison
Copy link
Member

Has a conflict now that needs to be resolved..

@MauriceVanVeen MauriceVanVeen force-pushed the maurice/stall-catchup-behind-on-applies branch 2 times, most recently from a4711ee to bb7ebfe Compare December 9, 2024 10:16
@@ -1654,7 +1654,7 @@ func (n *raft) State() RaftState {
func (n *raft) Progress() (index, commit, applied uint64) {
n.RLock()
defer n.RUnlock()
return n.pindex + 1, n.commit, n.applied
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an off-by-one, but this value was not used within the code. Now we started to use its value.

n.pindex == n.commit == n.appplied means we have a log where every entry is applied.
If we have n.pindex 1 greater than n.commit that means we have one uncommitted entry. If we have n.pindex 1 greater than n.applied but equal to n.commit, that means we have a committed entry that's not applied yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chalk up another ;)

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
@MauriceVanVeen MauriceVanVeen force-pushed the maurice/stall-catchup-behind-on-applies branch from bb7ebfe to 3392957 Compare December 12, 2024 16:07
@MauriceVanVeen MauriceVanVeen marked this pull request as ready for review December 12, 2024 20:29
@MauriceVanVeen MauriceVanVeen requested a review from a team as a code owner December 12, 2024 20:29
@MauriceVanVeen
Copy link
Member Author

@derekcollison, this is ready now.

Copy link
Member

@derekcollison derekcollison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@derekcollison derekcollison merged commit e79e505 into main Dec 12, 2024
5 checks passed
@derekcollison derekcollison deleted the maurice/stall-catchup-behind-on-applies branch December 12, 2024 22:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants