Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NRG (2.11): Don't revert term to pterm on AE mismatch #5684

Merged
merged 1 commit into from
Jul 23, 2024

Conversation

neilalexander
Copy link
Member

@neilalexander neilalexander commented Jul 22, 2024

Beforehand when we were trying to run a catchup, we were reverting the term back to pterm. We can't ever move the term backwards safely and the catchup itself does not rely on this behaviour in order to work (as the catchup entries are matched only on pterm/pindex), so don't revert it.

We saw this behaviour in Antithesis where a catchup could take us back a term.

Signed-off-by: Neil Twigg neil@nats.io

@neilalexander neilalexander requested a review from a team as a code owner July 22, 2024 17:14
Beforehand when we were trying to run a catchup, we were reverting the
`term` back to `pterm`. We can't ever move the term backwards safely and
the catchup itself does not rely on this behaviour in order to work (as
the catchup entries are matched only on `pindex`), so don't revert it.

Signed-off-by: Neil Twigg <neil@nats.io>
@neilalexander
Copy link
Member Author

Would like @ReubenMathew's approval before we proceed.

@ReubenMathew
Copy link
Contributor

Example log of this happening:

[        46.773] [      service_nats-0] [inf] [1] 2024/07/18 18:43:37.702207 [DBG] RAFT [S1Nunr6R - S-R3F-41d81bAI - term:4 p:1/19 sm:18/18] AppendEntry updating leader to "cnrtt3eg"
[        46.773] [      service_nats-0] [inf] [1] 2024/07/18 18:43:37.702213 [DBG] RAFT [S1Nunr6R - S-R3F-41d81bAI - term:4 p:1/19 sm:18/18] AppendEntry did not match 1 22 with 1 19
[        47.457] [      service_nats-2] [inf] [1] 2024/07/18 18:43:38.386003 [DBG] RAFT [cnrtt3eg - S-R3F-41d81bAI - term:4 p:4/23 sm:20/20] Being asked to catch up follower: "S1Nunr6R"
[        47.457] [      service_nats-2] [inf] [1] 2024/07/18 18:43:38.386011 [DBG] RAFT [cnrtt3eg - S-R3F-41d81bAI - term:4 p:4/23 sm:20/20] Need to send snapshot to follower
[        47.457] [      service_nats-2] [inf] [1] 2024/07/18 18:43:38.386165 [DBG] RAFT [cnrtt3eg - S-R3F-41d81bAI - term:4 p:4/23 sm:20/20] Snapshot sent, reset first catchup entry to 20
[        47.458] [      service_nats-2] [inf] [1] 2024/07/18 18:43:38.386752 [DBG] RAFT [cnrtt3eg - S-R3F-41d81bAI - term:4 p:4/23 sm:20/20] Our first entry [1:20] does not match request from follower [1:19]
[        47.458] [      service_nats-2] [inf] [1] 2024/07/18 18:43:38.387053 [DBG] RAFT [cnrtt3eg - S-R3F-41d81bAI - term:4 p:4/23 sm:20/20] Running catchup for "S1Nunr6R"
[        47.459] [      service_nats-2] [inf] [1] 2024/07/18 18:43:38.387644 [DBG] RAFT [cnrtt3eg - S-R3F-41d81bAI - term:4 p:4/23 sm:20/20] Finished catching up
[        49.926] [      service_nats-0] [inf] [1] 2024/07/18 18:43:40.854758 [DBG] RAFT [S1Nunr6R - S-R3F-41d81bAI - term:1 p:1/19 sm:18/18] Catchup may be stalled, will request again

Copy link
Member

@derekcollison derekcollison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@derekcollison derekcollison merged commit c316803 into main Jul 23, 2024
5 checks passed
@derekcollison derekcollison deleted the neil/nrgaemismatch branch July 23, 2024 03:14
neilalexander added a commit that referenced this pull request Nov 25, 2024
Includes the following:

- #5661
- #5666
- #5671
- #5344
- #5684
- #5689
- #5691
- #5714
- #5717
- #5707
- #5792
- #5912
- #5957
- #5700
- #5975
- #5991
- #5987
- #6027
- #6038
- #6053
- #5848
- #6055
- #6056
- #6060
- #6061
- #6072
- #5832
- #6073
- #6107

Signed-off-by: Neil Twigg <neil@nats.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants