Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NRG (2.11): Signal on initial applied messages, ensure 'expected per subject' consistency #6194

Merged
merged 1 commit into from
Dec 5, 2024

Conversation

MauriceVanVeen
Copy link
Member

This PR ensures consistent 'expected per subject' updates by introducing two new errors:

  • expected last sequence per subject temporarily unavailable
    (when we have entries in our log after becoming leader, don't accept updates before we've applied all of them, because we could have pending updates in our log still)
  • subject for expected last sequence is in process
    (don't allow multiple updates for the same subject until the first has been fully processed, essentially de-duping)

Without these it would mean multiple updates for the same subject could go through. For example having 2 updates for the same subject. The leader would accept the first, and fail on the second, upping mset.clfs/failed. In certain scenarios we could get the leader to do that, and have a follower accept both messages, leading to desync.

By introducing these two new errors, we ensure only one update per subject goes through at a time. And we can't desync based on this anymore.


For a full explanation of how it would go wrong before:

  1. Send message foo.foo, gets accepted by the leader.
  2. Send message foo.bar with expected subject sequence of 0, gets accepted by the leader.
  3. Send message foo.foo, gets accepted by the leader.
  4. The leader makes a snapshot now.
  5. Send message foo.bar with expected subject sequence of 0, fails on the leader. This duplicate would not be blocked before if these messages were sent quickly enough.
  6. Send message foo.foo, gets accepted by the leader.
  7. Send another message for foo.bar with subject sequence of 1, gets accepted by the leader.

If we now restart a follower in a way where it requires the snapshot that was made at step 4, the following would happen:

  1. Snapshot is installed, and upper layer catchup gives foo.foo at seq 1, no message at seq 2 (because foo.bar was updated in step 7), another foo.foo at seq 3.
  2. Now we replay step 5, message foo.bar with expected subject sequence of 0. This failed on the leader.. but because the replica must now make an assumption what the leader did and we have no message for this sequence. It assumes this message must be applied, resulting in the first step of the stream desyncing.
  3. Step 6 gets applied, but message foo.foo will have a sequence one higher than on the leader.
  4. Step 7 gets executed, but fails on the follower because the subject sequence is not 1, but 4 due to the follower's step 2.
  5. Now the stream has desynced in a way that could not be detected (if bytes were also the same). If more updates based on sequence would be done, the streams would continue desyncing, essentially a snowball effect.

Because we now never allow multiple updates for the same subject to go through, this can't happen anymore.

Signed-off-by: Maurice van Veen github@mauricevanveen.com

@MauriceVanVeen MauriceVanVeen requested a review from a team as a code owner December 2, 2024 10:24
server/jetstream_cluster.go Outdated Show resolved Hide resolved
server/jetstream_cluster.go Outdated Show resolved Hide resolved
server/raft.go Outdated Show resolved Hide resolved
@MauriceVanVeen MauriceVanVeen force-pushed the maurice/block-expected-lseq-per-subject branch from 9861c16 to 7c93468 Compare December 3, 2024 08:44
server/raft.go Outdated Show resolved Hide resolved
@MauriceVanVeen MauriceVanVeen force-pushed the maurice/block-expected-lseq-per-subject branch from 7c93468 to 5ebe522 Compare December 3, 2024 15:13
@derekcollison derekcollison self-requested a review December 3, 2024 15:17
Copy link
Member

@derekcollison derekcollison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@derekcollison
Copy link
Member

Question I have is of the error message when we have some messages in flight. We know the sequence if it is accepted, etc. Should we just do normal processing with what we think the sequence will be?

@MauriceVanVeen
Copy link
Member Author

We know the sequence if it is accepted, etc. Should we just do normal processing with what we think the sequence will be?

Not sure what you mean? We can't know what the sequence will be until it has fully gone through proposals and is persisted into the stream.

If we get two messages on the same subject with a expected subject sequence of 10, we must allow one to be persisted and we must reject one. If we don't reject the second one before proposing we could run into the issue described in the description.

Having errors before proposing and with the guarantee the one inflight message will be persisted is sadly the only way to guard against this issue.
(because a stream snapshot doesn't contain the messages themselves, and messages are fully deleted and not marked as deleted, there's no way to know as a follower after processing a snapshot what to do with a message if the message it's pointing to is deleted)

@derekcollison
Copy link
Member

We do know the proposed sequence of all items proposed to the NRTG layer. So we could say that we track that and assume it will succeed so we can early error on the next message if it does not align with the expected sequence.

@MauriceVanVeen
Copy link
Member Author

So we could say that we track that and assume it will succeed so we can early error on the next message if it does not align with the expected sequence.

We indeed know the proposed sequence. But until it's fully propagated and persisted that sequence will only be just that, an assumption. It's very likely to become the correct sequence, but that guarantee depends on which stream settings you use whether it's very likely or less likely to be right (never 100%).

So it's still dangerous as we shouldn't rely on an assumption, it should be 100% guaranteed for there to never be stream desync (otherwise we can't fix the desync). Back to what's mentioned in the description; If you have a snapshot and any chance of mset.clfs needing to be upped because that assumption was incorrect, then the stream will desync.

For more context about the validity of this fix. With a customer's repro that results in KV desyncing, and running the changes from this PR, #6213 and #6216 combined. I have been running that repro locally for more than 3 hours and it doesn't desync anymore. (Whereas it used to desync very often just within 15-30 minutes)

@derekcollison
Copy link
Member

I still think an error saying wrong last sequence when rejecting a message would be more meaningful than the temporary status error, which would result in a retry of which 99.999% of the time would return wrong sequence error.

Hence why I am proposing we just say wrong sequence right away based on knowledge we have about inflight.

…' consistency

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
@MauriceVanVeen MauriceVanVeen force-pushed the maurice/block-expected-lseq-per-subject branch from 5ebe522 to 469f834 Compare December 4, 2024 20:40
@MauriceVanVeen
Copy link
Member Author

I still think an error saying wrong last sequence when rejecting a message would be more meaningful than the temporary status error, which would result in a retry of which 99.999% of the time would return wrong sequence error.

Agree, that sounds better. Have changed the error message to be just wrong last sequence.

@derekcollison derekcollison self-requested a review December 5, 2024 06:28
Copy link
Member

@derekcollison derekcollison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@derekcollison derekcollison merged commit 436340a into main Dec 5, 2024
5 checks passed
@derekcollison derekcollison deleted the maurice/block-expected-lseq-per-subject branch December 5, 2024 06:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants