NRG (2.11): Signal on initial applied messages, ensure 'expected per subject' consistency #6194

MauriceVanVeen · 2024-12-02T10:24:34Z

This PR ensures consistent 'expected per subject' updates by introducing two new errors:

expected last sequence per subject temporarily unavailable
(when we have entries in our log after becoming leader, don't accept updates before we've applied all of them, because we could have pending updates in our log still)
subject for expected last sequence is in process
(don't allow multiple updates for the same subject until the first has been fully processed, essentially de-duping)

Without these it would mean multiple updates for the same subject could go through. For example having 2 updates for the same subject. The leader would accept the first, and fail on the second, upping mset.clfs/failed. In certain scenarios we could get the leader to do that, and have a follower accept both messages, leading to desync.

By introducing these two new errors, we ensure only one update per subject goes through at a time. And we can't desync based on this anymore.

For a full explanation of how it would go wrong before:

Send message foo.foo, gets accepted by the leader.
Send message foo.bar with expected subject sequence of 0, gets accepted by the leader.
Send message foo.foo, gets accepted by the leader.
The leader makes a snapshot now.
Send message foo.bar with expected subject sequence of 0, fails on the leader. This duplicate would not be blocked before if these messages were sent quickly enough.
Send message foo.foo, gets accepted by the leader.
Send another message for foo.bar with subject sequence of 1, gets accepted by the leader.

If we now restart a follower in a way where it requires the snapshot that was made at step 4, the following would happen:

Snapshot is installed, and upper layer catchup gives foo.foo at seq 1, no message at seq 2 (because foo.bar was updated in step 7), another foo.foo at seq 3.
Now we replay step 5, message foo.bar with expected subject sequence of 0. This failed on the leader.. but because the replica must now make an assumption what the leader did and we have no message for this sequence. It assumes this message must be applied, resulting in the first step of the stream desyncing.
Step 6 gets applied, but message foo.foo will have a sequence one higher than on the leader.
Step 7 gets executed, but fails on the follower because the subject sequence is not 1, but 4 due to the follower's step 2.
Now the stream has desynced in a way that could not be detected (if bytes were also the same). If more updates based on sequence would be done, the streams would continue desyncing, essentially a snowball effect.

Because we now never allow multiple updates for the same subject to go through, this can't happen anymore.

Signed-off-by: Maurice van Veen github@mauricevanveen.com

server/jetstream_cluster.go

server/raft.go

derekcollison

LGTM

derekcollison · 2024-12-03T15:35:52Z

Question I have is of the error message when we have some messages in flight. We know the sequence if it is accepted, etc. Should we just do normal processing with what we think the sequence will be?

MauriceVanVeen · 2024-12-03T15:59:55Z

We know the sequence if it is accepted, etc. Should we just do normal processing with what we think the sequence will be?

Not sure what you mean? We can't know what the sequence will be until it has fully gone through proposals and is persisted into the stream.

If we get two messages on the same subject with a expected subject sequence of 10, we must allow one to be persisted and we must reject one. If we don't reject the second one before proposing we could run into the issue described in the description.

Having errors before proposing and with the guarantee the one inflight message will be persisted is sadly the only way to guard against this issue.
(because a stream snapshot doesn't contain the messages themselves, and messages are fully deleted and not marked as deleted, there's no way to know as a follower after processing a snapshot what to do with a message if the message it's pointing to is deleted)

derekcollison · 2024-12-03T16:44:21Z

We do know the proposed sequence of all items proposed to the NRTG layer. So we could say that we track that and assume it will succeed so we can early error on the next message if it does not align with the expected sequence.

MauriceVanVeen · 2024-12-04T16:06:40Z

So we could say that we track that and assume it will succeed so we can early error on the next message if it does not align with the expected sequence.

We indeed know the proposed sequence. But until it's fully propagated and persisted that sequence will only be just that, an assumption. It's very likely to become the correct sequence, but that guarantee depends on which stream settings you use whether it's very likely or less likely to be right (never 100%).

So it's still dangerous as we shouldn't rely on an assumption, it should be 100% guaranteed for there to never be stream desync (otherwise we can't fix the desync). Back to what's mentioned in the description; If you have a snapshot and any chance of mset.clfs needing to be upped because that assumption was incorrect, then the stream will desync.

For more context about the validity of this fix. With a customer's repro that results in KV desyncing, and running the changes from this PR, #6213 and #6216 combined. I have been running that repro locally for more than 3 hours and it doesn't desync anymore. (Whereas it used to desync very often just within 15-30 minutes)

derekcollison · 2024-12-04T20:19:08Z

I still think an error saying wrong last sequence when rejecting a message would be more meaningful than the temporary status error, which would result in a retry of which 99.999% of the time would return wrong sequence error.

Hence why I am proposing we just say wrong sequence right away based on knowledge we have about inflight.

…' consistency Signed-off-by: Maurice van Veen <github@mauricevanveen.com>

MauriceVanVeen · 2024-12-04T20:42:19Z

I still think an error saying wrong last sequence when rejecting a message would be more meaningful than the temporary status error, which would result in a retry of which 99.999% of the time would return wrong sequence error.

Agree, that sounds better. Have changed the error message to be just wrong last sequence.

derekcollison

LGTM

MauriceVanVeen requested a review from a team as a code owner December 2, 2024 10:24

derekcollison requested changes Dec 2, 2024

View reviewed changes

server/jetstream_cluster.go Outdated Show resolved Hide resolved

server/jetstream_cluster.go Outdated Show resolved Hide resolved

server/raft.go Outdated Show resolved Hide resolved

MauriceVanVeen force-pushed the maurice/block-expected-lseq-per-subject branch from 9861c16 to 7c93468 Compare December 3, 2024 08:44

derekcollison reviewed Dec 3, 2024

View reviewed changes

server/raft.go Outdated Show resolved Hide resolved

MauriceVanVeen force-pushed the maurice/block-expected-lseq-per-subject branch from 7c93468 to 5ebe522 Compare December 3, 2024 15:13

derekcollison self-requested a review December 3, 2024 15:17

derekcollison approved these changes Dec 3, 2024

View reviewed changes

NRG: Signal on initial applied messages, ensure 'expected per subject…

469f834

…' consistency Signed-off-by: Maurice van Veen <github@mauricevanveen.com>

MauriceVanVeen force-pushed the maurice/block-expected-lseq-per-subject branch from 5ebe522 to 469f834 Compare December 4, 2024 20:40

derekcollison self-requested a review December 5, 2024 06:28

derekcollison approved these changes Dec 5, 2024

View reviewed changes

derekcollison merged commit 436340a into main Dec 5, 2024
5 checks passed

derekcollison deleted the maurice/block-expected-lseq-per-subject branch December 5, 2024 06:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NRG (2.11): Signal on initial applied messages, ensure 'expected per subject' consistency #6194

NRG (2.11): Signal on initial applied messages, ensure 'expected per subject' consistency #6194

MauriceVanVeen commented Dec 2, 2024

derekcollison left a comment

derekcollison commented Dec 3, 2024

MauriceVanVeen commented Dec 3, 2024

derekcollison commented Dec 3, 2024

MauriceVanVeen commented Dec 4, 2024

derekcollison commented Dec 4, 2024

MauriceVanVeen commented Dec 4, 2024

derekcollison left a comment

NRG (2.11): Signal on initial applied messages, ensure 'expected per subject' consistency #6194

NRG (2.11): Signal on initial applied messages, ensure 'expected per subject' consistency #6194

Conversation

MauriceVanVeen commented Dec 2, 2024

derekcollison left a comment

Choose a reason for hiding this comment

derekcollison commented Dec 3, 2024

MauriceVanVeen commented Dec 3, 2024

derekcollison commented Dec 3, 2024

MauriceVanVeen commented Dec 4, 2024

derekcollison commented Dec 4, 2024

MauriceVanVeen commented Dec 4, 2024

derekcollison left a comment

Choose a reason for hiding this comment