Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIXED] Improve per-subject state performance & keep ss.Last up-to-date #6235

Merged
merged 3 commits into from
Dec 11, 2024

Conversation

MauriceVanVeen
Copy link
Member

@MauriceVanVeen MauriceVanVeen commented Dec 10, 2024

Improve performance by only setting ss.Last during recalculation, so only when it's needed.

Also fixes a bug where ss.Last was not kept up-to-date if ss.Msgs > 1, by introducing a ss.lastNeedsUpdate just like ss.firstNeedsUpdate.

Signed-off-by: Maurice van Veen github@mauricevanveen.com

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
@MauriceVanVeen MauriceVanVeen requested a review from a team as a code owner December 10, 2024 08:57
Copy link
Member

@derekcollison derekcollison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the original code doing here before the changes that we are now deleting?
What was the bug we were trying to fix?

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
@MauriceVanVeen
Copy link
Member Author

MauriceVanVeen commented Dec 10, 2024

The original bug was here: https://github.com/nats-io/nats-server/pull/6226/files#diff-384c189826934c9a6fc3554dafc63dab2076245010e3d6fce5c71a93e15e9877L7946-L7949

If ss.Msgs == 1 we can't assume either ss.First or ss.Last to be correct. We need to recalculate to get the right value for both.

However, doing that after doing ss.Msgs-- means it would be done even if it wasn't needed. That introduces a performance regression. So now only updating ss.Last (and ss.First) when we actually are doing the recalculation.

@derekcollison
Copy link
Member

ss.Last should always be correct, only ss.First could be wrong..

So this original code would only need to check if firstNeedsUpdate if the seq being deleted is not the last.

	// Only one left.
	if ss.Msgs == 1 {
		if seq == ss.Last {
			if ss.firstNeedsUpdate {
				mb.recalculateFirstForSubj(subj, ss.First, ss)
			}
			ss.Last = ss.First
		} else {
			ss.First = ss.Last
			ss.firstNeedsUpdate = false
		}
	}

I think the code above would do the trick.

@MauriceVanVeen
Copy link
Member Author

ss.Last should always be correct, only ss.First could be wrong..

No, ss.Last can be incorrect, and that's actually what I exploit in TestStoreSubjectStateConsistency.

If we delete a message where seq == ss.Last, we never update ss.Last. That makes both ss.First and ss.Last lazy. So we can't trust it to be correct.

For example with ss.Msgs=3, ss.First=1, ss.Last=3. If we remove the message at sequence 3, we'd end up with ss.Msgs=2, ss.First=1, ss.Last=3 (equal, but ss.Msgs--). The only places where we update ss.Last is if we get a new message, or if ss.Msgs=1 but we need my changes for ss.Last to be correct if we did delete the message at that sequence.

@derekcollison
Copy link
Member

Ah gotcha, when ss.Msgs != 1 and we delete last and ss.Msgs after delete is also != 1.

@MauriceVanVeen
Copy link
Member Author

Yeah, and turns out we can just defer "recalculating" ss.Last to when we're recalculating ss.First and ss.Msgs==1.
That allows us to not need to do anything in removeSeqPerSubject, allowing to remove that code and fix the performance regression in recalculating there.

@derekcollison
Copy link
Member

Do we need ss.Last to be correct for looking up ast by subject or do we use PSIM?

What happens when we asked for SubjectState? If this is wrong does it effect that?

@MauriceVanVeen
Copy link
Member Author

Turns out yes. But that's as simple as just deleting the last message, and then doing a lookup of the last message for that subject for it to not work anymore.

So we'll also need to mark ss.Last for recalculation when needed. Want me to do that in this PR or a followup?

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
@MauriceVanVeen
Copy link
Member Author

Have added a fix to this PR 🙂

@MauriceVanVeen MauriceVanVeen changed the title Improve per-subject state performance [FIXED] Improve per-subject state performance & keep ss.Last up-to-date Dec 10, 2024
@derekcollison derekcollison self-requested a review December 10, 2024 20:36
Copy link
Member

@derekcollison derekcollison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question, could we just hook RemoveMsg() and check once we have the block if the seq matches first or last and if so re-calculate there? Unless ss.Msgs now equals 1 and if we always keep them updated that should be a simple assignment?

@MauriceVanVeen
Copy link
Member Author

Question, could we just hook RemoveMsg() and check once we have the block if the seq matches first or last and if so re-calculate there?

Functionally that would work but performance-wise I think it would regress just like when I added the recalc earlier in #6226.
RemoveMsg eventually calls into removeSeqPerSubject which uses the ss.firstNeedsUpdate etc. to not need to update while we're in RemoveMsg. Doing a recalc there would likely result in the same performance regression.
So I think the xNeedsUpdate flags are the way to go.

@derekcollison
Copy link
Member

The majority of use cases will be expiring the first and removing it. For WQs and interest streams they can be removed out of order yes.

@MauriceVanVeen
Copy link
Member Author

MauriceVanVeen commented Dec 11, 2024

Question, could we just hook RemoveMsg() and check once we have the block if the seq matches first or last and if so re-calculate there?

Have tried this out by removing ss.firstNeedsUpdate and immediately recalculating ss.First and ss.Last in removeSeqPerSubject, and ran some benchmarks. Confirmed the performance regressions remained and in some sections worsened (did see some improvements in some places, but overall decreased). It also made CI way less reliable for some suites.

So I'd say (at least for now) we need to keep ss.First and ss.Last as lazy, and need ss.firstNeedsUpdate and ss.lastNeedsUpdate to both only update when needed as well as ensuring both are up-to-date when they are used.

@MauriceVanVeen
Copy link
Member Author

Will experiment some more, have a hunch where we could improve to possibly not regress. I'll report back later.

@MauriceVanVeen
Copy link
Member Author

Conclusion; it's not easy to make ss.First and ss.Last not lazy.

Making them always be consistent means we can remove ss.firstNeedsUpdate and remove that part of memory from SimpleState. By reversing the direction of some loops we could get a quick win since ss.First would not be repeatedly recalculated if we went down from ss.Last instead of up from ss.First.

But.. quickly that strategy falls down. PurgeEx has a loop that can't be reversed and still would repeatedly recalc first. Then you'd need a workaround to skip updating ss.First and ss.Last in removeSeqPerSubject. That works for PurgeEx, but then you'd need the same for file store, the same for when MaxMsgsPerSubject is changed, the same yet again for deletes based on JetStream catchup, etc. etc. (and likely other places where messages are removed in a loop).

It quickly snowballs and would end up requiring more code and "spaghetti" to be able to influence from a high level whether removeSeqPerSubject should or should not do something on a lower level.

Would propose to leave the logic of having ss.First and ss.Last be lazy as-is for now. At least until we change out SimpleState's range for an AVL tree, which was discussed before I believe. Or we'd need a refactor that doesn't need to reach into removeSeqPerSubject's implementation.

@derekcollison , would it be okay to merge this PR first, to fix the performance regression and bug on main, and look into improving further later?

@derekcollison derekcollison self-requested a review December 11, 2024 23:27
Copy link
Member

@derekcollison derekcollison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@derekcollison derekcollison merged commit 2565d45 into main Dec 11, 2024
5 checks passed
@derekcollison derekcollison deleted the maurice/less-recalc-first branch December 11, 2024 23:30
neilalexander added a commit that referenced this pull request Dec 13, 2024
Includes the following:

- #6226
- #6232
- #6235
- #6064
- #6244
- #6246
- #6247
- #6248
- #6250

Signed-off-by: Neil Twigg <neil@nats.io>
wallyqs added a commit that referenced this pull request Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants