Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIXED] Fix request/reply performance when using allow_responses perms #6064

Merged
merged 3 commits into from
Dec 11, 2024

Conversation

jack7803m
Copy link
Contributor

Fixes performance issues noted in #6058. Attempts to prune reply map every replyPermLimit messages or if it has been more than replyPruneTime since the last prune.

Resolves #6058

Signed-off-by: Jack Morris jack@jackmorris.me

@jack7803m jack7803m requested a review from a team as a code owner October 31, 2024 18:21
@jack7803m
Copy link
Contributor Author

Unsure how those failing tests could be affected by the minimal changes I made.

Still need to add some sort of solution to the infinite expiry, though I'm not sure exactly what direction to go with that (i.e. error or set to a default), so I'll leave that decision up to the maintainers.

@@ -3636,7 +3640,8 @@ func (c *client) deliverMsg(prodIsMQTT bool, sub *subscription, acc *Account, su
// do that accounting here. We only look at client.replies which will be non-nil.
if client.replies != nil && len(reply) > 0 {
client.replies[string(reply)] = &resp{time.Now(), 0}
if len(client.replies) > replyPermLimit {
client.repliesSincePrune++
if client.repliesSincePrune > replyPermLimit || time.Since(client.lastReplyPrune) > replyPruneTime {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a sense under heaby load how much more memory this will hold onto?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original issue was it holding onto too much memory and looping through all of it every message.

Just added some debug statements and found that if the reply subject is already allowed (eg. "pub": ">"), then the reply counter never actually goes up and therefore it never is able to prune that subject out until it expires by time. Just noting this because I'm going to be looking at a fix for that too but not sure if it'll have broader effects (hopefully not).

Assuming that the subjects are getting pruned as they're replied to, at most it should only be able to get the replies map to an extra replyPermLimit size than what it could've possible been before in the worst case scenario. Even for that to happen it would have to fill the map with subjects, attempt to prune, then expire them all by the next message - in that case, pruning for every message over the replyPermLimit would cause it to immediately prune whereas this solution would hold onto that memory for the next replyPermLimit messages, making the map size replyPermLimit * 2.

Under normal heavy load it shouldn't make any significant difference, as the current behavior typically should only run the prune once every replyPermLimit messages anyway when it's configured properly.

Copy link
Member

@MauriceVanVeen MauriceVanVeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jack7803m, the PR's title mentions it's WIP, but is this ready/good to merge?

@neilalexander, would you maybe also want to review, given your comment here? #6058 (comment).
Think this PR would at least relieve some pressure, by not calling client.pruneReplyPerms() every single time just because the map is large enough.

@jack7803m jack7803m changed the title [WIP][FIXED] Fix request/reply performance when using allow_responses perms [FIXED] Fix request/reply performance when using allow_responses perms Dec 10, 2024
@jack7803m
Copy link
Contributor Author

Forgot to change the title - this should be good to merge!

@derekcollison
Copy link
Member

Let's have @neilalexander take a look as well real quick, but then we can get this merged.

@neilalexander neilalexander self-requested a review December 11, 2024 09:37
Copy link
Member

@neilalexander neilalexander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we probably want to address the non-expiring replies long-term but for now I'm OK with this to improve performance, LGTM.

@derekcollison derekcollison merged commit 88ab06b into nats-io:main Dec 11, 2024
1 check passed
neilalexander added a commit that referenced this pull request Dec 13, 2024
Includes the following:

- #6226
- #6232
- #6235
- #6064
- #6244
- #6246
- #6247
- #6248
- #6250

Signed-off-by: Neil Twigg <neil@nats.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Severe request/reply performance hit when using allow_responses map [v2.10.20]
4 participants