Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return from throttler goroutine if context is cancelled to prevent goroutine leaks #8489

Merged

Conversation

rohit-nayak-ps
Copy link
Contributor

@rohit-nayak-ps rohit-nayak-ps commented Jul 17, 2021

Description

goroutine profile: total 5614
5423 @ 0x43b3c5 0x4059fa 0x405755 0x1346ceb 0x4728e1
#	0x1346cea	vitess.io/vitess/go/vt/vttablet/tabletserver/vstreamer.(*vstreamer).parseEvents.func2+0xaa	vitess.io/vitess/go/vt/vttablet/tabletserver/vstreamer/vstreamer.go:279

There are reports of these leaking goroutines causing OOMs. I have not been able to reproduce it. I think this can only happen if vstreamer is failing to parse events resulting in the context being cancelled, with some events still present in the throttledEvents channel.
In the normal course of events, including errors that I can think of, we should not see this since the binlog streamer will close its channel causing the goroutine to return. So presumably there is an issue with this vreplication workflow causing these errors.

I have updated the goroutine to look for context cancellations so these goroutines don't leak, hoping that this will fix this issue. I would be happier if we could have reproduced it to ensure the fix works. For now we can try it in the failing workflow.

Signed-off-by: Rohit Nayak rohit@planetscale.com

Checklist

  • Tests were added or are not required
  • Documentation was added or is not required

…routine leaks

Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Copy link
Contributor

@shlomi-noach shlomi-noach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes make sense. I will add another check for ctx.Done() right before continue, so that we don't spend an infinite loop waiting for throttler in case of lag, when the context is cancelled.

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Copy link
Contributor

@shlomi-noach shlomi-noach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the changes look to be safe, and the code is better than before.

@shlomi-noach
Copy link
Contributor

I've added an extra check for ctx.Done()

@rohit-nayak-ps rohit-nayak-ps marked this pull request as ready for review July 17, 2021 12:23
@rohit-nayak-ps rohit-nayak-ps merged commit 08aa289 into vitessio:main Jul 17, 2021
@rohit-nayak-ps rohit-nayak-ps deleted the rn-vstreamer-throttler-goroutine-leak branch July 17, 2021 12:24
rohit-nayak-ps added a commit that referenced this pull request Jul 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants