Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nsqd: handle repeated calls to Exit() #1331

Merged
merged 1 commit into from
Apr 4, 2021

Conversation

jehiah
Copy link
Member

@jehiah jehiah commented Mar 23, 2021

This extracts handling from #1305 to handle repeated calls to nsqd.Exit(). This regression was introduced in #1319

panic: close of closed channel"

Fixes #1326

@jehiah jehiah added the bug label Mar 23, 2021
@jehiah jehiah self-assigned this Mar 23, 2021
@jehiah jehiah requested review from mreiferson and ploxiln March 23, 2021 14:09
Copy link
Member

@ploxiln ploxiln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to work fine. I'm not sure if a better refactoring of the whole go-svc signal-handling thing is in order, but that would be a lot more work to review, and this is rather small. Seems like a good fix for now. 👍

Copy link
Member

@mreiferson mreiferson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed up an alternative approach here (requires landing something like the following in go-svc judwhite/go-svc@main...mreiferson:signal-handler).

Edit: I realize this is an interface change to go-svc but, implementation aside, it feels like the right thing to expose to the application (the ability to act on signals without hacks).

@ploxiln
Copy link
Member

ploxiln commented Mar 28, 2021

I had to remind myself what this all was for ... it's --sigterm-mode=drain in #1305

What's awkward is the n.ctxCancel() at the end of NSQD.Exit(). With mreiferson's proposal, if the signal Handle() does not return ErrStop, then later the drain code calls NSQD.Exit(), that cancel-context will finally get some use and cause go-svc to finish its waiting, and it will call NSQD.Exit() again.

With the code in current master, the "bug" is arguably this (which mreiferson removes):

// This is a noop after first call
func (n *NSQD) TermSignal() {
	n.Exit()
}

because it seems like program.Stop() must be the only caller of NSQD.Exit() (and it even has protection against it being called twice, but it doesn't work for other locations in nsqd):

func (p *program) Stop() error {
	p.once.Do(func() {
		p.nsqd.Exit()
	})
	return nil
}

So you probably figured this out already, I'm just reminding myself ... and also pointing out that mreiferson's solution moves the problem from the exit-signal case to the drain-signal case, which doesn't exist yet, but will have to change it yet again. Maybe the n.ctxCancel() shouldn't be at the end of NSQD.Exit()? I dunno.

Anyway I do prefer the Handle() method for customizing signal behavior.

@mreiferson
Copy link
Member

because it seems like program.Stop() must be the only caller of NSQD.Exit() (and it even has protection against it being called twice, but it doesn't work for other locations in nsqd):

Yes, that is the current expectation. I'm fine with a change as part of #1305 to ensure that Exit() only executes once, but if we need to do that let's use sync.Once rather than an atomic variable (unless we need a variable to represent the exiting state).

@jehiah
Copy link
Member Author

jehiah commented Mar 30, 2021

Anyway I do prefer the Handle() method for customizing signal behavior.

I agree that comes out cleanly; @mreiferson do you want to work on upstreaming that then we can switch to that pattern when it's available upstream?

but if we need to do that let's use sync.Once rather than an atomic variable (unless we need a variable to represent the exiting state).

If Exit can only be called once this feels like something that feels like something to be implemented in nsqd/... not apps/nsqd and the atomic.CompareAndSwapInt32 is a pattern we use widely in that package (it's the preferred pattern when the provider implements a duplicate call restriction instead of the caller).

We can also fix this interim bug by switching the term signal handler from p.nsqd.TermSignal()to p.Stop() - it just defeated the point of making apps/nsqd/... code use the forward looking stub for term signal handling. If the interim TermSignal->Exit needs to be ok, i think the compare and swap is the right way to handle duplicate calls even if the "isExiting" state isn't needed.

@mreiferson
Copy link
Member

I agree that comes out cleanly; @mreiferson do you want to work on upstreaming that then we can switch to that pattern when it's available upstream?

Will do.

If Exit can only be called once this feels like something that feels like something to be implemented in nsqd/... not apps/nsqd

Agreed, and I'm fine with however way we want to implement it (atomic var or sync.Once).

We can also fix this interim bug by switching the term signal handler from p.nsqd.TermSignal()to p.Stop() - it just defeated the point of making apps/nsqd/... code use the forward looking stub for term signal handling. If the interim TermSignal->Exit needs to be ok, i think the compare and swap is the right way to handle duplicate calls even if the "isExiting" state isn't needed.

Somewhat related to the previous point, I don't think a method called TermSignal makes sense in nsqd/.... Signal handling is the responsibility of apps/nsqd and in #1305 we can call nsqd.StartDraining directly when it receives a SIGTERM.

@jehiah
Copy link
Member Author

jehiah commented Apr 4, 2021

and in #1305 we can call nsqd.StartDraining directly when it receives a SIGTERM.

@mreiferson 🤕 Yes, that's a much better name.

@jehiah jehiah force-pushed the exit_closed_channel_1331 branch from 954e32b to 9f16b60 Compare April 4, 2021 02:06
@jehiah
Copy link
Member Author

jehiah commented Apr 4, 2021

PTAL @mreiferson I've updated to remove the TermSignal stub for now (will add w/ better naming when it provides different functionality) and while this uses the sync.Once in apps/nsqd it also implements a atomic.CompareAndSwapInt32 w/ the expectation that we'll likely need that soon (and as it's an exit path we don't need there isn't a performance concern w/ having both sides guard a double call)

Copy link
Member

@mreiferson mreiferson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

nsqd: panic when nsqd exits
3 participants