Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Graceful shutdown issues #2968

Closed
burdiyan opened this issue Sep 19, 2024 · 12 comments
Closed

Bug: Graceful shutdown issues #2968

burdiyan opened this issue Sep 19, 2024 · 12 comments

Comments

@burdiyan
Copy link
Contributor

burdiyan commented Sep 19, 2024

We recently started facing issues with graceful shutdown in our app. After receiving termination signal, the app still hangs and never exists until forcefully shut down.

After spending some time debugging, I've found our that this place in libp2p never returns:

https://github.com/libp2p/go-libp2p/blob/v0.36.3/config/host.go#L28

To clarify, we are using libp2p with AutoRelay, HolePunching, DHT, and other things. The node needs to run for a while before this problem occurs. I suspect that it could be AutoRelay that's causing this, because the problem starts occurring after AutoRelay starts doing periodic relay finding.

So, closableRoutedHost.Close() gets called, but the underlying fx.App's Stop method never returns.

@burdiyan
Copy link
Contributor Author

It's not easy for me to provide a clean reproduction for this, but you could clone this repo: https://github.com/seed-hypermedia/seed and do go run ./backend/cmd/seed-daemon. After leaving it for a while (until periodic auto relay logs are seen), and then pressing ctrl+c it can be seen that the Shutdown started, but it gets stuck.

Doing some very tedious and manual debugging I figured out that it gets stuck in the place I shared previously.

@sukunrt
Copy link
Member

sukunrt commented Sep 19, 2024

Can you check if the environment variable GODEBUG="asynctimerchan=1" fixes the issue. It's probably because of golang/go#69312

Alternatively, you can change your go version in your go.mod to go1.22.

@vyzo
Copy link
Contributor

vyzo commented Sep 19, 2024

I think I found a solution for the timer problem (will make pr for pubsub as well):

if !timer.Stop() {
  select {
  case <-timer.C:
  default:
  }
}

@burdiyan
Copy link
Contributor Author

@sukunrt Oooh, I see. Unfortunately I can't use Go 1.22 at this point, because I'm already using iterators in some places :)

I think the solution @vyzo proposes could work. I remember doing something similar in my own code at some point.

@marten-seemann
Copy link
Contributor

@vyzo I'd advise against making any changes to production code. This was a Go bug and is going to get fixed in Go 1.23.2. Just use the compiler flag @sukunrt mentioned for now.

@burdiyan
Copy link
Contributor Author

@sukunrt can you point to me to the exact timer that could be causing the shutdown issues?

@burdiyan
Copy link
Contributor Author

Confirming that running with GODEBUG="asynctimerchan=1" fixes the problem for me.

@sukunrt
Copy link
Member

sukunrt commented Sep 19, 2024

@vyzo that solution is racy for versions <= go1.22.

if !timer.Stop() {
  select {
  case <-timer.C:
  default:
  }
}

When timer.Stop returns false, it doesn't mean the value has been pushed to the channel. It only means that Stop didn't stop the timer from executing, the value may be available in the channel or will be pushed soon.

@vyzo
Copy link
Contributor

vyzo commented Sep 19, 2024

ok, fair enough; lets wait for the upstream fix then.

@sukunrt sukunrt closed this as completed Sep 19, 2024
@sukunrt sukunrt reopened this Sep 19, 2024
@sukunrt
Copy link
Member

sukunrt commented Sep 19, 2024

@sukunrt can you point to me to the exact timer that could be causing the shutdown issues?

One is in quic-go: see quic-go/quic-go#4659
One is in autonat: https://github.com/libp2p/go-libp2p/blob/master/p2p/host/autonat/autonat.go#L221

I'm sure there are some others in go-libp2p and the dependencies.

I'm keeping this issue open. I'll add some text in the next patch release regarding this, and close the issue.

@vyzo
Copy link
Contributor

vyzo commented Sep 19, 2024

there is one in pubsub too

@sukunrt
Copy link
Member

sukunrt commented Sep 26, 2024

fixed by v0.36.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants