Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network worker not shutting down #5775

Open
nflaig opened this issue Jul 18, 2023 · 5 comments · Fixed by #5946
Open

Network worker not shutting down #5775

nflaig opened this issue Jul 18, 2023 · 5 comments · Fixed by #5946
Assignees
Labels
meta-bug Issues that identify a bug and require a fix. meta-investigate Issues found that require further investigation and may not have a specific resolution/fix prio-medium Resolve this some time soon (tm).

Comments

@nflaig
Copy link
Member

nflaig commented Jul 18, 2023

Describe the bug

The network worker thread is not shutting down in some cases.

Force closing (kill) does not help either because worker does receive signal.

Jul-17 17:08:54.000[]                 info: Synced - slot: 6092144 - head: 0xe00e…5e94 - exec-block: valid(9361504 0x8772…) - finalized: 0x7710…765c:190377 - peers: 51
^CJul-17 17:09:01.307[]                 info: Stopping gracefully, use Ctrl+C again to force process exit
Jul-17 17:09:06.176[]                 info: Synced - slot: 6092145 - head: (slot -1) 0xe00e…5e94 - exec-block: valid(9361504 0x8772…) - finalized: 0x7710…765c:190377 - peers: 50
^CJul-17 17:09:10.028[]                 info: Forcing process exit
^C^C^C^C^C./lodestar: line 7: 2357120 Killed                  node --trace-deprecation --max-old-space-size=4096 ./packages/cli/bin/lodestar.js "$@

It is pretty clear based on observed logs that the issue is the network worker.

"terminating network worker" is logged

this.modules.logger.debug("terminating network worker");

but "terminated network worker" is not

this.modules.logger.debug("terminated network worker");

I checked the log files, there is nothing that peaks out, likely just the same issue we see on the main thread with libp2p

Can likely be resolved by calling process.exit on worker explicitly or force closing it using another approach.

Expected behavior

Network worker should shut down cleanly and not hang the main process.

Steps to reproduce

Run Lodestar beacon node with --network.useWorker true flag.

Additional context

The problem seems to be with libp2p which in the end must take care of closing all connections / removing tcp listeners. There are several closed but also open issues regarding connections not being closed properly.

This comment libp2p/js-libp2p#436 (comment) summarizes open tasks but there was no progress in a while.

Operating system

Linux

Lodestar version or commit hash

ec81531

@nflaig
Copy link
Member Author

nflaig commented Oct 17, 2023

Reopening until we can confirm this is fixed by upgrading libp2p, see #5642 (comment).

@nflaig nflaig reopened this Oct 17, 2023
@philknows philknows added meta-investigate Issues found that require further investigation and may not have a specific resolution/fix prio-low This is nice to have. labels Nov 7, 2023
@nflaig nflaig added prio-medium Resolve this some time soon (tm). and removed prio-low This is nice to have. labels Mar 27, 2024
@nflaig
Copy link
Member Author

nflaig commented Mar 27, 2024

This is still an issue in the latest release as confirmed by seamonkey

@nflaig
Copy link
Member Author

nflaig commented Apr 15, 2024

Might be resolved in the next node release, see fix nodejs/node#51526 and another related issue tinylibs/tinypool#54

@wemeetagain
Copy link
Member

@nflaig I haven't heard of this happening lately, stale?

@nflaig
Copy link
Member Author

nflaig commented Oct 15, 2024

This is kinda related to #6053 as well, I haven't seen this myself in a while which is good because hanging work was a pain to deal with.

Should be observed more closely and can probably close it

we also wanna consider removing the timeout if we don't see it anymore

// In some cases, `libp2p.stop` never resolves, it is required
// to wrap the call with a timeout to allow for a timely shutdown
// See https://github.com/ChainSafe/lodestar/issues/6053
await withTimeout(async () => this.libp2p.stop(), 5000);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta-bug Issues that identify a bug and require a fix. meta-investigate Issues found that require further investigation and may not have a specific resolution/fix prio-medium Resolve this some time soon (tm).
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants