Beacon node exits uncontrolled (too early) during graceful shutdown #5317

nflaig · 2023-03-29T16:07:08Z

Problem

Lodestar BN does not wait for all jobs to be finished during graceful shutdown.
This is problematic as some operations are not executed such as writing BLSToExecutionChanges to disk (see discord discussion)

Root cause

Lodestar uses threads package which installs SIGTERM signal listeners that exit the main process once all workers are terminated, see terminateWorkersAndMaster.

Related

g11tech · 2023-03-29T17:22:55Z

yes i confirm this to be issue as pointed by @nflaig : https://github.com/andywer/threads.js/blob/0d1a882f09273cb8482902ae2601fe4ffed6fbab/src/master/implementation.node.ts#L142

as soon as i comment out the lines in the local nodemodules, node close follow all close steps (but obviously workers keep going on still), else it seems to be processing this event on the first true async await call (await network.close)

So we might need to export the terminateWorkerAndMaster and call it explicity and remove these

process.on("SIGINT", () => terminateWorkersAndMaster())
  process.on("SIGTERM", () => terminateWorkersAndMaster())

``

nflaig · 2023-04-04T17:29:40Z

As discussed in Apr 4 standup we want to push a fix for this before the Shapella hardfork to address the BLSToExecutionChanges issue and other possible issues related to not shutting down properly.

For the quickfix we will have to

release a new version of @chainsafe/threads @wemeetagain
explicitly call process.exit after node.close succeeds (code: 0) or fails (code: 1) @nflaig
retest that BLSToExecutionChanges are correctly persistet, best if @g11tech can confirm this once patch is implemented

In #5330 a cleaner solution will be implemented to make sure all active handlers are actually terminated and we don't need to rely on process.exit. I documented how to work with the branch in case anyone is bored and wants to go on an active handles hunt

philknows · 2023-04-15T05:26:06Z

Due to not being able to make the timeline for the Shapella hard fork (wasn't super urgent since other clients were able to handle the load as well), we can wait for #5330 and fix this optimally rather than implementing a workaround.

nflaig added the prio-high Resolve issues as soon as possible. label Mar 29, 2023

g11tech mentioned this issue Mar 30, 2023

fix: remove process signals handling ChainSafe/threads.js#1

Merged

philknows added this to the v1.8.0 milestone Mar 31, 2023

nflaig mentioned this issue Apr 1, 2023

fix(beacon-node): cleanly exit process on graceful shutdown #5330

Merged

7 tasks

nflaig modified the milestones: v1.8.0, v1.9.0 May 6, 2023

wemeetagain closed this as completed in #5330 May 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Beacon node exits uncontrolled (too early) during graceful shutdown #5317

Beacon node exits uncontrolled (too early) during graceful shutdown #5317

nflaig commented Mar 29, 2023 •

edited

Loading

g11tech commented Mar 29, 2023

nflaig commented Apr 4, 2023 •

edited

Loading

philknows commented Apr 15, 2023

Beacon node exits uncontrolled (too early) during graceful shutdown #5317

Beacon node exits uncontrolled (too early) during graceful shutdown #5317

Comments

nflaig commented Mar 29, 2023 • edited Loading

g11tech commented Mar 29, 2023

nflaig commented Apr 4, 2023 • edited Loading

philknows commented Apr 15, 2023

nflaig commented Mar 29, 2023 •

edited

Loading

nflaig commented Apr 4, 2023 •

edited

Loading