Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

A global overseer shutdown handle #6979

Closed
s0me0ne-unkn0wn opened this issue Mar 29, 2023 · 6 comments · Fixed by #6981
Closed

A global overseer shutdown handle #6979

s0me0ne-unkn0wn opened this issue Mar 29, 2023 · 6 comments · Fixed by #6981

Comments

@s0me0ne-unkn0wn
Copy link
Contributor

In #6861, a hotfix was implemented to shut down the node from an external process if things are going terribly wrong to prevent raising unnecessary disputes. The current implementation sends SIGKILL to its parent node process, forcing it to terminate.

A much better solution would be a node-wide handle allowing the node to shut down gracefully in an emergency, for example, a SIGHUP or SIGUSR1 handler that sends a shutdown signal to the overseer. To implement it, we need to figure out how/if it is possible to signal the overseer shutdown from a system signal handler.

@bkchr
Copy link
Member

bkchr commented Mar 30, 2023

We already have signal handlers for SIGTERM and SIGINT which lead to a graceful shutdown: https://github.com/paritytech/substrate/blob/53f5bef318c563f3aa27b55e8c46a82e8e3d961f/client/cli/src/signals.rs#L41-L53

This should be fixed immediately as otherwise we risk data loss.

@mrcnski
Copy link
Contributor

mrcnski commented Mar 30, 2023

We should handle every termination signal that can be handled, like SIGQUIT (I didn't know about this one). See https://www.gnu.org/software/libc/manual/html_node/Termination-Signals.html. But SIGKILL can't be handled.

Edit: actually I'm not sure now after reading "Certain kinds of cleanups are best omitted in handling SIGQUIT."

This should be fixed immediately as otherwise we risk data loss.

Out of curiosity, what kind of data loss can it lead to? I would think critical things should be done in an ACID way.

@s0me0ne-unkn0wn
Copy link
Contributor Author

We already have signal handlers for SIGTERM and SIGINT which lead to a graceful shutdown

Wow. So just changing SIGKILL to SIGTERM would be enough? Cool.

Of course, I grepped code for SIGTERM in the first place, but not for SignalKind::terminate 😅

We should handle every termination signal that can be handled, like SIGQUIT

Well, we probably should, but did you ever see the SIGQUIT handler executed in your life? 😁
SIGTERM is sent on the system shutdown by init scripts (or systemd in modern systems), and SIGHUP is sent by Ctrl-C from the console, those are common use-cases, and it's usually enough to handle them. Handling every termination signal would involve much more rarely used signals like SIGUSR2 etc.

@mrcnski
Copy link
Contributor

mrcnski commented Mar 30, 2023

Yeah, I'm not sure that SIGQUIT should be handled. 🤷‍♂️ And also not sure if we need to handle any more than what's mentioned in the "Termination Signals" manual, like SIGUSR2, which seems like a convention only. https://www.gnu.org/software/libc/manual/html_node/Miscellaneous-Signals.html

SIGHUP is sent by Ctrl-C from the console, those are common use-cases

So we should handle this one for sure, looks like we don't right now.

Of course, I grepped code for SIGTERM in the first place, but not for SignalKind::terminate 😅

We could have a code comment there that mentions the signals, for clarity and greppability.

@vstakhov
Copy link
Contributor

and SIGHUP is sent by Ctrl-C

Nit: it is SIGINT not SIGHUP.

@s0me0ne-unkn0wn
Copy link
Contributor Author

So we should handle this one for sure, looks like we don't right now.

Yeah, mixed up with SIGINT, which is handled.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants