-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Attempt to graceful shutdown in case of panics #8999
Conversation
Do we have any Drop handlers that are essential for maintaining consistent state? Leaking memory on shutdown is fine, since OS will take care of reclaiming resources. What about DB flush? |
A basic search of
I think what might matter most are database waits. If we don't shutdown cleanly and wait for locks then it's possible to break db consistency (like all the "block/transaction not found" panics). |
There are also generic Drop impls:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good except for the nits.
parity/main.rs
Outdated
/// Status used to exit or restart the program. | ||
struct ExitStatus { | ||
/// Whether the program panicked. | ||
pub panicking: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: these fields don't have to be public.
parity/main.rs
Outdated
spec_name_override: None | ||
}), Condvar::new())); | ||
|
||
let exiting = Arc::new(AtomicBool::new(false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should document here what you said in the comment of the PR.
It's not obvious why this exists and is not part of ExitStatus
.
I'm not totally sure that trying to exit gracefully may not lead to deadlocks. |
@tomaka Not sure about that either. In shutdown sequence we have |
…rp_sync_on_light_client * 'master' of https://github.com/paritytech/parity: Attempt to graceful shutdown in case of panics (openethereum#8999) simplify kvdb error types (openethereum#8924) Add option for user to set max size limit for RPC requests (openethereum#9010) bump ntp to 0.5.0 (openethereum#9009) Removed duplicate dependency (openethereum#9021) Minimal effective gas price in the queue (openethereum#8934)
Panics in our code indicates logic errors. However, it still happens. And if it does, the current behavior causes all Drop handlers not being called and possibly lead to resources not being freed or shutdown. This can lead to a "slippery slop", where if panic happens to an user, it can lead to other panics in the future (with a totally unrelated reason that won't help us to debug the issue). This possibly resulted in the "transaction/block not found in the database; qed" and other panic errors we occasionally see in some issue reports.
This PR makes it so that it still aborts on panic initially when the program starts. After it has started, at the first occasion when we can, switch to graceful shutdown.
std::panic::set_hook
is thread-safe as it internally uses Mutex.should_exit
boolean is added toExitStatus
, because we may have a situation where a panic happens afterpanic_hook::set_with
, but before we can wait onexit.1
.exiting
atomic boolean is used because double panic can happen, and as we lockExitStatus
after the main thread is notified, it cannot be locked again.