Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Use SIGTERM instead of SIGKILL on PVF worker version mismatch #6981

Merged
merged 1 commit into from
Mar 30, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions node/core/pvf/src/worker_common.rs
Original file line number Diff line number Diff line change
Expand Up @@ -416,7 +416,7 @@ pub async fn framed_recv(r: &mut (impl AsyncRead + Unpin)) -> io::Result<Vec<u8>
Ok(buf)
}

/// In case of node and worker version mismatch (as a result of in-place upgrade), send `SIGKILL`
/// In case of node and worker version mismatch (as a result of in-place upgrade), send `SIGTERM`
/// to the node to tear it down and prevent it from raising disputes on valid candidates. Node
/// restart should be handled by the node owner. As node exits, unix sockets opened to workers
/// get closed by the OS and other workers receive error on socket read and also exit. Preparation
Expand All @@ -428,7 +428,7 @@ pub(crate) fn kill_parent_node_in_emergency() {
// some corner cases, which is checked. `kill()` never fails.
let ppid = libc::getppid();
if ppid > 1 {
libc::kill(ppid, libc::SIGKILL);
libc::kill(ppid, libc::SIGTERM);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is safer to try SIGTERM first and wait for the node to exit. If for some weird reason or bug it doesn't exit we could force terminate with SIGKILL .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if the node continues to run, nothing bad should happen? The node will also take up to 60 seconds if there is a future stalled.

}
}
}