-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cancel external triggers if one of multiple fails on start #2248
Cancel external triggers if one of multiple fails on start #2248
Conversation
f1ac4a3
to
6f1bcd5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooof. It certainly isn't pretty. I think it's fine for now. I wonder if there's some sort of supervisor tree architecture we can adopt that makes some of this book keeping easier, but that's really hard when you have to jump the process boundary... Hmm....
|
||
#[cfg(windows)] | ||
fn get_pids(_trigger_processes: &[tokio::process::Child]) -> Vec<usize> { | ||
vec![] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused why this would be correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chuckle It's not "correct". This PR tracks the original Ctrl+C propagation code, which was cfg-ed out on Windows (because the signals stuff doesn't have an equivalent); so although I need to have a function here (and could indeed get OS-level process handles) I can't do anything meaningful with them, at least not without a lot more research than "preserve what the old code did".
I can look at implementing Ctrl+C propagation on Windows for sure. But I'd be inclined to look at a slightly different design for that, with processes starting suspended and being placed in a job object. That's more than I want to bite off in this PR though...!
#[cfg(windows)] | ||
fn set_kill_on_ctrl_c(_child: &tokio::process::Child) {} | ||
|
||
#[cfg(not(windows))] | ||
fn set_kill_on_ctrl_c(child: &tokio::process::Child) { | ||
if let Some(pid) = child.id().map(|id| nix::unistd::Pid::from_raw(id as i32)) { | ||
_ = ctrlc::set_handler(move || { | ||
_ = nix::sys::signal::kill(pid, nix::sys::signal::SIGTERM); | ||
}); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option 🤷
#[cfg(windows)] | |
fn set_kill_on_ctrl_c(_child: &tokio::process::Child) {} | |
#[cfg(not(windows))] | |
fn set_kill_on_ctrl_c(child: &tokio::process::Child) { | |
if let Some(pid) = child.id().map(|id| nix::unistd::Pid::from_raw(id as i32)) { | |
_ = ctrlc::set_handler(move || { | |
_ = nix::sys::signal::kill(pid, nix::sys::signal::SIGTERM); | |
}); | |
} | |
} | |
fn set_kill_on_ctrl_c(child: &tokio::process::Child) { | |
if cfg!(windows) { | |
return; | |
} | |
if let Some(pid) = child.id().map(|id| nix::unistd::Pid::from_raw(id as i32)) { | |
_ = ctrlc::set_handler(move || { | |
_ = nix::sys::signal::kill(pid, nix::sys::signal::SIGTERM); | |
}); | |
} | |
} |
// this because it kills the Spin process but that doesn't cascade to the | ||
// child plugin trigger process.) So add a hopefully insignificant delay | ||
// between them to reduce the chance of this happening. | ||
const MULTI_TRIGGER_START_OFFSET: tokio::time::Duration = tokio::time::Duration::from_millis(20); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tokio::time::Duration
is an alias for std::time::Duration
.
I'm...not really sure why they did that...
// Re-export for convenience
🤔
if is_multi { | ||
// Allow time for the child `spin` process to launch the trigger | ||
// and hook up its cancellation. Mitigates the race condition | ||
// noted on the constant (see there for more info). | ||
tokio::time::sleep(MULTI_TRIGGER_START_OFFSET).await; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem like this should be necessary both here and above?
src/commands/up.rs
Outdated
#[cfg(windows)] | ||
fn get_pids(_trigger_processes: &[tokio::process::Child]) -> Vec<usize> { | ||
vec![] | ||
} | ||
|
||
#[cfg(not(windows))] | ||
fn get_pids(trigger_processes: &[tokio::process::Child]) -> Vec<nix::unistd::Pid> { | ||
use itertools::Itertools; | ||
// https://github.com/nix-rust/nix/issues/656 | ||
let pids = trigger_processes | ||
trigger_processes | ||
.iter() | ||
.flat_map(|child| child.id().map(|id| nix::unistd::Pid::from_raw(id as i32))) | ||
.collect_vec(); | ||
ctrlc::set_handler(move || { | ||
for pid in &pids { | ||
if let Err(err) = nix::sys::signal::kill(*pid, nix::sys::signal::SIGTERM) { | ||
tracing::warn!("Failed to kill trigger handler process: {:?}", err) | ||
} | ||
.collect_vec() | ||
} | ||
|
||
#[cfg(windows)] | ||
fn kill_them_all_god_will_know_his_own(_pids: &[usize]) {} | ||
|
||
#[cfg(not(windows))] | ||
fn kill_them_all_god_will_know_his_own(pids: &[nix::unistd::Pid]) { | ||
// https://github.com/nix-rust/nix/issues/656 | ||
for pid in pids { | ||
// println!("killing {pid}"); | ||
if let Err(err) = nix::sys::signal::kill(*pid, nix::sys::signal::SIGTERM) { | ||
tracing::warn!("Failed to kill trigger handler process: {:?}", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be simplified a bit by passing the u32
s from Child::id
into kill_...
and doing the cast to Pid
s in there.
Signed-off-by: itowlson <ivan.towlson@fermyon.com>
6f1bcd5
to
9ad3591
Compare
Fixes #2245.
There may still be a nasty if one of the child triggers fails to launch, or if a trigger unexpectedly exits without reporting an error. But this does fix, albeit uglily, the "typical" case of a trigger failing (and specifically a non-multi-aware trigger crashing during startup).
I'm concerned about using timers here, because they're obviously quite fragile; but I can't think of another way to get at "all
spin trigger-foo
process have had the chance to start their childtrigger-foo
processes and hook them up to Ctrl+C." Open to ideas!