Cancel external triggers if one of multiple fails on start #2248

itowlson · 2024-01-26T01:26:17Z

There may still be a nasty if one of the child triggers fails to launch, or if a trigger unexpectedly exits without reporting an error. But this does fix, albeit uglily, the "typical" case of a trigger failing (and specifically a non-multi-aware trigger crashing during startup).

I'm concerned about using timers here, because they're obviously quite fragile; but I can't think of another way to get at "all spin trigger-foo process have had the chance to start their child trigger-foo processes and hook them up to Ctrl+C." Open to ideas!

rylev

ooof. It certainly isn't pretty. I think it's fine for now. I wonder if there's some sort of supervisor tree architecture we can adopt that makes some of this book keeping easier, but that's really hard when you have to jump the process boundary... Hmm....

rylev · 2024-01-29T13:48:15Z

src/commands/up.rs

+
+#[cfg(windows)]
+fn get_pids(_trigger_processes: &[tokio::process::Child]) -> Vec<usize> {
+    vec![]


I'm confused why this would be correct.

chuckle It's not "correct". This PR tracks the original Ctrl+C propagation code, which was cfg-ed out on Windows (because the signals stuff doesn't have an equivalent); so although I need to have a function here (and could indeed get OS-level process handles) I can't do anything meaningful with them, at least not without a lot more research than "preserve what the old code did".

I can look at implementing Ctrl+C propagation on Windows for sure. But I'd be inclined to look at a slightly different design for that, with processes starting suspended and being placed in a job object. That's more than I want to bite off in this PR though...!

lann · 2024-01-29T13:55:58Z

src/commands/external.rs

+#[cfg(windows)]
+fn set_kill_on_ctrl_c(_child: &tokio::process::Child) {}
+
+#[cfg(not(windows))]
+fn set_kill_on_ctrl_c(child: &tokio::process::Child) {
+    if let Some(pid) = child.id().map(|id| nix::unistd::Pid::from_raw(id as i32)) {
+        _ = ctrlc::set_handler(move || {
+            _ = nix::sys::signal::kill(pid, nix::sys::signal::SIGTERM);
+        });
+    }
+}


Another option 🤷

Suggested change

#[cfg(windows)]

fn set_kill_on_ctrl_c(_child: &tokio::process::Child) {}

#[cfg(not(windows))]

fn set_kill_on_ctrl_c(child: &tokio::process::Child) {

if let Some(pid) = child.id().map(|id| nix::unistd::Pid::from_raw(id as i32)) {

_ = ctrlc::set_handler(move || {

_ = nix::sys::signal::kill(pid, nix::sys::signal::SIGTERM);

});

}

}

fn set_kill_on_ctrl_c(child: &tokio::process::Child) {

if cfg!(windows) {

return;

}

if let Some(pid) = child.id().map(|id| nix::unistd::Pid::from_raw(id as i32)) {

_ = ctrlc::set_handler(move || {

_ = nix::sys::signal::kill(pid, nix::sys::signal::SIGTERM);

});

}

}

lann · 2024-01-29T13:59:51Z

src/commands/up.rs

+// this because it kills the Spin process but that doesn't cascade to the
+// child plugin trigger process.) So add a hopefully insignificant delay
+// between them to reduce the chance of this happening.
+const MULTI_TRIGGER_START_OFFSET: tokio::time::Duration = tokio::time::Duration::from_millis(20);


tokio::time::Duration is an alias for std::time::Duration.

I'm...not really sure why they did that...

https://github.com/tokio-rs/tokio/blob/131e7b4e49c8849298ba54b4e0c99f4b81d869e3/tokio/src/time/mod.rs#L107-L109

// Re-export for convenience

🤔

lann · 2024-01-29T14:41:24Z

src/commands/up.rs

+            if is_multi {
+                // Allow time for the child `spin` process to launch the trigger
+                // and hook up its cancellation. Mitigates the race condition
+                // noted on the constant (see there for more info).
+                tokio::time::sleep(MULTI_TRIGGER_START_OFFSET).await;
+            }


It doesn't seem like this should be necessary both here and above?

lann · 2024-01-29T14:45:20Z

src/commands/up.rs

+#[cfg(windows)]
+fn get_pids(_trigger_processes: &[tokio::process::Child]) -> Vec<usize> {
+    vec![]
+}
+
+#[cfg(not(windows))]
+fn get_pids(trigger_processes: &[tokio::process::Child]) -> Vec<nix::unistd::Pid> {
+    use itertools::Itertools;
    // https://github.com/nix-rust/nix/issues/656
-    let pids = trigger_processes
+    trigger_processes
        .iter()
        .flat_map(|child| child.id().map(|id| nix::unistd::Pid::from_raw(id as i32)))
-        .collect_vec();
-    ctrlc::set_handler(move || {
-        for pid in &pids {
-            if let Err(err) = nix::sys::signal::kill(*pid, nix::sys::signal::SIGTERM) {
-                tracing::warn!("Failed to kill trigger handler process: {:?}", err)
-            }
+        .collect_vec()
+}
+
+#[cfg(windows)]
+fn kill_them_all_god_will_know_his_own(_pids: &[usize]) {}
+
+#[cfg(not(windows))]
+fn kill_them_all_god_will_know_his_own(pids: &[nix::unistd::Pid]) {
+    // https://github.com/nix-rust/nix/issues/656
+    for pid in pids {
+        // println!("killing {pid}");
+        if let Err(err) = nix::sys::signal::kill(*pid, nix::sys::signal::SIGTERM) {
+            tracing::warn!("Failed to kill trigger handler process: {:?}", err)


I think this could be simplified a bit by passing the u32s from Child::id into kill_... and doing the cast to Pids in there.

Signed-off-by: itowlson <ivan.towlson@fermyon.com>

itowlson force-pushed the terminate-child-triggers-on-trigger-exit branch from f1ac4a3 to 6f1bcd5 Compare January 26, 2024 01:58

itowlson marked this pull request as ready for review January 28, 2024 23:07

rylev approved these changes Jan 29, 2024

View reviewed changes

lann reviewed Jan 29, 2024

View reviewed changes

Cancel external triggers if one of multiple fails on start

9ad3591

Signed-off-by: itowlson <ivan.towlson@fermyon.com>

itowlson force-pushed the terminate-child-triggers-on-trigger-exit branch from 6f1bcd5 to 9ad3591 Compare January 29, 2024 19:44

itowlson merged commit 04aa3bb into spinframework:main Jan 29, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cancel external triggers if one of multiple fails on start #2248

Cancel external triggers if one of multiple fails on start #2248

itowlson commented Jan 26, 2024

rylev left a comment

rylev Jan 29, 2024

itowlson Jan 29, 2024

lann Jan 29, 2024

lann Jan 29, 2024

lann Jan 29, 2024

lann Jan 29, 2024

Cancel external triggers if one of multiple fails on start #2248

Cancel external triggers if one of multiple fails on start #2248

Conversation

itowlson commented Jan 26, 2024

rylev left a comment

Choose a reason for hiding this comment

rylev Jan 29, 2024

Choose a reason for hiding this comment

itowlson Jan 29, 2024

Choose a reason for hiding this comment

lann Jan 29, 2024

Choose a reason for hiding this comment

lann Jan 29, 2024

Choose a reason for hiding this comment

lann Jan 29, 2024

Choose a reason for hiding this comment

lann Jan 29, 2024

Choose a reason for hiding this comment