Skip to content

Make it possible to kill guest execution when running a host function. #192

@danbugs

Description

@danbugs

Currently, it is not possible to interrupt or cancel execution if the guest is calling a host function. This means that, if the host function hangs, then the call will never return or get cancelled. This gets surfaced, like so:

HyperlightError::GuestExecutionHungOnHostFunctionCall() => {}

One possible solution

When running with the seccomp feature on, host functions are wrapped in their own thread like so:

let join_handle = std::thread::Builder::new()
.name(format!("Host Function Worker Thread for: {:?}", name_cloned))
.spawn(move || {
// We have a `catch_unwind` here because, if a disallowed syscall is issued,
// we handle it by panicking. This is to avoid returning execution to the
// offending host function—for two reasons: (1) if a host function is issuing
// disallowed syscalls, it could be unsafe to return to, and (2) returning
// execution after trapping the disallowed syscall can lead to UB (e.g., try
// running a host function that attempts to sleep without `SYS_clock_nanosleep`,
// you'll block the syscall but panic in the aftermath).
match std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| call_func(&host_funcs_cloned, &name_cloned, args_cloned))) {
Ok(val) => val,
Err(err) => {
if let Some(crate::HyperlightError::DisallowedSyscall) = err.downcast_ref::<crate::HyperlightError>() {
return Err(crate::HyperlightError::DisallowedSyscall)
}
crate::log_then_return!("Host function {} panicked", name_cloned);
}
}
})?;

You could leverage these threads to cancel execution in the same way we cancel execution in the guest:

let thread_id = self.execution_variables.get_thread_id()?;
if thread_id == u64::MAX {
log_then_return!("Failed to get thread id to signal thread");
}
let mut count: i32 = 0;
// We need to send the signal multiple times in case the thread was between checking if it
// should be cancelled and entering the run loop
// We cannot do this forever (if the thread is calling a host function that never
// returns we will sit here forever), so use the timeout_wait_to_cancel to limit the number
// of iterations
let number_of_iterations =
self.configuration.max_wait_for_cancellation.as_micros() / 500;
while !self.execution_variables.run_cancelled.load() {
count += 1;
if count > number_of_iterations.try_into().unwrap() {
break;
}
info!(
"Sending signal to thread {} iteration: {}",
thread_id, count
);
let ret = unsafe { pthread_kill(thread_id, SIGRTMIN()) };
// We may get ESRCH if we try to signal a thread that has already exited
if ret < 0 && ret != ESRCH {
log_then_return!("error {} calling pthread_kill", ret);
}
std::thread::sleep(Duration::from_micros(500));
}

Though, this would mean always wrapping host function calls with an extra thread and that might be naive in terms of perf.

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/confirmedBug is verified or proposal seems reasonable

    Projects

    Status

    No status

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions