-
Notifications
You must be signed in to change notification settings - Fork 154
Description
Currently, it is not possible to interrupt or cancel execution if the guest is calling a host function. This means that, if the host function hangs, then the call will never return or get cancelled. This gets surfaced, like so:
| HyperlightError::GuestExecutionHungOnHostFunctionCall() => {} |
One possible solution
When running with the seccomp feature on, host functions are wrapped in their own thread like so:
hyperlight/src/hyperlight_host/src/sandbox/host_funcs.rs
Lines 208 to 228 in b9c67fb
| let join_handle = std::thread::Builder::new() | |
| .name(format!("Host Function Worker Thread for: {:?}", name_cloned)) | |
| .spawn(move || { | |
| // We have a `catch_unwind` here because, if a disallowed syscall is issued, | |
| // we handle it by panicking. This is to avoid returning execution to the | |
| // offending host function—for two reasons: (1) if a host function is issuing | |
| // disallowed syscalls, it could be unsafe to return to, and (2) returning | |
| // execution after trapping the disallowed syscall can lead to UB (e.g., try | |
| // running a host function that attempts to sleep without `SYS_clock_nanosleep`, | |
| // you'll block the syscall but panic in the aftermath). | |
| match std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| call_func(&host_funcs_cloned, &name_cloned, args_cloned))) { | |
| Ok(val) => val, | |
| Err(err) => { | |
| if let Some(crate::HyperlightError::DisallowedSyscall) = err.downcast_ref::<crate::HyperlightError>() { | |
| return Err(crate::HyperlightError::DisallowedSyscall) | |
| } | |
| crate::log_then_return!("Host function {} panicked", name_cloned); | |
| } | |
| } | |
| })?; |
You could leverage these threads to cancel execution in the same way we cancel execution in the guest:
hyperlight/src/hyperlight_host/src/hypervisor/hypervisor_handler.rs
Lines 729 to 762 in b9c67fb
| let thread_id = self.execution_variables.get_thread_id()?; | |
| if thread_id == u64::MAX { | |
| log_then_return!("Failed to get thread id to signal thread"); | |
| } | |
| let mut count: i32 = 0; | |
| // We need to send the signal multiple times in case the thread was between checking if it | |
| // should be cancelled and entering the run loop | |
| // We cannot do this forever (if the thread is calling a host function that never | |
| // returns we will sit here forever), so use the timeout_wait_to_cancel to limit the number | |
| // of iterations | |
| let number_of_iterations = | |
| self.configuration.max_wait_for_cancellation.as_micros() / 500; | |
| while !self.execution_variables.run_cancelled.load() { | |
| count += 1; | |
| if count > number_of_iterations.try_into().unwrap() { | |
| break; | |
| } | |
| info!( | |
| "Sending signal to thread {} iteration: {}", | |
| thread_id, count | |
| ); | |
| let ret = unsafe { pthread_kill(thread_id, SIGRTMIN()) }; | |
| // We may get ESRCH if we try to signal a thread that has already exited | |
| if ret < 0 && ret != ESRCH { | |
| log_then_return!("error {} calling pthread_kill", ret); | |
| } | |
| std::thread::sleep(Duration::from_micros(500)); | |
| } |
Though, this would mean always wrapping host function calls with an extra thread and that might be naive in terms of perf.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status