-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for async
call hooks
#3876
Add support for async
call hooks
#3876
Conversation
Subscribe to Label Actioncc @peterhuene
This issue or pull request has been labeled: "wasmtime:api"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me, thanks! I mostly just commented below how lots of fallible error handling I think can be asserting instead since it's indicative of a bug in Wasmtime if we hit the None
case for the AsyncCx
in most locations (it's just the one on CallHook
that matters).
Additionally though can you be sure to add some tests for this, also specifically for a case such as where a hook is registered and the async invocation is interrupted?
@@ -1251,11 +1310,15 @@ impl StoreOpaque { | |||
|
|||
#[cfg(feature = "async")] | |||
#[inline] | |||
pub fn async_cx(&self) -> AsyncCx { | |||
pub fn async_cx(&self) -> Option<AsyncCx> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some documentation to this function as to why Option
is the return value? (notably pointing out as well that this panics if async_support
is not enabled, whereas the signature might naively appear to return None
if async support is not enabled)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason not to just make the check be for both conditions (i.e., we have async_support
and we're not being torn down), rather than panicking if async_support
is not enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I've added the basic docs, with the panic sill allowed, in 09d62a4)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking is that the decision about whether to do something async or not is a higher-level concern than this function, so something shouldn't unconditionally call this and then act on the return value, instead it should have already made some prior decision to call this function and then it acts on the return value as necessary. (plus having two different None
states would mean we'd probably have to return some sort of custom enum which is a bit of a pain)
pub trait CallHookHandler<T>: Send { | ||
/// A callback to run when wasmtime is about to enter a host call, or when about to | ||
/// exit the hostcall. | ||
async fn handle_call_event(&self, t: &mut T, ch: CallHook) -> Result<(), crate::Trap>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One worry here: async_trait
means this will be using a Box
under the hood, which means we incur an allocation for every hook invocation, i.e. two times per hostcall.
A different factoring might allow us to allocate once up front (for the hook itself), basically by making this a poll
-style function rather than an async fn
.
How worried are we about these costs? AIUI we care a lot about host/guest boundary crossing being cheap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that's actually a good point! I haven't ever done benchmarking myself about the costs with call hooks enabled, but I suspect that even as-is it's already at least an order of magnitude slower than "raw" syscalls. That being said though even when wiggle is layered on top I suspect it's even slower as we haven't done much work to optimize wiggle bits (I know there's a Mutex
somewhere right now frobbed on all entrances and exits, but the mutex is always un-contended). In that sense it might be good to benchmark this locally in something like Viceroy perhaps and see if the overhead is meaningful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. Seems like it'd be interesting data to have around -- comparing no call hook, sync call hook, and async call hook.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To assist in poking around with performance bits I wrote up #3883 which should be a good spot we can insert an async call hook and see the performance. My expectation is that given how slow most async things already are an extra allocation here isn't going to really be all that expensive. For comparison calling into wasm itself is on the order of microseconds instead of nanoseconds because of how our stack management currently works. Calling into the host already allocates a Box
'd future per call so allocating 2 more probably isn't the end of the world.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the benchmark added there the host-calling-wasm case has negligible overhead with this design of async hooks because the cost of that transition is on the order of a microsecond where the allocations here are more like nanoseconds. I think setting up the stack and unconditionally doing a context switch is probably the expensive part.
For wasm-calling-host this imposes a ~40ns overhead. If the host function is defined via Func::wrap
("typed" and "synchronous") then the sync hook vs async hook is 7ns vs 46ns. For functiosn defined via Func::wrapN_async
("typed" and async) then the sync hook vs async hook is 29ns vs 69ns.
Now whether those specific numbers make sense in a broader context I'm not sure, but I would suspect that most of the cost is indeed the allocation of the returned future from this trait.
The goal of this new benchmark, `call`, is to help us measure the overhead of both calling into WebAssembly from the host as well as calling the host from WebAssembly. There's lots of various ways to measure this so this benchmark is a bit large but should hopefully be pretty thorough. It's expected that this benchmark will rarely be run in its entirety but rather only a subset of the benchmarks will be run at any one given time. Some metrics measured here are: * Typed vs Untyped vs Unchecked - testing the cost of both calling wasm with these various methods as well as having wasm call the host where the host function is defined with these various methods. * With and without `call_hook` - helps to measure the overhead of the `Store::call_hook` API. * Synchronous and Asynchronous - measures the overhead of calling into WebAssembly asynchronously (with and without the pooling allocator) in addition to defining host APIs in various methods when wasm is called asynchronously. Currently all the numbers are as expected, notably: * Host calling WebAssembly is ~25ns of overhead * WebAssembly calling the host is ~3ns of overhead * "Unchecked" is a bit slower than "typed", and "Untyped" is slower than unchecked. * Asynchronous wasm calling a synchronous host function has ~3ns of overhead (nothing more than usual). * Asynchronous calls are much slower, on the order of 2-3us due to `madvise`. Lots of other fiddly bits that can be measured here, but this will hopefully help establish a benchmark through which we can measure in the future in addition to measuring changes such as bytecodealliance#3876
The goal of this new benchmark, `call`, is to help us measure the overhead of both calling into WebAssembly from the host as well as calling the host from WebAssembly. There's lots of various ways to measure this so this benchmark is a bit large but should hopefully be pretty thorough. It's expected that this benchmark will rarely be run in its entirety but rather only a subset of the benchmarks will be run at any one given time. Some metrics measured here are: * Typed vs Untyped vs Unchecked - testing the cost of both calling wasm with these various methods as well as having wasm call the host where the host function is defined with these various methods. * With and without `call_hook` - helps to measure the overhead of the `Store::call_hook` API. * Synchronous and Asynchronous - measures the overhead of calling into WebAssembly asynchronously (with and without the pooling allocator) in addition to defining host APIs in various methods when wasm is called asynchronously. Currently all the numbers are as expected, notably: * Host calling WebAssembly is ~25ns of overhead * WebAssembly calling the host is ~3ns of overhead * "Unchecked" is a bit slower than "typed", and "Untyped" is slower than unchecked. * Asynchronous wasm calling a synchronous host function has ~3ns of overhead (nothing more than usual). * Asynchronous calls are much slower, on the order of 2-3us due to `madvise`. Lots of other fiddly bits that can be measured here, but this will hopefully help establish a benchmark through which we can measure in the future in addition to measuring changes such as #3876
…e on a dying fiber. This situation should never occur in the existing code base, but can be triggered if support for running outside async code in a call hook.
…dying. This should never happen in the existing code base, but is a nice forward-looking guard. The current implementations simply lift the trap that would eventually be produced by such an operation into a `Trap` (or similar) at the invocation of `async_cx()`.
This retains the ability to do non-async hooks. Hooks end up being implemented as an async trait with a handler call, to get around some issues passing around async closures. This change requires some of the prior changes to handle picking up blocked tasks during fiber shutdown, to avoid some panics during timeouts and other such events.
…em into expect()/unwrap(). The justification for this revert is that (a) these events shouldn't happen, and (b) they wouldn't be catchable by wasm anyways.
…heck. This also moves the checks inside of their respective Async variants, meaning that if you're using an async-enabled version of wasmtime but using the synchronous versions of the callbacks, you won't pay any penalty for validating the async context.
In the prior version, we only checked that the box had not been cleared, but had not ensured that there was an actual context for us to use. This updates the check to validate both, returning None if the inner context is missing. This allows us to skip a validation check inside `block_on`, since all callers will have run through the `async_cx` check prior to arrival.
207f2e6
to
48e7f6e
Compare
Apologies for the rebase 😞, but this should be good to go. The one thing I haven't figured out is a test case that triggers |
Should help exercise that the check for `None` is properly handled in a few more locations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries! I pushed up what I was thinking for a test which looks like it panics if the one case of .ok_or()
switches to a unwrap()
which is mainly what I wanted to make sure we tested.
The present runtime supports the ability to add aspect-like hooks to wasm entrance/exit via the
call_hook
method. In some cases, it can be useful to have this hook do asynchronous operations; to acquire a lock, for example.This patch series provides an alternate way for users to provide a call hook that mirrors the resource limiter callbacks. To get around some trickiness with Rust types, closures, and Futures,
call_hook_async
takes an object on which a simple trait is implemented. This hook is then executed using the existingblock_on
mechanism.Beyond just adding this capability, this patch makes two other changes to the code base to avoid an error condition in which one of these tasks becomes scheduled right as an async fiber is being dropped. The first changes the
async_cx
method so that it returnsOption<AsyncCx>
instead of justAsyncCx
; this can be an early detection mechanism for the fiber being dropped. The more key one is a late check inblock_on
, which turns a former-panic into being a catchable an error condition.Many thanks to @alexcrichton, who figured out the "resuming a task on a dying thread" issue.