-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't re-capture backtraces when propagating traps through host frames #5049
Don't re-capture backtraces when propagating traps through host frames #5049
Conversation
…he stack when we trap
This fixes some accidentally quadratic code where we would re-capture a Wasm stack trace (takes `O(n)` time) every time we propagated a trap through a host frame back to Wasm (can happen `O(n)` times). And `O(n) * O(n) = O(n^2)`, of course. Whoops. After this commit, it trapping with a call stack that is `n` frames deep of Wasm-to-host-to-Wasm calls just captures a single backtrace and is therefore just a proper `O(n)` time operation, as it is intended to be. Now we explicitly track whether we need to capture a Wasm backtrace or not when raising a trap. This unfortunately isn't as straightforward as one might hope, however, because of the split between `wasmtime::Trap` and `wasmtime_runtime::Trap`. We need to decide whether or not to capture a Wasm backtrace inside `wasmtime_runtime` but in order to determine whether to do that or not we need to reflect on the `anyhow::Error` and see if it is a `wasmtime::Trap` that already has a backtrace or not. This can't be done the straightforward way because it would introduce a cyclic dependency between the `wasmtime` and `wasmtime-runtime` crates. We can't merge those two `Trap` types-- at least not without effectively merging the whole `wasmtime` and `wasmtime-runtime` crates together, which would be a good idea in a perfect world but would be a *ton* of ocean boiling from where we currently are -- because `wasmtime::Trap` does symbolication of stack traces which relies on module registration information data that resides inside the `wasmtime` crate and therefore can't be moved into `wasmtime-runtime`. We resolve this problem by adding a boolean to `wasmtime_runtime::raise_user_trap` that controls whether we should capture a Wasm backtrace or not, and then determine whether we need a backtrace or not at each of that function's call sites, which are in `wasmtime` and therefore can do the reflection to determine whether the user trap already has a backtrace or not. Phew! Fixes bytecodealliance#5037
Difference in benchmark results after the third commit:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
@Stebalien would you be up for testing this against the reproducer from #5037 and see if it resolves the issue? The quadratic behavior I think should be gone but there may be lurking inefficiences to still handle in the linear work to capture a backtrace so I think it'd be good to confirm one way or another.
Subscribe to Label Actioncc @peterhuene
This issue or pull request has been labeled: "wasmtime:api"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
Thanks for looking into this! Unfortunately, no difference. I tested it here: filecoin-project/ref-fvm@4242e72 (looks like I got the patches correct). I'll put a reproducer together by next week (have a few deadlines this week). |
Unfortunately we can't do debug_assert_eq!(needs_backtrace, trap.inner.backtrace.get().is_some()); because `needs_backtrace` doesn't consider whether Wasm backtraces have been disabled via config.
…user_trap` into one place
@Stebalien to be clear, enabling backtraces will always have some overhead that is proportional to how many frames are on the stack (capturing the stack is Are you seeing identical performance with this patch applied vs without? If so, how is the trap being triggered? Is it an (Reproducer would still be appreciated, once you have the cycles.) |
Within the margin of error, yes. Basically, this test is 300-400x slower with backtraces enabled.
The host is returning a
In other words:
And so on... Two steps backwards, one step forward, two steps backward, all the way to the top.
I'll also do a bit of profiling myself as well. |
To clarify, does this mean that for a stack of depth N, there are N traps? The "quadratic slowdown with backtraces" behavior is completely expected then, from first principles, I think: you're taking N traps, and they have cost 1, 2, 3, ... N frames/levels deep. I don't think there's anything we could do to make this not be quadratic, short of some sort of magical lazily-expanding constant-time stack capture (which we can't do because we don't keep the stack around after the trap fires). |
There's one trap at each level. If these stack traces cover the entire rust stack, not just the wasm module handling the trap, that would make sense. In that case, would the wasmtime team consider keeping the ability to disable backtraces? |
I think it would make sense to add an optional max number of frames captured in a backtrace, which can be set to zero to completely disable backtraces. Then we can limit the cost of a backtrace from O(frames) to some user-chosen acceptable constant overhead. Going to go ahead and merge this PR and file a new issue for that config option. I'll cc you on it, @Stebalien. |
|
This fixes some accidentally quadratic code where we would re-capture a Wasm
stack trace (takes
O(n)
time) every time we propagated a trap through a hostframe back to Wasm (can happen
O(n)
times). AndO(n) * O(n) = O(n^2)
, ofcourse. Whoops. After this commit, it trapping with a call stack that is
n
frames deep of Wasm-to-host-to-Wasm calls just captures a single backtrace and
is therefore just a proper
O(n)
time operation, as it is intended to be.Now we explicitly track whether we need to capture a Wasm backtrace or not when
raising a trap. This unfortunately isn't as straightforward as one might hope,
however, because of the split between
wasmtime::Trap
andwasmtime_runtime::Trap
. We need to decide whether or not to capture a Wasmbacktrace inside
wasmtime_runtime
but in order to determine whether to do thator not we need to reflect on the
anyhow::Error
and see if it is awasmtime::Trap
that already has a backtrace or not. This can't be done thestraightforward way because it would introduce a cyclic dependency between the
wasmtime
andwasmtime-runtime
crates. We can't merge those twoTrap
types-- at least not without effectively merging the whole
wasmtime
andwasmtime-runtime
crates together, which would be a good idea in a perfectworld but would be a ton of ocean boiling from where we currently are --
because
wasmtime::Trap
does symbolication of stack traces which relies onmodule registration information data that resides inside the
wasmtime
crateand therefore can't be moved into
wasmtime-runtime
. We resolve this problem byadding a boolean to
wasmtime_runtime::raise_user_trap
that controls whether weshould capture a Wasm backtrace or not, and then determine whether we need a
backtrace or not at each of that function's call sites, which are in
wasmtime
and therefore can do the reflection to determine whether the user trap already
has a backtrace or not. Phew!
Fixes #5037