Don't re-capture backtraces when propagating traps through host frames #5049

fitzgen · 2022-10-12T15:21:29Z

This fixes some accidentally quadratic code where we would re-capture a Wasm
stack trace (takes O(n) time) every time we propagated a trap through a host
frame back to Wasm (can happen O(n) times). And O(n) * O(n) = O(n^2), of
course. Whoops. After this commit, it trapping with a call stack that is n
frames deep of Wasm-to-host-to-Wasm calls just captures a single backtrace and
is therefore just a proper O(n) time operation, as it is intended to be.

Now we explicitly track whether we need to capture a Wasm backtrace or not when
raising a trap. This unfortunately isn't as straightforward as one might hope,
however, because of the split between wasmtime::Trap and
wasmtime_runtime::Trap. We need to decide whether or not to capture a Wasm
backtrace inside wasmtime_runtime but in order to determine whether to do that
or not we need to reflect on the anyhow::Error and see if it is a
wasmtime::Trap that already has a backtrace or not. This can't be done the
straightforward way because it would introduce a cyclic dependency between the
wasmtime and wasmtime-runtime crates. We can't merge those two Trap
types-- at least not without effectively merging the whole wasmtime and
wasmtime-runtime crates together, which would be a good idea in a perfect
world but would be a ton of ocean boiling from where we currently are --
because wasmtime::Trap does symbolication of stack traces which relies on
module registration information data that resides inside the wasmtime crate
and therefore can't be moved into wasmtime-runtime. We resolve this problem by
adding a boolean to wasmtime_runtime::raise_user_trap that controls whether we
should capture a Wasm backtrace or not, and then determine whether we need a
backtrace or not at each of that function's call sites, which are in wasmtime
and therefore can do the reflection to determine whether the user trap already
has a backtrace or not. Phew!

Fixes #5037

…he stack when we trap

This fixes some accidentally quadratic code where we would re-capture a Wasm stack trace (takes `O(n)` time) every time we propagated a trap through a host frame back to Wasm (can happen `O(n)` times). And `O(n) * O(n) = O(n^2)`, of course. Whoops. After this commit, it trapping with a call stack that is `n` frames deep of Wasm-to-host-to-Wasm calls just captures a single backtrace and is therefore just a proper `O(n)` time operation, as it is intended to be. Now we explicitly track whether we need to capture a Wasm backtrace or not when raising a trap. This unfortunately isn't as straightforward as one might hope, however, because of the split between `wasmtime::Trap` and `wasmtime_runtime::Trap`. We need to decide whether or not to capture a Wasm backtrace inside `wasmtime_runtime` but in order to determine whether to do that or not we need to reflect on the `anyhow::Error` and see if it is a `wasmtime::Trap` that already has a backtrace or not. This can't be done the straightforward way because it would introduce a cyclic dependency between the `wasmtime` and `wasmtime-runtime` crates. We can't merge those two `Trap` types-- at least not without effectively merging the whole `wasmtime` and `wasmtime-runtime` crates together, which would be a good idea in a perfect world but would be a *ton* of ocean boiling from where we currently are -- because `wasmtime::Trap` does symbolication of stack traces which relies on module registration information data that resides inside the `wasmtime` crate and therefore can't be moved into `wasmtime-runtime`. We resolve this problem by adding a boolean to `wasmtime_runtime::raise_user_trap` that controls whether we should capture a Wasm backtrace or not, and then determine whether we need a backtrace or not at each of that function's call sites, which are in `wasmtime` and therefore can do the reflection to determine whether the user trap already has a backtrace or not. Phew! Fixes bytecodealliance#5037

fitzgen · 2022-10-12T15:31:22Z

Difference in benchmark results after the third commit:

host-wasm-frames-traps/20
                        time:   [9.1573 µs 9.2173 µs 9.2943 µs]
                        thrpt:  [2.1519 Melem/s 2.1698 Melem/s 2.1840 Melem/s]
                 change:
                        time:   [-71.950% -71.263% -70.647%] (p = 0.00 < 0.05)
                        thrpt:  [+240.68% +247.98% +256.50%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) high mild
  6 (6.00%) high severe
host-wasm-frames-traps/40
                        time:   [17.594 µs 17.679 µs 17.777 µs]
                        thrpt:  [2.2500 Melem/s 2.2626 Melem/s 2.2735 Melem/s]
                 change:
                        time:   [-82.311% -82.060% -81.828%] (p = 0.00 < 0.05)
                        thrpt:  [+450.31% +457.42% +465.32%]
                        Performance has improved.
Found 21 outliers among 100 measurements (21.00%)
  12 (12.00%) low severe
  2 (2.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe
host-wasm-frames-traps/60
                        time:   [24.421 µs 24.825 µs 25.198 µs]
                        thrpt:  [2.3812 Melem/s 2.4169 Melem/s 2.4569 Melem/s]
                 change:
                        time:   [-87.209% -86.980% -86.694%] (p = 0.00 < 0.05)
                        thrpt:  [+651.54% +668.05% +681.82%]
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
host-wasm-frames-traps/80
                        time:   [30.406 µs 30.660 µs 31.073 µs]
                        thrpt:  [2.5746 Melem/s 2.6092 Melem/s 2.6311 Melem/s]
                 change:
                        time:   [-90.168% -90.023% -89.849%] (p = 0.00 < 0.05)
                        thrpt:  [+885.12% +902.27% +917.09%]
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
host-wasm-frames-traps/100
                        time:   [41.574 µs 41.822 µs 42.132 µs]
                        thrpt:  [2.3735 Melem/s 2.3911 Melem/s 2.4053 Melem/s]
                 change:
                        time:   [-91.218% -91.097% -90.974%] (p = 0.00 < 0.05)
                        thrpt:  [+1007.9% +1023.2% +1038.7%]
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  7 (7.00%) high mild
  11 (11.00%) high severe
host-wasm-frames-traps/120
                        time:   [50.096 µs 50.518 µs 51.109 µs]
                        thrpt:  [2.3479 Melem/s 2.3754 Melem/s 2.3954 Melem/s]
                 change:
                        time:   [-92.489% -92.372% -92.261%] (p = 0.00 < 0.05)
                        thrpt:  [+1192.2% +1211.0% +1231.3%]
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  9 (9.00%) high mild
  6 (6.00%) high severe
host-wasm-frames-traps/140
                        time:   [57.252 µs 57.426 µs 57.624 µs]
                        thrpt:  [2.4295 Melem/s 2.4379 Melem/s 2.4453 Melem/s]
                 change:
                        time:   [-93.581% -93.502% -93.423%] (p = 0.00 < 0.05)
                        thrpt:  [+1420.4% +1438.8% +1457.9%]
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  6 (6.00%) high mild
  7 (7.00%) high severe
host-wasm-frames-traps/160
                        time:   [68.391 µs 70.340 µs 72.333 µs]
                        thrpt:  [2.2120 Melem/s 2.2747 Melem/s 2.3395 Melem/s]
                 change:
                        time:   [-94.214% -94.064% -93.903%] (p = 0.00 < 0.05)
                        thrpt:  [+1540.2% +1584.6% +1628.4%]
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  6 (6.00%) high mild
  8 (8.00%) high severe
host-wasm-frames-traps/180
                        time:   [76.005 µs 77.580 µs 79.369 µs]
                        thrpt:  [2.2679 Melem/s 2.3202 Melem/s 2.3683 Melem/s]
                 change:
                        time:   [-95.074% -94.871% -94.668%] (p = 0.00 < 0.05)
                        thrpt:  [+1775.4% +1849.9% +1930.2%]
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  6 (6.00%) high mild
  5 (5.00%) high severe
host-wasm-frames-traps/200
                        time:   [90.825 µs 94.718 µs 98.745 µs]
                        thrpt:  [2.0254 Melem/s 2.1115 Melem/s 2.2020 Melem/s]
                 change:
                        time:   [-95.291% -95.072% -94.844%] (p = 0.00 < 0.05)
                        thrpt:  [+1839.6% +1929.0% +2023.4%]
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  6 (6.00%) high mild
  7 (7.00%) high severe

alexcrichton

Nice!

@Stebalien would you be up for testing this against the reproducer from #5037 and see if it resolves the issue? The quadratic behavior I think should be gone but there may be lurking inefficiences to still handle in the linear work to capture a backtrace so I think it'd be good to confirm one way or another.

crates/wasmtime/src/component/func/host.rs

crates/wasmtime/src/trap.rs

github-actions · 2022-10-12T16:12:30Z

Subscribe to Label Action

cc @peterhuene

This issue or pull request has been labeled: "wasmtime:api"

Thus the following users have been cc'd because of the following labels:

peterhuene: wasmtime:api

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

Stebalien · 2022-10-12T16:13:35Z

Thanks for looking into this! Unfortunately, no difference. I tested it here: filecoin-project/ref-fvm@4242e72 (looks like I got the patches correct).

I'll put a reproducer together by next week (have a few deadlines this week).

Unfortunately we can't do debug_assert_eq!(needs_backtrace, trap.inner.backtrace.get().is_some()); because `needs_backtrace` doesn't consider whether Wasm backtraces have been disabled via config.

…user_trap` into one place

fitzgen · 2022-10-12T18:55:32Z

@Stebalien to be clear, enabling backtraces will always have some overhead that is proportional to how many frames are on the stack (capturing the stack is O(frames)) so we shouldn't expect that performance is identical when backtraces are enabled and when they disabled. But comparing performance with this PR vs without this PR should ideally show a marked improvement in performance.

Are you seeing identical performance with this patch applied vs without?

If so, how is the trap being triggered? Is it an unreachable in Wasm, an out-of-bounds memory operation, a host function returning Err(wasmtime::Trap::new(...)), or something else?

(Reproducer would still be appreciated, once you have the cycles.)

Stebalien · 2022-10-12T19:29:59Z

Are you seeing identical performance with this patch applied vs without?

Within the margin of error, yes. Basically, this test is 300-400x slower with backtraces enabled.

Is it an unreachable in Wasm, an out-of-bounds memory operation, a host function returning Err(wasmtime::Trap::new(...)), or something else?

The host is returning a Trap from a syscall. Specifically:

Host returns a Trap.
Host handles the Trap on the other side of the func.call.
Host returns to the next wasm instance up the stack.
The next wasm instance calls back into the host (asking the host to "abort").
The host returns another trap, go back to step 2.

In other words:

Host -> Wasm -> Host -> Wasm ... (1000x) ... -> Host (traps) [Z]
Wasm [W] <- Host [X] <- (Wasm [Y] skipped) <- Host (traps) [Z].
Wasm [W] -> Host (traps) [X].
Wasm [U] <- Host [V] <- (Wasm [W] skipped) <- Host (traps) [X].

And so on... Two steps backwards, one step forward, two steps backward, all the way to the top.

(Reproducer would still be appreciated, once you have the cycles.)

I'll also do a bit of profiling myself as well.

cfallin · 2022-10-12T19:36:13Z

The host is returning a Trap from a syscall. Specifically:

Host returns a Trap.

Host handles the Trap on the other side of the func.call.

Host returns to the next wasm instance up the stack.

The next wasm instance calls back into the host (asking the host to "abort").

The host returns another trap, go back to step 2.

To clarify, does this mean that for a stack of depth N, there are N traps?

The "quadratic slowdown with backtraces" behavior is completely expected then, from first principles, I think: you're taking N traps, and they have cost 1, 2, 3, ... N frames/levels deep. I don't think there's anything we could do to make this not be quadratic, short of some sort of magical lazily-expanding constant-time stack capture (which we can't do because we don't keep the stack around after the trap fires).

Stebalien · 2022-10-12T20:30:47Z

There's one trap at each level. If these stack traces cover the entire rust stack, not just the wasm module handling the trap, that would make sense.

In that case, would the wasmtime team consider keeping the ability to disable backtraces?

fitzgen · 2022-10-13T14:20:04Z

I think it would make sense to add an optional max number of frames captured in a backtrace, which can be set to zero to completely disable backtraces. Then we can limit the cost of a backtrace from O(frames) to some user-chosen acceptable constant overhead.

Going to go ahead and merge this PR and file a new issue for that config option. I'll cc you on it, @Stebalien.

fitzgen · 2022-10-13T14:28:49Z

I think it would make sense to add an optional max number of frames captured in a backtrace, which can be set to zero to completely disable backtraces. Then we can limit the cost of a backtrace from O(frames) to some user-chosen acceptable constant overhead.

#5052

fitzgen added 3 commits October 12, 2022 07:44

Add a benchmark for traps with many Wasm<-->host calls on the stack

b11950d

Add a test for expected Wasm stack traces with Wasm<--host calls on t…

0ae8f95

…he stack when we trap

fitzgen requested a review from alexcrichton October 12, 2022 15:21

alexcrichton approved these changes Oct 12, 2022

View reviewed changes

crates/wasmtime/src/component/func/host.rs Outdated Show resolved Hide resolved

crates/wasmtime/src/trap.rs Show resolved Hide resolved

github-actions bot added the wasmtime:api Related to the API of the `wasmtime` crate itself label Oct 12, 2022

fitzgen added 3 commits October 12, 2022 11:15

debug assert that we don't record unnecessary backtraces for traps

43c505d

Add assertions around needs_backtrace

def3fd6

Unfortunately we can't do debug_assert_eq!(needs_backtrace, trap.inner.backtrace.get().is_some()); because `needs_backtrace` doesn't consider whether Wasm backtraces have been disabled via config.

Consolidate needs_backtrace calculation followed by calling `raise_…

d43a54f

…user_trap` into one place

fitzgen merged commit a2f846f into bytecodealliance:main Oct 13, 2022

fitzgen deleted the keep-existing-trap-backtraces branch October 13, 2022 14:22

fitzgen mentioned this pull request Oct 13, 2022

wasmtime: Add a config knob to control the maximum number of stack frames captured in a backtrace #5052

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't re-capture backtraces when propagating traps through host frames #5049

Don't re-capture backtraces when propagating traps through host frames #5049

fitzgen commented Oct 12, 2022

fitzgen commented Oct 12, 2022

alexcrichton left a comment

github-actions bot commented Oct 12, 2022

Stebalien commented Oct 12, 2022

fitzgen commented Oct 12, 2022

Stebalien commented Oct 12, 2022

cfallin commented Oct 12, 2022

Stebalien commented Oct 12, 2022

fitzgen commented Oct 13, 2022

fitzgen commented Oct 13, 2022

Don't re-capture backtraces when propagating traps through host frames #5049

Don't re-capture backtraces when propagating traps through host frames #5049

Conversation

fitzgen commented Oct 12, 2022

fitzgen commented Oct 12, 2022

alexcrichton left a comment

Choose a reason for hiding this comment

github-actions bot commented Oct 12, 2022

Subscribe to Label Action

Stebalien commented Oct 12, 2022

fitzgen commented Oct 12, 2022

Stebalien commented Oct 12, 2022

cfallin commented Oct 12, 2022

Stebalien commented Oct 12, 2022

fitzgen commented Oct 13, 2022

fitzgen commented Oct 13, 2022