Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More optimizations for calling into WebAssembly #2759

Merged
merged 4 commits into from
Mar 24, 2021

Conversation

alexcrichton
Copy link
Member

This continues #2757 a bit more by optimizing calls into WebAssembly. The first commit should affect all platforms, but is only a minor speedup. The second commit is a macOS-specific improvement but is a very large improvement for macOS. I believe after this all platforms should be again performing roughly on-par with one another.

This commit is an extension of bytecodealliance#2757 where the goal is to optimize entry
into WebAssembly. Currently wasmtime has two stack-based cleanups when
entering wasm, one for the externref activation table and another for
stack limits getting reset. This commit fuses these two cleanups
together into one and moves some code around which enables less captures
for fewer closures and such to speed up calls in to wasm a bit more.
Overall this drops the execution time from 88ns to 80ns locally for me.

This also updates the atomic orderings when updating the stack limit
from `SeqCst` to `Relaxed`. While `SeqCst` is a reasonable starting
point the usage here should be safe to use `Relaxed` since we're not
using the atomics to actually protect any memory, it's simply receiving
signals from other threads.
The macOS implementation of traps recently changed to using mach ports
for handlers instead of signal handlers. This means that a previously
relied upon invariant, each thread fixes its own trap, was broken. The
macOS implementation worked around this by maintaining a global map from
thread id to thread local information, however, to solve the problem.

This global map is quite slow though. It involves taking a lock and
updating a hash map on all calls into WebAssembly. In my local testing
this accounts for >70% of the overhead of calling into WebAssembly on
macOS. Naturally it'd be great to remove this!

This commit fixes this issue and removes the global lock/map that is
updated on all calls into WebAssembly. The fix is to maintain a global
map of wasm modules and their trap addresses in the `wasmtime` crate.
Doing so is relatively simple since we're already tracking this
information at the `Store` level.

Once we've got a global map then the macOS implementation can use this
from a foreign thread and everything works out.

Locally this brings the overhead, on macOS specifically, of calling into
wasm from 80ns to ~20ns.
@github-actions github-actions bot added the wasmtime:api Related to the API of the `wasmtime` crate itself label Mar 23, 2021
@github-actions
Copy link

Subscribe to Label Action

cc @peterhuene

This issue or pull request has been labeled: "wasmtime:api"

Thus the following users have been cc'd because of the following labels:

  • peterhuene: wasmtime:api

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

Copy link
Member

@bnjbvr bnjbvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice wins! I'm not very familiar with the first commit's code, but it looks sensible.

});

debug_assert!(!jmp_buf.is_null());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a check in the unix implementation that the jmp_buf could be set to 1 (meaning a custom signal handler has run and the trap handler should return early). Is it not necessary anymore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah I realized that it was no longer necessary since we don't run the custom trap handler on macOS, it's only run on Unix and Windows now.

crates/wasmtime/src/frame_info.rs Outdated Show resolved Hide resolved
crates/wasmtime/src/frame_info.rs Outdated Show resolved Hide resolved
crates/runtime/src/externref.rs Outdated Show resolved Hide resolved
crates/runtime/src/externref.rs Outdated Show resolved Hide resolved
@alexcrichton alexcrichton merged commit d4b54ee into bytecodealliance:main Mar 24, 2021
@alexcrichton alexcrichton deleted the fast-call2 branch March 24, 2021 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wasmtime:api Related to the API of the `wasmtime` crate itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants