Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fully support multiple returns in Wasmtime #2806

Merged
merged 16 commits into from
Apr 7, 2021

Conversation

alexcrichton
Copy link
Member

For quite some time now Wasmtime has "supported" multiple return values,
but only in the mose bare bones ways. Up until recently you couldn't get
a typed version of functions with multiple return values, and never have
you been able to use Func::wrap with functions that return multiple
values. Even recently where Func::typed can call functions that return
multiple values it uses a double-indirection by calling a trampoline
which calls the real function.

The underlying reason for this lack of support is that cranelift's ABI
for returning multiple values is not possible to write in Rust. For
example if a wasm function returns two i32 values there is no Rust (or
C!) function you can write to correspond to that. This commit, however
fixes that.

This commit adds two new ABIs to Cranelift: WasmtimeSystemV and
WasmtimeFastcall. The intention is that these Wasmtime-specific ABIs
match their corresponding ABI (e.g. SystemV or WindowsFastcall) for
everything except how multiple values are returned. For multiple
return values we simply define our own version of the ABI which Wasmtime
implements, which is that for N return values the first is returned as
if the function only returned that and the latter N-1 return values are
returned via an out-ptr that's the last parameter to the function.

These custom ABIs provides the ability for Wasmtime to bind these in
Rust meaning that Func::wrap can now wrap functions that return
multiple values and Func::typed no longer uses trampolines when
calling functions that return multiple values. Although there's lots of
internal changes there's no actual changes in the API surface area of
Wasmtime, just a few more impls of more public traits which means that
more types are supported in more places!

Another change made with this PR is a consolidation of how the ABI of
each function in a wasm module is selected. The native SystemV ABI,
for example, is more efficient at returning multiple values than the
wasmtime version of the ABI (since more things are in more registers).
To continue to take advantage of this Wasmtime will now classify some
functions in a wasm module with the "fast" ABI. Only functions that are
not reachable externally from the module are classified with the fast
ABI (e.g. those not exported, used in tables, or used with ref.func).
This should enable purely internal functions of modules to have a faster
calling convention than those which might be exposed to Wasmtime itself.

Closes #1178

For quite some time now Wasmtime has "supported" multiple return values,
but only in the mose bare bones ways. Up until recently you couldn't get
a typed version of functions with multiple return values, and never have
you been able to use `Func::wrap` with functions that return multiple
values. Even recently where `Func::typed` can call functions that return
multiple values it uses a double-indirection by calling a trampoline
which calls the real function.

The underlying reason for this lack of support is that cranelift's ABI
for returning multiple values is not possible to write in Rust. For
example if a wasm function returns two `i32` values there is no Rust (or
C!) function you can write to correspond to that. This commit, however
fixes that.

This commit adds two new ABIs to Cranelift: `WasmtimeSystemV` and
`WasmtimeFastcall`. The intention is that these Wasmtime-specific ABIs
match their corresponding ABI (e.g. `SystemV` or `WindowsFastcall`) for
everything *except* how multiple values are returned. For multiple
return values we simply define our own version of the ABI which Wasmtime
implements, which is that for N return values the first is returned as
if the function only returned that and the latter N-1 return values are
returned via an out-ptr that's the last parameter to the function.

These custom ABIs provides the ability for Wasmtime to bind these in
Rust meaning that `Func::wrap` can now wrap functions that return
multiple values and `Func::typed` no longer uses trampolines when
calling functions that return multiple values. Although there's lots of
internal changes there's no actual changes in the API surface area of
Wasmtime, just a few more impls of more public traits which means that
more types are supported in more places!

Another change made with this PR is a consolidation of how the ABI of
each function in a wasm module is selected. The native `SystemV` ABI,
for example, is more efficient at returning multiple values than the
wasmtime version of the ABI (since more things are in more registers).
To continue to take advantage of this Wasmtime will now classify some
functions in a wasm module with the "fast" ABI. Only functions that are
not reachable externally from the module are classified with the fast
ABI (e.g. those not exported, used in tables, or used with `ref.func`).
This should enable purely internal functions of modules to have a faster
calling convention than those which might be exposed to Wasmtime itself.

Closes bytecodealliance#1178
@sunfishcode
Copy link
Member

If the wasm C ABI adopts multiple return values as a way to return small-ish structs, and if a hypothetical future Rust supports this when returning repr(C) structs to extern "C" functions, then we could optimize this feature without user-visible API changes, right?

Ah, and as a minor bikeshed, rather than "WasmtimeSystemV" and "WasmtimeFastcall", what would you think of a more explicit name, like "SingleRetSystemV" and "SingleRetFastcall" or so, to help keep it clear what its purpose is?

@alexcrichton
Copy link
Member Author

Can you clarify what you mean about the wasm C ABI? The current proposed wasm-c-api wasm.h header file provides no way to get a natively-callable function pointer which exposes ABI details, that's all internal to the engine. Currently we have no support to do so in the wasmtime extensions either afaik.

For the naming bits, I would personally think that we should probably keep "wasmtime" in the name to give ourselves the liberty to easily change it in the future. For example baldrdash has a whole bunch of custom conventions and such throughout cranelift, and I could imagine us wanting to add something like that for wasmtime in the future to optimize various bits and pieces here and there.

@sunfishcode
Copy link
Member

Ah, I meant the toolchain C ABI used by clang, where we've talked about adding multi-value for small structs once it's sufficiently supported in engines.

If WasmtimeSystemV etc. are meant to adapt to other Wasmtime needs in the future, should we document that they aren't ABI-stable, similar to Fast and Cold?

@alexcrichton
Copy link
Member Author

Ah I see! I think, yes, we're able to optimize this in the future without any user-visible changes. We don't expose anything about the ABI details from the wasmtime crate today (intentionally). We're limited in that we have to be able to define all wasm signatures as mapping to a particular Rust signature (which we previously weren't able to do so), but beyond that we can do whatever we want to the internals of the ABI.

And sure, I can document that these are unstable ABIs.

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen cranelift:wasm lightbeam Issues related to the Lightbeam compiler wasmtime:api Related to the API of the `wasmtime` crate itself labels Apr 5, 2021
@github-actions
Copy link

github-actions bot commented Apr 5, 2021

Subscribe to Label Action

cc @peterhuene

This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "cranelift:area:machinst", "cranelift:area:x64", "cranelift:wasm", "lightbeam", "wasmtime:api"

Thus the following users have been cc'd because of the following labels:

  • peterhuene: wasmtime:api

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

This was overwriting upper bits when 32-bit registers were being stored
into return values, so fix the code inline to do a sized store instead
of one-size-fits-all store.
Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

I spent the most time looking over the Cranelift ABI changes; I didn't look as closely at the trampoline/function-registry/translation bits, but I can do a deeper review of those parts too if you'd like.

Once thing that would be useful to test more thoroughly, I think, is the robustness of calls between Wasmtime-ABI and Fast-ABI functions within a Wasm module. So far we have mostly exercised same-ABI-to-same-ABI calls; there are some interesting corner cases in the code to handle calling a function of a different ABI (mostly due to differing clobber sets actually, but in general it is a different case).

So given that, perhaps we could add some filetests with the two interesting combos (wasmtime -> fast, fast -> wasmtime) and varying numbers of args and rets, and just assert a golden compilation result? We could also add some run tests that pass multiple args/rets, with different alignment requirements, etc.

crates/lightbeam/wasmtime/src/lib.rs Outdated Show resolved Hide resolved
@alexcrichton
Copy link
Member Author

Ok I think I got everything working now! Since it was a very recent change, @cfallin can you confirm you saw 95f86be? That was necessary to fix some segfaults that were cropping up on windows since the 4-byte return values were getting stored as 8-byte results. The comment there makes me suspicious though...

I made a stab at getting this implemented in the old backend but I ended up giving up. The old backend implements multi-value returns very differently than the new backend. The old backend spills everything to the stack if it doesn't all fit in registers, which isn't the behavior that we want here (one in registers everything else on the stack). I couldn't figure out how to update the old backend to do this but I figured it wasn't really that necessary. I just removed some trait impls for the old backend in the wasmtime crate so wasmtime is still usable but won't have some abilities. (technically not a valid use of Cargo features but who's enabling the feature anyway?)

I'll tag @peterhuene for the review of the wasmtime bits, thanks for looking at the Cranelift bits @cfallin!

@alexcrichton alexcrichton requested a review from peterhuene April 6, 2021 20:25
@cfallin
Copy link
Member

cfallin commented Apr 6, 2021

Ok I think I got everything working now! Since it was a very recent change, @cfallin can you confirm you saw 95f86be? That was necessary to fix some segfaults that were cropping up on windows since the 4-byte return values were getting stored as 8-byte results. The comment there makes me suspicious though...

That was indeed a suspicious comment. I am not sure exactly what the original intent was but I double-checked just now that this store helper is not being used to codegen stores; those directly produce correctly-sized instructions. I suspect this originally came from spill/reload helpers in which case "always spill/reload the full reg" is a reasonable conservative approach but even then we have the invariant that upper bits (beyond a type's width) in a register are undefined and extended when needed by an op so this should be fine I think. Thanks for calling it out!

Copy link
Member

@peterhuene peterhuene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So these changes look great and I really like how TypedFunc works now.

Initially I had some concerns over emitting Windows unwind information for CallConv::Fast functions, but I can't think of an incompatibility that might ensue in the current form.

Windows unwind information should be able to represent the systemv prologues that CallConv::Fast uses and the epilogues are the same as Windows' calling convention (that's actually more important because epilogues on Windows, at least for x64, must be of a particular form).

I guess my remaining concern over the mixing of calling conventions is if CallConv::Fast changes in a subtle way that creates invalid Windows unwind information, thereby making it such that we can't unwind through that frame properly.

Perhaps we should add a Windows test where we trap with several "internal" CallConv::Fast frames on the trace and we can properly get the trap's backtrace?

@alexcrichton
Copy link
Member Author

I was a little worried about the unwind info myself (and kept switching it back and forth from Fast to the default to see if it ever fixed failing tests, it never did). In the traps tests though we do have a number of tests already with "internal" wasm function frames on the stack (such as this one) so I think we're at least covering the basics of representing unwind information for varying abis. I'm not sure if this is guaranteed to work in the limit, though.

@cfallin also were you thinking of more than the tests I added in 5962d4c? I tried to break things in various ways but I think it all seems to be working pretty well so far at least.

@cfallin
Copy link
Member

cfallin commented Apr 7, 2021

@cfallin also were you thinking of more than the tests I added in 5962d4c? I tried to break things in various ways but I think it all seems to be working pretty well so far at least.

@alexcrichton That's very thorough, thanks!

I may be missing something but I don't see any wasmtime_system_v or wasmtime_fastcall signatures -- my main concern was the interleaving of those with externally-usable ABIs. I think it may be enough to just do a wasmtime_system_v + system_v set of tests instead of the SysV/Fastcall combos you have in that file now. Thanks!

@alexcrichton
Copy link
Member Author

Ok I've managed to get all tests green and I think I've got all the various tests here in place. It's of course always easy though to use the same abi everywhere if any issues arise, but I'm gonna go ahead and merge this. If there are more tests desired though I'm happy to add some!

@alexcrichton alexcrichton merged commit 195bf0e into bytecodealliance:main Apr 7, 2021
@alexcrichton alexcrichton deleted the multi-return branch April 7, 2021 17:34
@bnjbvr
Copy link
Member

bnjbvr commented Apr 15, 2021

Just a note for future reference that this commit has also caused another regression on our side, causing stack overflows on Windows in async tests. I unfortunately don't have more to share, because debugging doesn't work reliably: the called wasm function panics (which is the subject of the test, so it's supposed to do so). When running in the VSCode debugger I hit SIGILL and then the program aborts, while running outside of a debugger there's a stack overflow somewhere. I haven't managed to make an isolated test case either, since I don't have any clues about what's causing the failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:x64 Issues related to x64 codegen cranelift:wasm cranelift Issues related to the Cranelift code generator lightbeam Issues related to the Lightbeam compiler wasmtime:api Related to the API of the `wasmtime` crate itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable Rust/C functions to "hook up" to multi-value functions
6 participants