Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to build a Wasm component from the Wasm core module with "non-component" imports #334

Closed
greenhat opened this issue Sep 25, 2024 · 8 comments · Fixed by #329
Closed
Labels
blocker This issue is one of our top priorities

Comments

@greenhat
Copy link
Contributor

TLDR; Using core Wasm module imports for our intrinsics and Miden SDK functions is not playing well with the Wasm component model.

What

When building a Wasm component from a Wasm core module compiled from the basic-wallet with Rust SDK example, the build fails with the following error:

Caused by:
    0: module was not valid
    1: module requires an import interface named `miden::account`

Why

The reason for this error is that the Wasm core module contains imports that wit-component(used by cargo-component to build the Wasm component) fails to recognize as component imports, i.e. imports that are defined in the WIT file. In this case, it's miden::account::add_asset tx kernel function from the Miden SDK. Effectively, all our intrinsics and Miden SDK functions are triggering this error.

This happens at https://github.com/bytecodealliance/wasm-tools/blob/51ade657f6d6874b1bdfb87a47299dbe7e61d5c2/crates/wit-component/src/validation.rs#L411-L491, where the module name of the core module import is expected to be found in the interface imports in the WIT file.

How to reproduce

In #329 branch run:

RUST_LOG=debug cargo test rust_sdk_basic

To check out the Wasm core module, use wasm-tools print PATH_TO_CORE_WASM with the path reported in logs at componentizing WebAssembly module ...

How to fix

The most promising approach IMO would be to not use Wasm core module imports for our intrinsics and Miden SDK functions. The reason why wit-component treats them as errors is fair, and I think it'd be difficult if at all possible to alter that behavior in the upstream.

So instead of defining our Miden SDK functions and intrinsics as extern

#[link(wasm_import_module = "miden::account")]
extern "C" {
#[link_name = "get_id<0x0000000000000000000000000000000000000000000000000000000000000000>"]
pub fn extern_account_get_id() -> AccountId;
#[link_name = "add_asset<0x0000000000000000000000000000000000000000000000000000000000000000>"]
pub fn extern_account_add_asset(_: Felt, _: Felt, _: Felt, _: Felt, ptr: *mut CoreAsset);
#[link_name = "remove_asset<0x0000000000000000000000000000000000000000000000000000000000000000>"]
pub fn extern_account_remove_asset(_: Felt, _: Felt, _: Felt, _: Felt, ptr: *mut CoreAsset);
}

we could define them as functions with unreachable!() bodies and recognize them by function names and signatures in the frontend:

#[inline(never)]
#[no_mangle]
fn miden_account_add_asset(_: Felt, _: Felt, _: Felt, _: Felt, ptr: *mut CoreAsset) {
    unreachable!()
}

Although, #[inline(never)] is only a hint to the compiler, with a side effect in the body we should be safe from inlining.

@greenhat greenhat added the blocker This issue is one of our top priorities label Sep 25, 2024
@greenhat
Copy link
Contributor Author

@bitwalker That's a blocker for me. Please check it out ASAP.

@bitwalker
Copy link
Contributor

I'm not sure I understand why non-component imports are being treated as component imports here, though I haven't dug into it just yet. The entire premise of not making much/all of the Miden SDK a component, was so that the code from the SDK would be considered part of the component being defined (i.e. the one importing the SDK as a utility), or part of the host at the very least. It sounds like wit-component is just treating anything that is an unresolved import as a component import, is that right?

If so, one option would be to provide to wit-component the set of components that are available, and treat anything that isn't an import from one of those components as a host function. AFAIK there isn't a requirement that host functions be represented as components (which wouldn't make much sense anyway), so there must be some way to represent such things.

we could define them as functions with unreachable!() bodies and recognize them by function names and signatures in the frontend:

#[inline(never)]
#[no_mangle]
fn miden_account_add_asset(_: Felt, _: Felt, _: Felt, _: Felt, ptr: *mut CoreAsset) {
   unreachable!()
}

Although, #[inline(never)] is only a hint to the compiler, with a side effect in the body we should be safe from inlining.

We can't rely on #[inline(never)] to prevent inlining, particularly when the function body is so simple, LLVM simply makes no guarantees about what it will do with those attributes. We could check in LLVM to see what it currently does, and maybe that would be sufficient. It would probably be good to do some tests to see what happens with real programs too, but boy would relying on that be fragile to compiler changes in both rustc and LLVM. Not to mention other potential tools in the toolchain like Binaryen that are used to post-process Wasm modules in order to optimize them further and/or trim out "fat" from the modules produced by other compilers. If we absolutely had to do this, it might be better to do something like:

#[inline(never)]
#[no_mangle]
pub extern "C-unwind" fn foo(...) {
    core::hint::black_box(unreachable!())
}

This ensures that callers treat calls to foo according to the C ABI, and that the body of the function is maximally pessimized from Rust's (and maybe LLVM's) perspective. Still, the trick boils down to what happens when compiling real projects that use this structure.

Even with those tweaks, I think we are still forced to assume that nothing is guaranteed with regards to inlining. The major problem with having no guarantees of course, is that even if we do know when it happens (and it isn't happening on some code path that is rarely called, and thus not encountered until an inopportune time), there isn't anything we can actually do to prevent it. I really don't want that kind of uncertainty.

So the way I see it is we have few possible options:

  1. Figure out how we're supposed to represent host functions to wit-component, and work within those constraints if possible. It must be the case that such functions can do things "component functions" cannot, otherwise there would be extremely onerous limits on what you could do in the real world with the component model. For example, it must be the case that you can pass a Wasm address to a host function, otherwise there would be no way to ask the host to perform some operation on data at that address. IMO virtually all of the use cases we want non-component imports for can be considered some variant of this.
  2. Constrain the workings of wit-component such that our expected use case can be supported. Necessarily this involves us maintaining our own fork, which is not ideal. In this scenario, we do something like I mentioned up at the top, by requiring wit-component to treat imports that can't be resolved to a known component, as host functions, not component imports.
  3. Figure out how to represent the SDK as a "real" component. This would require us to re-evaluate, amongst other things, how to distinguish functions that should be call'd vs exec'd. The bigger problem though would simply be that the component model ABI does not permit things like pointers, which would be a real issue with some of the low-level SDK functions.

We should probably have a call to discuss this tomorrow, ping me via IM if you have a particular time that works best, but I'll assume roughly sometime around our usual compiler standup time. We can try and hash through the gritty details together.

@bitwalker
Copy link
Contributor

@greenhat Sorry, I just realized I forgot to check, but are you building the Rust crate with --target wasm32-wasip2? AFAIK rustc has native support for emitting components now, using that target. I'll have to confirm that fact, but that is why it exists at the moment. By relying on rustc to emit the component, we may sidestep the problem here.

I suppose the problem might be that we need WIT in the loop, and AFAIK there is no support for expressing component definitions via rustc natively yet. Presumably it makes its own choices there, but I haven't played around with it to find out.

If I can find time tonight, I'll try to do some experimenting with all of this.

@greenhat
Copy link
Contributor Author

I'm not sure I understand why non-component imports are being treated as component imports here, though I haven't dug into it just yet. The entire premise of not making much/all of the Miden SDK a component, was so that the code from the SDK would be considered part of the component being defined (i.e. the one importing the SDK as a utility), or part of the host at the very least. It sounds like wit-component is just treating anything that is an unresolved import as a component import, is that right?

Yes, that's my understanding. All unresolved imports are treated as component imports.

If so, one option would be to provide to wit-component the set of components that are available, and treat anything that isn't an import from one of those components as a host function. AFAIK there isn't a requirement that host functions be represented as components (which wouldn't make much sense anyway), so there must be some way to represent such things.

I'll keep digging into wit-component and see if I can find a way to do this.

We can't rely on #[inline(never)] to prevent inlining, particularly when the function body is so simple, LLVM simply makes no guarantees about what it will do with those attributes. We could check in LLVM to see what it currently does, and maybe that would be sufficient. It would probably be good to do some tests to see what happens with real programs too, but boy would relying on that be fragile to compiler changes in both rustc and LLVM. Not to mention other potential tools in the toolchain like Binaryen that are used to post-process Wasm modules in order to optimize them further and/or trim out "fat" from the modules produced by other compilers. If we absolutely had to do this, it might be better to do something like:

#[inline(never)]
#[no_mangle]
pub extern "C-unwind" fn foo(...) {
    core::hint::black_box(unreachable!())
}

This ensures that callers treat calls to foo according to the C ABI, and that the body of the function is maximally pessimized from Rust's (and maybe LLVM's) perspective. Still, the trick boils down to what happens when compiling real projects that use this structure.

Even with those tweaks, I think we are still forced to assume that nothing is guaranteed with regards to inlining. The major problem with having no guarantees of course, is that even if we do know when it happens (and it isn't happening on some code path that is rarely called, and thus not encountered until an inopportune time), there isn't anything we can actually do to prevent it. I really don't want that kind of uncertainty.

I agree. It's fragile and ugly. I've given up on imports too early. I'll keep digging.

So the way I see it is we have few possible options:

  1. Figure out how we're supposed to represent host functions to wit-component, and work within those constraints if possible. It must be the case that such functions can do things "component functions" cannot, otherwise there would be extremely onerous limits on what you could do in the real world with the component model. For example, it must be the case that you can pass a Wasm address to a host function, otherwise there would be no way to ask the host to perform some operation on data at that address. IMO virtually all of the use cases we want non-component imports for can be considered some variant of this.
  2. Constrain the workings of wit-component such that our expected use case can be supported. Necessarily this involves us maintaining our own fork, which is not ideal. In this scenario, we do something like I mentioned up at the top, by requiring wit-component to treat imports that can't be resolved to a known component, as host functions, not component imports.
  3. Figure out how to represent the SDK as a "real" component. This would require us to re-evaluate, amongst other things, how to distinguish functions that should be call'd vs exec'd. The bigger problem though would simply be that the component model ABI does not permit things like pointers, which would be a real issue with some of the low-level SDK functions.

I thought about the third option, and it's my least favorite. The second option is feasible, but would require an effort to maintain a fork of wit-component which is being actively developed.

We should probably have a call to discuss this tomorrow, ping me via IM if you have a particular time that works best, but I'll assume roughly sometime around our usual compiler standup time. We can try and hash through the gritty details together.

Sure, the usual time works for me.

@greenhat
Copy link
Contributor Author

@greenhat Sorry, I just realized I forgot to check, but are you building the Rust crate with --target wasm32-wasip2? AFAIK rustc has native support for emitting components now, using that target. I'll have to confirm that fact, but that is why it exists at the moment. By relying on rustc to emit the component, we may sidestep the problem here.

I suppose the problem might be that we need WIT in the loop, and AFAIK there is no support for expressing component definitions via rustc natively yet. Presumably it makes its own choices there, but I haven't played around with it to find out.

No, I'm using wasm32-wasip1. I'll try wasip2.

@greenhat
Copy link
Contributor Author

@bitwalker Be aware, that the relevant parts of wit-component were re-written a few days ago. cargo-component is not using them yet. Actually, I want to try them out in cargo-component.

@bitwalker
Copy link
Contributor

bitwalker commented Sep 26, 2024

I did some experimenting, and it seems that, at least at the moment, the following code does not inline the call to foo, while also not resulting in any optimization across the call boundary:

#[no_mangle]
pub fn run(num: i32) -> i32 {
    foo(core::ptr::null(), num as usize);
    1
}

#[inline(never)]
#[no_mangle]
pub extern "C-unwind" fn foo(arg1: *const (), len: usize) {
    core::hint::black_box(());
}

The resulting code seems to be compiled as if foo is well-behaved, foo is kept in the output, and the call to foo is not inlined, even at the highest optimization levels:

run:
        push    rax
        call    qword ptr [rip + foo@GOTPCREL]
        mov     eax, 1
        pop     rcx
        ret

foo:
        ret

However, there is a caveat, which I'll get to in a moment. Approaches that just don't work correctly at all:

  • if we remove core::hint::black_box, the call is inlined, and the caller is optimized based on what's in the body of foo
  • If we keep the black box, but replace () with unreachable!(), foo gets marked noreturn in LLVM, and the caller is optimized to assume that the call to foo never returns (because it aborts). This breaks the code in the caller.
  • If we replace unreachable!() with unsafe { core::hint::unreachable_unchecked() }, foo gets marked noreturn, but the call to foo gets erased entirely at any optimization level > 0, as calling foo has undefined behavior, so LLVM is allowed to elide the call entirely since it doesn't do anything.

So the bottom line appears to be that LLVM will still optimize based on the definition, even if it is not inlined. Inlining would result in some pretty aggressive optimizations, but as we can see above, even without inlining, some attributes of the function can still result in radical assumptions about calls to it. In particular:

  • Rust/LLVM will just straight up ignore #[inline(never)] without core::hint::black_box present. However, even when it is present, it is not a perfect black box, as things like panics/aborts/unreachable code will still be detected via static analysis, and cause the function to be marked as noreturn even though it should ostensibly be a black box.
  • With the black box present, LLVM might not inline the function body, but when the body of the function implies static attributes about the function itself, that information will be made available to callers, and used to optimize call sites. Thus any panicking/unreachable code in the stub will cause LLVM to treat the call itself as a panic/unreachable.
  • An empty function body with core::hint::black_box(()) seems to be correct at first glance, but because the body does not do anything, the LLVM IR of the function is marked memory(none), i.e. it is implied to have no memory effects (no reads, no writes). This means that the LLVM optimizer is free to treat it as a pure function, reordering it, eliding calls and reusing results of a previous call if the parameters are the same, etc.

So IMO, it is just simply not an option to define stubs in Rust that we then replace. The program will be optimized around the stub implementation, regardless of what we do, and this will result in unsoundness if we then swap out that implementation with a new one that has different behavior. We might be able to come up with a stub that is sufficiently pessimized that it works, but slight changes to how Rust or LLVM compiles the code could cause it to break at any time.

As discussed on our call, coming at this from the angle of finding a way to represent these functions in a component interface seems like our best bet for now, these findings just solidify that as our path forward IMO. I'll follow up if I come up with any other options though.

@greenhat
Copy link
Contributor Author

@bitwalker That's a comprehensive answer! I agree, stubs are a dead end.

I dug into the Wasm CM repo and found a discussion which addresses our case of a Wasm component with unsatisfied imports in the core module - WebAssembly/component-model#275
Our question was raised in this comment and the answer is that if such component is built, it'd be outside the whole Wasm CM ecosystem. Which concurs my feelings when I bent wit-component and built such a component only to discover that wasmparser deems it invalid.
BTW, native Wasm CM support in browsers is at least a few years away, according to this comment. Meanwhile, it seems jco is the answer (transpiling the Wasm components to a core Wasm module + JS bindings).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker This issue is one of our top priorities
Projects
None yet
2 participants