Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a first-class way of accessing caller's memory #1237

Closed

Conversation

alexcrichton
Copy link
Member

This commit adds a first-class way to access a caller's memory when
defining a Func implemented by the host. This is a very common usage
scenario where the wasm module calls a host function with pointers to
its own wasm memory, and the host function needs to read that
information.

Currently, however, it's pretty roundabout to implement this
functionality in wasm. Typically you'll have to set up an
Rc<RefCell<Option<Memory>>>, close over that in your function imports,
and then fill it in once the instance is created. The goal of this PR is
to make it more ergonomic to implement this.

Note that eventually this is intended to be supplanted by interface
types which automatically and safely convert from wasm memory to host
objects. Interface types though is still aways out and it seemed like it
would be prudent to have this functionality for users today.

Currently the shape of this functionality is to allow adding an argument
to Func::wrap* closures of Option<Memory>. If available this option
will be filled in with Some of the caller's memory. This ends up
being None in a few places:

  • The calling module may not actually have a memory
  • There may not be a calling module in the case that you directly call
    Func::call or similar.

Note that this is also at least violating the spirit of the JS API,
since there's no equivalent to this functionality there whatsoever.
Additionally this breaks concepts of encapsulation because it means your
memory can be accessed regardless of whether it's exported or not. To
make matters even worse it has no way to cope with multi-memory, it
simply returns None if there's more than one memory defined.

Overall this is intended to be a convenience or an escape hatch. As a
host you sort of get the power to do whatever you want with the module.
Empirically there's at least one major use case of this today, WASI.
There's also been mentions on Zulip of other needs for something like
this as well. This feels like it's basically useful enough that we
should provide it today, with an eye towards deprecating it once
interface types are implemented.

This commit adds a first-class way to access a caller's memory when
defining a `Func` implemented by the host. This is a very common usage
scenario where the wasm module calls a host function with pointers to
its own wasm memory, and the host function needs to read that
information.

Currently, however, it's pretty roundabout to implement this
functionality in wasm. Typically you'll have to set up an
`Rc<RefCell<Option<Memory>>>`, close over that in your function imports,
and then fill it in once the instance is created. The goal of this PR is
to make it more ergonomic to implement this.

Note that eventually this is intended to be supplanted by interface
types which automatically and safely convert from wasm memory to host
objects. Interface types though is still aways out and it seemed like it
would be prudent to have this functionality for users today.

Currently the shape of this functionality is to allow adding an argument
to `Func::wrap*` closures of `Option<Memory>`. If available this option
will be filled in with `Some` of the *caller's* memory. This ends up
being `None` in a few places:

* The calling module may not actually have a memory
* There may not be a calling module in the case that you directly call
  `Func::call` or similar.

Note that this is also at least violating the spirit of the JS API,
since there's no equivalent to this functionality there whatsoever.
Additionally this breaks concepts of encapsulation because it means your
memory can be accessed regardless of whether it's exported or not. To
make matters even worse it has no way to cope with multi-memory, it
simply returns `None` if there's more than one memory defined.

Overall this is intended to be *a convenience* or an escape hatch. As a
host you sort of get the power to do whatever you want with the module.
Empirically there's at least one major use case of this today, WASI.
There's also been mentions on Zulip of other needs for something like
this as well. This feels like it's basically useful enough that we
should provide it today, with an eye towards deprecating it once
interface types are implemented.
@sunfishcode
Copy link
Member

The part about this that makes me nervous is being able to access a module's memory without it being exported.

Obviously, requiring modules to export their memory is unfortunate, and the simple name "memory" isn't ideal, and we should do something better, but for now, it does help to model what's going on. You're calling out with i32 values which are pointers to a memory accessed via the "memory" export. We don't want Wasmtime or any tools assuming that a non-exported memory isn't mutated by anything outside the module. The export also helps in the case of multiple memories, because you can pick which one to export as "memory".

Would it still achieve your goal here if you reinstated the lookup("memory") logic in from_abi, keeping the rest of the feature?

@lostman
Copy link
Contributor

lostman commented Mar 10, 2020

Typically you'll have to set up an Rc<RefCell<Option<Memory>>>, close over that in your function imports, and then fill it in once the instance is created.

When an instance is created it automatically runs the (start) function. If the Rc<...> was the only way to access memory then it wouldn't be possible to use a memory-accessing host call inside (start).

Is that the intended behavior?

Alternatively, if one could make an instance, tie the knot, and only then explicitly run (start), the Rc<...> approach would work.

How would interface types solve this problem?

@alexcrichton
Copy link
Member Author

@sunfishcode yeah we could definitely go that route, only giving access to exported memories. I would prefer to not hardcode the string "memory" , though, so we could switch to something like:

Func::wrap1(&store, |mem: wasmtime::Caller| {
    let mem: Option<&Memory> = mem.get_export("memory").and_then(|e| e.memory());
    // ...
});

That way the string "memory" would still be passed in by the caller, and we could enhance CallerMemory over time with other stuff if we really wanted. For now it'd just be a thin wrapper around *mut VMContext.

@lostman yes the current way to implement this today with Rc doesn't support calling imports in the start function, but this is basically the same limitation of the JS API and is one of the known caveats of the start function.

Interface types would solve this issue because callers would simply say "I need a string", and it's the job of wasmtime to figure out how to give you that string, not the caller to figure out how to read the string out of linear memory.

@sunfishcode
Copy link
Member

Passing in the "memory" string sounds good. Maybe we could also add a pub const WASM_MEMORY_EXPORT_NAME = "memory" or some such thing that we could recommend people use with this API, so that when we do change it, we have more options for helping them avoid silent breakage.

@lostman
Copy link
Contributor

lostman commented Mar 11, 2020

What about multiple memories? Is that anywhere on the road map? If it is, any API that relies on specific names will lead to trouble.

The current situation is already a bit messy. What if a module exports memory but calls it "mem"? I recall compiling C++ to Wasm while back and there was no option to chose the name. Not sure what the current state of Rust toolchain is either.

@alexcrichton
Copy link
Member Author

@lostman with @sunfishcode's suggestion nothing will be hardcoded here any more, so we'll have an easy way to specify which memory is being accessed (via an exported name). I'm working on an update to the various traits here to support all this.

@sunfishcode
Copy link
Member

As additional background, the "memory" export name is currently hard-wired into lld, and it's become a de-facto convention that engines assume now. This is obviously not ideal, but I am optimistic that we'll have chances to move away from programs exporting their entire memory to the world, which will be opportunities to fix this.

@alexcrichton
Copy link
Member Author

I'm gonna end up taking a pretty different approach to doing this, so I'm going to close this.

alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Mar 11, 2020
This commit is a continuation of bytecodealliance#1237 and updates the API of `Func` to
allow defining host functions which have easy access to a caller's
memory in particular. The new APIs look like so:

* The `Func::wrap*` family of functions was condensed into one
  `Func::wrap` function.
* The ABI layer of conversions in `WasmTy` were removed
* An optional `Caller<'_>` argument can be at the front of all
  host-defined functions now.

The old way the wasi bindings looked up memory has been removed and is
now replaced with the `Caller` type. The `Caller` type has a
`get_export` method on it which allows looking up a caller's export by
name, allowing you to get access to the caller's memory easily, and even
during instantiation.
@alexcrichton
Copy link
Member Author

Ok I've followed up with #1290 which I think is a better way to tackle this, thanks for the suggestion @sunfishcode!

@alexcrichton alexcrichton deleted the memory-arg branch March 11, 2020 19:14
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Mar 12, 2020
This commit is a continuation of bytecodealliance#1237 and updates the API of `Func` to
allow defining host functions which have easy access to a caller's
memory in particular. The new APIs look like so:

* The `Func::wrap*` family of functions was condensed into one
  `Func::wrap` function.
* The ABI layer of conversions in `WasmTy` were removed
* An optional `Caller<'_>` argument can be at the front of all
  host-defined functions now.

The old way the wasi bindings looked up memory has been removed and is
now replaced with the `Caller` type. The `Caller` type has a
`get_export` method on it which allows looking up a caller's export by
name, allowing you to get access to the caller's memory easily, and even
during instantiation.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Mar 12, 2020
This commit is a continuation of bytecodealliance#1237 and updates the API of `Func` to
allow defining host functions which have easy access to a caller's
memory in particular. The new APIs look like so:

* The `Func::wrap*` family of functions was condensed into one
  `Func::wrap` function.
* The ABI layer of conversions in `WasmTy` were removed
* An optional `Caller<'_>` argument can be at the front of all
  host-defined functions now.

The old way the wasi bindings looked up memory has been removed and is
now replaced with the `Caller` type. The `Caller` type has a
`get_export` method on it which allows looking up a caller's export by
name, allowing you to get access to the caller's memory easily, and even
during instantiation.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Mar 18, 2020
This commit is a continuation of bytecodealliance#1237 and updates the API of `Func` to
allow defining host functions which have easy access to a caller's
memory in particular. The new APIs look like so:

* The `Func::wrap*` family of functions was condensed into one
  `Func::wrap` function.
* The ABI layer of conversions in `WasmTy` were removed
* An optional `Caller<'_>` argument can be at the front of all
  host-defined functions now.

The old way the wasi bindings looked up memory has been removed and is
now replaced with the `Caller` type. The `Caller` type has a
`get_export` method on it which allows looking up a caller's export by
name, allowing you to get access to the caller's memory easily, and even
during instantiation.
alexcrichton added a commit that referenced this pull request Mar 18, 2020
* Add a first-class way of accessing caller's exports

This commit is a continuation of #1237 and updates the API of `Func` to
allow defining host functions which have easy access to a caller's
memory in particular. The new APIs look like so:

* The `Func::wrap*` family of functions was condensed into one
  `Func::wrap` function.
* The ABI layer of conversions in `WasmTy` were removed
* An optional `Caller<'_>` argument can be at the front of all
  host-defined functions now.

The old way the wasi bindings looked up memory has been removed and is
now replaced with the `Caller` type. The `Caller` type has a
`get_export` method on it which allows looking up a caller's export by
name, allowing you to get access to the caller's memory easily, and even
during instantiation.

* Add a temporary note

* Move some docs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants