Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate modules while translating #2059

Merged
merged 2 commits into from
Oct 5, 2020

Conversation

alexcrichton
Copy link
Member

@alexcrichton alexcrichton commented Jul 21, 2020

This commit is a change to cranelift-wasm to validate each function body
as it is translated. Additionally top-level module translation functions
will perform module validation. This commit builds on changes in
wasmparser to perform module validation interwtwined with parsing and
translation. This will be necessary for future wasm features such as
module linking where the type behind a function index, for example, can
be far away in another module. Additionally this also brings a nice
benefit where parsing the binary only happens once (instead of having an
up-front serial validation step) and validation can happen in parallel
for each function.

Most of the changes in this commit are plumbing to make sure everything
lines up right. The major functional change here is that module
compilation should be faster by validating in parallel (or skipping
function validation entirely in the case of a cache hit). Otherwise from
a user-facing perspective nothing should be that different.

This commit does mean that cranelift's translation now inherently
validates the input wasm module. This means that the Spidermonkey
integration of cranelift-wasm will also be validating the function as
it's being translated with cranelift. The associated PR for wasmparser
(bytecodealliance/wasm-tools#62) provides the necessary tools to create
a FuncValidator for Gecko, but this is something I'll want careful
review for before landing!

@alexcrichton alexcrichton marked this pull request as draft July 21, 2020 22:44
@alexcrichton
Copy link
Member Author

I've marked this as a draft while the wasmparser changes have yet to be released.

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:wasm lightbeam Issues related to the Lightbeam compiler wasmtime:api Related to the API of the `wasmtime` crate itself labels Jul 21, 2020
@github-actions
Copy link

Subscribe to Label Action

cc @bnjbvr, @peterhuene

This issue or pull request has been labeled: "cranelift", "cranelift:wasm", "lightbeam", "wasmtime:api"

Thus the following users have been cc'd because of the following labels:

  • bnjbvr: cranelift
  • peterhuene: wasmtime:api

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

Copy link
Member

@bnjbvr bnjbvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\o/ I'll be quite happy to try this PR in Spidermonkey, before it lands, to give a confirmation this integrates nicely and it passes our extensive test suite :-)

cranelift/wasm/src/module_translator.rs Show resolved Hide resolved
@alexcrichton alexcrichton force-pushed the late-validate branch 3 times, most recently from 55489f0 to a729479 Compare July 22, 2020 16:23
@alexcrichton alexcrichton marked this pull request as ready for review July 22, 2020 16:24
@alexcrichton
Copy link
Member Author

Ok the wasmparser bits are published and this should be good to go!

@github-actions github-actions bot added the fuzzing Issues related to our fuzzing infrastructure label Jul 22, 2020
@github-actions
Copy link

Subscribe to Label Action

cc @fitzgen

This issue or pull request has been labeled: "fuzzing"

Thus the following users have been cc'd because of the following labels:

  • fitzgen: fuzzing

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

Copy link
Contributor

@yurydelendik yurydelendik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Looks good on API and wasmparser integration side.

@@ -53,14 +53,14 @@ use wasmparser::{MemoryImmediate, Operator};
/// Translates wasm operators into Cranelift IR instructions. Returns `true` if it inserted
/// a return.
pub fn translate_operator<FE: FuncEnvironment + ?Sized>(
module_translation_state: &ModuleTranslationState,
validator: &mut FuncValidator<impl WasmModuleResources>,
Copy link
Member

@bnjbvr bnjbvr Jul 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One early comment (before i try this tomorrow): can you please expose wasmparser::FuncValidator publicly, so that embedders aren't required to add wasmparser to their own dependencies? (and if embedders are to implement WasmModuleResources, this one as well maybe?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing! I've pushed up a comment which reexports wasmparser as a crate since there's a number of types/etc you'll likely want to pull from there.

@bnjbvr
Copy link
Member

bnjbvr commented Jul 24, 2020

The bits to implement don't seem terribly hard, we already required most of this information anyways, so it was around. It's nice to have a proper API to make sense of it!

One request, though: in wasmparser, most of the trait functions return a &Self::Type (or &Self::FuncType, etc.), which makes it a bit hard to implement in some places. For instance, our Type representation is a plain packed u32 at the moment: the ret type being a borrow makes it hard to return a plain primitive value. Could it be a Cow<Self::Type> instead? (Unless there's a better API for this of course; my Rust knowledge is reaching its boundaries here.)

Because of the above issue, I haven't gone through implementing it entirely in Spidermonkey. Since I wouldn't want to block this from landing, would it make sense to allow both APIs right now, mark the former one as deprecated, and allow for implementing the new one for clients who want it?

@bnjbvr
Copy link
Member

bnjbvr commented Jul 24, 2020

(I'm being told Rust nightly has generic associated types, which would allow putting a lifetime type parameter in the Type trait declaration, making it possible to return something that depends on &'self or not; this is not a viable possibility at the moment, though.)

@alexcrichton
Copy link
Member Author

After some discussion on Zulip, I think bytecodealliance/wasm-tools#68 will help address the issues here, so I'll get that landed/published and update it here before we merge this.

@alexcrichton
Copy link
Member Author

Ok I think wasmparser should have the necessary changes now, @bnjbvr did you want to double-check this again?

@bnjbvr
Copy link
Member

bnjbvr commented Jul 31, 2020

As agreed on Zulip, since i won't be around for some time, this should land later to give Julian/Chris an opportunity to look into Spidermonkey changes this coming week. Thanks Alex!

@alexcrichton alexcrichton force-pushed the late-validate branch 4 times, most recently from c34ea71 to 5570c95 Compare August 4, 2020 17:54
@cfallin
Copy link
Member

cfallin commented Sep 30, 2020

Getting closer on my end too; FWIW I had to make one small change to fix some tests, exposing a junk-bytes-at-end condition as an error return rather than panic: commit (feel free to incorporate). I just have to tackle some table/reftypes-related test failures and then SpiderMonkey will be ready for this!

This way we can let the validator take care of any issues with
mismatched `end` instructions and/or trailing operators/bytes.
@alexcrichton
Copy link
Member Author

Oh oops, good catch! The intention was to let wasmparser's validator catch that naturally, so I pushed up a slightly tweaked commit which should remove the panic as well.

@cfallin
Copy link
Member

cfallin commented Oct 4, 2020

I've gotten all tests to pass in SpiderMonkey using this validator, now! In order to do so, I locally updated the wasmparser dep in cranelift-wasm to 0.63.0; this included an update to the bytecode-operator enum so I have a local commit here on top of your two commits. (That patch-stack is rebased to current main as well.) I'm not sure if the update is strictly necessary, now, but I did so as I was working through some weird validation mismatches, and have now baked the new version's error messages into the unit-test updates, so it has become a requirement for that reason :-)

Given that, I think there isn't any further reason to hold this back -- thanks for waiting on SM to catch up!

(A tangential note, as hinted above -- the SpiderMonkey test suite has a whole bunch of "this wasm module should fail to compile in a way that matches this regex" tests, so we should be careful updating any error messages going forward, or at least consider the cost; I wish there were a better way (standardized error messages as per a spec?) but this is what we have for now...)

@fitzgen
Copy link
Member

fitzgen commented Oct 4, 2020

(A tangential note, as hinted above -- the SpiderMonkey test suite has a whole bunch of "this wasm module should fail to compile in a way that matches this regex" tests, so we should be careful updating any error messages going forward, or at least consider the cost; I wish there were a better way (standardized error messages as per a spec?) but this is what we have for now...)

The spec tests do have expected error messages and, except for a few exceptions, we match them in wasmparser. Does SpiderMonkey also (newly?) do this? Or am I misunderstanding which tests you're talking about?

@cfallin
Copy link
Member

cfallin commented Oct 4, 2020

Ah, I hadn't realized there were standard error messages. I can dig further into why SpiderMonkey doesn't hit issues with this (perhaps the harness is tweaked?) but actually all of the issues I hit were in the SpiderMonkey-specific test suite. An example here matches on a specific "type mismatch: expression has type A but expected B" message; wasmparser (version 0.63) emits "type mismatch: expected Some(A), found Some(B)" and A and B are e.g. I32, FuncRef rather than i32, funcref. (Seems this is just a direct Debug rendering of the Rust type representation.) So I had to tweak a bunch of /a/ regexes to /(a)|(b)/ (because SM still has the old backends). Not ideal, but workable for now.

@alexcrichton
Copy link
Member Author

"standard" is a bit of a stretch in that they're not super descriptitive, it's typically "type mismatch" for everything related to a type error and the implementation typically provides more descriptive stuff afterwards. That being said of Some(...) is showing up in wasmparser parser errors that's probably a bug on our end where we were too lazy to stringify something and went with format!("{:?}") for convenience. We'd be more than happy to update the error messages in wasmparser to be more readable, I don't think they've received too much love previously!

It's great to hear though that SpiderMonkey is coming along well so I'm going to carry over the approval from before and go ahead and merge this. Thanks again for the heads up @cfallin!

@alexcrichton alexcrichton merged commit 2c68410 into bytecodealliance:main Oct 5, 2020
@alexcrichton alexcrichton deleted the late-validate branch October 5, 2020 16:02
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this pull request Oct 7, 2020
…c. r=jseward

This patch pulls in an updated Cranelift with a new validation strategy,
introduced by bytecodealliance/wasmtime#2059. This new design validates
the Wasm module as it parses the function bodies. A subsequent patch
will adapt Baldrdash to work with this.

Differential Revision: https://phabricator.services.mozilla.com/D92503
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this pull request Oct 7, 2020
…c. r=jseward

This patch pulls in an updated Cranelift with a new validation strategy,
introduced by bytecodealliance/wasmtime#2059. This new design validates
the Wasm module as it parses the function bodies. A subsequent patch
will adapt Baldrdash to work with this.

Differential Revision: https://phabricator.services.mozilla.com/D92503
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this pull request Oct 7, 2020
This patch is an update of Benjamin Bouvier's WIP patch that supports
Cranelift's new Wasm validator functionality, as introduced in
bytecodealliance/wasmtime#2059.

Previously, Baldrdash allowed SpiderMonkey to validate Wasm function
bodies before invoking Cranelift. The Cranelift PR now causes validation
to happen in the function-body compiler.  Because the Cranelift backend
now takes on this responsibility, we no longer need to invoke the
SpiderMonkey function-body validator. This requires some significant new
glue in the Baldrdash bindings to allow Cranelift's Wasm frontend to
access module type information.

Co-authored-by: Benjamin Bouvier <benj@benj.me>

Differential Revision: https://phabricator.services.mozilla.com/D92504
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this pull request Oct 8, 2020
…c. r=jseward

This patch pulls in an updated Cranelift with a new validation strategy,
introduced by bytecodealliance/wasmtime#2059. This new design validates
the Wasm module as it parses the function bodies. A subsequent patch
will adapt Baldrdash to work with this.

Differential Revision: https://phabricator.services.mozilla.com/D92503
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this pull request Oct 8, 2020
This patch is an update of Benjamin Bouvier's WIP patch that supports
Cranelift's new Wasm validator functionality, as introduced in
bytecodealliance/wasmtime#2059.

Previously, Baldrdash allowed SpiderMonkey to validate Wasm function
bodies before invoking Cranelift. The Cranelift PR now causes validation
to happen in the function-body compiler.  Because the Cranelift backend
now takes on this responsibility, we no longer need to invoke the
SpiderMonkey function-body validator. This requires some significant new
glue in the Baldrdash bindings to allow Cranelift's Wasm frontend to
access module type information.

Co-authored-by: Benjamin Bouvier <benj@benj.me>

Differential Revision: https://phabricator.services.mozilla.com/D92504
sidvishnoi pushed a commit to sidvishnoi/gecko-webmonetization that referenced this pull request Oct 13, 2020
…c. r=jseward

This patch pulls in an updated Cranelift with a new validation strategy,
introduced by bytecodealliance/wasmtime#2059. This new design validates
the Wasm module as it parses the function bodies. A subsequent patch
will adapt Baldrdash to work with this.

Differential Revision: https://phabricator.services.mozilla.com/D92503
sidvishnoi pushed a commit to sidvishnoi/gecko-webmonetization that referenced this pull request Oct 13, 2020
…c. r=jseward

This patch pulls in an updated Cranelift with a new validation strategy,
introduced by bytecodealliance/wasmtime#2059. This new design validates
the Wasm module as it parses the function bodies. A subsequent patch
will adapt Baldrdash to work with this.

Differential Revision: https://phabricator.services.mozilla.com/D92503
sidvishnoi pushed a commit to sidvishnoi/gecko-webmonetization that referenced this pull request Oct 13, 2020
This patch is an update of Benjamin Bouvier's WIP patch that supports
Cranelift's new Wasm validator functionality, as introduced in
bytecodealliance/wasmtime#2059.

Previously, Baldrdash allowed SpiderMonkey to validate Wasm function
bodies before invoking Cranelift. The Cranelift PR now causes validation
to happen in the function-body compiler.  Because the Cranelift backend
now takes on this responsibility, we no longer need to invoke the
SpiderMonkey function-body validator. This requires some significant new
glue in the Baldrdash bindings to allow Cranelift's Wasm frontend to
access module type information.

Co-authored-by: Benjamin Bouvier <benj@benj.me>

Differential Revision: https://phabricator.services.mozilla.com/D92504
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Nov 3, 2020
This commit is intended to be the first of many in implementing the
module linking proposal. At this time this builds on bytecodealliance#2059 so it
shouldn't land yet. The goal of this commit is to compile bare-bones
modules which use module linking, e.g. those with nested modules.

My hope with module linking is that almost everything in wasmtime only
needs mild refactorings to handle it. The goal is that all per-module
structures are still per-module and at the top level there's just a
`Vec` containing a bunch of modules. That's implemented currently where
`wasmtime::Module` contains `Arc<[CompiledModule]>` and an index of
which one it's pointing to. This should enable
serialization/deserialization of any module in a nested modules
scenario, no matter how you got it.

Tons of features of the module linking proposal are missing from this
commit. For example instantiation flat out doesn't work, nor does
import/export of modules or instances. That'll be coming as future
commits, but the purpose here is to start laying groundwork in Wasmtime
for handling lots of modules in lots of places.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Nov 3, 2020
This commit is intended to be the first of many in implementing the
module linking proposal. At this time this builds on bytecodealliance#2059 so it
shouldn't land yet. The goal of this commit is to compile bare-bones
modules which use module linking, e.g. those with nested modules.

My hope with module linking is that almost everything in wasmtime only
needs mild refactorings to handle it. The goal is that all per-module
structures are still per-module and at the top level there's just a
`Vec` containing a bunch of modules. That's implemented currently where
`wasmtime::Module` contains `Arc<[CompiledModule]>` and an index of
which one it's pointing to. This should enable
serialization/deserialization of any module in a nested modules
scenario, no matter how you got it.

Tons of features of the module linking proposal are missing from this
commit. For example instantiation flat out doesn't work, nor does
import/export of modules or instances. That'll be coming as future
commits, but the purpose here is to start laying groundwork in Wasmtime
for handling lots of modules in lots of places.
ambroff pushed a commit to ambroff/gecko that referenced this pull request Nov 4, 2020
…c. r=jseward

This patch pulls in an updated Cranelift with a new validation strategy,
introduced by bytecodealliance/wasmtime#2059. This new design validates
the Wasm module as it parses the function bodies. A subsequent patch
will adapt Baldrdash to work with this.

Differential Revision: https://phabricator.services.mozilla.com/D92503
ambroff pushed a commit to ambroff/gecko that referenced this pull request Nov 4, 2020
This patch is an update of Benjamin Bouvier's WIP patch that supports
Cranelift's new Wasm validator functionality, as introduced in
bytecodealliance/wasmtime#2059.

Previously, Baldrdash allowed SpiderMonkey to validate Wasm function
bodies before invoking Cranelift. The Cranelift PR now causes validation
to happen in the function-body compiler.  Because the Cranelift backend
now takes on this responsibility, we no longer need to invoke the
SpiderMonkey function-body validator. This requires some significant new
glue in the Baldrdash bindings to allow Cranelift's Wasm frontend to
access module type information.

Co-authored-by: Benjamin Bouvier <benj@benj.me>

Differential Revision: https://phabricator.services.mozilla.com/D92504
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Nov 6, 2020
This commit is intended to be the first of many in implementing the
module linking proposal. At this time this builds on bytecodealliance#2059 so it
shouldn't land yet. The goal of this commit is to compile bare-bones
modules which use module linking, e.g. those with nested modules.

My hope with module linking is that almost everything in wasmtime only
needs mild refactorings to handle it. The goal is that all per-module
structures are still per-module and at the top level there's just a
`Vec` containing a bunch of modules. That's implemented currently where
`wasmtime::Module` contains `Arc<[CompiledModule]>` and an index of
which one it's pointing to. This should enable
serialization/deserialization of any module in a nested modules
scenario, no matter how you got it.

Tons of features of the module linking proposal are missing from this
commit. For example instantiation flat out doesn't work, nor does
import/export of modules or instances. That'll be coming as future
commits, but the purpose here is to start laying groundwork in Wasmtime
for handling lots of modules in lots of places.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Nov 6, 2020
This commit is intended to be the first of many in implementing the
module linking proposal. At this time this builds on bytecodealliance#2059 so it
shouldn't land yet. The goal of this commit is to compile bare-bones
modules which use module linking, e.g. those with nested modules.

My hope with module linking is that almost everything in wasmtime only
needs mild refactorings to handle it. The goal is that all per-module
structures are still per-module and at the top level there's just a
`Vec` containing a bunch of modules. That's implemented currently where
`wasmtime::Module` contains `Arc<[CompiledModule]>` and an index of
which one it's pointing to. This should enable
serialization/deserialization of any module in a nested modules
scenario, no matter how you got it.

Tons of features of the module linking proposal are missing from this
commit. For example instantiation flat out doesn't work, nor does
import/export of modules or instances. That'll be coming as future
commits, but the purpose here is to start laying groundwork in Wasmtime
for handling lots of modules in lots of places.
alexcrichton added a commit that referenced this pull request Nov 6, 2020
This commit is intended to be the first of many in implementing the
module linking proposal. At this time this builds on #2059 so it
shouldn't land yet. The goal of this commit is to compile bare-bones
modules which use module linking, e.g. those with nested modules.

My hope with module linking is that almost everything in wasmtime only
needs mild refactorings to handle it. The goal is that all per-module
structures are still per-module and at the top level there's just a
`Vec` containing a bunch of modules. That's implemented currently where
`wasmtime::Module` contains `Arc<[CompiledModule]>` and an index of
which one it's pointing to. This should enable
serialization/deserialization of any module in a nested modules
scenario, no matter how you got it.

Tons of features of the module linking proposal are missing from this
commit. For example instantiation flat out doesn't work, nor does
import/export of modules or instances. That'll be coming as future
commits, but the purpose here is to start laying groundwork in Wasmtime
for handling lots of modules in lots of places.
cfallin pushed a commit that referenced this pull request Nov 30, 2020
* Validate modules while translating

This commit is a change to cranelift-wasm to validate each function body
as it is translated. Additionally top-level module translation functions
will perform module validation. This commit builds on changes in
wasmparser to perform module validation interwtwined with parsing and
translation. This will be necessary for future wasm features such as
module linking where the type behind a function index, for example, can
be far away in another module. Additionally this also brings a nice
benefit where parsing the binary only happens once (instead of having an
up-front serial validation step) and validation can happen in parallel
for each function.

Most of the changes in this commit are plumbing to make sure everything
lines up right. The major functional change here is that module
compilation should be faster by validating in parallel (or skipping
function validation entirely in the case of a cache hit). Otherwise from
a user-facing perspective nothing should be that different.

This commit does mean that cranelift's translation now inherently
validates the input wasm module. This means that the Spidermonkey
integration of cranelift-wasm will also be validating the function as
it's being translated with cranelift. The associated PR for wasmparser
(bytecodealliance/wasmparser#62) provides the necessary tools to create
a `FuncValidator` for Gecko, but this is something I'll want careful
review for before landing!

* Read function operators until EOF

This way we can let the validator take care of any issues with
mismatched `end` instructions and/or trailing operators/bytes.
cfallin pushed a commit that referenced this pull request Nov 30, 2020
This commit is intended to be the first of many in implementing the
module linking proposal. At this time this builds on #2059 so it
shouldn't land yet. The goal of this commit is to compile bare-bones
modules which use module linking, e.g. those with nested modules.

My hope with module linking is that almost everything in wasmtime only
needs mild refactorings to handle it. The goal is that all per-module
structures are still per-module and at the top level there's just a
`Vec` containing a bunch of modules. That's implemented currently where
`wasmtime::Module` contains `Arc<[CompiledModule]>` and an index of
which one it's pointing to. This should enable
serialization/deserialization of any module in a nested modules
scenario, no matter how you got it.

Tons of features of the module linking proposal are missing from this
commit. For example instantiation flat out doesn't work, nor does
import/export of modules or instances. That'll be coming as future
commits, but the purpose here is to start laying groundwork in Wasmtime
for handling lots of modules in lots of places.
jameysharp added a commit to jameysharp/wasmtime that referenced this pull request Jul 9, 2024
As far as I can tell it's been unused since bytecodealliance#2059 in 2020.
jameysharp added a commit to jameysharp/wasmtime that referenced this pull request Jul 9, 2024
As far as I can tell it's been unused since bytecodealliance#2059 in 2020.
github-merge-queue bot pushed a commit that referenced this pull request Jul 10, 2024
As far as I can tell it's been unused since #2059 in 2020.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:wasm cranelift Issues related to the Cranelift code generator fuzzing Issues related to our fuzzing infrastructure lightbeam Issues related to the Lightbeam compiler wasmtime:api Related to the API of the `wasmtime` crate itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants