-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add benchmarks #1660
Add benchmarks #1660
Conversation
…rce location tracking for traps, start to support multi-value The only thing that still needs to be supported for multi-value is stack returns, but we need to make it compatible with Cranelift.
…impossible) instead of panicking
…ock args (since this stack space may become invalid)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! In terms of code my main concern for this is that there's a lot of duplication with how *.wast
files are handled which I think we'll want to reduce. Other than that I'm a little concerned that the spec tests aren't really that meaningful of benchmarks since they're either extremely small modules or aren't really exercising interesting performance characteristics but rather mostly just small tests for correctness.
That being said though I don't really know what a good benchmark suite would look like, and it's perhaps better to have something than nothing at all!
@@ -0,0 +1,272 @@ | |||
#![cfg(feature = "benches")] | |||
#![feature(test)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this file be renamed to perhaps spec.rs
or something that indicates that it's benchmarking the spec test suite?
FWIW the spec test suite I don't think is really the most interesting of benchmarks, but at least for our own internal tracking and performance monitoring it probably isn't so bad to track!
@@ -0,0 +1,272 @@ | |||
#![cfg(feature = "benches")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to use a hand-rolled harness = false
scheme or something like criterion
that works on stable?
.wasm_reference_types(reftypes) | ||
.wasm_multi_value(multi_val) | ||
.strategy(strategy)? | ||
.cranelift_debug_verifier(cfg!(debug_assertions)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This I think is duplicated with the testing code as well as with the function below, could this all be consolidated into one function, perhaps in the wasmtime-wast
crate itself?
let file_contents = std::fs::read_to_string(path) | ||
.with_context(|| format!("failed to read `{}`", path.display()))?; | ||
let buf = ParseBuffer::new(&file_contents).map_err(adjust_wast)?; | ||
let ast = parser::parse::<wast::Wast>(&buf).map_err(adjust_wast)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be deduplicated with the wasmtime-wast
crate? Perhaps something where a function to handle each directive is provided and then we say "run this file", or something like that?
…ed to conditionally jump to blocks with heterogenous calling conventions
…with multiple-return functions fail to compile and that is currently not implemented correctly
…hat somehow weren't caught by any tests until now
Subscribe to Label Actioncc @peterhuene
This issue or pull request has been labeled: "lightbeam", "wasmtime:api"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
So I wrote these benchmarks with the aim of getting a vague idea of how Lightbeam compares to Cranelift in compile-time and runtime performance, you can see the results in this gist https://gist.github.com/Vurich/8696e67180aa3c93b4548fb1f298c29e.
I'd love any criticisms of the methodology used by these benchmarks w.r.t. the likely reliability of the results, I tried to dig into the code to make sure that, in the loop which is being measured, what I thought was being run was actually being run (and what I didn't think would be run was actually not being run), but I'd love if there were some holes to poke in the way that these benchmarks have been written.
Unfortunately, it changes the API of
wasmtime-wast
in order to make public a couple of functions (and one type returned by one of those functions) that the benchmarks need. I considered making these functions public only whencfg(test)
is active, or to inline the benchmarks into that file itself instead of making a separatebenches
binary, but I figured that it would be more helpful to ask here what people would consider the best way to go about that. I personally didn't see anything majorly problematic with just making these methods public and it was the easiest method of implementation, so that's the method that I chose for the original version of this PR, with the understanding that I can change it later if necessary.