-
Notifications
You must be signed in to change notification settings - Fork 624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix compilation determinism issue #7443
Fix compilation determinism issue #7443
Conversation
@@ -3,7 +3,7 @@ use parity_wasm::{ | |||
builder, | |||
elements::{self, FunctionType, Internal}, | |||
}; | |||
use std::collections::HashMap as Map; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should create an issue to review all uses of HashMap
, HashSet
, etc. in our code to ensure that there aren't any non-determinism issues there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I'd be curious to hear some case studies of how to best enforce determinism consistently.
- The simplest option is indeed to use tree-based collections, but my understanding is that it is pretty inefficient, as collection lookups are usually hot in profiles, and trees are way less CPU&memory efficient than hashmaps. No gut feeling for just how inefficient it is.
- The second simplest option is to use hash-maps with deterministic iteration (index map, what modern python is doing). Again, this is trading perf for determinism, but I don't know how much perf. I think this would probably be my default pick for making deterministic software.
- The third option is to make sure that
HashMap
doesn't expose non-deterministic API (doesn't implementiter
). I think this is whatrustc
is doing, not sure how successful it is.
Interestingly, I think in rust-analyzer
we've never ever hit this problem. This is surprising to me, as, given our incremental compilation setup, the non-determinism would lead to straightforward panic, so it'd be hard to miss. I think that is because in rust-analyzer we are using FxHashMap
which doesn't do randomization (so, DOS-vulnerable).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this fix the whole class of bugs for us!
@@ -3,7 +3,7 @@ use parity_wasm::{ | |||
builder, | |||
elements::{self, FunctionType, Internal}, | |||
}; | |||
use std::collections::HashMap as Map; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I'd be curious to hear some case studies of how to best enforce determinism consistently.
- The simplest option is indeed to use tree-based collections, but my understanding is that it is pretty inefficient, as collection lookups are usually hot in profiles, and trees are way less CPU&memory efficient than hashmaps. No gut feeling for just how inefficient it is.
- The second simplest option is to use hash-maps with deterministic iteration (index map, what modern python is doing). Again, this is trading perf for determinism, but I don't know how much perf. I think this would probably be my default pick for making deterministic software.
- The third option is to make sure that
HashMap
doesn't expose non-deterministic API (doesn't implementiter
). I think this is whatrustc
is doing, not sure how successful it is.
Interestingly, I think in rust-analyzer
we've never ever hit this problem. This is surprising to me, as, given our incremental compilation setup, the non-determinism would lead to straightforward panic, so it'd be hard to miss. I think that is because in rust-analyzer we are using FxHashMap
which doesn't do randomization (so, DOS-vulnerable).
@@ -167,3 +170,31 @@ fn wasmer2_and_wasmtime_agree() { | |||
assert_eq!(wasmer2, wasmtime); | |||
}); | |||
} | |||
|
|||
#[test] | |||
fn wasmer2_is_reproducible() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see we also attach a crash file here crash-7efd70f50949e0fdff8f4c8d6cc66709faf6d3cf
. I am a tiny bit skeptical of the utility of them -- it seems it's relatively easy for the crash file to become obsolete with unrelated changes.
I think what is in the crash file is data: &[u8]
, the seed we pass to arbitrary. But that means that, eg, upgrading arbitrary
can chance the .wasm
that is generated, which might break the test.
It seems to me that, for generative fuzzing [u8] -> T
we should save T
, rather than [u8]
, as a crash. So, in this case, an actual .wasm
.
Action items here:
- this probably wants some design discussion on the bolero repository
- I think I am fine with including the current crash file here, but I am curious whether the default 1000 iterations would be enough to get it? If it's easy to discover, I suggest not adding a file maybe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While what you say makes sense, what would be a better option in your opinion? I don’t think we can do this straight from bolero, because we’d have to have some serialization method for storing the crash report, and Arbitrary is as good a serialization method as any other afaict.
The alternative I can think of that would make sense would be to just add a unit test instead of checking in a crash report. I didn’t go that way first, but could switch to this method if you think it’d be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh and, it took my computer approx. 20min/1hr to find the crash from scratch, so unfortunately I don’t think just not adding an explicit test is a solution. (Also, I let the computer search for a few more hours after that and didn’t find any other determinism issue, so hopefully the rest of our code is ok-ish)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh and, it took my computer approx. 20min/1hr to find the crash from scratch
Yeah, that's a good argument for keeping this as is.
I don’t think we can do this straight from bolero, because we’d have to have some serialization method for storing the crash report, and Arbitrary is as good a serialization method as any other afaict.
That's the crux of my thinking here -- I do think that Arbitrary
is less good serialization method, because it is meaningfully less stable. Specifically:
- if we store raw bytes, and do
cargo upgrade -p wasm-smith
, the test could break - if we store wasm generated by arbitrary by some particualar version of
wasm-smith
, than the test would be reproducible even after upgrades.
A slightly different way to get at this: it might be the case that some .wasm
s are impossible to generate via wasm-smith
, while raw .wasm
naturally can represent any .wasm
. The generation routine is not necessary surjective, while proper serialization is.
OTOH, I also don't see small-incremental solutions here. A "large" solution would be something like the following
traint TestInput
: arbitrary::Arbitrary // to generate from raw bytes
+ Display // to show on failures
{
// to save/load examples independent from arbitrary,
// or for manual table-driven tests.
fn dump(&self) -> Vec<u8>
fn load(Vec<u8>) -> Result<Self>
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm so I’m not entirely sure how that’d work out, because even Arbitrary is but one of the ways to generate inputs with bolero, there’s also bolero’s own TypeGenerator and ValueGenerator, and I think there are even other generators they’re considering adding.
That said what you’re saying does make sense to me: it’d be good to be able to provide a "value" instead of a fuzzer seed, if only so that it becomes possible to use a known crash input for the fuzzer. I had opened camshaft/bolero#91 to track this, though I’m not sure how doable-in-practice that’d be. Does it make sense to you?
use near_primitives::runtime::fees::RuntimeFeesConfig; | ||
use near_primitives::version::PROTOCOL_VERSION; | ||
use near_vm_errors::{FunctionCallError, VMError}; | ||
use near_vm_logic::mocks::mock_external::MockedExternal; | ||
use near_vm_logic::{VMConfig, VMContext}; | ||
use wasmer_engine::Executable; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated to the PR, but we don't leave blank lines between imports.
match first_hash { | ||
None => first_hash = Some(hash), | ||
Some(h) => assert_eq!(h, hash), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe:
let first_hash = *first_hash.get_or_insert(hash);
assert_eq!(first_hash, hash);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm so this would definitely work, but I’m not sure the loss in readability is worth the gain in conciseness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was actually a bit confused about the original control flow here, hence I tried to think about a different way to write this, but it's true that get_or_insert
isn't necessary readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the following pattern?
let mut prev_hash = None;
for 0..3 {
...
if let Some(h) = prev_hash {
assert_eq!(h, hash);
}
prev_hash = Some(hash);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe a mix of the two, like this?
let mut first_hash = None;
for _ in 0..3 {
// ...
if let None = first_hash {
first_hash = Some(hash);
}
assert_eq!(first_hash, Some(hash));
}
Now I must say I don’t have any strong preferences, so unless someone comes forward with a strong preference for one of these styles I’ll just go forward and land it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't have a strong opinion at all. I just saw a bike shed like discussion and felt like jumping in =).
@Ekleog-NEAR do you want to apply automerge label here? By default, PR author merges PRs: https://github.com/near/nearcore/blob/master/CONTRIBUTING.md#after-the-pr-is-submitted |
Just did it, thank you for the ping! |
There was a compilation determinism issue in our instrumentation, due to iterating over a HashMap when generating the thunks. With this, I add: - A fuzzer that tries to automatically detect non-determinism instances - An output from this fuzzer that generates said non-determinism; auto-generated so that when running `cargo test` the regression would be caught - The fix for the compilation determinism issue per se I've been running the fuzzer locally for a bit to check there's no other obvious instance of compilation non-determinism, and coverage has already doubled since the fuzzer found the crash I checked in so hopefully other instances of compilation non-determinism would be hard to find. cc @akhi3030 @matklad
There was a compilation determinism issue in our instrumentation, due to iterating over a HashMap when generating the thunks. With this, I add:
cargo test
the regression would be caughtI've been running the fuzzer locally for a bit to check there's no other obvious instance of compilation non-determinism, and coverage has already doubled since the fuzzer found the crash I checked in so hopefully other instances of compilation non-determinism would be hard to find.
cc @akhi3030 @matklad