Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix compilation determinism issue #7443

Merged

Conversation

Ekleog
Copy link
Contributor

@Ekleog Ekleog commented Aug 19, 2022

There was a compilation determinism issue in our instrumentation, due to iterating over a HashMap when generating the thunks. With this, I add:

  • A fuzzer that tries to automatically detect non-determinism instances
  • An output from this fuzzer that generates said non-determinism; auto-generated so that when running cargo test the regression would be caught
  • The fix for the compilation determinism issue per se

I've been running the fuzzer locally for a bit to check there's no other obvious instance of compilation non-determinism, and coverage has already doubled since the fuzzer found the crash I checked in so hopefully other instances of compilation non-determinism would be hard to find.

cc @akhi3030 @matklad

@Ekleog Ekleog requested a review from a team as a code owner August 19, 2022 15:20
@Ekleog Ekleog requested a review from nikurt August 19, 2022 15:20
@@ -3,7 +3,7 @@ use parity_wasm::{
builder,
elements::{self, FunctionType, Internal},
};
use std::collections::HashMap as Map;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should create an issue to review all uses of HashMap, HashSet, etc. in our code to ensure that there aren't any non-determinism issues there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I'd be curious to hear some case studies of how to best enforce determinism consistently.

  • The simplest option is indeed to use tree-based collections, but my understanding is that it is pretty inefficient, as collection lookups are usually hot in profiles, and trees are way less CPU&memory efficient than hashmaps. No gut feeling for just how inefficient it is.
  • The second simplest option is to use hash-maps with deterministic iteration (index map, what modern python is doing). Again, this is trading perf for determinism, but I don't know how much perf. I think this would probably be my default pick for making deterministic software.
  • The third option is to make sure that HashMap doesn't expose non-deterministic API (doesn't implement iter). I think this is what rustc is doing, not sure how successful it is.

Interestingly, I think in rust-analyzer we've never ever hit this problem. This is surprising to me, as, given our incremental compilation setup, the non-determinism would lead to straightforward panic, so it'd be hard to miss. I think that is because in rust-analyzer we are using FxHashMap which doesn't do randomization (so, DOS-vulnerable).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akhi3030 I’ve opened #7507 to track doing this :)

runtime/near-vm-runner/src/tests/fuzzers.rs Show resolved Hide resolved
Copy link
Contributor

@matklad matklad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this fix the whole class of bugs for us!

@@ -3,7 +3,7 @@ use parity_wasm::{
builder,
elements::{self, FunctionType, Internal},
};
use std::collections::HashMap as Map;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I'd be curious to hear some case studies of how to best enforce determinism consistently.

  • The simplest option is indeed to use tree-based collections, but my understanding is that it is pretty inefficient, as collection lookups are usually hot in profiles, and trees are way less CPU&memory efficient than hashmaps. No gut feeling for just how inefficient it is.
  • The second simplest option is to use hash-maps with deterministic iteration (index map, what modern python is doing). Again, this is trading perf for determinism, but I don't know how much perf. I think this would probably be my default pick for making deterministic software.
  • The third option is to make sure that HashMap doesn't expose non-deterministic API (doesn't implement iter). I think this is what rustc is doing, not sure how successful it is.

Interestingly, I think in rust-analyzer we've never ever hit this problem. This is surprising to me, as, given our incremental compilation setup, the non-determinism would lead to straightforward panic, so it'd be hard to miss. I think that is because in rust-analyzer we are using FxHashMap which doesn't do randomization (so, DOS-vulnerable).

@@ -167,3 +170,31 @@ fn wasmer2_and_wasmtime_agree() {
assert_eq!(wasmer2, wasmtime);
});
}

#[test]
fn wasmer2_is_reproducible() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we also attach a crash file here crash-7efd70f50949e0fdff8f4c8d6cc66709faf6d3cf. I am a tiny bit skeptical of the utility of them -- it seems it's relatively easy for the crash file to become obsolete with unrelated changes.

I think what is in the crash file is data: &[u8], the seed we pass to arbitrary. But that means that, eg, upgrading arbitrary can chance the .wasm that is generated, which might break the test.

It seems to me that, for generative fuzzing [u8] -> T we should save T, rather than [u8], as a crash. So, in this case, an actual .wasm.

Action items here:

  • this probably wants some design discussion on the bolero repository
  • I think I am fine with including the current crash file here, but I am curious whether the default 1000 iterations would be enough to get it? If it's easy to discover, I suggest not adding a file maybe.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While what you say makes sense, what would be a better option in your opinion? I don’t think we can do this straight from bolero, because we’d have to have some serialization method for storing the crash report, and Arbitrary is as good a serialization method as any other afaict.

The alternative I can think of that would make sense would be to just add a unit test instead of checking in a crash report. I didn’t go that way first, but could switch to this method if you think it’d be better?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh and, it took my computer approx. 20min/1hr to find the crash from scratch, so unfortunately I don’t think just not adding an explicit test is a solution. (Also, I let the computer search for a few more hours after that and didn’t find any other determinism issue, so hopefully the rest of our code is ok-ish)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh and, it took my computer approx. 20min/1hr to find the crash from scratch

Yeah, that's a good argument for keeping this as is.

I don’t think we can do this straight from bolero, because we’d have to have some serialization method for storing the crash report, and Arbitrary is as good a serialization method as any other afaict.

That's the crux of my thinking here -- I do think that Arbitrary is less good serialization method, because it is meaningfully less stable. Specifically:

  • if we store raw bytes, and do cargo upgrade -p wasm-smith, the test could break
  • if we store wasm generated by arbitrary by some particualar version of wasm-smith, than the test would be reproducible even after upgrades.

A slightly different way to get at this: it might be the case that some .wasms are impossible to generate via wasm-smith, while raw .wasm naturally can represent any .wasm. The generation routine is not necessary surjective, while proper serialization is.

OTOH, I also don't see small-incremental solutions here. A "large" solution would be something like the following

traint TestInput
  : arbitrary::Arbitrary // to generate from raw bytes
  + Display // to show on failures
{
  // to save/load examples independent from arbitrary,
  // or for manual table-driven tests. 
  fn dump(&self) -> Vec<u8> 
  fn load(Vec<u8>) -> Result<Self>
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm so I’m not entirely sure how that’d work out, because even Arbitrary is but one of the ways to generate inputs with bolero, there’s also bolero’s own TypeGenerator and ValueGenerator, and I think there are even other generators they’re considering adding.

That said what you’re saying does make sense to me: it’d be good to be able to provide a "value" instead of a fuzzer seed, if only so that it becomes possible to use a known crash input for the fuzzer. I had opened camshaft/bolero#91 to track this, though I’m not sure how doable-in-practice that’d be. Does it make sense to you?

use near_primitives::runtime::fees::RuntimeFeesConfig;
use near_primitives::version::PROTOCOL_VERSION;
use near_vm_errors::{FunctionCallError, VMError};
use near_vm_logic::mocks::mock_external::MockedExternal;
use near_vm_logic::{VMConfig, VMContext};
use wasmer_engine::Executable;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated to the PR, but we don't leave blank lines between imports.

runtime/near-vm-runner/src/tests/fuzzers.rs Show resolved Hide resolved
Comment on lines +193 to +196
match first_hash {
None => first_hash = Some(hash),
Some(h) => assert_eq!(h, hash),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

let first_hash = *first_hash.get_or_insert(hash);
assert_eq!(first_hash, hash);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm so this would definitely work, but I’m not sure the loss in readability is worth the gain in conciseness?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was actually a bit confused about the original control flow here, hence I tried to think about a different way to write this, but it's true that get_or_insert isn't necessary readable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the following pattern?

let mut prev_hash = None;
for 0..3 {
    ...
    if let Some(h) = prev_hash {
        assert_eq!(h, hash);
    }
    prev_hash = Some(hash);
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe a mix of the two, like this?

let mut first_hash = None;
for _ in 0..3 {
    // ...
    if let None = first_hash {
        first_hash = Some(hash);
    }
    assert_eq!(first_hash, Some(hash));
}

Now I must say I don’t have any strong preferences, so unless someone comes forward with a strong preference for one of these styles I’ll just go forward and land it :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't have a strong opinion at all. I just saw a bike shed like discussion and felt like jumping in =).

@matklad
Copy link
Contributor

matklad commented Sep 1, 2022

@Ekleog-NEAR do you want to apply automerge label here? By default, PR author merges PRs: https://github.com/near/nearcore/blob/master/CONTRIBUTING.md#after-the-pr-is-submitted

@Ekleog-NEAR
Copy link
Collaborator

Just did it, thank you for the ping!

@near-bulldozer near-bulldozer bot merged commit 93c6372 into near:master Sep 1, 2022
nikurt pushed a commit that referenced this pull request Nov 9, 2022
There was a compilation determinism issue in our instrumentation, due to iterating over a HashMap when generating the thunks. With this, I add:
- A fuzzer that tries to automatically detect non-determinism instances
- An output from this fuzzer that generates said non-determinism; auto-generated so that when running `cargo test` the regression would be caught
- The fix for the compilation determinism issue per se

I've been running the fuzzer locally for a bit to check there's no other obvious instance of compilation non-determinism, and coverage has already doubled since the fuzzer found the crash I checked in so hopefully other instances of compilation non-determinism would be hard to find.

cc @akhi3030 @matklad
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants