Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cranelift CLIF-level differential fuzzer #3038

Merged
merged 20 commits into from
Jul 1, 2021

Conversation

afonso360
Copy link
Contributor

Hey!

Here's an initial version of the cranelift fuzzer.

We generate a random cranelift function based on input bytes from the fuzzer. Currently we can generate 7 instructions, but it is already enough to get us into trouble.

There is also a gen_testcase util that grabs fuzzer artifacts and converts them into clif files for easier testing.

In order for the fuzzer to be useful in testing DivByZero traps we need to disable overflow checks:

 RUSTFLAGS="-C overflow-checks=off" cargo fuzz run fuzzgen

Otherwise, the interpreter crashes immediately with overflows (We also need to fix this).

I've based my design a lot on @fitzgen 's blog post about wasm-smith. Its what made me start thinking about building this for cranelift in the first place. Thanks!

  • Organization

    • I've created a new cargo-fuzz package for cranelift, should I merge the fuzzer with wasmtime's fuzzers?
    • Is there a way to keep track which cranelift features we are covering / not? It would be useful to keep track of this, so that we don't accidentally miss any feature.
  • Design questions:

    • How do we deal with arbitrary memory accesses?

    • How do we deal with traps?

      • This first version is also a POC for this, the idea being that the fuzzer will generate a div by zero or overflow.
      • We should be able to catch this in the interpreter
      • If the program traps, it would be useful to run it on the host and check for the same Trap. Can we do this?
    • How do we deal with infinite loops?

      • My idea here is to allow a fixed number of steps on the interpreter, and if we exceed those, we just ignore this program.
    • Is there any other bad behavior that we need to handle specially?

    • Should we distinguish between pointer types and i64's in the generator? We can just use any i64 and the program will crash until it does a valid thing, This might not be great in terms of fuzzing efficiency but it might generate more interesting programs.

CC: @cfallin, @andrewbrown, @fitzgen, @bjorn3 from the earlier fuzzer discussion

cranelift/filetests/src/function_runner.rs Outdated Show resolved Hide resolved
cranelift/fuzz/.gitignore Outdated Show resolved Hide resolved
cranelift/fuzz/Cargo.toml Outdated Show resolved Hide resolved
cranelift/fuzz/Cargo.toml Outdated Show resolved Hide resolved
cranelift/fuzz/Cargo.toml Outdated Show resolved Hide resolved
cranelift/fuzz/fuzz_targets/fuzzgen.rs Outdated Show resolved Hide resolved
cranelift/fuzz/fuzz_targets/fuzzgen.rs Outdated Show resolved Hide resolved
@github-actions github-actions bot added the cranelift Issues related to the Cranelift code generator label Jun 27, 2021
@afonso360
Copy link
Contributor Author

Fixed, Thanks! @bjorn3

@afonso360 afonso360 changed the title Cranelift fuzzer POC Cranelift CLIF-level differential fuzzer Jun 28, 2021
Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for taking this on, @afonso360!

I have a bunch of comments below, but I think the general direction here is good.

As a meta-comment, it's pretty clear to me that this is an effort that can take place over multiple PRs over a span of time; it is valuable even to have the first step of "straight-line code with arithmetic operators only". I think that we'll want to outline the next steps, maybe even in a centralized TODO comment. Specifically, I can think of a few things:

  • Generating test inputs with control flow
  • Generating calls (just to builtins/intrinsics? or just between multiple functions in one testcase?)
  • Generating memory loads/stores (stack and heap)
  • Full coverage of arithmetic ops
  • Full coverage of SIMD ops (this will take a while undoubtedly!)

And at the testing-harness level, we need a bit more thought about (i) infinite loops and (ii) what to do about traps, though I have some thoughts below.

Anyway, I think we can iterate here to get just straightline testcases with a few ops working, then expand from there!

cranelift/filetests/src/function_runner.rs Outdated Show resolved Hide resolved
cranelift/fuzz/Cargo.toml Outdated Show resolved Hide resolved
use cranelift_interpreter::step::ControlFlow;
use cranelift_interpreter::step::CraneliftTrap;

enum RunResult {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be an error-case for "timeout", i.e. ran too long? I'll keep reading to see how infinite loops are caught but this seems like a separate logical case from explicit trap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense. I'm not seeing a quick way to constrain the interpreter to run just N instructions, so lets add that when we implement branches and control flow.

fn run_in_interpreter(interpreter: &mut Interpreter, args: &[DataValue]) -> RunResult {
// The entrypoint function is always 0
let index = FuncIndex::from_u32(0);
let res = interpreter.call_by_index(index, args);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it seems we might want a "run for max N steps" method on the interpreter here. Otherwise, if we have an infinite loop (when fuzz-case generation becomes sophisticated enough for that, if it's not already?), we tie up a core indefinitely.

There's a question whether we step for N steps on both sides (interp and compiled) and then compare state at the end, or if we just reject inputs that run that long. I suspect probably the latter is good enough; the former would require instrumentation in the compiler to count ops and stop. We have "fuel" at the Wasmtime level but it's not built-in to Cranelift itself. And, anything a testcase could do to run indefinitely would be tested by other valid test cases as well, so I think it's fine to avoid these inputs altogether.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when fuzz-case generation becomes sophisticated enough for that, if it's not already?

It isn't simply because we limit the number of generated instructions up to a maximum of 16. Otherwise I think we are simply limited by the input length. On a related note, I moved all of the arbitrary restrictions that we place on the generator into a Config module so that this become more obvious (and so that we can eventually do swarm fuzzing).

There's a question whether we step for N steps on both sides (interp and compiled) and then compare state at the end, or if we just reject inputs that run that long. I suspect probably the latter is good enough

Rejecting inputs that run that long on the interpreter seems like the way to go for me.

cranelift/fuzz/fuzz_targets/fuzzgen.rs Outdated Show resolved Hide resolved
cranelift/fuzzgen/README.md Outdated Show resolved Hide resolved
cranelift/fuzzgen/src/function_generator.rs Outdated Show resolved Hide resolved
// Reuse var
if self.vars_of_type(ty).len() != 0 {
opts.push(|fg, _, ty| {
let opts = fg.vars_of_type(ty);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "available variables" approach is nice, but gets somewhat complicated with control flow, and you can see e.g. in the regalloc2 SSA generator there's a two-staged approach where we decide what values are defined in each block first, then (i) pick among values defined in blocks that dominate us for uses, (ii) ensure we define all vars we said we would in a given block.

So, in this implementation, what happens if the variable is defined, but only in one predecessor? Do we unconditionally re-define it (and let the frontend handle the SSA fixup for that)?

Another workaround is to decide at the top of the function which variables will exist, and assign initial values to all of them. Then all the rest of the instruction outputs are just redefinitions. This is actually probably the easiest and least buggy (which is what we want for a fuzztest generator!) and still shouldn't sacrifice any coverage, since it can turn into arbitrary SSA.

Copy link
Contributor Author

@afonso360 afonso360 Jun 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here was that this set of "available variables" is defined per block not per function (its not currently separated because we generate just one block), when branching or calling return we choose a random number of variables from the current block to transfer onto the next block (depending on the block "signature").

I don't think we have issues with picking variables defined in predecessors because we never do that, and I don't think we can do that with clif? (I may be wrong here)

Another workaround is to decide at the top of the function which variables will exist, and assign initial values to all of them.

Ah! I didn't know that we could transform non SSA code into SSA code. This might be something that we actually want to fuzz as well. In that case I think this is a good way to go.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when branching ... we choose a random number of variables ... to transfer onto the next block

This definitely works for a straightline sequence of blocks, but how does this work for loops? More generally, if you decide at the end of a block which variables to see as "defined" in the next block, you need to see all in-edges to a block before you know for sure that a variable is defined on every in-edge.

This is the origin of the two-stage approach -- it breaks the cycle: if you pin down what should be defined in each location, you can then, in parallel, both rely on that (in uses) and ensure that (in defs).

I think the easiest answer is to rely on the SSA-builder algorithm built into the variable helper layer which you're using here, and def everything at the top first, then select any variable for each use. The key thing (which I think may be flying under the radar here given the "transfer to the next block" phrasing above) is that "variables" as you're using them are not SSA values or blockparams; the latter are defined by the former depending on where you place the one or more definitions of a variable. (And indeed, a variable can have more than one def.) It's always correct to use a variable that has been defined at least once on every path from the entry block (has some dominating def), so the trivially correct thing to do is define everything (for the first time) in the entry block :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the fuzzer to perform this upfront allocation of variables. I'm slightly concerned that it now seems to be reusing the same variable a lot consecutively, but I think this may be the fuzzer exploring the non ssa to ssa code?

Copy link
Contributor Author

@afonso360 afonso360 Jun 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't seen your reply when writing my previous comment. I've changed it to the upfront allocation and i think it should be better because we are probably going to stress the variable helper a bit more with the fuzzer.

But i'm still not sure i'm following what you are saying.

This definitely works for a straightline sequence of blocks, but how does this work for loops? More generally, if you decide at the end of a block which variables to see as "defined" in the next block, you need to see all in-edges to a block before you know for sure that a variable is defined on every in-edge.

The way I'm thinking about this is as if every block is sort of a function.

function %test(i32, i32) -> i32 {
block0(v0: i32, v1: i32):
    v2 = iconst.i32 10
    jump block1(v1)  ; Here we can choose any of v0, v1, v2

block1(v4: i32):
    v5 = iconst.i32 12
    v6 = iconst.i32 13
    jump block0(v4, v5)              ; Here we can choose either v4, v5 or v6 but never v0,v1,v2
}

So in a sense, what I was thinking was: We generate the next block (or reuse an existing one) see what types it has in its signature and choose variables available in the current block to "call" it with (either block params or values locally defined by us).

This way, we don't need to know who called this block, because we never use variables defined by anyone else.

This is more of a curiosity now, because I've changed into the other approach and think it is better, but I'd like to understand why it wouldn't work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generate the next block (or reuse an existing one) see what types it has in its signature and choose variables available in the current block to "call" it with (either block params or values locally defined by us).

Ah, I see! The key thing here, I think, is that "what types it has in its signature" is decided before code is generated, and so breaks the cycle: this is the equivalent to the pre-pass I mentioned that decides where to define things so that defs and uses can both rely on those decisions. So this approach would also work.

Given that the cranelift-frontend logic is built to accept multi-def vars and generate SSA, I think that the approach this PR takes now (upfront initializations) is still the right one, but your approach would work well for generating the SSA directly, if we decide to do that instead at some point.

cranelift/fuzzgen/src/gen_testcase.rs Outdated Show resolved Hide resolved
cranelift/fuzzgen/src/lib.rs Show resolved Hide resolved
@abrown
Copy link
Contributor

abrown commented Jun 28, 2021

@afonso360, just wanted to say this is shaping up to be a good contribution--nice work!

@github-actions github-actions bot added the fuzzing Issues related to our fuzzing infrastructure label Jun 29, 2021
@github-actions
Copy link

Subscribe to Label Action

cc @fitzgen

This issue or pull request has been labeled: "cranelift", "fuzzing"

Thus the following users have been cc'd because of the following labels:

  • fitzgen: fuzzing

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@afonso360
Copy link
Contributor Author

@cfallin Thanks for taking the time to review this!

I think that we'll want to outline the next steps, maybe even in a centralized TODO comment

I think it would be good to have a tracking issue for this so that we could also keep the discussion there. (I can open one if so)

Besides the steps that you mentioned here, we should add the TODO's that were left pending from this PR, namely:

  • Deriving the instructions to be generated from cranelift-meta
  • Reconsider if we want to catch traps and ensure that they produce the same trap both on the interpreter and the host. (As @cfallin points out this is probably more effort than its worth)


// Define variables for the function signature
for (i, param) in sig.params.iter().enumerate() {
let var = self.create_var(&mut builder, param.value_type)?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I'm looking at this again, I don't think its quite right, we probably should be assigning the block params to the pool of variables instead of creating new ones.

I'll look at this again tommorow

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be right? At least, my understanding is that the blockparams are SSA values while the variables are a higher-level abstraction; it would be more confusing to have a pool of both and then selectively lower vars to SSA values or use an SSA value directly, I think.

Copy link
Contributor Author

@afonso360 afonso360 Jul 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this works only because the first block is special and has the function params. I think for the other blocks we have to associate the block params to existing variables? Otherwise we run into the issue you were pointing out earlier where we may not have defined a variable depending on where we came from. (i.e. we created it for a blockparam on a block that we haven't visited yet)

Anyway, I'll probably start working on the control flow improvements today, and should be in a more informed position to address this when I submit the next PR.

Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates -- this looks good to me and is a great starting point for further work! I'm excited to see what this turns into.

I think there's just one issue with the verify-publish CI job because of the new crate (cranelift-fuzzgen) -- you'll have to add it to a list somewhere. Once that's fixed, I'm happy to merge this.

Regarding the followup work, yes, please do open an issue or issues to track; a single top-level issue with a checklist (features until v1 -- control flow, memory, ... as listed above) would be fine for now, I think.

I assume it is probably not super-useful to add this to our oss-fuzz configuration now, but once it is more featureful, we can talk about that as well!

@afonso360
Copy link
Contributor Author

It should be ok to merge now. I've disabled publishing the fuzzgen crate, as we do with the wasmtime-fuzzing crate. If someone finds it useful in the future, we can always change it.

Thanks to everyone for reviewing and discussing this! 🎉

@abrown abrown merged commit 7453bd5 into bytecodealliance:main Jul 1, 2021
@afonso360 afonso360 deleted the fuzz-gen branch July 1, 2021 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift Issues related to the Cranelift code generator fuzzing Issues related to our fuzzing infrastructure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants