Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fuzz testing to Module/ParseOutput #3

Merged
merged 17 commits into from
Mar 5, 2023

Conversation

jasikpark
Copy link
Contributor

@jasikpark jasikpark commented Mar 3, 2023

First off: Thanks for working on this project! I'm excited to see where this goes 😁!

This PR adds a cargo-fuzz fuzzer for ParseOutput

Seemed like a good first-step at adding some fuzz testing validation to the project :p

The methodology is that code printed out by the project should parse correctly a second time if it parsed correctly the first time.

I'm currently hitting an unimplemented!() match arm in testing, so I'll have to learn how to debug rust to be of help w/ a bugfix.

To run the fuzzer:

cargo install cargo-fuzz
rustup default nightly
cd parser/fuzz
cargo fuzz run statements

This fuzzer is a clone of the unit test in https://github.com/kaleidawave/ezno/blob/main/parser/tests/statements.rs

@jasikpark
Copy link
Contributor Author

Current failing fuzz run:

❯ cargo fuzz run statements
   Compiling ezno-parser v0.0.1 (/Users/calebjasik-defined/Github/ezno/parser)
   Compiling ezno-parser-fuzz v0.0.0 (/Users/calebjasik-defined/Github/ezno/parser/fuzz)
    Finished release [optimized + debuginfo] target(s) in 1m 06s
    Finished release [optimized + debuginfo] target(s) in 0.02s
     Running `target/aarch64-apple-darwin/release/statements -artifact_prefix=/Users/calebjasik-defined/Github/ezno/parser/fuzz/artifacts/statements/ /Users/calebjasik-defined/Github/ezno/parser/fuzz/corpus/statements`
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 1060211075
INFO: Loaded 1 modules   (243109 inline 8-bit counters): 243109 [0x105754220, 0x10578f7c5),
INFO: Loaded 1 PC tables (243109 PCs): 243109 [0x10578f7c8,0x105b45218),
INFO:       63 files found in /Users/calebjasik-defined/Github/ezno/parser/fuzz/corpus/statements
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: seed corpus: files: 63 min: 1b max: 4b total: 177b rss: 64Mb
thread '<unnamed>' panicked at 'internal error: entered unreachable code', /Users/calebjasik-defined/Github/ezno/parser/src/expressions/template_literal.rs:143:22
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
==86685== ERROR: libFuzzer: deadly signal
thread '<unnamed>' panicked at 'internal error: entered unreachable code', /Users/calebjasik-defined/Github/ezno/parser/src/expressions/template_literal.rs:143:22
thread '<unnamed>' panicked at 'internal error: entered unreachable code', /Users/calebjasik-defined/Github/ezno/parser/src/expressions/template_literal.rs:143:22
thread '<unnamed>' panicked at 'internal error: entered unreachable code', /Users/calebjasik-defined/Github/ezno/parser/src/expressions/template_literal.rs:143:22
────────────────────────────────────────────────────────────────────────────────

Error: Fuzz target exited with signal: 6 (SIGABRT)

@jasikpark jasikpark marked this pull request as ready for review March 3, 2023 21:03
.gitignore Outdated Show resolved Hide resolved
parser/fuzz/Cargo.toml Show resolved Hide resolved
parser/fuzz/fuzz_targets/statements.rs Outdated Show resolved Hide resolved
@kaleidawave
Copy link
Owner

kaleidawave commented Mar 4, 2023

Wow this is awesome. I don't know a huge amount about testing practises and even less about fuzz testing, so I am learning some interesting things here.

So as I am learning this checks that the parsing (and lexing step) doesn't panic at runtime for a set of random string inputs. So it should bring up problems like out of bounds indexing, unhandled character codes or places which I think are unreachable! but actually are 😀.

It looks like the template literal parsing assumes that the lexing produces a certain sequence of tokens but another token has crept in there? Will merge this and fix afterwards!

This test fuzzes parsing Modules (which is a collection of statements), so it should cover all reachable AST which is nice!

You have also added it being printed back to a string and then reparsed so it doubles up as testing that the string output has the same logic as parsing! I wonder how often the string ends up being valid code? but still nice.

This is great addition to enforcing stability, so eager to merge. Just a few things before I merge:

  • With the vscode rust analyzer settings, wondering if there is a better way by adding it to the Cargo.toml workspace list?
  • Don't know what the practise is, but wondering if the naming the folder fuzz-testing or putting the word test in there somewhere might make it clearer what it is for.
  • If you know anything about GitHub actions, it would be great to add this to the current CI check script. Either inside the tests step or alongside in parallel. Don't worry if you don't, I will try and figure it out later.
  • To help with finding this change, the PR might be better named Add fuzz testing to Module rather than ParseOutput

Other comments were helpful and are okay! Are there any objections to a squash merge?

@kaleidawave kaleidawave added parser Related to Ezno's syntax parser, AST definitions and output testing-and-documentation labels Mar 4, 2023
@jasikpark jasikpark changed the title Add fuzzer for ParseOutput Add fuzz testing to Module/ParseOutput Mar 4, 2023
@jasikpark
Copy link
Contributor Author

the fuzz directory is what I've seen as standard across cargo-fuzz projects - maybe a FUZZING.md file w/ docs would be helpful?

@jasikpark
Copy link
Contributor Author

I think https://google.github.io/clusterfuzzlite/ is the best option for CI other than just running the fuzz tests with a max fuzz time..


# Prevent this from interfering with workspaces
[workspace]
members = ["."]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line makes me believe it shouldn't be added to the larger workspace.

If only rust-analyzer was smarter 😭

@addisoncrump
Copy link
Contributor

addisoncrump commented Mar 4, 2023

Another alternative here is to simply take the corpus entries as generated and make unit tests with them. This is ultimately one of the benefits of this fuzzer is that you can use it to generate test cases automatically, and since most bugs are discovered merely by hitting the code (having trouble finding a citation for this, but it's empirically true).

As it stands, this fuzzer will have trouble "solving" the grammar since it mutates text rather than mutating source code, and will have difficulty exploring the full program without grammar inference. For an example of a fuzzer which generates (mostly) valid ASTs, you should see: https://github.com/boa-dev/boa/tree/main/fuzz
(this one uses arbitrary to generate valid ASTs but still uses the round-trip property of parse/unparse to test the parser)

There are a lot of great options for fuzzing with Rust, but ultimately, I would recommend using this particular parser as a testcase generator for unit tests, then switch over to CI fuzzing with a grammar-aware fuzzer once you move beyond the parsing phase. Given that you have most of what you need already to do this, it should be fairly quick work. 🙂

@addisoncrump
Copy link
Contributor

addisoncrump commented Mar 4, 2023

I submitted a PR to @jasikpark's repo with a structured fuzzer. It will find different bugs than the string-based fuzzer, so they operate as different testing tools.

Also, given that there doesn't seem to be much unsafe in this crate, you can safely gain quite a bit of speed by executing cargo fuzz with cargo fuzz run -s none ...

use pretty_assertions::assert_eq;

fn do_fuzz(data: common::FuzzSource) -> Corpus {
let input = data.source;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be valid JS code in this instance, so we should have fewer Corpus:Rejects / failed parses

@jasikpark jasikpark requested a review from kaleidawave March 4, 2023 17:45
@kaleidawave
Copy link
Owner

Awesome, so looking at the latest commits it is reusing the fuzzing tests from boa. And (for reasons) they don't publish it as a library. So build.rs downloads it so we can reuse it. Little bit hacky but whatever.

So @VTCAKAVSMoACE s changes mean that the fuzzing input is strings that may be actual JS syntax? Is there a way to see some examples of that?

Is there a need for both statements.rs and structured.rs? Do they cover different things?

Once I understand that it will be nearly to merge.

I will do a few things just before:

  • Add a README for the fuzzing crate. To explain what it is, how to use and acknowledge boa.
  • I like the .vscode settings addition. But to keep this repo lean and because it won't apply to people who use other editors, I will leave it out for now. Ideally fuzzing is now setup we shouldn't need to change it often and so won't need the editor completions in it. The vscode settings tip is really useful so will add that to the README for those who want to work on fuzzing. I also think Rust analyzer should work if you open the project at the root of the fuzzing crate so will mention that.
  • Add the fuzzing command to the GitHub actions CI Rust check

@addisoncrump
Copy link
Contributor

addisoncrump commented Mar 4, 2023

My changes should always be valid JS syntax. This will help you find other problems in your code that are more high-level issues; where @jasikpark's fuzzer will generally find lexing issues, structured will help you find high level issues in parsing. For example, running structured for a few seconds gives two different (potential) bugs:

  • return; statements result in a difference in output (unsure of cause)
  • do {} while (...); panics with unimplemented

You can view the source code triggering the bug after the crash or by using cargo fuzz fmt structured path/to/testcase (using the same flags as run, e.g. cargo fuzz fmt -s none ...)

@addisoncrump
Copy link
Contributor

addisoncrump commented Mar 4, 2023

Should probably run the fuzz tests with a timeout and with -s none 🙂

For now, they will almost certainly be failing; as a general rule, fuzzing will find unbelievably specific problems 😅

@kaleidawave
Copy link
Owner

Awesome, I am windows (which as I have just found out, cargo-fuzz doesn't work on) so just wanted to see the output.

I think the commands I should be running in the CI are then: (with the || true so it runs despite it returning an error code)

cargo fuzz run statements -- -s none -timeout 120 || true
cargo fuzz run structured -- -s none -timeout 120 || true

I have added the README and am just wondering before I press the merge button whether statements and structured are the best names for the fuzz tests?

Even though merging will show red lights on main, it's fine as it is only highlighting existing errors, not introducing new ones. From there can make issues for the errors it has found, then begin work on those.

Will check back in on it tomorrow morning 😄

@addisoncrump
Copy link
Contributor

addisoncrump commented Mar 4, 2023

Here are the full commands I recommend:

cargo fuzz run -s none statements -- -timeout=10 -max_total_time=120 -use_value_profile=1 || true
cargo fuzz run -s none structured -- -timeout=10 -max_total_time=120 -use_value_profile=1 || true

You can review all of the available options here: https://llvm.org/docs/LibFuzzer.html#options

…ctured`

This also uses the timeout commands that VTCAKAVSMoACE recommended for CI
@jasikpark
Copy link
Contributor Author

If we run the fuzzers concurrently, it will take 240s for the fuzzer step in CI..

ig that's fine if they aren't required + they're catching stuff rn

@kaleidawave
Copy link
Owner

Ah it should be fine. It currently takes 7 mins in sequence, which isn't terrible.

Thanks for the file rename and the README additions.

The CI still needs some adjustment. It should run the two fuzzing tests in parallel. That would also allow GH actions to show a red cross for the failings. It currently dumps a lot of other information, don't know if there is a way to just the important lines. cargo fuzz should also see if there is a way to use a existing build.

But that is a different issue only for CI and needs some other stuff, so will open that separately.

This looks good to go! Ezno's parser now has fuzzing tests 🎉

@kaleidawave kaleidawave merged commit 97098d1 into kaleidawave:main Mar 5, 2023
@jasikpark jasikpark deleted the add-a-parser-fuzzer branch March 5, 2023 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parser Related to Ezno's syntax parser, AST definitions and output
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants