Add fuzz testing to `Module`/`ParseOutput` #3

jasikpark · 2023-03-03T20:59:04Z

First off: Thanks for working on this project! I'm excited to see where this goes 😁!

This PR adds a cargo-fuzz fuzzer for ParseOutput

Seemed like a good first-step at adding some fuzz testing validation to the project :p

The methodology is that code printed out by the project should parse correctly a second time if it parsed correctly the first time.

I'm currently hitting an unimplemented!() match arm in testing, so I'll have to learn how to debug rust to be of help w/ a bugfix.

To run the fuzzer:

cargo install cargo-fuzz
rustup default nightly
cd parser/fuzz
cargo fuzz run statements

This fuzzer is a clone of the unit test in https://github.com/kaleidawave/ezno/blob/main/parser/tests/statements.rs

… rust-analyzer

jasikpark · 2023-03-03T21:02:27Z

Current failing fuzz run:

❯ cargo fuzz run statements
   Compiling ezno-parser v0.0.1 (/Users/calebjasik-defined/Github/ezno/parser)
   Compiling ezno-parser-fuzz v0.0.0 (/Users/calebjasik-defined/Github/ezno/parser/fuzz)
    Finished release [optimized + debuginfo] target(s) in 1m 06s
    Finished release [optimized + debuginfo] target(s) in 0.02s
     Running `target/aarch64-apple-darwin/release/statements -artifact_prefix=/Users/calebjasik-defined/Github/ezno/parser/fuzz/artifacts/statements/ /Users/calebjasik-defined/Github/ezno/parser/fuzz/corpus/statements`
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 1060211075
INFO: Loaded 1 modules   (243109 inline 8-bit counters): 243109 [0x105754220, 0x10578f7c5),
INFO: Loaded 1 PC tables (243109 PCs): 243109 [0x10578f7c8,0x105b45218),
INFO:       63 files found in /Users/calebjasik-defined/Github/ezno/parser/fuzz/corpus/statements
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: seed corpus: files: 63 min: 1b max: 4b total: 177b rss: 64Mb
thread '<unnamed>' panicked at 'internal error: entered unreachable code', /Users/calebjasik-defined/Github/ezno/parser/src/expressions/template_literal.rs:143:22
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
==86685== ERROR: libFuzzer: deadly signal
thread '<unnamed>' panicked at 'internal error: entered unreachable code', /Users/calebjasik-defined/Github/ezno/parser/src/expressions/template_literal.rs:143:22
thread '<unnamed>' panicked at 'internal error: entered unreachable code', /Users/calebjasik-defined/Github/ezno/parser/src/expressions/template_literal.rs:143:22
thread '<unnamed>' panicked at 'internal error: entered unreachable code', /Users/calebjasik-defined/Github/ezno/parser/src/expressions/template_literal.rs:143:22
────────────────────────────────────────────────────────────────────────────────

Error: Fuzz target exited with signal: 6 (SIGABRT)

.gitignore

parser/fuzz/Cargo.toml

parser/fuzz/fuzz_targets/statements.rs

kaleidawave · 2023-03-04T09:35:27Z

Wow this is awesome. I don't know a huge amount about testing practises and even less about fuzz testing, so I am learning some interesting things here.

So as I am learning this checks that the parsing (and lexing step) doesn't panic at runtime for a set of random string inputs. So it should bring up problems like out of bounds indexing, unhandled character codes or places which I think are unreachable! but actually are 😀.

It looks like the template literal parsing assumes that the lexing produces a certain sequence of tokens but another token has crept in there? Will merge this and fix afterwards!

This test fuzzes parsing Modules (which is a collection of statements), so it should cover all reachable AST which is nice!

You have also added it being printed back to a string and then reparsed so it doubles up as testing that the string output has the same logic as parsing! I wonder how often the string ends up being valid code? but still nice.

This is great addition to enforcing stability, so eager to merge. Just a few things before I merge:

With the vscode rust analyzer settings, wondering if there is a better way by adding it to the Cargo.toml workspace list?
Don't know what the practise is, but wondering if the naming the folder fuzz-testing or putting the word test in there somewhere might make it clearer what it is for.
If you know anything about GitHub actions, it would be great to add this to the current CI check script. Either inside the tests step or alongside in parallel. Don't worry if you don't, I will try and figure it out later.
To help with finding this change, the PR might be better named Add fuzz testing to Module rather than ParseOutput

Other comments were helpful and are okay! Are there any objections to a squash merge?

jasikpark · 2023-03-04T15:33:36Z

the fuzz directory is what I've seen as standard across cargo-fuzz projects - maybe a FUZZING.md file w/ docs would be helpful?

jasikpark · 2023-03-04T15:34:59Z

I think https://google.github.io/clusterfuzzlite/ is the best option for CI other than just running the fuzz tests with a max fuzz time..

jasikpark · 2023-03-04T15:47:36Z

parser/fuzz/Cargo.toml

+
+# Prevent this from interfering with workspaces
+[workspace]
+  members = ["."]


This line makes me believe it shouldn't be added to the larger workspace.

If only rust-analyzer was smarter 😭

addisoncrump · 2023-03-04T15:47:47Z

Another alternative here is to simply take the corpus entries as generated and make unit tests with them. This is ultimately one of the benefits of this fuzzer is that you can use it to generate test cases automatically, and since most bugs are discovered merely by hitting the code (having trouble finding a citation for this, but it's empirically true).

As it stands, this fuzzer will have trouble "solving" the grammar since it mutates text rather than mutating source code, and will have difficulty exploring the full program without grammar inference. For an example of a fuzzer which generates (mostly) valid ASTs, you should see: https://github.com/boa-dev/boa/tree/main/fuzz
(this one uses arbitrary to generate valid ASTs but still uses the round-trip property of parse/unparse to test the parser)

There are a lot of great options for fuzzing with Rust, but ultimately, I would recommend using this particular parser as a testcase generator for unit tests, then switch over to CI fuzzing with a grammar-aware fuzzer once you move beyond the parsing phase. Given that you have most of what you need already to do this, it should be fairly quick work. 🙂

addisoncrump · 2023-03-04T16:45:35Z

I submitted a PR to @jasikpark's repo with a structured fuzzer. It will find different bugs than the string-based fuzzer, so they operate as different testing tools.

Also, given that there doesn't seem to be much unsafe in this crate, you can safely gain quite a bit of speed by executing cargo fuzz with cargo fuzz run -s none ...

Add structured fuzzer based on boa

jasikpark · 2023-03-04T16:55:28Z

parser/fuzz/fuzz_targets/structured.rs

+use pretty_assertions::assert_eq;
+
+fn do_fuzz(data: common::FuzzSource) -> Corpus {
+	let input = data.source;


This should be valid JS code in this instance, so we should have fewer Corpus:Rejects / failed parses

kaleidawave · 2023-03-04T18:40:05Z

Awesome, so looking at the latest commits it is reusing the fuzzing tests from boa. And (for reasons) they don't publish it as a library. So build.rs downloads it so we can reuse it. Little bit hacky but whatever.

So @VTCAKAVSMoACE s changes mean that the fuzzing input is strings that may be actual JS syntax? Is there a way to see some examples of that?

Is there a need for both statements.rs and structured.rs? Do they cover different things?

Once I understand that it will be nearly to merge.

I will do a few things just before:

Add a README for the fuzzing crate. To explain what it is, how to use and acknowledge boa.
I like the .vscode settings addition. But to keep this repo lean and because it won't apply to people who use other editors, I will leave it out for now. Ideally fuzzing is now setup we shouldn't need to change it often and so won't need the editor completions in it. The vscode settings tip is really useful so will add that to the README for those who want to work on fuzzing. I also think Rust analyzer should work if you open the project at the root of the fuzzing crate so will mention that.
Add the fuzzing command to the GitHub actions CI Rust check

addisoncrump · 2023-03-04T19:07:46Z

My changes should always be valid JS syntax. This will help you find other problems in your code that are more high-level issues; where @jasikpark's fuzzer will generally find lexing issues, structured will help you find high level issues in parsing. For example, running structured for a few seconds gives two different (potential) bugs:

return; statements result in a difference in output (unsure of cause)
do {} while (...); panics with unimplemented

You can view the source code triggering the bug after the crash or by using cargo fuzz fmt structured path/to/testcase (using the same flags as run, e.g. cargo fuzz fmt -s none ...)

addisoncrump · 2023-03-04T20:11:23Z

Should probably run the fuzz tests with a timeout and with -s none 🙂

For now, they will almost certainly be failing; as a general rule, fuzzing will find unbelievably specific problems 😅

kaleidawave · 2023-03-04T20:31:02Z

Awesome, I am windows (which as I have just found out, cargo-fuzz doesn't work on) so just wanted to see the output.

I think the commands I should be running in the CI are then: (with the || true so it runs despite it returning an error code)

cargo fuzz run statements -- -s none -timeout 120 || true
cargo fuzz run structured -- -s none -timeout 120 || true

I have added the README and am just wondering before I press the merge button whether statements and structured are the best names for the fuzz tests?

Even though merging will show red lights on main, it's fine as it is only highlighting existing errors, not introducing new ones. From there can make issues for the errors it has found, then begin work on those.

Will check back in on it tomorrow morning 😄

addisoncrump · 2023-03-04T20:34:54Z

Here are the full commands I recommend:

cargo fuzz run -s none statements -- -timeout=10 -max_total_time=120 -use_value_profile=1 || true
cargo fuzz run -s none structured -- -timeout=10 -max_total_time=120 -use_value_profile=1 || true

You can review all of the available options here: https://llvm.org/docs/LibFuzzer.html#options

parser/fuzz/fuzz_targets/structured.rs

…ctured` This also uses the timeout commands that VTCAKAVSMoACE recommended for CI

jasikpark · 2023-03-05T00:56:43Z

If we run the fuzzers concurrently, it will take 240s for the fuzzer step in CI..

ig that's fine if they aren't required + they're catching stuff rn

kaleidawave · 2023-03-05T11:57:00Z

Ah it should be fine. It currently takes 7 mins in sequence, which isn't terrible.

Thanks for the file rename and the README additions.

The CI still needs some adjustment. It should run the two fuzzing tests in parallel. That would also allow GH actions to show a red cross for the failings. It currently dumps a lot of other information, don't know if there is a way to just the important lines. cargo fuzz should also see if there is a way to use a existing build.

But that is a different issue only for CI and needs some other stuff, so will open that separately.

This looks good to go! Ezno's parser now has fuzzing tests 🎉

jasikpark added 7 commits March 2, 2023 23:06

Author initial fuzzer impl, ignore parse errors

04e272e

format

d6bf754

Refactor to separate function to make the code more understandable to…

8e61ca1

… rust-analyzer

Add fuzz project to rust-analyzer linked projects

97a5a0e

Return Corpus:Keep and Corpus:Reject instead of ()

b67dac8

Second parse should not error

d595807

Remove needless return

700a459

jasikpark marked this pull request as ready for review March 3, 2023 21:03

jasikpark commented Mar 3, 2023

View reviewed changes

.gitignore Outdated Show resolved Hide resolved

parser/fuzz/Cargo.toml Show resolved Hide resolved

parser/fuzz/fuzz_targets/statements.rs Outdated Show resolved Hide resolved

kaleidawave added parser Related to Ezno's syntax parser, AST definitions and output testing-and-documentation labels Mar 4, 2023

jasikpark changed the title ~~Add fuzzer for ParseOutput~~ Add fuzz testing to Module/ParseOutput Mar 4, 2023

jasikpark commented Mar 4, 2023

View reviewed changes

addisoncrump added 3 commits March 4, 2023 17:23

add structured fuzzer based on boa

b415f19

remove spaces, replace with tabs

b283adb

use fuzz source for source dump in fuzzer output

1a159ba

Merge pull request #1 from VTCAKAVSMoACE/add-a-parser-fuzzer

702800f

Add structured fuzzer based on boa

jasikpark commented Mar 4, 2023

View reviewed changes

Improve reporting on "second parse failure"

f3336c7

jasikpark requested a review from kaleidawave March 4, 2023 17:45

addisoncrump mentioned this pull request Mar 4, 2023

Migrate fuzz/fuzz_targets/common.rs to its own crate boa-dev/boa#2630

Open

Add README and small finalizing tweaks

0b55211

Set nightly as default for fuzzing in CI

3c26d9c

jasikpark commented Mar 4, 2023

View reviewed changes

parser/fuzz/fuzz_targets/structured.rs Outdated Show resolved Hide resolved

jasikpark added 2 commits March 4, 2023 18:48

Use panic!() instead of expect() + format!()

4f18d03

Rename fuzzers to module_roundtrip_naive and `module_roundtrip_stru…

e9ab83f

…ctured` This also uses the timeout commands that VTCAKAVSMoACE recommended for CI

Add info about how to run fuzz tests

da882b1

kaleidawave merged commit 97098d1 into kaleidawave:main Mar 5, 2023

jasikpark deleted the add-a-parser-fuzzer branch March 5, 2023 15:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fuzz testing to `Module`/`ParseOutput` #3

Add fuzz testing to `Module`/`ParseOutput` #3

jasikpark commented Mar 3, 2023 •

edited

Loading

jasikpark commented Mar 3, 2023

kaleidawave commented Mar 4, 2023 •

edited

Loading

jasikpark commented Mar 4, 2023

jasikpark commented Mar 4, 2023

jasikpark Mar 4, 2023

addisoncrump commented Mar 4, 2023 •

edited

Loading

addisoncrump commented Mar 4, 2023 •

edited

Loading

jasikpark Mar 4, 2023

kaleidawave commented Mar 4, 2023

addisoncrump commented Mar 4, 2023 •

edited

Loading

addisoncrump commented Mar 4, 2023 •

edited

Loading

kaleidawave commented Mar 4, 2023

addisoncrump commented Mar 4, 2023 •

edited

Loading

jasikpark commented Mar 5, 2023

kaleidawave commented Mar 5, 2023

Add fuzz testing to Module/ParseOutput #3

Add fuzz testing to Module/ParseOutput #3

Conversation

jasikpark commented Mar 3, 2023 • edited Loading

jasikpark commented Mar 3, 2023

kaleidawave commented Mar 4, 2023 • edited Loading

jasikpark commented Mar 4, 2023

jasikpark commented Mar 4, 2023

jasikpark Mar 4, 2023

Choose a reason for hiding this comment

addisoncrump commented Mar 4, 2023 • edited Loading

addisoncrump commented Mar 4, 2023 • edited Loading

jasikpark Mar 4, 2023

Choose a reason for hiding this comment

kaleidawave commented Mar 4, 2023

addisoncrump commented Mar 4, 2023 • edited Loading

addisoncrump commented Mar 4, 2023 • edited Loading

kaleidawave commented Mar 4, 2023

addisoncrump commented Mar 4, 2023 • edited Loading

jasikpark commented Mar 5, 2023

kaleidawave commented Mar 5, 2023

Add fuzz testing to `Module`/`ParseOutput` #3

Add fuzz testing to `Module`/`ParseOutput` #3

jasikpark commented Mar 3, 2023 •

edited

Loading

kaleidawave commented Mar 4, 2023 •

edited

Loading

addisoncrump commented Mar 4, 2023 •

edited

Loading

addisoncrump commented Mar 4, 2023 •

edited

Loading

addisoncrump commented Mar 4, 2023 •

edited

Loading

addisoncrump commented Mar 4, 2023 •

edited

Loading

addisoncrump commented Mar 4, 2023 •

edited

Loading