This repository has been archived by the owner on Aug 31, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 656
Implement round-trip fuzzers for finding correctness bugs #4559
Merged
Merged
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
6748746
init fuzzers
addisoncrump 1a2ed62
correct corpus link
addisoncrump 0676030
add more fuzzers
addisoncrump 008af4e
add formatter fuzzers
addisoncrump 7bbfebb
document formatter strategy
addisoncrump 906bee6
add fuzzer build to CI
addisoncrump 83dd602
better github workflow
addisoncrump ed1fc0e
whoops, need to specify where it runs
addisoncrump 5539b4b
fix CI
addisoncrump ce7c470
address naming nit
addisoncrump 3db6138
add text diff to formatter
addisoncrump eaef5de
add linter checks to formatter output
addisoncrump e4d87d2
correct diff args
addisoncrump 08047c0
use strip dead code (ew) to resolve the memory usage issue
addisoncrump File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
artifacts/ | ||
corpus/rome_format_all | ||
corpus/rome_format_json | ||
corpus/rome_format_css | ||
Cargo.lock |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
[package] | ||
name = "rome_fuzz" | ||
version = "0.0.0" | ||
authors = [ | ||
"Addison Crump <research@addisoncrump.info>", | ||
] | ||
publish = false | ||
edition = "2021" | ||
|
||
[features] | ||
default = ["libfuzzer"] | ||
full-idempotency = [] | ||
libfuzzer = ["libfuzzer-sys/link_libfuzzer"] | ||
rome_all = [] | ||
|
||
[package.metadata] | ||
cargo-fuzz = true | ||
|
||
[dependencies] | ||
arbitrary = { version = "1.3.0", features = ["derive"] } | ||
libfuzzer-sys = { git = "https://github.com/rust-fuzz/libfuzzer", default-features = false } | ||
rome_analyze = { path = "../crates/rome_analyze" } | ||
rome_diagnostics = { path = "../crates/rome_diagnostics" } | ||
rome_formatter = { path = "../crates/rome_formatter" } | ||
rome_js_analyze = { path = "../crates/rome_js_analyze" } | ||
rome_js_formatter = { path = "../crates/rome_js_formatter" } | ||
rome_js_parser = { path = "../crates/rome_js_parser" } | ||
rome_js_syntax = { path = "../crates/rome_js_syntax" } | ||
rome_json_formatter = { path = "../crates/rome_json_formatter" } | ||
rome_json_parser = { path = "../crates/rome_json_parser" } | ||
rome_json_syntax = { path = "../crates/rome_json_syntax" } | ||
rome_service = { path = "../crates/rome_service" } | ||
similar = { version = "2.2.1" } | ||
|
||
# Prevent this from interfering with workspaces | ||
[workspace] | ||
members = ["."] | ||
|
||
[[bin]] | ||
name = "rome_parse_all" | ||
path = "fuzz_targets/rome_parse_all.rs" | ||
required-features = ["rome_all"] | ||
|
||
[[bin]] | ||
name = "rome_parse_d_ts" | ||
path = "fuzz_targets/rome_parse_d_ts.rs" | ||
|
||
[[bin]] | ||
name = "rome_parse_json" | ||
path = "fuzz_targets/rome_parse_json.rs" | ||
|
||
[[bin]] | ||
name = "rome_parse_module" | ||
path = "fuzz_targets/rome_parse_module.rs" | ||
|
||
[[bin]] | ||
name = "rome_parse_script" | ||
path = "fuzz_targets/rome_parse_script.rs" | ||
|
||
[[bin]] | ||
name = "rome_parse_jsx" | ||
path = "fuzz_targets/rome_parse_jsx.rs" | ||
|
||
[[bin]] | ||
name = "rome_parse_tsx" | ||
path = "fuzz_targets/rome_parse_tsx.rs" | ||
|
||
[[bin]] | ||
name = "rome_parse_typescript" | ||
path = "fuzz_targets/rome_parse_typescript.rs" | ||
|
||
[[bin]] | ||
name = "rome_format_all" | ||
path = "fuzz_targets/rome_format_all.rs" | ||
required-features = ["rome_all"] | ||
|
||
[[bin]] | ||
name = "rome_format_d_ts" | ||
path = "fuzz_targets/rome_format_d_ts.rs" | ||
|
||
[[bin]] | ||
name = "rome_format_json" | ||
path = "fuzz_targets/rome_format_json.rs" | ||
|
||
[[bin]] | ||
name = "rome_format_module" | ||
path = "fuzz_targets/rome_format_module.rs" | ||
|
||
[[bin]] | ||
name = "rome_format_script" | ||
path = "fuzz_targets/rome_format_script.rs" | ||
|
||
[[bin]] | ||
name = "rome_format_jsx" | ||
path = "fuzz_targets/rome_format_jsx.rs" | ||
|
||
[[bin]] | ||
name = "rome_format_tsx" | ||
path = "fuzz_targets/rome_format_tsx.rs" | ||
|
||
[[bin]] | ||
name = "rome_format_typescript" | ||
path = "fuzz_targets/rome_format_typescript.rs" | ||
|
||
# enabling debug seems to cause a massive use of RAM (>12GB) | ||
[profile.release] | ||
opt-level = 3 | ||
#debug = true | ||
debug = false | ||
|
||
[profile.dev] | ||
opt-level = 3 | ||
#debug = true | ||
debug = false | ||
|
||
[profile.test] | ||
opt-level = 3 | ||
#debug = true | ||
debug = false |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,126 @@ | ||
# rome-fuzz | ||
|
||
Fuzzers and associated utilities for automatic testing of Rome. | ||
|
||
## Usage | ||
|
||
To use the fuzzers provided in this directory, start by invoking: | ||
|
||
```bash | ||
./fuzz/init-fuzzers.sh | ||
``` | ||
|
||
This will install [`cargo-fuzz`](https://github.com/rust-fuzz/cargo-fuzz) and optionally download | ||
datasets which improve the efficacy of the testing. | ||
**This step is necessary for initialising the corpus directory, as all fuzzers share a common | ||
corpus.** | ||
The dataset may take several hours to download and clean, so if you're just looking to try out the | ||
fuzzers, skip the dataset download, though be warned that some features simply cannot be tested | ||
without it (very unlikely for the fuzzer to generate valid python code from "thin air"). | ||
|
||
Once you have initialised the fuzzers, you can then execute any fuzzer with: | ||
|
||
```bash | ||
cargo fuzz run --strip-dead-code -s none name_of_fuzzer -- -timeout=1 | ||
``` | ||
|
||
**Users using Apple M1 devices must use a nightly compiler and omit the `-s none` portion of this | ||
command, as this architecture does not support fuzzing without a sanitizer.** | ||
You can view the names of the available fuzzers with `cargo fuzz list`. | ||
For specific details about how each fuzzer works, please read this document in its entirety. | ||
|
||
**IMPORTANT: You should run `./reinit-fuzzer.sh` after adding more file-based testcases.** This will | ||
allow the testing of new features that you've added unit tests for. | ||
|
||
### Debugging a crash | ||
|
||
Once you've found a crash, you'll need to debug it. | ||
The easiest first step in this process is to minimise the input such that the crash is still | ||
triggered with a smaller input. | ||
`cargo-fuzz` supports this out of the box with: | ||
|
||
```bash | ||
cargo fuzz tmin --strip-dead-code -s none name_of_fuzzer artifacts/name_of_fuzzer/crash-... | ||
``` | ||
|
||
From here, you will need to analyse the input and potentially the behaviour of the program. | ||
The debugging process from here is unfortunately less well-defined, so you will need to apply some | ||
expertise here. | ||
Happy hunting! | ||
|
||
## A brief introduction to fuzzers | ||
|
||
Fuzzing, or fuzz testing, is the process of providing generated data to a program under test. | ||
The most common variety of fuzzers are mutational fuzzers; given a set of existing inputs (a | ||
"corpus"), it will attempt to slightly change (or "mutate") these inputs into new inputs that cover | ||
parts of the code that haven't yet been observed. | ||
Using this strategy, we can quite efficiently generate testcases which cover significant portions of | ||
the program, both with expected and unexpected data. | ||
[This is really quite effective for finding bugs.](https://github.com/rust-fuzz/trophy-case) | ||
|
||
The fuzzers here use [`cargo-fuzz`](https://github.com/rust-fuzz/cargo-fuzz), a utility which allows | ||
Rust to integrate with [libFuzzer](https://llvm.org/docs/LibFuzzer.html), the fuzzer library built | ||
into LLVM. | ||
Each source file present in [`fuzz_targets`](fuzz_targets) is a harness, which is, in effect, a unit | ||
test which can handle different inputs. | ||
When an input is provided to a harness, the harness processes this data and libFuzzer observes the | ||
code coverage and any special values used in comparisons over the course of the run. | ||
Special values are preserved for future mutations and inputs which cover new regions of code are | ||
added to the corpus. | ||
|
||
## Each fuzzer harness in detail | ||
|
||
Each fuzzer harness is designed to test different aspects of Rome. | ||
Since Rome's primary function is parsing, formatting, and linting, we can use fuzzing not only to | ||
detect crashes or panics, but also to detect violations of guarantees of the crate. | ||
This concept is used extensively throughout the fuzzers. | ||
|
||
### `rome_parse_*` | ||
|
||
Each of the `rome_parse_*` fuzz harnesses utilise the [round-trip | ||
property](https://blog.ssanj.net/posts/2016-06-26-property-based-testing-patterns.html) of parsing | ||
and unparsing; that is, given a particular input, if we parse some code successfully, we expect the | ||
unparsed code to have the content as the original code. | ||
If they do not match, then some details of the original input were not captured on the first parse. | ||
The corpus for the JS-like parsers is based on unit tests and [a JS dataset for machine learning | ||
training](https://www.sri.inf.ethz.ch/js150). | ||
|
||
Errata for specific fuzzers can be seen below. | ||
|
||
#### `rome_parse_json` | ||
|
||
Since JSON formats are distinct from JS source code and are a relatively simple format, it is not | ||
strictly necessary to use the shared corpus. | ||
[Fuzzbench](https://google.github.io/fuzzbench/) results consistently show that JSON parsers tend to | ||
max out their coverage with minimal or no corpora. | ||
|
||
At time of writing (June 11, 2023), JSONC does not seem to be supported, so it is not fuzzed. | ||
|
||
#### `rome_parse_all` | ||
|
||
This fuzz harness merely merges all the JS parsers together to create a shared corpus. | ||
It can be used in place of the parsers for d_ts, jsx, module, script, tsx, and typescript in | ||
continuous integration. | ||
|
||
### `rome_format_*` | ||
|
||
These fuzzers use the same corpora as the fuzzers previously mentioned, but check the correctness of | ||
the formatters as well. | ||
We assume the following qualities of formatters: | ||
- Formatters will not introduce syntax errors into the program | ||
- Formatting code twice will have the same result as formatting code once | ||
|
||
In this way, we verify the [idempotency](https://en.wikipedia.org/wiki/Idempotence) and syntax | ||
preservation property of formatting. | ||
|
||
Of particular note: these fuzzers may have false negative results if e.g. two tokens are turned into | ||
one token and the reformatting result is the same. | ||
Unfortunately, we can't necessarily control for this because the formatter may reorganise the | ||
sequence of tokens. | ||
|
||
## Errata | ||
|
||
Unfortunately, `--strip-dead-code` is necessary to build the target with a suitable amount of | ||
memory. | ||
This seems to be caused by some issue in LLVM, but I haven't been able to spend the time to | ||
investigate this fully yet. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
rome_format_all |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
rome_parse_all |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
rome_format_json |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
rome_parse_all |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
rome_parse_all |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
rome_parse_all |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
rome_parse_all |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
rome_parse_all |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💜