Skip to content
This repository has been archived by the owner on Aug 31, 2023. It is now read-only.

Commit

Permalink
Implement round-trip fuzzers for finding correctness bugs (#4559)
Browse files Browse the repository at this point in the history
* init fuzzers

* correct corpus link

* add more fuzzers

* add formatter fuzzers

* document formatter strategy

* add fuzzer build to CI

* better github workflow

* whoops, need to specify where it runs

* fix CI

* address naming nit

* add text diff to formatter

* add linter checks to formatter output

* correct diff args

* use strip dead code (ew) to resolve the memory usage issue
  • Loading branch information
addisoncrump authored Jun 14, 2023
1 parent 3de5a1a commit 171bc0f
Show file tree
Hide file tree
Showing 31 changed files with 822 additions and 0 deletions.
15 changes: 15 additions & 0 deletions .github/workflows/pull_request.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ on:
- main
paths: # Only run when changes are made to rust code or root Cargo
- 'crates/**'
- 'fuzz/**'
- 'xtask/**'
- 'Cargo.toml'
- 'Cargo.lock'
Expand Down Expand Up @@ -83,6 +84,20 @@ jobs:
- name: Run doctests
run: cargo test --doc

fuzz-all:
name: Build and init fuzzers
runs-on: ubuntu-latest

steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Install toolchain
uses: moonrepo/setup-rust@v0
with:
bins: cargo-fuzz
- name: Run init-fuzzer
run: bash fuzz/init-fuzzer.sh

test-node-api:
name: Test node.js API
runs-on: ubuntu-latest
Expand Down
5 changes: 5 additions & 0 deletions fuzz/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
artifacts/
corpus/rome_format_all
corpus/rome_format_json
corpus/rome_format_css
Cargo.lock
119 changes: 119 additions & 0 deletions fuzz/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
[package]
name = "rome_fuzz"
version = "0.0.0"
authors = [
"Addison Crump <research@addisoncrump.info>",
]
publish = false
edition = "2021"

[features]
default = ["libfuzzer"]
full-idempotency = []
libfuzzer = ["libfuzzer-sys/link_libfuzzer"]
rome_all = []

[package.metadata]
cargo-fuzz = true

[dependencies]
arbitrary = { version = "1.3.0", features = ["derive"] }
libfuzzer-sys = { git = "https://github.com/rust-fuzz/libfuzzer", default-features = false }
rome_analyze = { path = "../crates/rome_analyze" }
rome_diagnostics = { path = "../crates/rome_diagnostics" }
rome_formatter = { path = "../crates/rome_formatter" }
rome_js_analyze = { path = "../crates/rome_js_analyze" }
rome_js_formatter = { path = "../crates/rome_js_formatter" }
rome_js_parser = { path = "../crates/rome_js_parser" }
rome_js_syntax = { path = "../crates/rome_js_syntax" }
rome_json_formatter = { path = "../crates/rome_json_formatter" }
rome_json_parser = { path = "../crates/rome_json_parser" }
rome_json_syntax = { path = "../crates/rome_json_syntax" }
rome_service = { path = "../crates/rome_service" }
similar = { version = "2.2.1" }

# Prevent this from interfering with workspaces
[workspace]
members = ["."]

[[bin]]
name = "rome_parse_all"
path = "fuzz_targets/rome_parse_all.rs"
required-features = ["rome_all"]

[[bin]]
name = "rome_parse_d_ts"
path = "fuzz_targets/rome_parse_d_ts.rs"

[[bin]]
name = "rome_parse_json"
path = "fuzz_targets/rome_parse_json.rs"

[[bin]]
name = "rome_parse_module"
path = "fuzz_targets/rome_parse_module.rs"

[[bin]]
name = "rome_parse_script"
path = "fuzz_targets/rome_parse_script.rs"

[[bin]]
name = "rome_parse_jsx"
path = "fuzz_targets/rome_parse_jsx.rs"

[[bin]]
name = "rome_parse_tsx"
path = "fuzz_targets/rome_parse_tsx.rs"

[[bin]]
name = "rome_parse_typescript"
path = "fuzz_targets/rome_parse_typescript.rs"

[[bin]]
name = "rome_format_all"
path = "fuzz_targets/rome_format_all.rs"
required-features = ["rome_all"]

[[bin]]
name = "rome_format_d_ts"
path = "fuzz_targets/rome_format_d_ts.rs"

[[bin]]
name = "rome_format_json"
path = "fuzz_targets/rome_format_json.rs"

[[bin]]
name = "rome_format_module"
path = "fuzz_targets/rome_format_module.rs"

[[bin]]
name = "rome_format_script"
path = "fuzz_targets/rome_format_script.rs"

[[bin]]
name = "rome_format_jsx"
path = "fuzz_targets/rome_format_jsx.rs"

[[bin]]
name = "rome_format_tsx"
path = "fuzz_targets/rome_format_tsx.rs"

[[bin]]
name = "rome_format_typescript"
path = "fuzz_targets/rome_format_typescript.rs"

# enabling debug seems to cause a massive use of RAM (>12GB)
[profile.release]
opt-level = 3
#debug = true
debug = false

[profile.dev]
opt-level = 3
#debug = true
debug = false

[profile.test]
opt-level = 3
#debug = true
debug = false
126 changes: 126 additions & 0 deletions fuzz/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# rome-fuzz

Fuzzers and associated utilities for automatic testing of Rome.

## Usage

To use the fuzzers provided in this directory, start by invoking:

```bash
./fuzz/init-fuzzers.sh
```

This will install [`cargo-fuzz`](https://github.com/rust-fuzz/cargo-fuzz) and optionally download
datasets which improve the efficacy of the testing.
**This step is necessary for initialising the corpus directory, as all fuzzers share a common
corpus.**
The dataset may take several hours to download and clean, so if you're just looking to try out the
fuzzers, skip the dataset download, though be warned that some features simply cannot be tested
without it (very unlikely for the fuzzer to generate valid python code from "thin air").

Once you have initialised the fuzzers, you can then execute any fuzzer with:

```bash
cargo fuzz run --strip-dead-code -s none name_of_fuzzer -- -timeout=1
```

**Users using Apple M1 devices must use a nightly compiler and omit the `-s none` portion of this
command, as this architecture does not support fuzzing without a sanitizer.**
You can view the names of the available fuzzers with `cargo fuzz list`.
For specific details about how each fuzzer works, please read this document in its entirety.

**IMPORTANT: You should run `./reinit-fuzzer.sh` after adding more file-based testcases.** This will
allow the testing of new features that you've added unit tests for.

### Debugging a crash

Once you've found a crash, you'll need to debug it.
The easiest first step in this process is to minimise the input such that the crash is still
triggered with a smaller input.
`cargo-fuzz` supports this out of the box with:

```bash
cargo fuzz tmin --strip-dead-code -s none name_of_fuzzer artifacts/name_of_fuzzer/crash-...
```

From here, you will need to analyse the input and potentially the behaviour of the program.
The debugging process from here is unfortunately less well-defined, so you will need to apply some
expertise here.
Happy hunting!

## A brief introduction to fuzzers

Fuzzing, or fuzz testing, is the process of providing generated data to a program under test.
The most common variety of fuzzers are mutational fuzzers; given a set of existing inputs (a
"corpus"), it will attempt to slightly change (or "mutate") these inputs into new inputs that cover
parts of the code that haven't yet been observed.
Using this strategy, we can quite efficiently generate testcases which cover significant portions of
the program, both with expected and unexpected data.
[This is really quite effective for finding bugs.](https://github.com/rust-fuzz/trophy-case)

The fuzzers here use [`cargo-fuzz`](https://github.com/rust-fuzz/cargo-fuzz), a utility which allows
Rust to integrate with [libFuzzer](https://llvm.org/docs/LibFuzzer.html), the fuzzer library built
into LLVM.
Each source file present in [`fuzz_targets`](fuzz_targets) is a harness, which is, in effect, a unit
test which can handle different inputs.
When an input is provided to a harness, the harness processes this data and libFuzzer observes the
code coverage and any special values used in comparisons over the course of the run.
Special values are preserved for future mutations and inputs which cover new regions of code are
added to the corpus.

## Each fuzzer harness in detail

Each fuzzer harness is designed to test different aspects of Rome.
Since Rome's primary function is parsing, formatting, and linting, we can use fuzzing not only to
detect crashes or panics, but also to detect violations of guarantees of the crate.
This concept is used extensively throughout the fuzzers.

### `rome_parse_*`

Each of the `rome_parse_*` fuzz harnesses utilise the [round-trip
property](https://blog.ssanj.net/posts/2016-06-26-property-based-testing-patterns.html) of parsing
and unparsing; that is, given a particular input, if we parse some code successfully, we expect the
unparsed code to have the content as the original code.
If they do not match, then some details of the original input were not captured on the first parse.
The corpus for the JS-like parsers is based on unit tests and [a JS dataset for machine learning
training](https://www.sri.inf.ethz.ch/js150).

Errata for specific fuzzers can be seen below.

#### `rome_parse_json`

Since JSON formats are distinct from JS source code and are a relatively simple format, it is not
strictly necessary to use the shared corpus.
[Fuzzbench](https://google.github.io/fuzzbench/) results consistently show that JSON parsers tend to
max out their coverage with minimal or no corpora.

At time of writing (June 11, 2023), JSONC does not seem to be supported, so it is not fuzzed.

#### `rome_parse_all`

This fuzz harness merely merges all the JS parsers together to create a shared corpus.
It can be used in place of the parsers for d_ts, jsx, module, script, tsx, and typescript in
continuous integration.

### `rome_format_*`

These fuzzers use the same corpora as the fuzzers previously mentioned, but check the correctness of
the formatters as well.
We assume the following qualities of formatters:
- Formatters will not introduce syntax errors into the program
- Formatting code twice will have the same result as formatting code once

In this way, we verify the [idempotency](https://en.wikipedia.org/wiki/Idempotence) and syntax
preservation property of formatting.

Of particular note: these fuzzers may have false negative results if e.g. two tokens are turned into
one token and the reformatting result is the same.
Unfortunately, we can't necessarily control for this because the formatter may reorganise the
sequence of tokens.

## Errata

Unfortunately, `--strip-dead-code` is necessary to build the target with a suitable amount of
memory.
This seems to be caused by some issue in LLVM, but I haven't been able to spend the time to
investigate this fully yet.
1 change: 1 addition & 0 deletions fuzz/corpus/rome_parse_all
1 change: 1 addition & 0 deletions fuzz/corpus/rome_parse_d_ts
1 change: 1 addition & 0 deletions fuzz/corpus/rome_parse_json
1 change: 1 addition & 0 deletions fuzz/corpus/rome_parse_jsx
1 change: 1 addition & 0 deletions fuzz/corpus/rome_parse_module
1 change: 1 addition & 0 deletions fuzz/corpus/rome_parse_script
1 change: 1 addition & 0 deletions fuzz/corpus/rome_parse_tsx
1 change: 1 addition & 0 deletions fuzz/corpus/rome_parse_typescript
Loading

0 comments on commit 171bc0f

Please sign in to comment.