-
Notifications
You must be signed in to change notification settings - Fork 745
Fuzzing
Binaryen has built-in fuzzing and reducing capabilities. They can be used both on either Binaryen itself or other compilers, VM, or toolchains.
Binaryen's wasm-opt
tool has the --translate-to-fuzz / -ttf
option. When set, it considers the input as a stream of arbitrary bytes that it converts into a valid wasm module - somehow. That is, the input is sort of like a random seed to a deterministic random number generator, and instead of numbers we generate wasm modules.
In other words, you can give wasm-opt -ttf
any input file with any contents, and it will create a wasm file. You can then save it (using -o
) and run that in another tool. For example, you can run a fuzzing script that generates a random string, feeds that to wasm-opt -ttf
, and runs a VM on that output.
For fuzzing of Binaryen itself, the following options are useful:
-
--fuzz-exec
: This runs the generated wasm module in the Binaryen interpreter, printing out results from calling its methods, similar to the JS wrapper from--emit-js-wrapper
. This will also do that another time after optimizations, which lets you check if they broke anything. -
--fuzz-passes
: Select and run some random passes (based on the random input), to further shape the wasm before fuzzing.
These two options are not strictly necessary, but can greatly improve execution times, as a single invocation can do a full random module generation + optimization + binary test. For example,
wasm-opt input.dat -ttf --fuzz-exec --fuzz-passes -O3
Even on a fairly low-powered machine this lets afl-fuzz
do hundreds of iterations per second.
The output wasms from -ttf
are guaranteed to not hang, as they have built-in hang instrumentation. They may trap though. The JS wrapper code will catch those and print them.
fuzz_opt.py is a very useful script that runs
- various fuzzing modes (
--fuzz-exec
, specific fuzzing for wasm2js and asyncify, etc.) - various random inputs (random bytes interpreted using
-ttf
) - various random passes
Just running
$ python scripts/fuzz_opt.py
will run the script, which will continue to run until it finds a possible bug.
For maximum throughput, it is recommended to run scripts/fuzz_opt.py
including its related binaries on a fast hard drive or, alternatively, on a ramdrive. Running on spinning disks is about an order of magnitude slower.
This script will use existing wasm files as the basis for fuzzing (mutating and expanding upon them), which is good if that set of files represents realistic content. By default the script will use all testcases in the test suite as such initial content, with a priority given to files modified in the last 30 days. You can also put wasm files in the ./fuzz/
directory and it will likewise treat them as high priority initial content, which is useful when you have some local files you want to especially fuzz.
Binaryen has scripts for ClusterFuzz integration. See bundle_clusterfuzz.py
.
A complementary feature is reducing: taking an existing interesting testcase and reducing it to as small a testcase as possible while keeping it interesting. Binaryen's wasm-reduce
tool can do that, using something like
bin/wasm-reduce start.wasm "--command=checker-command test.wasm" -t test.wasm -w work.wasm
This takes an input wasm and a bunch of options:
- The command is the command to be run. Our goal during reduction is to keep the behavior of the command the same, namely, same exit code and same stdout. You should therefore make sure the command emits only the relevant output for you (e.g., if the command prints out the wasm binary size, reduction is impossible!).
- The "test file": the file we write to and then run the command. The command should run on that file. Note how in the example above we explicitly tell it to (but if it had that name hardcoded inside it, that would be fine as well).
- The "work file": the current reduction. You can look at that file while reduction is still going on to track progress. This will also contain the final reduction at the end.
wasm-reduce
works by trying all sorts of changes to the file that shrink it, and if a change is valid (keeps it "interesting", i.e., same result on the command) then we keep it and continue from there.
Reduction can be a slow process, because we need to check every change by running the command - so if the command takes 5 seconds, it may take that long to shrink by a single byte (!). wasm-reduce
tries to get around that by taking advantage of the Binaryen optimizer: it will interleave "destructive reduction" (removing code, breaking code in ways that might alter program behavior) with "pass reduction" (running Binaryen optimizer passes, which should not alter program behavior). For example, destructive removal of a condition to an if
might let an optimization pass remove one arm of the if, which can be much faster than removing all the parts of the arm one by one.
- Note that
wasm-reduce
only works on valid wasm files. It takes advantage of the structure of the wasm in order to reduce effectively, which means we parse it and then manipulate it. If we can't even parse it, we can't do anything. In that case you may want to use more general purpose binary testcase reduction tools.
third_party/setup.py can automatically install the necessary dependencies like the Spidermonkey JS shell (mozjs), the V8 JS shell (d8) and WABT in third_party/
.
./third_party/setup.py [mozjs|v8|wabt|all]
Also helps when fuzzing on a ramdrive (requires about 300mb):
./third_party/setup.py all
cp -r build/ scripts/ test/ third_party/ /path/to/ramdrive
cd /path/to/ramdrive
./scripts/fuzz_opt.py --binaryen-bin build/bin