Skip to content

Commit

Permalink
Fuzzing: ClusterFuzz integration (#7079)
Browse files Browse the repository at this point in the history
The main addition here is a bundle_clusterfuzz.py script which will package up
the exact files that should be uploaded to ClusterFuzz. It also documents the
process and bundling and testing. You can do

bundle.py OUTPUT_FILE.tgz

That bundles wasm-opt from ./bin., which is enough for local testing. For
actually uploading to ClusterFuzz, we need a portable build, and @dschuff
had the idea to reuse the emsdk build, which works nicely. Doing

bundle.py OUTPUT_FILE.tgz --build-dir=/path/to/emsdk/upstream/

will bundle wasm-opt (+libs) from the emsdk. I verified that those builds
work on ClusterFuzz.

I added several forms of testing here. First, our main fuzzer fuzz_opt.py now
has a ClusterFuzz testcase handler, which simulates a ClusterFuzz environment.
Second, there are smoke tests that run in the unit test suite, and can also be
run separately:

python -m unittest test/unit/test_cluster_fuzz.py

Those unit tests can also run on a given bundle, e.g. one created from an
emsdk build, for testing right before upload:

BINARYEN_CLUSTER_FUZZ_BUNDLE=/path/to/bundle.tgz python -m unittest test/unit/test_cluster_fuzz.py

A third piece of testing is to add a --fuzz-passes test. That is a mode for
-ttf (translate random data into a valid wasm fuzz testcase) that uses random
data to pick and run a set of passes, to further shape the wasm. (--fuzz-passes
had no previous testing, and this PR fixes it and tidies it up a little, adding some
newer passes too).

Otherwise this PR includes the key run.py script that is bundled and then
executed by ClusterFuzz, basically a python script that runs wasm-opt -ttf [..]
to generate testcases, sets up their JS, and emits them.

fuzz_shell.js, which is the JS to execute testcases, will now check if it is
provided binary data of a wasm file. If so, it does not read a wasm file from
argv[1]. (This is needed because ClusterFuzz expects a single file for the
testcase, so we make a JS file with bundled wasm inside it.)
  • Loading branch information
kripken authored Nov 19, 2024
1 parent 25b8e6a commit b0e999a
Show file tree
Hide file tree
Showing 11 changed files with 808 additions and 22 deletions.
135 changes: 135 additions & 0 deletions scripts/bundle_clusterfuzz.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
#!/usr/bin/python3

'''
Bundle files for uploading to ClusterFuzz.
Usage:
bundle.py OUTPUT_FILE.tgz [--build-dir=BUILD_DIR]
The output file will be a .tgz file.
if a build directory is provided, we will look under there to find bin/wasm-opt
and lib/libbinaryen.so. A useful place to get builds from is the Emscripten SDK,
as you can do
./emsdk install tot
after which ./upstream/ (from the emsdk dir) will contain builds of wasm-opt and
libbinaryen.so (that are designed to run on as many systems as possible, by not
depending on newer libc symbols, etc., as opposed to a normal local build).
Thus, the full workflow could be
cd emsdk
./emsdk install tot
cd ../binaryen
python3 scripts/bundle_clusterfuzz.py binaryen_wasm_fuzzer.tgz --build-dir=../emsdk/upstream
When using --build-dir in this way, you are responsible for ensuring that the
wasm-opt in the build dir is compatible with the scripts in the current dir
(e.g., if run.py here passes a flag that is only in a new/older version of
wasm-opt, a problem can happen).
Before uploading to ClusterFuzz, it is worth doing the following:
1. Run the local fuzzer (scripts/fuzz_opt.py). That includes a ClusterFuzz
testcase handler, which simulates what ClusterFuzz does.
2. Run the unit tests, which include smoke tests for our ClusterFuzz support:
python -m unittest test/unit/test_cluster_fuzz.py
Look at the logs, which will contain statistics on the wasm files the
fuzzer emits, and see that they look reasonable.
You should run the unit tests on the bundle you are about to upload, by
setting the proper env var like this (using the same filename as above):
BINARYEN_CLUSTER_FUZZ_BUNDLE=`pwd`/binaryen_wasm_fuzzer.tgz python -m unittest test/unit/test_cluster_fuzz.py
Note that you must pass an absolute filename (e.g. using pwd as shown).
The unittest logs should reflect that that bundle is being used at the
very start ("Using existing bundle: ..." rather than "Making a new
bundle"). Note that some of the unittests also create their own bundles, to
test the bundling script itself, so later down you will see logging of
bundle creation even if you provide a bundle.
After uploading to ClusterFuzz, you can wait a while for it to run, and then:
1. Inspect the log to see that we generate all the testcases properly, and
their sizes look reasonably random, etc.
2. Inspect the sample testcase and run it locally, to see that
d8 --wasm-staging testcase.js
properly runs the testcase, emitting logging etc.
3. Check the stats and crashes page (known crashes should at least be showing
up). Note that these may take longer to show up than 1 and 2.
'''

import os
import sys
import tarfile

# Read the filenames first, as importing |shared| changes the directory.
output_file = os.path.abspath(sys.argv[1])
print(f'Bundling to: {output_file}')
assert output_file.endswith('.tgz'), 'Can only generate a .tgz'

build_dir = None
if len(sys.argv) >= 3:
assert sys.argv[2].startswith('--build-dir=')
build_dir = sys.argv[2].split('=')[1]
build_dir = os.path.abspath(build_dir)
# Delete the argument, as importing |shared| scans it.
sys.argv.pop()

from test import shared # noqa

# Pick where to get the builds
if build_dir:
binaryen_bin = os.path.join(build_dir, 'bin')
binaryen_lib = os.path.join(build_dir, 'lib')
else:
binaryen_bin = shared.options.binaryen_bin
binaryen_lib = shared.options.binaryen_lib

with tarfile.open(output_file, "w:gz") as tar:
# run.py
run = os.path.join(shared.options.binaryen_root, 'scripts', 'clusterfuzz', 'run.py')
print(f' .. run: {run}')
tar.add(run, arcname='run.py')

# fuzz_shell.js
fuzz_shell = os.path.join(shared.options.binaryen_root, 'scripts', 'fuzz_shell.js')
print(f' .. fuzz_shell: {fuzz_shell}')
tar.add(fuzz_shell, arcname='scripts/fuzz_shell.js')

# wasm-opt binary
wasm_opt = os.path.join(binaryen_bin, 'wasm-opt')
print(f' .. wasm-opt: {wasm_opt}')
tar.add(wasm_opt, arcname='bin/wasm-opt')

# For a dynamic build we also need libbinaryen.so and possibly other files.
# Try both .so and .dylib suffixes for more OS coverage.
for suffix in ['.so', '.dylib']:
libbinaryen = os.path.join(binaryen_lib, f'libbinaryen{suffix}')
if os.path.exists(libbinaryen):
print(f' .. libbinaryen: {libbinaryen}')
tar.add(libbinaryen, arcname=f'lib/libbinaryen{suffix}')

# The emsdk build also includes some more necessary files.
for name in [f'libc++{suffix}', f'libc++{suffix}.2', f'libc++{suffix}.2.0']:
path = os.path.join(binaryen_lib, name)
if os.path.exists(path):
print(f' ......... : {path}')
tar.add(path, arcname=f'lib/{name}')

print('Done.')
print('To run the tests on this bundle, do:')
print()
print(f'BINARYEN_CLUSTER_FUZZ_BUNDLE={output_file} python -m unittest test/unit/test_cluster_fuzz.py')
print()
163 changes: 163 additions & 0 deletions scripts/clusterfuzz/run.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
#
# Copyright 2024 WebAssembly Community Group participants
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

'''
ClusterFuzz run.py script: when run by ClusterFuzz, it uses wasm-opt to generate
a fixed number of testcases. This is a "blackbox fuzzer", see
https://google.github.io/clusterfuzz/setting-up-fuzzing/blackbox-fuzzing/
This file should be bundled up together with the other files it needs, see
bundle_clusterfuzz.py.
'''

import os
import getopt
import random
import subprocess
import sys

# The V8 flags we put in the "fuzzer flags" files, which tell ClusterFuzz how to
# run V8. By default we apply all staging flags.
FUZZER_FLAGS_FILE_CONTENTS = '--wasm-staging'

# Maximum size of the random data that we feed into wasm-opt -ttf. This is
# smaller than fuzz_opt.py's INPUT_SIZE_MAX because that script is tuned for
# fuzzing large wasm files (to reduce the overhead we have of launching many
# processes per file), which is less of an issue on ClusterFuzz.
MAX_RANDOM_SIZE = 15 * 1024

# The prefix for fuzz files.
FUZZ_FILENAME_PREFIX = 'fuzz-'

# The prefix for flags files.
FLAGS_FILENAME_PREFIX = 'flags-'

# The name of the fuzzer (appears after FUZZ_FILENAME_PREFIX /
# FLAGS_FILENAME_PREFIX).
FUZZER_NAME_PREFIX = 'binaryen-'

# The root directory of the bundle this will be in, which is the directory of
# this very file.
ROOT_DIR = os.path.dirname(os.path.abspath(__file__))

# The path to the wasm-opt binary that we run to generate testcases.
FUZZER_BINARY_PATH = os.path.join(ROOT_DIR, 'bin', 'wasm-opt')

# The path to the fuzz_shell.js script that will execute the wasm in each
# testcase.
JS_SHELL_PATH = os.path.join(ROOT_DIR, 'scripts', 'fuzz_shell.js')

# The arguments we provide to wasm-opt to generate wasm files.
FUZZER_ARGS = [
# Generate a wasm from random data.
'--translate-to-fuzz',
# Run some random passes, to further shape the random wasm we emit.
'--fuzz-passes',
# Enable all features but disable ones not yet ready for fuzzing. This may
# be a smaller set than fuzz_opt.py, as that enables a few experimental
# flags, while here we just fuzz with d8's --wasm-staging.
'-all',
'--disable-shared-everything',
'--disable-fp16',
]


# Returns the file name for fuzz or flags files.
def get_file_name(prefix, index):
return f'{prefix}{FUZZER_NAME_PREFIX}{index}.js'


# Returns the contents of a .js fuzz file, given particular wasm contents that
# we want to be executed.
def get_js_file_contents(wasm_contents):
# Start with the standard JS shell.
with open(JS_SHELL_PATH) as file:
js = file.read()

# Prepend the wasm contents, so they are used (rather than the normal
# mechanism where the wasm file's name is provided in argv).
wasm_contents = ','.join([str(c) for c in wasm_contents])
js = f'var binary = new Uint8Array([{wasm_contents}]);\n\n' + js
return js


def main(argv):
# Parse the options. See
# https://google.github.io/clusterfuzz/setting-up-fuzzing/blackbox-fuzzing/#uploading-a-fuzzer
output_dir = '.'
num = 100
expected_flags = ['input_dir=', 'output_dir=', 'no_of_files=']
optlist, _ = getopt.getopt(argv[1:], '', expected_flags)
for option, value in optlist:
if option == '--output_dir':
output_dir = value
elif option == '--no_of_files':
num = int(value)

for i in range(1, num + 1):
input_data_file_path = os.path.join(output_dir, f'{i}.input')
wasm_file_path = os.path.join(output_dir, f'{i}.wasm')

# wasm-opt may fail to run in rare cases (when the fuzzer emits code it
# detects as invalid). Just try again in such a case.
for attempt in range(0, 100):
# Generate random data.
random_size = random.SystemRandom().randint(1, MAX_RANDOM_SIZE)
with open(input_data_file_path, 'wb') as file:
file.write(os.urandom(random_size))

# Generate wasm from the random data.
cmd = [FUZZER_BINARY_PATH] + FUZZER_ARGS
cmd += ['-o', wasm_file_path, input_data_file_path]
try:
subprocess.check_call(cmd)
except subprocess.CalledProcessError:
# Try again.
print('(oops, retrying wasm-opt)')
attempt += 1
if attempt == 99:
# Something is very wrong!
raise
continue
# Success, leave the loop.
break

# Generate a testcase from the wasm
with open(wasm_file_path, 'rb') as file:
wasm_contents = file.read()
testcase_file_path = os.path.join(output_dir,
get_file_name(FUZZ_FILENAME_PREFIX, i))
js_file_contents = get_js_file_contents(wasm_contents)
with open(testcase_file_path, 'w') as file:
file.write(js_file_contents)

# Emit a corresponding flags file.
flags_file_path = os.path.join(output_dir,
get_file_name(FLAGS_FILENAME_PREFIX, i))
with open(flags_file_path, 'w') as file:
file.write(FUZZER_FLAGS_FILE_CONTENTS)

print(f'Created testcase: {testcase_file_path}, {len(wasm_contents)} bytes')

# Remove temporary files.
os.remove(input_data_file_path)
os.remove(wasm_file_path)

print(f'Created {num} testcases.')


if __name__ == '__main__':
main(sys.argv)
Loading

0 comments on commit b0e999a

Please sign in to comment.