Fuzzing: ClusterFuzz integration #7079

kripken · 2024-11-14T22:08:41Z

The main addition here is a bundle_clusterfuzz.py script which will package up
the exact files that should be uploaded to ClusterFuzz. It also documents the
process and bundling and testing. You can do

bundle.py OUTPUT_FILE.tgz

That bundles wasm-opt from ./bin., which is enough for local testing. For
actually uploading to ClusterFuzz, we need a portable build, and @dschuff
had the idea to reuse the emsdk build, which works nicely. Doing

bundle.py OUTPUT_FILE.tgz --build-dir=/path/to/emsdk/upstream/

will bundle wasm-opt (+libs) from the emsdk. I verified that those builds
work on ClusterFuzz.

I added several forms of testing here. First, our main fuzzer fuzz_opt.py now
has a ClusterFuzz testcase handler, which simulates a ClusterFuzz environment.
Second, there are smoke tests that run in the unit test suite, and can also be
run separately:

python -m unittest test/unit/test_cluster_fuzz.py

Those unit tests can also run on a given bundle, e.g. one created from an
emsdk build, for testing right before upload:

BINARYEN_CLUSTER_FUZZ_BUNDLE=/path/to/bundle.tgz python -m unittest test/unit/test_cluster_fuzz.py

A third piece of testing is to add a --fuzz-passes test. That is a mode for
-ttf (translate random data into a valid wasm fuzz testcase) that uses random
data to pick and run a set of passes, to further shape the wasm. (--fuzz-passes
had no previous testing, and this PR fixes it and tidies it up a little, adding some
newer passes too).

Otherwise this PR includes the key run.py script that is bundled and then
executed by ClusterFuzz, basically a python script that runs wasm-opt -ttf [..]
to generate testcases, sets up their JS, and emits them.

fuzz_shell.js, which is the JS to execute testcases, will now check if it is
provided binary data of a wasm file. If so, it does not read a wasm file from
argv[1]. (This is needed because ClusterFuzz expects a single file for the
testcase, so we make a JS file with bundled wasm inside it.)

tlively · 2024-11-14T23:03:32Z

scripts/bundle_clusterfuzz.py

@@ -85,7 +85,7 @@
    # Delete the argument, as importing |shared| scans it.
    sys.argv.pop()

-from test import shared
+from test import shared # noqa


Can we refactor the shared argument parsing to use less global state so we don't have to dodge the linter like this?

That might be a very large refactoring. shared.py depends on parsing the arguments synchonously (it uses their results immediately), so putting it all in a function to call later wouldn't be enough. And I'm not sure how to add a "plugin" interface to add more things for that argparse code to handle.

I do agree that it is weird that this script has its own argument parsing in addition to the core parsing, but we do need that core parsing (for the flags to set the bin dir). We'd need to either duplicate that code, or do some kind of big refactoring that I don't have a good idea for.

tlively · 2024-11-15T02:16:03Z

scripts/bundle_clusterfuzz.py

+
+  ./emsdk install tot
+
+after which ./upstream/ (from the emsdk dir) will contain portable builds of


What does "portable" mean in this context?

In the sense of being able to run on as many targets as possible. For Linux, that means not depending on specific versions of system libc etc. The emsdk makes such builds, for example. Is there a better word for this?

Maybe "hermetic," but that's a stronger property than what we mean here. How about instead of just saying "portable," we mention that the emsdk builds don't depend on system libc, etc.

Sounds good, done.

Adjusts glasses and pocket protector...
well actually, the emsdk builds do depend on system libc. Just not on libc++ (and the libc is fairly old, so it doesn't depend on very new libc symbol versions).

Heh, ok, I adjusted the comment to mention that. I think now it's general and accurate enough.

tlively · 2024-11-15T02:17:36Z

scripts/bundle_clusterfuzz.py

+
+  2. Run the unit tests, which include smoke tests for our ClusterFuzz support:
+
+       python -m unittest test/unit/test_cluster_fuzz.py


Maybe this script should run these smoke tests automatically?

I don't feel strongly, but given that the tests have some logging output that the user should review manually, it seems best to me to separate the two tasks in a clean way. In particular, the user might want to run those tests multiple times on a single bundle.

If the user is supposed to inspect logged output, I think that makes it even better to have the bundler script run them. We can still allow the tests to be run separately as well, and could even print instructions for that in the bundler output.

Hmm, that still feels a little less simple/unixey to me. The script would no longer be a bundler, but a "bundle-and-test" script, that does more than one thing. How about just printing the instructions after bundling?

That sounds fine to me 👍

scripts/clusterfuzz/run.py

test/unit/test_cluster_fuzz.py

Co-authored-by: Thomas Lively <tlively123@gmail.com>

kripken added 30 commits November 8, 2024 11:33

start

9926504

work

8d201ca

work

fa633e9

prep

d29bb70

work

17e6e94

work

bc9a1d1

work

eb91fd3

work

e1d5be0

work

c9b057c

work

fe8b47a

work

cc22c7a

work

1b97501

work

6fb3e45

work

ae2f663

work

b940d34

work

794980c

work

823f146

work

1ed21d5

work

1657555

work

586bad8

work

ad6f5ee

work

66e56db

work

02a89b7

fix

156f6b6

text

07e1033

oops

f0cab01

restore

a694dd7

finish

af7b2d5

moar

a0da68b

oops.in.advance

faf380c

kripken added 6 commits November 14, 2024 12:50

notes

b440b65

fix

53cec85

format

e0fb922

text

23ae5a4

note

d0b254d

note

ccf4683

kripken requested review from sbc100, dschuff and tlively November 14, 2024 22:08

kripken added 8 commits November 14, 2024 14:10

lint

6487be1

lint

5fcf347

lint

b3859df

lint

2b3e0f7

update

e17046b

try to fix macos

51cff4d

Make the test use the right build dir, which varies on CI

9b08a40

find build dir properly

e3c9915

tlively reviewed Nov 15, 2024

View reviewed changes

kripken and others added 8 commits November 18, 2024 13:22

Update scripts/clusterfuzz/run.py

e3b905e

Co-authored-by: Thomas Lively <tlively123@gmail.com>

use unittest asserts

99ba1ee

Avoid regex-capturing stuff we don't need

aa9bb5c

assert on having one line per regex

f4d79b1

Update test/unit/test_cluster_fuzz.py

8de3f10

Co-authored-by: Thomas Lively <tlively123@gmail.com>

comment

8977b39

get build dir in all tests in the same, correct, manner

60e2f97

Skip on windows

310e161

tlively approved these changes Nov 19, 2024

View reviewed changes

comments

d713d6e

kripken merged commit b0e999a into WebAssembly:main Nov 19, 2024
13 checks passed

kripken deleted the clusterfuzz branch November 19, 2024 17:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fuzzing: ClusterFuzz integration #7079

Fuzzing: ClusterFuzz integration #7079

kripken commented Nov 14, 2024

tlively Nov 14, 2024

kripken Nov 18, 2024

tlively Nov 15, 2024

kripken Nov 18, 2024

tlively Nov 18, 2024

kripken Nov 18, 2024

dschuff Nov 19, 2024

kripken Nov 19, 2024

tlively Nov 15, 2024

kripken Nov 18, 2024

tlively Nov 18, 2024

kripken Nov 18, 2024

tlively Nov 19, 2024

kripken Nov 19, 2024


		./emsdk install tot

		after which ./upstream/ (from the emsdk dir) will contain portable builds of


		2. Run the unit tests, which include smoke tests for our ClusterFuzz support:

		python -m unittest test/unit/test_cluster_fuzz.py

Fuzzing: ClusterFuzz integration #7079

Fuzzing: ClusterFuzz integration #7079

Conversation

kripken commented Nov 14, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment