-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fuzzing: ClusterFuzz integration #7079
Conversation
@@ -85,7 +85,7 @@ | |||
# Delete the argument, as importing |shared| scans it. | |||
sys.argv.pop() | |||
|
|||
from test import shared | |||
from test import shared # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we refactor the shared argument parsing to use less global state so we don't have to dodge the linter like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might be a very large refactoring. shared.py
depends on parsing the arguments synchonously (it uses their results immediately), so putting it all in a function to call later wouldn't be enough. And I'm not sure how to add a "plugin" interface to add more things for that argparse code to handle.
I do agree that it is weird that this script has its own argument parsing in addition to the core parsing, but we do need that core parsing (for the flags to set the bin dir). We'd need to either duplicate that code, or do some kind of big refactoring that I don't have a good idea for.
scripts/bundle_clusterfuzz.py
Outdated
|
||
./emsdk install tot | ||
|
||
after which ./upstream/ (from the emsdk dir) will contain portable builds of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "portable" mean in this context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the sense of being able to run on as many targets as possible. For Linux, that means not depending on specific versions of system libc etc. The emsdk makes such builds, for example. Is there a better word for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "hermetic," but that's a stronger property than what we mean here. How about instead of just saying "portable," we mention that the emsdk builds don't depend on system libc, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjusts glasses and pocket protector...
well actually, the emsdk builds do depend on system libc. Just not on libc++ (and the libc is fairly old, so it doesn't depend on very new libc symbol versions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, ok, I adjusted the comment to mention that. I think now it's general and accurate enough.
|
||
2. Run the unit tests, which include smoke tests for our ClusterFuzz support: | ||
|
||
python -m unittest test/unit/test_cluster_fuzz.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this script should run these smoke tests automatically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel strongly, but given that the tests have some logging output that the user should review manually, it seems best to me to separate the two tasks in a clean way. In particular, the user might want to run those tests multiple times on a single bundle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the user is supposed to inspect logged output, I think that makes it even better to have the bundler script run them. We can still allow the tests to be run separately as well, and could even print instructions for that in the bundler output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that still feels a little less simple/unixey to me. The script would no longer be a bundler, but a "bundle-and-test" script, that does more than one thing. How about just printing the instructions after bundling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds fine to me 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Co-authored-by: Thomas Lively <tlively123@gmail.com>
Co-authored-by: Thomas Lively <tlively123@gmail.com>
The main addition here is a
bundle_clusterfuzz.py
script which will package upthe exact files that should be uploaded to ClusterFuzz. It also documents the
process and bundling and testing. You can do
That bundles
wasm-opt
from./bin.
, which is enough for local testing. Foractually uploading to ClusterFuzz, we need a portable build, and @dschuff
had the idea to reuse the emsdk build, which works nicely. Doing
will bundle
wasm-opt
(+libs) from the emsdk. I verified that those buildswork on ClusterFuzz.
I added several forms of testing here. First, our main fuzzer
fuzz_opt.py
nowhas a
ClusterFuzz
testcase handler, which simulates a ClusterFuzz environment.Second, there are smoke tests that run in the unit test suite, and can also be
run separately:
Those unit tests can also run on a given bundle, e.g. one created from an
emsdk build, for testing right before upload:
A third piece of testing is to add a
--fuzz-passes
test. That is a mode for-ttf
(translate random data into a valid wasm fuzz testcase) that uses randomdata to pick and run a set of passes, to further shape the wasm. (
--fuzz-passes
had no previous testing, and this PR fixes it and tidies it up a little, adding some
newer passes too).
Otherwise this PR includes the key
run.py
script that is bundled and thenexecuted by ClusterFuzz, basically a python script that runs
wasm-opt -ttf [..]
to generate testcases, sets up their JS, and emits them.
fuzz_shell.js
, which is the JS to execute testcases, will now check if it isprovided binary data of a wasm file. If so, it does not read a wasm file from
argv[1]
. (This is needed because ClusterFuzz expects a single file for thetestcase, so we make a JS file with bundled wasm inside it.)