Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fuzzing] wasm2c integration #2772

Merged
merged 139 commits into from
Apr 22, 2020
Merged

[fuzzing] wasm2c integration #2772

merged 139 commits into from
Apr 22, 2020

Conversation

kripken
Copy link
Member

@kripken kripken commented Apr 16, 2020

This adds support for fuzzing with wabt's wasm2c that @binji wrote.
Basically we compile the wasm to C, then compile the C to a native
executable with a custom main() to wrap around it. The executable
should then print exactly the same as that wasm when run in either
the binaryen interpreter or in a JS VM with our wrapper JS for that
wasm. In other words, compiling the wasm to C is another way to
run that wasm.

I ran this for many hours and could not find any bugs in wasm2c.
Nice work there!

The main reasons I want this are to fuzz wasm2c itself, and to
have another option for fuzzing emcc. For the latter, we do fuzz
wasm-opt quite a lot, but that doesn't fuzz the non-wasm-opt
parts of emcc. And using wasm2c for that is nice since the
starting point is always a wasm file, which means we
can use tools like wasm-reduce and so forth, which can be
integrated with this fuzzer.

This also:

  • Refactors the fuzzer harness a little to make it easier to
    add more "VMs" to run wasms in.

  • Do not autoreduce when re-running a testcase, which I hit
    while developing this.

src/tools/wasm-opt.cpp Outdated Show resolved Hide resolved
std::ofstream outfile;
outfile.open(emitWasm2CWrapper, std::ofstream::out);
outfile << generateWasm2CWrapper(wasm);
outfile.close();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why generating this wrapper makes sense as part of wasm-opt. It seems like a separate function to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be a new tool I suppose, but we have 3 wrapper generators now (spec, js, wasm2c), and it's convenient to run them from wasm-opt so you can emit the wrapper as you generate the fuzz code, in a single invocation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk, it seems similar to a lot of other auxiliary functionality in wasm-opt like the fuzzing stuff or the JS wrappers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, there is a precedent here. In that case that fine for now.

But maybe as a followup this should be a separate tool? So you would wasm-opt and then wasm-generate-fuzz-wrapper -type=wasm2c or something less clunky than that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, maybe that's better. Another option would be to make all 3 of those be passes.

scripts/fuzz_opt.py Outdated Show resolved Hide resolved
# No legalization for JS means we can't compare JS to others, as any
# illegal export will fail immediately.
vm = self.vms[i]
if vm.can_compare_to_others() and results[i] is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems simpler to me to keep track of the used vms by making each element of results a tuple of a vm and its fixed output) rather than inserting holes into results. I think a list comprehension for this would be nice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed to use tuples, but am not quite sure where you wanted a list comprehension?

std::ofstream outfile;
outfile.open(emitWasm2CWrapper, std::ofstream::out);
outfile << generateWasm2CWrapper(wasm);
outfile.close();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk, it seems similar to a lot of other auxiliary functionality in wasm-opt like the fuzzing stuff or the JS wrappers.

src/tools/wasm2c-wrapper.hpp Outdated Show resolved Hide resolved
src/tools/wasm2c-wrapper.hpp Outdated Show resolved Hide resolved

)";

for (auto& exp : wasm.exports) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also having trouble understanding what this loop is adding and why. More comments would be very welcome.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that, I added a bunch more comments now.

@kripken
Copy link
Member Author

kripken commented Apr 21, 2020

Review comments addressed, but the clang-tidy stuff is an increasing rabbithole... the one file i added is now leading it to find interesting problems in many other files that existed before, but I guess it never scanned them until this PR.

See for example the last commit for the stuff I need to do here, and probably a lot left: c32f148

What's the best thing to do here? not sure if I'm doing the right thing in trying to fix these... cc @aheejin

@tlively
Copy link
Member

tlively commented Apr 21, 2020

Are we enforcing clang-tidy now? Before it was just clang-format.

@sbc100
Copy link
Member

sbc100 commented Apr 21, 2020

Are we enforcing clang-tidy now? Before it was just clang-format.

I think the intention was that were enforcing it, at least we were running that script. I did fix a couple possible bug in the script leading up to the github actions transition. @aheejin certainly believed it to be previously working.

Do we not want to enforce it? Does it generate false positives?

@kripken
Copy link
Member Author

kripken commented Apr 21, 2020

For the latter case of wanting fuzz emcc, are you basically just using wasm2c here as way to generate C source? Kind like csmith but where the C code itself comes from a wasm file? Kind of like using the wasm input as a seed to generate a bunch of C that might trip emcc up?

Yeah, that's basically it. It's easy to do and gives another source of C programs for fuzzing. And we have good tools around this for reduction on the wasm, options like no OOB and no NaNs, etc., already working on wasm.

But also, this helps fuzz wasm2c itself. I think that has some cool use cases and it's nice to know it's been fuzzed before recommending it broadly I think.

@aheejin
Copy link
Member

aheejin commented Apr 22, 2020

Review comments addressed, but the clang-tidy stuff is an increasing rabbithole... the one file i added is now leading it to find interesting problems in many other files that existed before, but I guess it never scanned them until this PR.

See for example the last commit for the stuff I need to do here, and probably a lot left: c32f148

What's the best thing to do here? not sure if I'm doing the right thing in trying to fix these... cc @aheejin

I can take a deeper look and run clang-tidy locally to see what's happening, but I think our current .clang-tidy effectively only enables readability-braces-around-statements, which enforces {} on one-line ifs and loops. What kind of clang-tidy errors are you seeing?

Also, if code is in a header file, it is checked only when it is included in a source file IIRC.

@sbc100
Copy link
Member

sbc100 commented Apr 22, 2020

Sorry, I found that clang-tidy problem, I'd forgotten that it needs to compilation database in order to be useful.

Copy link
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@kripken kripken merged commit 35a36b1 into master Apr 22, 2020
@kripken kripken deleted the wasm2c branch April 22, 2020 19:11
@tlively
Copy link
Member

tlively commented Apr 23, 2020

@kripken I have WABT installed locally, but when I run the fuzzer with this change I am getting

clang-9: error: no such file or directory: '/usr/local/google/home/tlively/local/wasm2c/wasm-rt-impl.c'

Do you know what that might be about? cc @binji

@kripken
Copy link
Member Author

kripken commented Apr 23, 2020

The code assumes you build wabt with a build dir in the wabt root. So that if it finds wasm2c at /a/my_wabt_root/build_dir/wasm2c then it can find the wasm2c support files directory at /a/my_wabt_root/wasm2c/. Do you have just the binaries, maybe, or a different kind of build dir setup?

@tlively
Copy link
Member

tlively commented Apr 23, 2020

My build dir is in the wabt root, but after I build I do ninja install to install the binaries to my ~/local/bin directory. Is the runtime implementation something that should be installed alongside the binaries?

@kripken
Copy link
Member Author

kripken commented Apr 23, 2020

Oh, yes, we need those runtime support files to build with. We can't copy them into binaryen here because they need to match what wabt emits. If there's an install step, perhaps it should install those then?

@binji
Copy link
Member

binji commented Apr 23, 2020

Yeah, I guess it should... maybe to /usr/share?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants