Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fuzz: differential V8 engine occasionally crashes #4786

Open
abrown opened this issue Aug 25, 2022 · 5 comments
Open

fuzz: differential V8 engine occasionally crashes #4786

abrown opened this issue Aug 25, 2022 · 5 comments
Labels
bug Incorrect behavior in the current implementation that needs fixing fuzzing Issues related to our fuzzing infrastructure

Comments

@abrown
Copy link
Contributor

abrown commented Aug 25, 2022

Test Case

No test case produced.

Steps to Reproduce

$ ALLOWED_ENGINES=-spec,-wasmi cargo +nightly fuzz run differential -s none

Run for enough time to crash.

Expected Results

Not to crash.

Actual Results

...
=== Execution rate (1023 successes / 6000 attempted modules): 17.05% ===
        wasmi: 0.00%, spec: 0.00%, wasmtime: 87.70%, v8: 12.30%
        wasm-smith: 44.81%, single-inst: 55.19%
...
#6851   NEW    cov: 20559 ft: 63794 corp: 1188/146Kb lim: 170 exec/s: 360 rss: 106Mb L: 138/170 MS: 2 CopyPart-CopyPart-
#6862   NEW    cov: 20559 ft: 63795 corp: 1189/146Kb lim: 170 exec/s: 361 rss: 106Mb L: 132/170 MS: 1 CopyPart-
#6864   NEW    cov: 20560 ft: 63796 corp: 1190/146Kb lim: 170 exec/s: 361 rss: 106Mb L: 103/170 MS: 2 EraseBytes-ChangeBit-
#6869   NEW    cov: 20563 ft: 63809 corp: 1191/146Kb lim: 170 exec/s: 361 rss: 106Mb L: 169/170 MS: 5 ShuffleBytes-InsertRepeatedBytes-InsertRepeatedBytes-ShuffleBytes-PersAutoDict- DE: "\xff\xff\xff\x07"-
#6988   NEW    cov: 20563 ft: 63838 corp: 1192/146Kb lim: 170 exec/s: 349 rss: 106Mb L: 129/170 MS: 4 InsertRepeatedBytes-ChangeBit-EraseBytes-InsertRepeatedBytes-
────────────────────────────────────────────────────────────────────────────────

Error: Fuzz target exited with signal: 11 (core dumped)

Versions and Environment

Wasmtime version or commit: main

Operating system: Fedora 35

Architecture: x86_64

@abrown abrown added the bug Incorrect behavior in the current implementation that needs fixing label Aug 25, 2022
@abrown
Copy link
Contributor Author

abrown commented Aug 25, 2022

cc: @alexcrichton, guess that crash we saw wasn't the spec interpreter after all.

@abrown abrown added the fuzzing Issues related to our fuzzing infrastructure label Aug 25, 2022
@alexcrichton
Copy link
Member

I haven't been able to reproduce this locally unfortunately. When I let the fuzzer run long enough it hit #4812 before hitting this fault.

Can you try running the fuzzer in gdb to see if you can grab a stack trace of the fault? (presuming it doesn't take too long to fault). When using gdb though it's difficult b/c segfaults can normally happen for wasm and there's no great way to handle this. I was able to do handle SIGILL nostop locally in gdb but if you do handle SIGSEGV nostop I think that when it actually hits the crashing failure it lets the process exit which is no good, so you may have to babysit it a bit and double-check that SIGSEGV doesn't look like a normal wasm SIGSEGV (rr might be super helpful here in this regard)

@bjorn3
Copy link
Contributor

bjorn3 commented Aug 30, 2022

Maybe you can do handle SIGSEGV nostop but also add a breakpoint where the signal handler propagates the SIGSEGV if it didn't originate from the wasm module?

@alexcrichton
Copy link
Member

I've been unable to fuzz with v8 locally for the longest time so I tried to dig into this today. Turns out it's an issue with memory protection keys so this'll only affect new enough x64 hardware with new enough glibc. That I think would explain why Andrew initially ran into this and I couldn't reproduce. I think that would also help explain why it hasn't showed up on oss-fuzz at all despite being trivial to reproduce locally.

My best read of the situation is that if you run with v8 and differential fuzzing long enough locally it will eventually crash. This crash, when I've inspected it, always looks like this:

Thread 1 "differential" received signal SIGSEGV, Segmentation fault.
0x00002061775c8170 in ?? ()
(gdb) disas $rip,$rip+20
Dump of assembler code from 0x2061775c8170 to 0x2061775c8184:
=> 0x00002061775c8170:  jmp    *0x2(%rip)        # 0x2061775c8178

The memory being accessed is just below %rip itself, and the memory address of 0x00002061775c8178 is indeed readable in gdb. This is in fact an indirect jump to Builtins_WasmLiftoffFrameSetup all the time. The problem is that if you print $_siginfo.si_code in gdb it yields 4 which corresponds to SEGV_PKUERR.

So the problem here is that JIT code isn't allowed to read this code due to MPK and the status of the PKRU. The corruption here seems to be:

  • V8 looks like it uses a single pkey at most.
  • This is entirely unrelated to Wasmtime's PKRU support, if that's disabled the issue still happens.
  • V8 assumes that JIT code executes with pkey 1.
  • V8 uses pkeys to disallow writes to JIT code with pkey 1.
  • V8 assumes a steady-state of pkey 1 is accessible (executable/readable) but not writable
  • V8, when it modifies JIT code, frobs the PKRU to teporarily allow writes

This is all fine and dandy and Wasmtime shouldn't interfere with it, except it does. The problem is apparently signal handlers (it's always signal handlers). V8 assumes a steady-state PKRU register of 0x5...58. That means no access to keys >=2, readonly/executable access to pkey 1, and all access to pkey 0. When a signal is received, however, it resets the PKRU register to 0x5..54. This notably disables access to all keys >=1, disagreeing with V8's steady state.

The problem here is that when Wasmtime executes, hits a signal, recovers, and goes back into V8, now V8 is in a corrupt state. V8 thinks JIT code can access pkey 1, but it in fact cannot. A reproduction of this problem can be found in this gist which uses the v8 crate to run some code, then a segfault is generated and caught, then code is run again. This second execution crashes.


Ok so what to do about this. I found that v8::V8::set_flags_from_string("--no-memory_protection_keys") looks promising. That fixes the standalone test case here but it causes V8 to segfault in a different location when passed during fuzzing. According to the man page about pkeys signal handlers need to manually restore the state after they're received. Looks like sig{set,long}jmp don't solve this automatically.

I have run out of ideas of how to fix this. If OSS-Fuzz picks up hardware with MPK we'll probably have to disable differential execution with v8 if nothing else changes.

@alexcrichton
Copy link
Member

That fixes the standalone test case here but it causes V8 to segfault in a different location when passed during fuzzing.

Looks like this flag doesn't fully disable MPK. The crash is in V8 with si_code = 4 which shows pkey misconfiguration again. So looks like that flag only partially disables MPK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior in the current implementation that needs fixing fuzzing Issues related to our fuzzing infrastructure
Projects
None yet
Development

No branches or pull requests

3 participants