PCC: support x86-64. #7352

cfallin · 2023-10-24T21:36:21Z

This PR extends the proof-carrying-code infrastructure to support x86-64 as well as aarch64. In the process, many of the mechanisms had to be made a little more general.

One important change is that the PCC leaves more "breadcrumbs" on the frontend now, avoiding the need for magic handling of facts on constant values, etc., in the backend. For the first time a lowering rule also gains the ability to add a fact to a vreg to preserve the chain as well.

With these changes, we can validate compilation of SpiderMonkey.wasm with Wasm static memories on x86-64 and aarch64:

cfallin@fastly2:~/work/wasmtime% target/release/wasmtime compile -C pcc=yes --target x86_64 ../wasm-tests/spidermonkey.wasm
cfallin@fastly2:~/work/wasmtime% target/release/wasmtime compile -C pcc=yes --target aarch64 ../wasm-tests/spidermonkey.wasm
cfallin@fastly2:~/work/wasmtime%

This PR extends the proof-carrying-code infrastructure to support x86-64 as well as aarch64. In the process, many of the mechanisms had to be made a little more general. One important change is that the PCC leaves more "breadcrumbs" on the frontend now, avoiding the need for magic handling of facts on constant values, etc., in the backend. For the first time a lowering rule also gains the ability to add a fact to a vreg to preserve the chain as well. With these changes, we can validate compilation of SpiderMonkey.wasm with Wasm static memories on x86-64 and aarch64: ``` cfallin@fastly2:~/work/wasmtime% target/release/wasmtime compile -C pcc=yes --target x86_64 ../wasm-tests/spidermonkey.wasm cfallin@fastly2:~/work/wasmtime% target/release/wasmtime compile -C pcc=yes --target aarch64 ../wasm-tests/spidermonkey.wasm cfallin@fastly2:~/work/wasmtime% ```

…fairly expensive.

cfallin · 2023-10-24T21:56:05Z

A performance measurement, also:

cfallin@fastly2:~/work/wasmtime% hyperfine -L pcc no,yes "target/release/wasmtime compile -C pcc={pcc} --target x86_64 ../wasm-tests/spidermonkey.wasm"
Benchmark 1: target/release/wasmtime compile -C pcc=no --target x86_64 ../wasm-tests/spidermonkey.wasm
  Time (mean ± σ):     995.9 ms ±  13.1 ms    [User: 7685.1 ms, System: 347.6 ms]
  Range (min … max):   981.5 ms … 1015.4 ms    10 runs

Benchmark 2: target/release/wasmtime compile -C pcc=yes --target x86_64 ../wasm-tests/spidermonkey.wasm
  Time (mean ± σ):      1.009 s ±  0.008 s    [User: 7.828 s, System: 0.349 s]
  Range (min … max):    0.998 s …  1.026 s    10 runs

Summary
  target/release/wasmtime compile -C pcc=no --target x86_64 ../wasm-tests/spidermonkey.wasm ran
    1.01 ± 0.02 times faster than target/release/wasmtime compile -C pcc=yes --target x86_64 ../wasm-tests/spidermonkey.wasm

or in other words, ~1% overhead.

Prior to this PR, turning on PCC automatically enabled the regalloc checker as well; I found this to have much higher overhead:

cfallin@fastly2:~/work/wasmtime% hyperfine -L checker no,yes "target/release/wasmtime compile -C pcc=yes -C cranelift-regalloc_checker={checker} --target x86_64 ../wasm-tests/spidermonkey.wasm"
Benchmark 1: target/release/wasmtime compile -C pcc=yes -C cranelift-regalloc_checker=no --target x86_64 ../wasm-tests/spidermonkey.wasm
  Time (mean ± σ):      1.034 s ±  0.010 s    [User: 7.798 s, System: 0.362 s]
  Range (min … max):    1.018 s …  1.055 s    10 runs

Benchmark 2: target/release/wasmtime compile -C pcc=yes -C cranelift-regalloc_checker=yes --target x86_64 ../wasm-tests/spidermonkey.wasm
  Time (mean ± σ):      1.741 s ±  0.033 s    [User: 15.546 s, System: 0.393 s]
  Range (min … max):    1.710 s …  1.820 s    10 runs

Summary
  target/release/wasmtime compile -C pcc=yes -C cranelift-regalloc_checker=no --target x86_64 ../wasm-tests/spidermonkey.wasm ran
    1.68 ± 0.04 times faster than target/release/wasmtime compile -C pcc=yes -C cranelift-regalloc_checker=yes --target x86_64 ../wasm-tests/spidermonkey.wasm

or about 68% above just PCC. Given that, IMHO a good design tradeoff point is to run PCC in production, but not the regalloc checker; we already fuzz continuously with the latter. It can always be turned on explicitly.

jeffparsons · 2023-10-24T22:09:49Z

An inquisitive member of the peanut gallery would like to know if you have written anything high-level about your goals for this work? I've seen the PRs flying by and it sounds really cool, but I don't understand any of the context. In particular, I've been wondering:

Is this primarily aimed at having another layer of safety without compromising performance, or does it unlock opportunities for increased performance without having to compromise on existing safety by replacing blunter mechanisms?
Are there particular workloads that you expect this work to benefit?
If increased performance is a goal, do you have any targets/estimates/hopes in mind?

cfallin · 2023-10-24T22:31:13Z

@jeffparsons great questions! I haven't written anything beyond the initial issue proposing this work in #6090 -- the last section of that issue writeup describes the proof-carrying code / "memory capabilities" model. I plan to write more eventually.

The goal doesn't have anything to do with perf -- the generated code doesn't change, and this doesn't allow any more aggressive strategies to be used -- but rather, risk mitigation. We've had a few CVEs that have allowed sandbox escapes from Wasmtime due to miscompiles, and so I want to build infrastructure that does translation validation to prove a given compilation artifact doesn't have such an issue. Long-term, it could also be used to verify other invariants (e.g., @fitzgen and I have talked a bit about how it could be used to provide additional safety in the implementation of Wasm GC).

fitzgen

Nice!!

A few comments below.

cranelift/codegen/src/isa/x64/inst/args.rs

cranelift/codegen/src/isa/x64/pcc.rs

cranelift/codegen/src/isa/x64/inst.isle

… explicitly match every instruction kind.

cfallin · 2023-10-26T01:47:42Z

I reworked the whole PCC implementation for x64 based on the above feedback -- removing the ability to pattern-match into Gpr / Xmm types forced a transpose of the whole thing, but as a side-effect, I think the explicit case breakdown is kind of nice in its thoroughness. I was able to actually remove the _ catch-all and list every instruction kind explicitly, so we'll be forced to think about semantics (and catch memory accesses, etc.) whenever we add a new instruction kind. Let me know what you think!

cfallin requested a review from a team as a code owner October 24, 2023 21:36

cfallin requested review from abrown and fitzgen and removed request for a team and abrown October 24, 2023 21:36

Don't run regalloc checker if not requested in addition to PCC; it's …

c1ed20c

…fairly expensive.

fitzgen approved these changes Oct 25, 2023

View reviewed changes

Refactor x64 PCC code to avoid deep pattern matches on Gpr/Xmm types;…

a0cfee2

… explicitly match every instruction kind.

fitzgen approved these changes Oct 26, 2023

View reviewed changes

fitzgen enabled auto-merge October 26, 2023 18:10

fitzgen disabled auto-merge October 26, 2023 18:10

cfallin force-pushed the pcc-x86 branch from 72a09ec to a0cfee2 Compare October 26, 2023 18:43

cfallin enabled auto-merge October 26, 2023 19:17

cfallin added this pull request to the merge queue Oct 26, 2023

Merged via the queue into bytecodealliance:main with commit f262c31 Oct 26, 2023
40 checks passed

cfallin deleted the pcc-x86 branch October 26, 2023 20:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PCC: support x86-64. #7352

PCC: support x86-64. #7352

cfallin commented Oct 24, 2023

cfallin commented Oct 24, 2023

jeffparsons commented Oct 24, 2023

cfallin commented Oct 24, 2023

fitzgen left a comment

cfallin commented Oct 26, 2023

PCC: support x86-64. #7352

PCC: support x86-64. #7352

Conversation

cfallin commented Oct 24, 2023

cfallin commented Oct 24, 2023

jeffparsons commented Oct 24, 2023

cfallin commented Oct 24, 2023

fitzgen left a comment

Choose a reason for hiding this comment

cfallin commented Oct 26, 2023