Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PCC: support x86-64. #7352

Merged
merged 3 commits into from
Oct 26, 2023
Merged

PCC: support x86-64. #7352

merged 3 commits into from
Oct 26, 2023

Conversation

cfallin
Copy link
Member

@cfallin cfallin commented Oct 24, 2023

This PR extends the proof-carrying-code infrastructure to support x86-64 as well as aarch64. In the process, many of the mechanisms had to be made a little more general.

One important change is that the PCC leaves more "breadcrumbs" on the frontend now, avoiding the need for magic handling of facts on constant values, etc., in the backend. For the first time a lowering rule also gains the ability to add a fact to a vreg to preserve the chain as well.

With these changes, we can validate compilation of SpiderMonkey.wasm with Wasm static memories on x86-64 and aarch64:

cfallin@fastly2:~/work/wasmtime% target/release/wasmtime compile -C pcc=yes --target x86_64 ../wasm-tests/spidermonkey.wasm
cfallin@fastly2:~/work/wasmtime% target/release/wasmtime compile -C pcc=yes --target aarch64 ../wasm-tests/spidermonkey.wasm
cfallin@fastly2:~/work/wasmtime%

This PR extends the proof-carrying-code infrastructure to support x86-64
as well as aarch64. In the process, many of the mechanisms had to be
made a little more general.

One important change is that the PCC leaves more "breadcrumbs" on the
frontend now, avoiding the need for magic handling of facts on constant
values, etc., in the backend. For the first time a lowering rule also
gains the ability to add a fact to a vreg to preserve the chain as well.

With these changes, we can validate compilation of SpiderMonkey.wasm
with Wasm static memories on x86-64 and aarch64:

```
cfallin@fastly2:~/work/wasmtime% target/release/wasmtime compile -C pcc=yes --target x86_64 ../wasm-tests/spidermonkey.wasm
cfallin@fastly2:~/work/wasmtime% target/release/wasmtime compile -C pcc=yes --target aarch64 ../wasm-tests/spidermonkey.wasm
cfallin@fastly2:~/work/wasmtime%
```
@cfallin cfallin requested a review from a team as a code owner October 24, 2023 21:36
@cfallin cfallin requested review from abrown and fitzgen and removed request for a team and abrown October 24, 2023 21:36
@cfallin
Copy link
Member Author

cfallin commented Oct 24, 2023

A performance measurement, also:

cfallin@fastly2:~/work/wasmtime% hyperfine -L pcc no,yes "target/release/wasmtime compile -C pcc={pcc} --target x86_64 ../wasm-tests/spidermonkey.wasm"
Benchmark 1: target/release/wasmtime compile -C pcc=no --target x86_64 ../wasm-tests/spidermonkey.wasm
  Time (mean ± σ):     995.9 ms ±  13.1 ms    [User: 7685.1 ms, System: 347.6 ms]
  Range (min … max):   981.5 ms … 1015.4 ms    10 runs

Benchmark 2: target/release/wasmtime compile -C pcc=yes --target x86_64 ../wasm-tests/spidermonkey.wasm
  Time (mean ± σ):      1.009 s ±  0.008 s    [User: 7.828 s, System: 0.349 s]
  Range (min … max):    0.998 s …  1.026 s    10 runs

Summary
  target/release/wasmtime compile -C pcc=no --target x86_64 ../wasm-tests/spidermonkey.wasm ran
    1.01 ± 0.02 times faster than target/release/wasmtime compile -C pcc=yes --target x86_64 ../wasm-tests/spidermonkey.wasm

or in other words, ~1% overhead.

Prior to this PR, turning on PCC automatically enabled the regalloc checker as well; I found this to have much higher overhead:

cfallin@fastly2:~/work/wasmtime% hyperfine -L checker no,yes "target/release/wasmtime compile -C pcc=yes -C cranelift-regalloc_checker={checker} --target x86_64 ../wasm-tests/spidermonkey.wasm"
Benchmark 1: target/release/wasmtime compile -C pcc=yes -C cranelift-regalloc_checker=no --target x86_64 ../wasm-tests/spidermonkey.wasm
  Time (mean ± σ):      1.034 s ±  0.010 s    [User: 7.798 s, System: 0.362 s]
  Range (min … max):    1.018 s …  1.055 s    10 runs

Benchmark 2: target/release/wasmtime compile -C pcc=yes -C cranelift-regalloc_checker=yes --target x86_64 ../wasm-tests/spidermonkey.wasm
  Time (mean ± σ):      1.741 s ±  0.033 s    [User: 15.546 s, System: 0.393 s]
  Range (min … max):    1.710 s …  1.820 s    10 runs

Summary
  target/release/wasmtime compile -C pcc=yes -C cranelift-regalloc_checker=no --target x86_64 ../wasm-tests/spidermonkey.wasm ran
    1.68 ± 0.04 times faster than target/release/wasmtime compile -C pcc=yes -C cranelift-regalloc_checker=yes --target x86_64 ../wasm-tests/spidermonkey.wasm

or about 68% above just PCC. Given that, IMHO a good design tradeoff point is to run PCC in production, but not the regalloc checker; we already fuzz continuously with the latter. It can always be turned on explicitly.

@jeffparsons
Copy link
Contributor

An inquisitive member of the peanut gallery would like to know if you have written anything high-level about your goals for this work? I've seen the PRs flying by and it sounds really cool, but I don't understand any of the context. In particular, I've been wondering:

  • Is this primarily aimed at having another layer of safety without compromising performance, or does it unlock opportunities for increased performance without having to compromise on existing safety by replacing blunter mechanisms?
  • Are there particular workloads that you expect this work to benefit?
  • If increased performance is a goal, do you have any targets/estimates/hopes in mind?

@cfallin
Copy link
Member Author

cfallin commented Oct 24, 2023

@jeffparsons great questions! I haven't written anything beyond the initial issue proposing this work in #6090 -- the last section of that issue writeup describes the proof-carrying code / "memory capabilities" model. I plan to write more eventually.

The goal doesn't have anything to do with perf -- the generated code doesn't change, and this doesn't allow any more aggressive strategies to be used -- but rather, risk mitigation. We've had a few CVEs that have allowed sandbox escapes from Wasmtime due to miscompiles, and so I want to build infrastructure that does translation validation to prove a given compilation artifact doesn't have such an issue. Long-term, it could also be used to verify other invariants (e.g., @fitzgen and I have talked a bit about how it could be used to provide additional safety in the implementation of Wasm GC).

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen labels Oct 25, 2023
Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!!

A few comments below.

cranelift/codegen/src/isa/x64/inst/args.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/isa/x64/inst/args.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/isa/x64/inst/args.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/isa/x64/pcc.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/isa/x64/pcc.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/isa/x64/inst.isle Show resolved Hide resolved
@cfallin
Copy link
Member Author

cfallin commented Oct 26, 2023

I reworked the whole PCC implementation for x64 based on the above feedback -- removing the ability to pattern-match into Gpr / Xmm types forced a transpose of the whole thing, but as a side-effect, I think the explicit case breakdown is kind of nice in its thoroughness. I was able to actually remove the _ catch-all and list every instruction kind explicitly, so we'll be forced to think about semantics (and catch memory accesses, etc.) whenever we add a new instruction kind. Let me know what you think!

@fitzgen fitzgen enabled auto-merge October 26, 2023 18:10
@fitzgen fitzgen disabled auto-merge October 26, 2023 18:10
@cfallin cfallin enabled auto-merge October 26, 2023 19:17
@cfallin cfallin added this pull request to the merge queue Oct 26, 2023
Merged via the queue into bytecodealliance:main with commit f262c31 Oct 26, 2023
40 checks passed
@cfallin cfallin deleted the pcc-x86 branch October 26, 2023 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:x64 Issues related to x64 codegen cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants