RFC: Baseline compilation in Wasmtime #28

saulecabrera · 2022-08-03T14:06:31Z

This RFC proposes the addition of a new WebAssembly (Wasm) compiler to Wasmtime:
a single-pass or “baseline” compiler. A baseline compiler works to improve
overall compilation performance, yielding faster startup times, at the cost of
less optimized code.

This RFC describes:

The motivation for a baseline compiler
A high-level overview and description of the structure of the baseline
compiler
A high-level overview of its integration with Wasmtime

cc @cfallin @fitzgen @alexcrichton @bnjbvr

This RFC proposes the addition of a new WebAssembly (Wasm) compiler to Wasmtime: a single-pass or “baseline” compiler. A baseline compiler works to improve overall compilation performance, yielding faster startup times, at the cost of less optimized code. This RFC describes: * The motivation for a baseline compiler * A high-level overview and description of the structure of the baseline compiler * A high-level overview of its integration with Wasmtime

accepted/wasmtime-baseline-compilation.md

alexcrichton · 2022-08-03T15:19:21Z

Personally this all sounds great to me, I'm excited to see this take shape! I'm think that reusing Cranelift backends is a great idea and I think the integration strategy chosen here (no tiering yet) is the right answer for now.

I suspect that various abstractions in Wasmtime which baseline compilation would want to use are not as optimal as they could be as-implemented today. We haven't focused much on compile time performance and Cranelift often dwarfs any cost in Wasmtime or other parts of the system, but I think having a baseline compiler push on these abstractions to optimize them is a great forcing function to have. Mostly just saying this as a word of caution that there may be some surprsing bottlenecks come time for performance measurements but it should all be easy enough to handle.

Lastly one thing I might recommend is if possible getting AArch64 support up and running as soon as possible. In such an architecture-specific context as a baseline compiler adding a second architecture I find can often radically change existing designs so getting that forcing function sooner rather than later may be helpful. That being said I don't think it's necessary to push on it immediately and if y'all would prefer to defer it to the second phase as you've mentioned here I also think that's ok.

accepted/wasmtime-baseline-compilation.md

fitzgen

Exciting!

I think reusing Cranelift's instruction encoding but not vcode containers, and therefore avoiding any IR materialization between Wasm and machine code, is a great choice, practical, and complementary to Cranelift's position in the design space.

accepted/wasmtime-baseline-compilation.md

- Discussion of vector endianness (#4566) - Notification of plans to factor out part of Cranelift's base layer to share with Winch, a baseline compiler (bytecodealliance/rfcs#28)

- Discussion of vector endianness (bytecodealliance/wasmtime#4566) - Notification of plans to factor out part of Cranelift's base layer to share with Winch, a baseline compiler (bytecodealliance/rfcs#28)

bjorn3 · 2022-08-04T17:04:34Z

This paper may be useful:

https://arxiv.org/abs/2011.13127: Copy-and-Patch Compilation: A fast compilation algorithm for high-level languages and bytecode

Our WebAssembly compiler generates code 4.9X-6.5X faster than Liftoff, the WebAssembly baseline compiler in Google Chrome. The generated code also outperforms Liftoff's by 39%-63% on the Coremark and PolyBenchC WebAssembly benchmarks.

In the future if an interpreter tier is added below the baseline compiler the following is also interesting:

https://arxiv.org/abs/2205.01183: A fast in-place interpreter for WebAssembly

Our evaluation shows that in-place interpretation of Wasm code is space-efficient and fast, achieving performance on-par with interpreting a custom-designed internal format.

accepted/wasmtime-baseline-compilation.md

Move initial aarch64 support to phase 1 and move reference types to phase 2

cfallin · 2022-08-04T17:54:32Z

This paper may be useful:

https://arxiv.org/abs/2011.13127: Copy-and-Patch Compilation: A fast compilation algorithm for high-level languages and bytecode

Our WebAssembly compiler generates code 4.9X-6.5X faster than Liftoff, the WebAssembly baseline compiler in Google Chrome. The generated code also outperforms Liftoff's by 39%-63% on the Coremark and PolyBenchC WebAssembly benchmarks.

The paper is indeed really interesting, but the approach it takes is not terribly compatible with the goals that we have; in particular we don't want to depend on a C++ compiler and template specialization, and relocation magic, and then gluing together the machine code that comes from that separate offline process, to generate the actual bits of code. An equally important goal of the baseline compiler is to be straightforward to understand and modify; as such, a very simple "emit these three instructions for this Wasm opcode" mapping is ideal for maintainability.

saulecabrera · 2022-08-04T17:55:59Z

During today's wasmtime meeting, we discussed @fitzgen's and @alexcrichton's feedback around aarch64 support; we concluded that lowering the priority of Reference Types and bumping the priority of supporting at least a subset of aarch64 is probably the best path forward for the initial implementation. I've made the necessary changes to reflect this decision.

bjorn3 · 2022-08-05T09:49:42Z

in particular we don't want to depend on a C++ compiler and template specialization, and relocation magic, and then gluing together the machine code that comes from that separate offline process, to generate the actual bits of code.

Would it be possible to replace C++ with Cranelift IR? It could then be compiled at build time and at runtime.

saulecabrera · 2022-08-05T12:49:32Z

in particular we don't want to depend on a C++ compiler and template specialization, and relocation magic, and then gluing together the machine code that comes from that separate offline process, to generate the actual bits of code.

Would it be possible to replace C++ with Cranelift IR? It could then be compiled at build time and at runtime.

Personally I think that even if possible, this approach would still not be compatible with the design principles that we've laid out, mainly around (i) simplicity and (ii) no intermediate representation. In general I think that it might be worth thinking about keeping the implementation as simple and straightforward as possible.

bnjbvr

I'm quite excited by this proposal! Having a baseline compiler would help with experimenting with new wasm proposals faster, more easily support some dev-facing features (e.g. debugging), and be quite useful in use-cases where compile-time speed matters more than runtime speed.

I've added a few comments that should help with clarity, but I think this proposal is sound as it stands right now. Thanks for writing this up.

accepted/wasmtime-baseline-compilation.md

jameysharp · 2022-09-01T18:45:07Z

I want to expand on @bjorn3's suggestions. But, to be clear, I'm not going to argue that this alternative should replace the current RFC. If this RFC is the direction that you want to pursue, @saulecabrera, then I see no reason the rest of us shouldn't support you in that. That said, here's an alternative that I think has some advantages.

Imagine that for each WebAssembly instruction, we write down a sequence of Cranelift instructions to implement it, with holes for registers and labels. (To keep this short, I'm hand-waving things like control-flow.) That's basically what cranelift/wasm/src/code_translator.rs is, if you squint.

We currently get an optimizing WebAssembly compiler by filling those CLIF sequences with virtual registers from abstract interpretation, concatenating the snippets, and then handing the result off to Cranelift.

We could instead get a baseline compiler from those snippets of CLIF by compiling each snippet to native code in isolation, and then concatenating the compiled snippets. Compiling the snippets to native code can happen during the wasmtime build if desired. The number of variations we need to compile of each snippet is bounded by the number of physical registers used to shadow the wasm stack. This is essentially Copy-and-Patch, but using the compiler we already maintain instead of a C++ compiler.

And we could also get an interpreter by inserting those CLIF snippets into a dispatch loop, also written in CLIF, with register holes filled in by loads and stores to an explicit value stack. The "fast in-place interpreter" paper implemented the interpreter in native assembly because they couldn't control the generated code well enough from a high-level language, but I believe Cranelift IR gives us everything we'd need while being able to compile the interpreter for multiple architectures. (Chris tells me this resembles Spidermonkey's "baseline interpreter".)

I like this approach because we'd get three tiers, all supporting every target architecture, while only having to implement new instructions once. That appeals to me for giving us confidence that the tiers are all equally correct implementations.

fitzgen · 2022-09-01T19:30:53Z

Compiling the snippets to native code can happen during the wasmtime build if desired. The

I think this would be necessary to meet performance goals, since otherwise we would be materializing a bunch of IR that we wouldn't need to if we didn't do this approach.

Not clear what kind of impact this would have on the Wasmtime crate's compile times, but that is something I would be concerned about.

I like this approach because we'd get three tiers, all supporting every target architecture, while only having to implement new instructions once. That appeals to me for giving us confidence that the tiers are all equally correct implementations.

Yes, this is pretty nice. Tempting. I like using CLIF to side step using a C++ compiler and making a super hacky post-processing linker that chops off prologues and epilogues and patches control flow together. I like having control over the shape of the code that we would paste together, so we can be sure we aren't messing up edge cases.

But also, this is a bit more work to get to an initial baseline compiler implementation than what the RFC proposes.

cfallin · 2022-09-01T21:16:17Z

I do like @jameysharp 's idea here, from first principles: it would be a nice way to be able to obtain a series of compilers from one source-of-truth. We should experiment with this in the future, I think.

The tradeoff space I see right now is, on the one hand, a "handwritten straightforward baseline compiler (this RFC) with the advantages of:

Simple to verify by looking: one sees exactly the instructions that will be emitted for any given opcode, locked down in the source and not indirectly from any other toolchain or generation process;
Simpler to add new opcodes to than Cranelift. While on the one hand if we assume that all features will be in Cranelift so leveraging this work is "free", on the other hand in practice engines that have a separate baseline tier (like SpiderMonkey) often add features to the baseline compiler first, because it's an order of magnitude simpler. This could allow us to build out early support for some features and experiment more freely before we work out the details of a high-performance, optimizing-tier implementation.
A separate implementation adds "verification diversity": we can differentially fuzz Cranelift against this new backend and possibly find issues.

On the other hand:

A copy-and-patch-like approach that stitches together Cranelift-compiled fragments can much more flexibly grow its "library" into many op-combining possibilities. This would be a natural design point between an op-by-op baseline compiler / template-JIT (this RFC) and a full optimizing tier. "Macro-op fusion" is still a single-pass approach so retains compile speed while possibly getting pretty decent isel.
If we assume that the Cranelift implementation is a sunk cost, then this does give us baseline compiler support for new features as we add them for "free". We just need to pay for the complexity and effort to build the framework once.

also, to amplify and add a bit to @fitzgen's point here

Not clear what kind of impact this would have on the Wasmtime crate's compile times, but that is something I would be concerned about.

I'd be concerned about build complexity as well: this implies a "bootstrap compiler" that is built during the compiler build process to compile another compiler (or data that is embedded into it); that's probably nontrivial cost. And it implies that Cranelift itself is at least a dev-dependency of the baseline compiler, which may or may not be desirable.

On balance I think I still favor the approach in this RFC, but these ideas are intriguing for a possible later exploration (so thanks again for raising them!).

jameysharp · 2022-09-01T21:26:16Z

Thanks, Nick and Chris! I generally agree with both of you, although I'd quibble about the value of implementation diversity: we already have plenty of wasm implementations to use for differential testing. I have more thoughts on engineering details such as build time and build complexity, but I don't want to sidetrack discussion on this RFC further. So unless consensus swings towards implementing the copy-and-patch thing, I'll leave it at that.

saulecabrera · 2022-09-01T22:27:04Z

Thanks for expanding on this @jameysharp; I find your thoughts interesting here, mostly from the perspective of maximizing reusability and correctness. I must admit that one of the biggest unknowns for me, is the build-time complexity that this approach will/might introduce and what will be the ROI of that cost; in paper the ideas seem compelling, but I find it a bit hard to quantify in the scope of Wasmtime without more investment/investigation in that direction.

Expanding on @cfallin's point above around the simplicity to add new features:

One side-effect that I'm hoping to get with the current proposal (which is not explicit in the RFC and perhaps I should add it), is to make easier to add and maintain developer-facing features, the main feature that I have in mind is debugging. I feel that with the Copy-and-Patch approach, this will be more difficult to achieve (this is purely an assumption, so I might be wrong here).

On correctness: FWIW, we discussed (and the RFC states) integrating fuzzing in phase 1 to ensure a correctness-first approach; this doesn't void the fact that having a single source of truth is potentially less error-prone.

In conclusion my feeling is that switching the approach has the potential to bring in a set of desirable benefits, but we'd need to deal with more unknowns/complexity. The approach that we have today, offers a higher level of certainty in terms of maintenance, compilation and runtime performance in the scope of Wasmtime. That said, if there's a way to leave the door open to incrementally investigate this down the road, on top of the current proposal, I think we should consider it.

jameysharp · 2022-09-01T22:53:21Z

In conclusion my feeling is that switching the approach has the potential to bring in a set of desirable benefits, but we'd need to deal with more unknowns/complexity. The approach that we have today, offers a higher level of certainty in terms of maintenance, compilation and runtime performance in the scope of Wasmtime.

I agree. My guess is that copy-and-patch would let us maintain less total code in the end, but compared to the RFC, it's definitely not as clear how we get from here to there.

That said, if there's a way to leave the door open to incrementally investigate this down the road, on top of the current proposal, I think we should consider it.

I don't think there's anything about the current RFC that makes it hard to do something like copy-and-patch later. As long as Wasmtime has a path that still uses Cranelift, we can try slicing it up any time. We absolutely don't have to choose one approach or the other.

On the other hand, I don't think the work outlined in this RFC helps much with implementing copy-and-patch. So, if folks felt like that was something we want to do in the short term, then we could save some effort overall by going straight there. But since copy-and-patch has more uncertainty and engineering risk, I think it's reasonable to follow the RFC's plan first.

One side-effect that I'm hoping to get with the current proposal (which is not explicit in the RFC and perhaps I should add it), is to make easier to add and maintain developer-facing features, the main feature that I have in mind is debugging.

I think that would be a great note to add to the RFC, as well as these points that Chris made:

Simple to verify by looking: one sees exactly the instructions that will be emitted for any given opcode, locked down in the source and not indirectly from any other toolchain or generation process;

Simpler to add new opcodes to than Cranelift. [...] in practice engines that have a separate baseline tier (like SpiderMonkey) often add features to the baseline compiler first, because it's an order of magnitude simpler. This could allow us to build out early support for some features and experiment more freely before we work out the details of a high-performance, optimizing-tier implementation.

saulecabrera · 2023-01-18T14:02:00Z

The discussion here has stabilized and some considerable pieces of the compiler have landed in Wasmtime. With this, I would like to make a

Motion to finalize with disposition to merge

I've updated the stakeholder list to contain all stakeholders that have participated in Winch related discussions.

Embark Studios

@bnjbvr

Fastly

Intel

Shopify

@saulecabrera

Unaffiliated

@bjorn3

saulecabrera · 2023-01-18T21:14:53Z

Given the approvals, I'll move the RFC to the 10-day final comment period which spans until January 28th, 2023.

cfallin · 2023-01-30T18:03:59Z

The FCP has elapsed without any objections; therefore, we're now merging this RFC! Thanks @saulecabrera for defining this and doing the implementation work so far; it's exciting to see.

lqd reviewed Aug 3, 2022

View reviewed changes

accepted/wasmtime-baseline-compilation.md Outdated Show resolved Hide resolved

lqd reviewed Aug 3, 2022

View reviewed changes

accepted/wasmtime-baseline-compilation.md Outdated Show resolved Hide resolved

fitzgen reviewed Aug 3, 2022

View reviewed changes

accepted/wasmtime-baseline-compilation.md Outdated Show resolved Hide resolved

accepted/wasmtime-baseline-compilation.md Outdated Show resolved Hide resolved

accepted/wasmtime-baseline-compilation.md Outdated Show resolved Hide resolved

cfallin mentioned this pull request Aug 3, 2022

Add two items to Cranelift agenda for Aug 8. bytecodealliance/meetings#9

Merged

saulecabrera added 2 commits August 3, 2022 15:28

Typo corrections

d8d934e

Specify that changes will be introduced with a compile-time feature

00aacd0

bjorn3 reviewed Aug 4, 2022

View reviewed changes

accepted/wasmtime-baseline-compilation.md Outdated Show resolved Hide resolved

bjorn3 reviewed Aug 4, 2022

View reviewed changes

accepted/wasmtime-baseline-compilation.md Outdated Show resolved Hide resolved

bjorn3 reviewed Aug 4, 2022

View reviewed changes

accepted/wasmtime-baseline-compilation.md Show resolved Hide resolved

bjorn3 reviewed Aug 4, 2022

View reviewed changes

accepted/wasmtime-baseline-compilation.md Outdated Show resolved Hide resolved

saulecabrera added 2 commits August 4, 2022 13:40

Fix invalid rust snippet

47ad6a1

Modify phases

d1dffc1

Move initial aarch64 support to phase 1 and move reference types to phase 2

Explicitly call out Strategy::Winch

141a3c6

saulecabrera force-pushed the wasmtime-baseline-compilation branch from 2df33ed to 141a3c6 Compare August 5, 2022 12:59

bnjbvr reviewed Aug 8, 2022

View reviewed changes

saulecabrera added 2 commits August 8, 2022 15:39

General fixes to improve clarity

e04ccae

Explicitly call out Assembler

566ba62

Add new points to the _Design Principles_ section

e51a539

saulecabrera mentioned this pull request Sep 13, 2022

Initial skeleton for Winch bytecodealliance/wasmtime#4907

Merged

saulecabrera mentioned this pull request Nov 2, 2022

isle: Managing dependencies between crates bytecodealliance/wasmtime#5178

Open

bjorn3 approved these changes Jan 18, 2023

View reviewed changes

alexcrichton approved these changes Jan 18, 2023

View reviewed changes

cfallin merged commit 3f0ae11 into bytecodealliance:main Jan 30, 2023

saulecabrera deleted the wasmtime-baseline-compilation branch January 30, 2023 18:04

cfallin mentioned this pull request Jan 31, 2023

Cranelift Roadmap for 2023 #30

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Baseline compilation in Wasmtime #28

RFC: Baseline compilation in Wasmtime #28

saulecabrera commented Aug 3, 2022

alexcrichton commented Aug 3, 2022

fitzgen left a comment

bjorn3 commented Aug 4, 2022

cfallin commented Aug 4, 2022

saulecabrera commented Aug 4, 2022

bjorn3 commented Aug 5, 2022

saulecabrera commented Aug 5, 2022 •

edited

Loading

bnjbvr left a comment

jameysharp commented Sep 1, 2022

fitzgen commented Sep 1, 2022

cfallin commented Sep 1, 2022

jameysharp commented Sep 1, 2022

saulecabrera commented Sep 1, 2022

jameysharp commented Sep 1, 2022

saulecabrera commented Jan 18, 2023 •

edited by jameysharp

Loading

saulecabrera commented Jan 18, 2023

cfallin commented Jan 30, 2023

RFC: Baseline compilation in Wasmtime #28

RFC: Baseline compilation in Wasmtime #28

Conversation

saulecabrera commented Aug 3, 2022

alexcrichton commented Aug 3, 2022

fitzgen left a comment

Choose a reason for hiding this comment

bjorn3 commented Aug 4, 2022

cfallin commented Aug 4, 2022

saulecabrera commented Aug 4, 2022

bjorn3 commented Aug 5, 2022

saulecabrera commented Aug 5, 2022 • edited Loading

bnjbvr left a comment

Choose a reason for hiding this comment

jameysharp commented Sep 1, 2022

fitzgen commented Sep 1, 2022

cfallin commented Sep 1, 2022

jameysharp commented Sep 1, 2022

saulecabrera commented Sep 1, 2022

jameysharp commented Sep 1, 2022

saulecabrera commented Jan 18, 2023 • edited by jameysharp Loading

Motion to finalize with disposition to merge

Embark Studios

Fastly

Intel

Shopify

Unaffiliated

saulecabrera commented Jan 18, 2023

cfallin commented Jan 30, 2023

saulecabrera commented Aug 5, 2022 •

edited

Loading

saulecabrera commented Jan 18, 2023 •

edited by jameysharp

Loading