-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Baseline compilation in Wasmtime #28
RFC: Baseline compilation in Wasmtime #28
Conversation
This RFC proposes the addition of a new WebAssembly (Wasm) compiler to Wasmtime: a single-pass or “baseline” compiler. A baseline compiler works to improve overall compilation performance, yielding faster startup times, at the cost of less optimized code. This RFC describes: * The motivation for a baseline compiler * A high-level overview and description of the structure of the baseline compiler * A high-level overview of its integration with Wasmtime
Personally this all sounds great to me, I'm excited to see this take shape! I'm think that reusing Cranelift backends is a great idea and I think the integration strategy chosen here (no tiering yet) is the right answer for now. I suspect that various abstractions in Wasmtime which baseline compilation would want to use are not as optimal as they could be as-implemented today. We haven't focused much on compile time performance and Cranelift often dwarfs any cost in Wasmtime or other parts of the system, but I think having a baseline compiler push on these abstractions to optimize them is a great forcing function to have. Mostly just saying this as a word of caution that there may be some surprsing bottlenecks come time for performance measurements but it should all be easy enough to handle. Lastly one thing I might recommend is if possible getting AArch64 support up and running as soon as possible. In such an architecture-specific context as a baseline compiler adding a second architecture I find can often radically change existing designs so getting that forcing function sooner rather than later may be helpful. That being said I don't think it's necessary to push on it immediately and if y'all would prefer to defer it to the second phase as you've mentioned here I also think that's ok. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exciting!
I think reusing Cranelift's instruction encoding but not vcode containers, and therefore avoiding any IR materialization between Wasm and machine code, is a great choice, practical, and complementary to Cranelift's position in the design space.
- Discussion of vector endianness (#4566) - Notification of plans to factor out part of Cranelift's base layer to share with Winch, a baseline compiler (bytecodealliance/rfcs#28)
- Discussion of vector endianness (bytecodealliance/wasmtime#4566) - Notification of plans to factor out part of Cranelift's base layer to share with Winch, a baseline compiler (bytecodealliance/rfcs#28)
This paper may be useful: https://arxiv.org/abs/2011.13127: Copy-and-Patch Compilation: A fast compilation algorithm for high-level languages and bytecode
In the future if an interpreter tier is added below the baseline compiler the following is also interesting: https://arxiv.org/abs/2205.01183: A fast in-place interpreter for WebAssembly
|
Move initial aarch64 support to phase 1 and move reference types to phase 2
The paper is indeed really interesting, but the approach it takes is not terribly compatible with the goals that we have; in particular we don't want to depend on a C++ compiler and template specialization, and relocation magic, and then gluing together the machine code that comes from that separate offline process, to generate the actual bits of code. An equally important goal of the baseline compiler is to be straightforward to understand and modify; as such, a very simple "emit these three instructions for this Wasm opcode" mapping is ideal for maintainability. |
During today's wasmtime meeting, we discussed @fitzgen's and @alexcrichton's feedback around aarch64 support; we concluded that lowering the priority of Reference Types and bumping the priority of supporting at least a subset of aarch64 is probably the best path forward for the initial implementation. I've made the necessary changes to reflect this decision. |
Would it be possible to replace C++ with Cranelift IR? It could then be compiled at build time and at runtime. |
Personally I think that even if possible, this approach would still not be compatible with the design principles that we've laid out, mainly around (i) simplicity and (ii) no intermediate representation. In general I think that it might be worth thinking about keeping the implementation as simple and straightforward as possible. |
2df33ed
to
141a3c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm quite excited by this proposal! Having a baseline compiler would help with experimenting with new wasm proposals faster, more easily support some dev-facing features (e.g. debugging), and be quite useful in use-cases where compile-time speed matters more than runtime speed.
I've added a few comments that should help with clarity, but I think this proposal is sound as it stands right now. Thanks for writing this up.
I want to expand on @bjorn3's suggestions. But, to be clear, I'm not going to argue that this alternative should replace the current RFC. If this RFC is the direction that you want to pursue, @saulecabrera, then I see no reason the rest of us shouldn't support you in that. That said, here's an alternative that I think has some advantages. Imagine that for each WebAssembly instruction, we write down a sequence of Cranelift instructions to implement it, with holes for registers and labels. (To keep this short, I'm hand-waving things like control-flow.) That's basically what We currently get an optimizing WebAssembly compiler by filling those CLIF sequences with virtual registers from abstract interpretation, concatenating the snippets, and then handing the result off to Cranelift. We could instead get a baseline compiler from those snippets of CLIF by compiling each snippet to native code in isolation, and then concatenating the compiled snippets. Compiling the snippets to native code can happen during the wasmtime build if desired. The number of variations we need to compile of each snippet is bounded by the number of physical registers used to shadow the wasm stack. This is essentially Copy-and-Patch, but using the compiler we already maintain instead of a C++ compiler. And we could also get an interpreter by inserting those CLIF snippets into a dispatch loop, also written in CLIF, with register holes filled in by loads and stores to an explicit value stack. The "fast in-place interpreter" paper implemented the interpreter in native assembly because they couldn't control the generated code well enough from a high-level language, but I believe Cranelift IR gives us everything we'd need while being able to compile the interpreter for multiple architectures. (Chris tells me this resembles Spidermonkey's "baseline interpreter".) I like this approach because we'd get three tiers, all supporting every target architecture, while only having to implement new instructions once. That appeals to me for giving us confidence that the tiers are all equally correct implementations. |
I think this would be necessary to meet performance goals, since otherwise we would be materializing a bunch of IR that we wouldn't need to if we didn't do this approach. Not clear what kind of impact this would have on the Wasmtime crate's compile times, but that is something I would be concerned about.
Yes, this is pretty nice. Tempting. I like using CLIF to side step using a C++ compiler and making a super hacky post-processing linker that chops off prologues and epilogues and patches control flow together. I like having control over the shape of the code that we would paste together, so we can be sure we aren't messing up edge cases. But also, this is a bit more work to get to an initial baseline compiler implementation than what the RFC proposes. |
I do like @jameysharp 's idea here, from first principles: it would be a nice way to be able to obtain a series of compilers from one source-of-truth. We should experiment with this in the future, I think. The tradeoff space I see right now is, on the one hand, a "handwritten straightforward baseline compiler (this RFC) with the advantages of:
On the other hand:
also, to amplify and add a bit to @fitzgen's point here
I'd be concerned about build complexity as well: this implies a "bootstrap compiler" that is built during the compiler build process to compile another compiler (or data that is embedded into it); that's probably nontrivial cost. And it implies that Cranelift itself is at least a dev-dependency of the baseline compiler, which may or may not be desirable. On balance I think I still favor the approach in this RFC, but these ideas are intriguing for a possible later exploration (so thanks again for raising them!). |
Thanks, Nick and Chris! I generally agree with both of you, although I'd quibble about the value of implementation diversity: we already have plenty of wasm implementations to use for differential testing. I have more thoughts on engineering details such as build time and build complexity, but I don't want to sidetrack discussion on this RFC further. So unless consensus swings towards implementing the copy-and-patch thing, I'll leave it at that. |
Thanks for expanding on this @jameysharp; I find your thoughts interesting here, mostly from the perspective of maximizing reusability and correctness. I must admit that one of the biggest unknowns for me, is the build-time complexity that this approach will/might introduce and what will be the ROI of that cost; in paper the ideas seem compelling, but I find it a bit hard to quantify in the scope of Wasmtime without more investment/investigation in that direction. Expanding on @cfallin's point above around the simplicity to add new features: One side-effect that I'm hoping to get with the current proposal (which is not explicit in the RFC and perhaps I should add it), is to make easier to add and maintain developer-facing features, the main feature that I have in mind is debugging. I feel that with the Copy-and-Patch approach, this will be more difficult to achieve (this is purely an assumption, so I might be wrong here). On correctness: FWIW, we discussed (and the RFC states) integrating fuzzing in phase 1 to ensure a correctness-first approach; this doesn't void the fact that having a single source of truth is potentially less error-prone. In conclusion my feeling is that switching the approach has the potential to bring in a set of desirable benefits, but we'd need to deal with more unknowns/complexity. The approach that we have today, offers a higher level of certainty in terms of maintenance, compilation and runtime performance in the scope of Wasmtime. That said, if there's a way to leave the door open to incrementally investigate this down the road, on top of the current proposal, I think we should consider it. |
I agree. My guess is that copy-and-patch would let us maintain less total code in the end, but compared to the RFC, it's definitely not as clear how we get from here to there.
I don't think there's anything about the current RFC that makes it hard to do something like copy-and-patch later. As long as Wasmtime has a path that still uses Cranelift, we can try slicing it up any time. We absolutely don't have to choose one approach or the other. On the other hand, I don't think the work outlined in this RFC helps much with implementing copy-and-patch. So, if folks felt like that was something we want to do in the short term, then we could save some effort overall by going straight there. But since copy-and-patch has more uncertainty and engineering risk, I think it's reasonable to follow the RFC's plan first.
I think that would be a great note to add to the RFC, as well as these points that Chris made:
|
The discussion here has stabilized and some considerable pieces of the compiler have landed in Wasmtime. With this, I would like to make a Motion to finalize with disposition to mergeI've updated the stakeholder list to contain all stakeholders that have participated in Winch related discussions. Embark StudiosFastlyIntelShopifyUnaffiliated |
Given the approvals, I'll move the RFC to the 10-day final comment period which spans until January 28th, 2023. |
The FCP has elapsed without any objections; therefore, we're now merging this RFC! Thanks @saulecabrera for defining this and doing the implementation work so far; it's exciting to see. |
Rendered
This RFC proposes the addition of a new WebAssembly (Wasm) compiler to Wasmtime:
a single-pass or “baseline” compiler. A baseline compiler works to improve
overall compilation performance, yielding faster startup times, at the cost of
less optimized code.
This RFC describes:
compiler
cc @cfallin @fitzgen @alexcrichton @bnjbvr