-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How much should GC be influenced by JS engine design? #125
Comments
Practically speaking, since Web VMs are major stakeholders in WebAssembly, it seems expected and reasonable to me that their implementation concerns should influence the design of Wasm GC. Note that the exchange you quoted regards a piece of the MVP design that was explicitly influenced by feedback from V8's implementation work (see #91); in my view, this is the process working-as-intended. While there's certainly a balance to be had between considering existing implementation constraints and hypothetical implementations, concrete implementation feedback is one of the most effective ways I've seen in the web standards world of making progress, and I wouldn't want to see implementation feedback from web VMs discounted. |
Certainly JS engine feedback is extremely valuable in figuring out the details of any particular proposal, and I wouldn't want to see that feedback discounted either. This question is more about the broader discussion we've been having about alternative designs and what the goals of the GC proposal should be. The GC proposal must allow for the implementation strategies that Web engines will want to use, but it's unclear whether it should, may, or should not allow for additional standard implementation strategies as well. Our lack of consensus on this point is slowing us down. For example, Web engines may not want to use complex pointer-tagging schemes, but a non-Web engine may have no problem using such schemes if they were good for performance and the proposal allowed toolchains to request them. In this case, should we 1) try to include the pointer-tagging schemes idea in the design (possibly post-MVP) to improve peak performance on the non-Web engine, or 2) reject the pointer-tagging schemes idea because Web engines won't use it and it makes the proposal more complex. Until we have consensus on a design principle to guide that decision, there's no way folks on either side of the issue can possibly reach an agreement. |
I have few thoughts that are more meta than concrete at this point:
My interpretation of the feedback I have received on this topic is that extensibility concerns beyond what JS engines would immediately support are fine to take into consideration, but they should not come at a cost to what JS engines would currently use, and they should be considered low priority. So focusing on @tlively's second example, i.e. application-directed pointer tagging, this feedback would suggest to me something like #119. It addresses the concerns raised in #118 at no performance cost, but unlike the SOIL Initiative's proposal it does not itself directly provide the functionality for application-directed pointer tagging. Instead, it regards pointer tagging as low priority and simply ensures there is room to add the feature at a later time if the CG ever chooses to do so. |
To be clear, from a web engine's perspective, we certainly wouldn't want our current design to be set in stone: as Ross points out, we've gone through many iterations of design changes, and we definitely value the freedom to explore other implementation techniques in the future. (Also, there's no such thing as the one current web engine implementation. To use the example of pointer tagging: V8 uses Smi-tagging and Spidermonkey uses NaN-boxing, so even if we did want to be very simplistic, we couldn't just bake "the one tagging scheme that web engines use" into the WasmGC design.) That said, current JS engines are also representatives of general high-performance, high-complexity, long-lived virtual machine implementations. When (a majority of) web engines say "it's unlikely that we'll ever implement (or benefit from) X, because while it's certainly possible in theory, in practice the resulting implementation complexity seems prohibitive", then I think chances are that any other hypothetical future non-web Wasm engine may well arrive at a very similar conclusion. (Getting back to the tagging example, I'd also like to point out that there's no contradiction between the observation "there are VMs in existence that use tagging schemes X, Y, Z and are quite happy with that", and the prediction "it's unlikely that a future VM will let user code choose the tagging scheme to use, and will support loading/executing several modules specifying different tagging schemes at the same time".) In particular, when web engines predict that they won't implement X, that does have the consequence that for many use cases, Wasm's performance in practice will not benefit from any theoretical improvements that X might unlock if implemented. TL;DR: I agree that it's a balance. |
To be clear, I was merely mentioning Web engines as one major example of a broader observation in this quote. So it's only partially related to the discussion at hand. Personally, I do not think that Wasm's design should be specifically optimised for JS. However, as @jakobkummerow says, JS engines are the most dominant examples of versatile, high-performance VMs these days, so should naturally inform the design on many levels. |
Thank you, everyone, and sorry for the confusing initial wording of the question. This issue feels much clearer to me now. My takeaway is that when a Web engine says it is unlikely to implement a particular optimization:
For design discussions, this means that we don't want to immediately discard ideas for features that enable optimizations that JS engines currently do not do. Instead, we want to understand the complexity cost of the optimization and feature and consider whether any engine, JS or otherwise, would want to implement that optimization. |
I think this might be less clear than it superficially seems. If the optimization leads to small performance or resource usage improvements, then it seems fine to support it, but it might not be worth much additional complexity. If, however, there's a significant performance gain to be had, and web engines will just never be able to take advantage of it, then that risks a bifurcation of ecosystems that I think we should try to avoid. It's of course the case that there will always be lots of content only targeting some specific environment, and that's fine. But the risk here is about a more fundamental split, where toolchains would just entirely target only a subset of runtimes even for modules that aren't otherwise environment specific at all. |
Agreed with @tschneidereit. I suppose another way of phrasing this is that predictable performance remains an important goal, even if it gets considerably fuzzier with something like GC. |
Good point. Hypothetically, if an engine were comfortable with an extra optimization that made a big difference, ecosystem bifurcation would only be a risk if modules had to be changed to take advantage of that optimization and those changes made performance worse on other engines. We would want to make sure than any such changes to modules would not affect performance on engines that don't support the optimization. On a side note, is this what we mean by "predictable performance?" That a change to a module should never improve performance on some engines and worsen performance on other engines? That's a much more specific definition than I've seen before, but it makes sense. |
I don't think this an exhaustive characterisation of "predictable performance", but my understanding is that we want to avoid creating a scenario where engines implement type feedback-based "unsound" optimisations for Wasm which have the potential to deopt and create performance cliffs. I remember hearing strong opinions that this would be an explicit failure mode for the design of GC types, but it's possible my (second-hand) perspective is out of date. |
I'm not sure I agree with this. A large enough performance difference can make a module effectively useless in one environment, but very useful in another. In theory, we could of course have a situation where there's a design that's demonstrated to have the best properties overall, including for web engines, but also has this property. In such a situation, I guess it'd make sense to go with that design. I'm having a hard time believing that that's a likely outcome. And I think it's more useful to focus on designs that don't have this kind of risk to begin with. |
I generally agree with what @ajklein said here. Web engines are an important stakeholder and incorporating their feedback is extremely important to WebAssembly's continued success. I love the idea of custom Wasm engines and embeddings (and am happy to finally be free to work on them!), but the big iron that holds up this world is the major Web engines. Just to be clear on some of the calculus here, though. Thomas mentioned JS engine design in the issue here and several of the side issues have to do with implementation techniques that seem like just compiler/GC things. Web engine JITs and GCs can be refactored underneath to do all kinds of neat tricks like use fat pointers or crazy tagging schemes. That's a bounded amount of work. But when those values hit the JS API surface, and those values need to flow through the rest of the JS engine (ICs, runtime calls, JS reflection, prototype chains, the whole mess), then any representation trick for Wasm values can become insanely harder because of how huge a JS runtime is. That's less an "engine" issue, but more a boundary issue and JS complexity issue, as there are literally hundreds of thousands of lines of JS runtime code vs a much much smaller amount of compiler and GC code. I feel like discussions around some techniques haven't really acknowledged this. Basically, the upshot of the previous paragraph is that crazy tagging schemes and fat pointers are really difficult to make zero-cost in a Web VM, because JavaScript. To the extent that WebAssembly is going to offer anything like that, it is going to have to choose to work within that JS reality, or compromise on the predictable performance goal a little, as nutty values may end up being boxed if they are any more complicated than an Personally I think we need a layer between language runtimes and whatever GC mechanisms we develop that both solves the late-binding problem and allows for flexibility in choosing implementation strategies. Some of these problems become less pressing if layered properly. It can be done in a way that is mostly orthogonal to what is going on here, but I won't sidetrack the discussion here with that. I'll show more about what I have been up to shortly. |
This absolutely screams "layering" to me. If two engines have different mechanisms for implementing the same (source) thing, and modules are produced from source that target either one or the other techniques offline, then you need either conditional segments or to not do that. To not do that, you need to package up that source thing (e.g. a source type, a source operation, etc) to be lowered when the engine's supported mechanisms are known, either at link time, instantiation time, or runtime. That's a form of late-binding that can be done by a language runtime one layer up. Conditional segments only get you the ability to switch over techniques you know about now, while late binding gets you the ability to do absolutely anything in the future. |
The notion of "predictable performance" seems to me to miss some very important points: 1) everything is an abstraction, 2) many advances in this field have been made by exploiting those abstractions, 3) y'all are a bunch of competing browsers looking for ways to outdo each other, and 4) that competition is healthy and spurs innovation and adaptation.
We cannot predict performance. We cannot predict how programs and workloads will change. We cannot predict how browsers will evolve (and we certainly shouldn't expect them not to evolve). No matter what we do, the performance and implementation of WebAssembly will be an ever-changing landscape. We should not overly base our design decisions on these things we cannot predict. (Yes, of course you can overexecute on this suggestion. Every suggestion should be just one of many considerations in any decision.) |
Look, I love VMs. New and crazy optimizations are my bread and butter. I love writing compilers and garbage collectors and it's really been 20 years of fun and all that. But the hope of future optimization heroics is no excuse for a bad design. The reason VMs do heroics is because they are invariably trying to make some stupid interpreter with some bloody inefficient object model fast. That's some snark, but only barely. There is a reason why all the heavily engineered and optimized VMs are for legacy scripting languages with tons of code in the wild that cannot be changed. Optimization is what happens at the end of this whole process, when design options have been exhausted and we're stuck with something we can't fix and there are a trillion lines of code that still need to run. Million line systems don't turn on a dime. Complexity and technical debt are real things and any calculus without them is going to make wildly wrong predictions of what is easy and what is hard, and therefore what is likely to happen in the short term versus the long term. What I mean, concretely, is that JavaScript engines are not magic cauldrons. JavaScript VMs are in the business of optimizing for JavaScript. Despite our plucky upstart here in Wasm land, there is still 10,000 times as much JavaScript code in the world. Big teams are focused, rightly, on optimizing that. Wasm optimization effort competes with JavaScript engineering effort. Wasm complexity competes with JavaScript complexity. Our demands are not challenges, but choices. Wasm effort is an investment under constant negotiation with competing concerns. It's dreaming to believe teams are going to just do us a magic rain dance one afternoon to implement the genius optimization, or some virtuous benchmark war is gonna get kicked off among competing engines. I was around for the JS benchmark wars. They sucked. V8 had to break out of a debt spiral to refocus on real world performance and language features and our technical debt from squeezing every ounce of performance for those benchmarks was real. And, by the way, the Wasm engine landscape is totally different than the JS engine landscape of ~2010. Let's talk about "predictable performance". Many of us who spent long years working on JavaScript VMs realized the performance peaks we built and lauded ourselves over also gave rise to horrible performance cliffs jutting out of the landscape. Many here are pretty scarred by the amount of arguing we've had to do with unfortunate, frightened and helpless application programmers, and then the literal person-decades of engineering necessary to smooth out the hard cases as a long apology. Predictable performance means less arguing, less confusion, and less complexity. Predictable performance means applications and language runtimes have more agency; they can make effective decisions about what to do next if something sucks. And no, predictable performance doesn't mean the slowest common denominator. That's a strawman argument. No one is proposing the slowest possible thing. And that's because duh, slow things are exactly the thing that gives rise to both the opportunity for optimization and the subseqent performance cliffs! Slow (and complex) sucks. Fast and simple is better. Simple and slow is kind of OK, but kind of not OK. Please read my comment again. The last part had a meaning, too. We need proper layering. We need to think carefully about what optimizations go where instead of just assuming we can stack all the smarts at the bottom. Me from 10 years ago might have thought that worked. I was more academic then. But now I see how important it is that we do not end up with a massive anchor somewhere which is the fantastical thing at the bottom that everything depends on, and yet without which everything runs horribly slow. I've wandered deep into that magical thing; it's less an oracle but more a hall of mirrors so intricate that no one could conceivably write another competitive implementation from scratch. That's a failure mode that many of us are actively designing against. |
I just wanted to highlight this part because it is an excellent argument for layering and definitely not putting all the smarts in the bottom layer. It should be pretty obvious but I'll say it explicitly. It's the Wasm engine's job to adapt the Wasm to the hardware and it's the language's compiler/runtime system to adapt the program to Wasm. Wasm cannot and will not understand all languages' constructs and therefore it will need layers above it. |
@titzer I'm worried the argument I made was misunderstood. You seem to be responding to an argument that we should not rely on engine's to achieve magic for WebAssembly to achieve good performance, but I never made such an argument. I simply pointed out that, no matter how we design WebAssembly, engines will likely find innovative ways to make it perform better than what the straightforward interpretation of its instructions would indicate. In other words, the only reason WebAssembly seems to you to avoid the wars you dread is because it is not popular enough for it to drive real browser competition, and as such the browsers have not made the effort to look for real innovations.
I have had multiple times where I have suggested we design an abstraction so that programs could more directly express their needs so that engines whose designs were well-suited to those needs could provide even better performance, and had push back that we should not do this because it would cause less "predictable performance". For example, many programs need boxed integers or doubles, and so I suggested programs be able to more directly communicate that need so that engines that support integer packing or NaN boxing could provide these "boxed" values without actual allocation. But I was told doing so would be against "predictable performance" and so should not be done. That doesn't change the needs of these programs. It just gives them the worst of all worlds in which all engines are (presumably) forced to box these values even if it would be trivial for them to not do so if the abstraction were better designed. That in turn, as you point out, prompts engines to look for more ad-hoc/hacky means to identify these patterns and optimize for them. (Also, #120 suggests that the current MVP's casting design is essentially a worst-of-all-worlds design.)
This is an argument that competition should not be centered around optimizing a suite of benchmarks. That's certainly true, but it's unrelated to the points I made, which apply to real-world performance.
It is control that gives runtimes more agency. But with host-managed GC, we cannot give programs direct control over how they represent their pointers or specify their object descriptors or walk the heap or use generational/conservative/copying garbage collection. That's what makes this proposal very different from the rest of WebAssembly. The best we can do is to let programs communicate their needs to the component of the system that has control, i.e. the engine, so that that component can best take advantage of its control to best serve the programs' needs. Removing options like "I need a boxed 32-bit integer" for the sake of predictable performance does not grant any agency. |
When in actuality you wrote:
And also:
Forgive me for misunderstanding the part where you proposed a whole line of speculative optimizations as well as a general hope that things will magically work out as something that I should respond to in the way I did. You now write:
(Now I am going to respond to the words you wrote here, because you wrote them, or typed them, and I read them, and I am in good faith going to assume that you meant the words that you typed when you typed them, and then I copied them here, to prove, that you wrote them. I feel this is a regression in the level of the discussion if we have to proceed at such a low-level, but anyway...) You cannot have predictable performance by simply offering a large selection of (complex) choices and making absolutely no guarantee about their performance in hope of future optimizations. That is not control, that is not agency, that is just a recipe for lose-lose situation where engines become complex because they have to support a large variety of options (most of which they will punt on), but are under no apparent obligation to make any of them fast. Predictable performance is a contract that inherently means limiting options, and that is absolutely necessary to combat the combinatorial explosion that leads to complex systems and technical debt, which I spent considerable time trying to explain above. And because you ignored my comment about layering a second time, I'll restate it a third time and relate it to something you actually wrote.
Wonderful. Now replace |
A boxed 32-bit integer can be trivially defined as follows:
If you meant an unboxed 32-bit integer, then no, that cannot provide predictable performance. Because it's not representable on engines running on 32-bit hardware or using 32-bit compressed pointers on 64-bit hardware. |
I did not propose these optimizations. Others proposed them and have indicated their plans to employ these optimizations in CG meetings. As for the quote, it simply states that I look forward to how innovation will make WebAssembly even better. There is no reference to relying on magic. Would you rather me look forward to WebAssembly implementations never improving?
You and I have discussed this, but the rest of the group hasn't. I didn't mean to offend, I just wanted to keep the conversation on topics that everyone had the same amount of context for.
In the current MVP, this associates an identity to every reference, even if the program has no need for that identity. As such, even engines that could otherwise unbox this cannot. This is a worst-of-all-worlds solution to the program's need for boxed integers.
No one said it provides predictable performance. But its worst-case performance is the only thing the current MVP and Post-MVP supports. |
Fair enough, and there is a TODO regarding that in the MVP doc. It would be nice to address that, though it's not a showstopper or an MVP must-have either. |
Sweet. And once that's figured out, then engines can optimize this by unboxing the integer, meaning we will no longer have predictable performance. Instead, we will have improved performance on some engines, and the same old worst-of-all-worlds performance on other engines. That is the point I am making - once the program is able to express its needs more precisely, engines will be able to optimize for those particular needs if they choose to do so. The program does not have to rely on those "magic" optimizations to get reasonable performance, but it will benefit when those optimizations happen to be available. In a setting where programs cannot be given direct control and instead have to work through a higher-level abstraction, the more informative that abstraction is the more opportunities (not necessities) for optimization there are. These optimization opportunities also make for more variation in program performance, but all those variations are better than performance without those optimizations. So, in a setting where programs need to use an abstraction, the goals of better performance and more predictable performance conflict with each other. |
I brought up in the Requirements doc discussion that we may want to qualify what we mean by "(predictable) performance" -- as this discussion here illustrates nicely, it's not so obvious what kinds of expectations are realistic or reasonable. My take is that aiming for "good baseline performance" captures the essence of, and is more accurate than, clinging to the notion of "predictable performance". "Predictable performance", taken literally, is a myth. Not even machine instructions have predictable performance. I'd go as far as saying: any time a system lowers an abstraction, there is by definition some degree of freedom in how to do that, and that freedom creates differences in performance (between implementations, between different versions of the same evolving implementation, between different situations that this implementation encounters); and such differences are necessarily hard or impossible to predict. I think one underlying desire we might agree on much more easily is that we want to avoid cases of pathologically bad performance. And as a language specification, the best way to contribute to this outcome is to design in such a way that even simple implementations will end up delivering reasonable performance. That doesn't stop advanced implementations from eventually becoming (maybe significantly) better than such a baseline -- what matters is that the baseline doesn't suck. (I realize that that's easier said than done, in part because the definition is relative.) I think another corollary is that layering is, in a way, part of both the problem and the solution: each layer will present its own performance mysteries to the layers above it; at the same time, good layering probably gets us closer to the ideal of reasonable baseline performance, because each layer is solving a simpler problem, which is one reason why it can do a better job with less effort than a grand all-encompassing monolithic system could. (I for one am very much looking forward to what @titzer will present.) |
There are a few different meanings of "optimizations" being used here, and I could have been more clear in question and examples. In my original question, the "optimizations" I was thinking of were improvements to the expressiveness of the Wasm object model that would allow modules to provide hints about the best layouts for their objects. This increased expressiveness would certainly increase complexity and could lead to performance differences between engines that support different kinds of layouts, which would be bad for predictable performance, but only between engines. The other kinds of optimizations we've mentioned are speculative optimizations, which also increase complexity and are bad for predictable performance within even a single engine because they create performance cliffs. I wasn't thinking of these optimizations in my original question because I was operating under the assumption that they were off the table for WebAssembly engines.
I would like to avoid having an inefficient object model and getting stuck with something we can't fix. I would also like to avoid needing Wasm engines to adopt all the heroics of JS engines. That's why I was wondering if it would make sense to have Wasm engines give modules more control over their object representations, even though Web engines are currently opinionated about their object representations. @jakobkummerow, I have gotten the impression that you would be more ok using speculative optimizations in the GC implementation, so it would be great to hear your thoughts on this (or if that impression is wrong). I also agree that these design directions would be more clear-cut if we clarified what we meant by predictable performance. @titzer I look forward to your proposal about late binding and layering. It sounds like an interesting approach, and I am eager to see how it could simplify or inform the GC design :) |
I don't agree with the "only" part here. For one thing, engines are not static, they evolve. "That thing will work fine, but not on {client | server | OS} versions before X" is not a great state to be in. (Example: JavaScript developers are forced to use tools like Babel to transpile their code to ES5 for the benefit of those 10% of their users who are stuck on old browsers.) Also, implementation reality will be more complicated than "engine A supports custom layouts, and engine B doesn't". Instead, for instance, maybe an engine will give you fast support for custom tagging schemes as long as there are no more than N different custom tags. Or only if you don't use the respective objects in, say, cross-module function calls. Or it could turn off more expensive compiler optimization passes on slow and/or memory-constrained (mobile) devices. The possible combinations of "feature X is supported, but only if Y / not if Z" are endless. (By "supported" here I mean: giving you the performance benefit you were hoping for.) That said, my conclusion is not "it's even worse for predictable performance than you claim, so we can't have it", but instead "performance isn't going to be predictable no matter what we do, so there are other considerations that matter more". (Also, I'm trying to speak generically here, not to express an opinion on the specific feature example.)
Oh, I love it when engines can get away with not having to do speculative optimizations: simplicity (of the implementation itself, of the mental model one needs to understand how stuff works under the hood, etc) is awesome! And I think we should try very hard to design WasmGC such that engines that don't speculate can do a very good job of delivering good performance. That said, if Wasm-with-GC becomes as successful as I hope it will be, and if we accordingly venture into performance echelons that I hope we will reach, then pragmatically, I expect that "fancy" engine-side optimizations will sooner or later become part of the picture. That doesn't even have to be literal speculation (with potential deopts = throwing away mis-speculated code), it could just be having fast paths for some situations and slower paths for others. Concretely, in V8, we don't do speculative optimizations for Wasm code yet, and introducing it would be a fairly big effort (and come with obvious drawbacks), which we wouldn't mind avoiding, but we suspect that at some point we will have to do it. I'd love to be proven wrong on this :-) |
We've settled into a productive design process with a strong implementer feedback loop and moved to phase 2, so I'll go ahead and close this issue. |
One of the primary goals of the GC proposal is to enable seamless interop with JS, so it needs to be implementable in JS engines. But so far it is unclear whether the design of the GC proposal should be constrained to use mechanisms already present in JS engines or if it should allow for different styles of engines to make different object layout and optimization choices. On the one hand, the more the proposal differs from how JS engines work, the less likely it is to be acceptable to Web engines. On the other hand, many non-Web VMs would be able to take advantage of a design that makes fewer assumptions about specific implementation strategies.
One example where this problem has come up is in #119. @rossberg makes arguments based on what JS engines do:
While @RossTate argues that baking these assumptions into the GC design is limiting:
That's just one example, but I've noticed this disagreement causing friction in many discussions. To what extent should we consider and allow for implementation strategies not used by JS engines? Should it further be a goal for the proposal to allow for multiple implementation strategies so that engines and languages can individually make their own complexity/performance trade offs, or is ok to design around a single assumed implementation strategy because we also want portable performance?
The text was updated successfully, but these errors were encountered: