-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Yet Another MVP proposal #94
Comments
I suspect that if you were to refine this idea, you’d end up with something looking close to this: #78 (comment) |
@RossTate Indeed it is similar! There's some things in here that I think are worth considering:
|
Also there is nothing stopping pointer values being pointers into the host, e.g. JS heap. |
The thing I can’t figure out with your suggestion is how the garbage collector is supposed to walk structures after finding roots. |
Let me write down a few more bits of my original idea:
|
I then cut out all the extraneous stuff, because you can do all that in a variety of ways. But as you pointed out, that just results in a sketch that is similar to what you posted. |
Okay, so it sounds like a structure is an array of cells that are dynamically either pointers or 32-bit integers. On 64-bit machines, each cell would likely be 64 bits in size, taking advantage of room for bit flags to distinguish pointers from integers. On 32-bit machines, each cell would likely be 32 bits in size with a bit set and negative offsets indicating which cells are pointers and which are integers (assuming no concurrent reads and writes). Does that sound like what you're envisioning? |
The problem with that 32-bit option is what do you do when someone writes a negative integer into the memory? You're back to basically enshrining i31s, which doesn't mesh well with existing PLs. Another option I came up with is to not mix pointers and data, a la capnproto. allocate would have to take two parameters, number of pointers and number of data bytes. The nice part there is you don't need to know how big pointers are. The problem is I don't know how one would best represent a pointer to pointer vs a pointer to data. Doing it at the type level gets complicated fast. |
(FWIW, in CHERI there is a separate data region that stores the "valid pointer" bits.) |
By "negative offsets" I meant with respect to the object's layout, not negative integers. But regardless, I can't tell what your proposal has in mind here. How are you mixing integers and pointers?
I had originally ruled out this option, but I figured out a way to address the problem I had foreseen. Do as you say, but have the space for pointers grow negatively and the space for integers grow positively. This still supports the common OO pattern where subclasses add more fields—both reference and primitive fields—and yet still need the layout to align with that of superclasses. |
Oh, I see. Negative offsets, not negative values. Hm, that might work? |
I think so. It has various pros and cons, though I don't think any of them are particularly subtle at this point. So it seems we're at a good place for others to input their thoughts. |
Reading this issue, it seems to have more in common with the existing proposal than with #78: it wants to have memory managed by the engine, whose pointers are always exact, and which can be exchanged with other modules or with JS refs. That is all very different from #78 So it is better to look at how it is different from the main proposal, and I guess it is trying to replace strongly typed GC struct/arrays with generic memory blocks, at the cost of more runtime tracking (bounds checks).. I don't see that as a win. I think if we're going to have GC objects entirely engine managed we might as well statically know the references in them. If more flexibility is required, you could make the current GC proposal more flexible by making scalars untyped: rather than declaring a struct with certain sized int/float types in them, you'd declare a struct with N exact anyrefs and M bytes of "scalar memory" that can be addressed in whatever way. |
Good point on the comparison with #78!
The current GC proposal does not provide that. An
This is more what I had in mind (and I'm guessing @taralx too) for the integer portion. That is, a structure is a chunk of linear memory (at positive offsets) and an array of references (at negative offsets).
My sense is that the win here lies in its simplicity. It adds just one or two types and doesn't introduce subtyping, and yet it seems flexible enough to express GC languages. It would be relatively easy to implement and ship quickly. And although it's not particularly efficient due to bounds checks (which there are many mitigation techniques for), it would be easy to express in a more advanced GC proposal that is more efficient, so it could serve well as a holdover. |
Sorry, I should have said "strongly" rather than "statically". But yes, to your point, some level of dynamic checks are already required. I do like the elegance of all GC objects being uniformly 2 adjacent arrays (refs + bytes).. but would be good to have an idea what the expected cost would be compared to the current proposal. |
The strategy here would do a single indirection (twice on 32-bit machines; once on 64-bit machines) to perform a (double) bounds check. You could have an instruction that just asserts up front that there are so many "scalar" and reference fields (trapping if it's not the case), after which even simple streaming optimization would be able to eliminate subsequent bounds checks. The current proposal requires a double indirection to perform a cast. So I suspect which one performs better would vary by benchmark, and I imagine that often their performances would be quite similar. My guess is binary size would likely be smaller with the strategy here, and type-checking would likely be faster with the strategy here. |
I think I understand the model being described here, though it's changed a bit from the initial post (in particular, I don't see how
And I guess we're assuming that |
The reason I said "one or two" types is because we'd probably want a type for |
What do you think about the idea of pointers carrying their bounds around? It increases the pointer size, but removes the need to perform indirection to do bounds checks. |
That sounds like an implementation strategy that wouldn't need to be visible at the wasm level (so long as we didn't somehow make the design incompatible with that strategy). That is, I don't think anything above forces |
Thoughts from others? |
Well, I'm not others, but it seems like there is a qualitative difference in the goals of proposals like this one vs the existing GC proposal. Would it be a good idea to split "linear GC" from this more complex proposal? |
It started off that way, but after working out more details, it seems to serve the same purpose as the existing proposal. Is there some particular difference you have in mind? |
The existing proposal has avoiding dynamic checks as an explicit goal, according to @rossberg. There's no way to avoid them in this version, since we don't know what a pointer points to at compile time. |
I appreciate the striving for simplicity here: simplicity is good. That said, minimalism is not the only design goal that matters, even for an "MVP". From a Web Platform perspective, beyond coming up with a GC proposal that's expressive enough to work, there's also a performance requirement: it's already possible to compile whatever managed language for the web by compiling it to JavaScript, as e.g. GWT/J2CL, KotlinJS, Dart2JS, and a bunch of others are demonstrating. That's not exactly an ideal solution, especially from a technical point of view, but it works. So with this background in mind, the mission for Wasm-GC, should we choose to accept it, is to provide a better alternative, and that primarily means better performance according to one or more of the metrics: execution performance, memory consumption, wire bytes size. So adding a bit of complexity to the concept (like fully typing each pointer field in a struct) seems like a good tradeoff if it makes it more likely that the end result will have compelling arguments going for it compared to just continuing to compile stuff to JavaScript. |
Let's not lose track of the advantages of this proposal over the status quo, for example:
|
@taralx That's a good point, though putting it that way makes it seem like you're actually describing two different features -- the first is the ability to allocate and use garbage-collected memory buffers. The second is allowing these buffers to also contain references to other garbage-collected objects. We could support this first case via first-class memory instances, which may be more natural for how wasm currently works. |
I'm not sure first-class memory instances is a good match for this. Memories are expensive, generally requiring special guard pages, etc. I doubt you're going to want to make a memory for each malloc(). Additionally, you'd have to store the memory references in a table, making a kind of strange indirected "fat pointer" that will bloat code size. |
(I started writing this earlier today, but then got distracted by personal responsibilities, so sorry that this forks off from earlier in the discussion.) Both good points, but it's important to be aware that any GC proposal will have dynamic checks. Consider, for example, the following Java code (which is meant as a pedagogical device rather than something you'd see in the wild):
First let's consider how we'd implement this in the existing proposal. One important fact to be aware of is that the current proposal cannot guarantee a
Second, let's consider the same using
I think it's reasonable to expect that Now there will be Java benchmarks in which the current proposal will outperform
Before I go on, it's important to realize that an With I can give more examples where |
Are we sure that is the only way? I'd say not being able to represent Worse, can you imagine I know Java doesn't allow It be rather odd if a program using Sounds like in the specific case of |
I think the main questions I have are:
|
Good questions. Regarding 1, it's an oversimplification to expect every field access to require a bounds check. If you have a Maybe this is what you meant by your point 2—I just wanted to head off any misconceptions. |
What you’re proposing @RossTate, could be a cast from a general managed reference (let’s say of type For the MVP, we’d allow structs to contain a number of Alternatively, instead of a @binji, what would be the best way to check on performance of real world programs? Are there some (semi artificial) examples, like @RossTate showed above, which cover most use cases? At first sight, I think this is highly dependable on the source language and it’s abilities. |
Fwiw, I'd suspect that every bit of additional work performed by fundamental language features will inevitably add up and is worth avoiding, since multiple tradeoffs in one code path potentially multiply. And more tradeoffs will be made, if not by the spec, then by implementers, and if not by implementers, then by users. |
@timjs, I'm not actually proposing to have the Suppose we didn't have such an instruction. Then if code were to set the first reference field, then the second reference field, and then the third reference field, this would require three bounds checks because it's possible the Note that this issue of implicit information tracking is present in the existing proposal as well. OCaml is likely to run into it in one of two ways. The reason is that OCaml has a number of operations that are intended to be implemented by walking over its datastructures in an abstract manner. These include polymorphic structural equality, two inequivalent variations on structural comparison, and polymorphic hashing. It seems OCaml will have to implement this in one of two ways: have most values be arrays of references, or have most values have an associated v-table. Let's consider the former strategy: using arrays. Something like Let's consider the latter strategy: using v-tables. The implementation of structural comparison for a value that's the third case of an algebraic data type will proceed as follows. It will take the value it's being compared to and So the issue of implicit simple (i.e. streaming-compatible) optimization is present in both designs, though |
@timjs I'm not seeing the advantage of |
@RossTate, I'm trying to understand the benefits of your proposal to not encode length information into a I think the implementation of a polymorphic list ( I wasn't entirely clear before that I'm assuming each This model greatly mimics OCaml's memory blocks, Haskell's heap objects, and Clean's heap objects (chapter 10). As my background is mostly in the implementation of functional languages, I could be missing an important thing here to support object-oriented languages. Nevertheless, Go's structs as well as OCaml's variants should fit nicely in this scheme. The idea of putting references at negative offsets and linear memory at positive offsets would allow for the implementation of classes and casting @taralx, the only advantage of using |
@timjs, that's a fair question (and amongst a bunch of useful thoughts!). The high-level answer is simplicity. The following are a bunch of more detailed answers:
At a meta-level, right now it is very difficult to know which typing features will actually improve performance. We simply don't have the infrastructure in place to collect such knowledge, and without that knowledge we'll just be guessing as to which extensions/refinements/variations are actually worthwhile. Having something flexible like this gets people compiling to WebAssembly without the type system limiting how they represent data. That gets us corpuses that we can analyze for patterns, and it gets us backends we can experiment with and modify to test out whether variations have impact and whether it's actually practical to expect backends to produce such variations. So keeping it simple also avoids guesswork, leaving development of extensions/refinements/variations to when they can be informed by real data. Does that seem like overall sound reasoning? |
Thats a very thorough and clear answer @RossTate! I can imagine that for the MVP it is not needed to add bound information to the types. In my proposal it can always be added later as a subtype, i.e. Btw, I think my description diverged somewhat from the original proposal by @taralx. Should I open another issue or write it out as a more complete proposal? Are these proposals something the that the subgroup wants to discuss during meetings? |
My sense is that this is a collective brainstorming session where we are getting a sense of how this rough direction could work, what it's advantages and disadvantages might be, and whether there's interest in exploring further. If there is interest, then I think it'd be worthwhile separately developing a bunch of detailed variations (e.g. |
Sounds good! More comments and ideas are always welcome. |
Open questions:
Looking at languages that compile to JavaScript, I think
|
The current proposal seems well suited for the assumption that most code is monomorphic and casts are rare (either because types are known statically, or because a single runtime cast guards a nontrivial amount of code, i.e. a statically typed section). In such a scenario, having precise types on fields means that values loaded from fields don't have to be cast or typechecked at all. There are plenty of examples where this assumption holds (e.g. pretty much all languages that have a notion of "classes" with "methods", so long as they don't have high-frequency calls that interleave methods defined on subtypes and supertypes). There are also plenty of examples where it doesn't hold (e.g. hashing any generic object's contents, as was discussed on the other thread). (FWIW, in JavaScript we see both: a large part of V8's performance is based on the insight that most code ends up being monomorphic even in such a highly dynamic language; at the same time we've seen plenty of examples where code makes use of the language's flexibility, and any engine feature that assumes otherwise falls flat onto its face.) |
Unfortunately not true for C# due to reified generics, nested structures, interior pointers, and so on. Nested structures and interior pointers are also issues for Go. I don't know enough about Dart these days to assess. Also doesn't work well for Scala due to multiple inheritance of traits, though they at least have the infrastructure for compiling to the JVM that they can possibly fall back on. (I'm assuming your statement was only meant to apply to typed languages.) So, amongst the top 50 TIOBE languages, this statement really only seems to hold for Java and Kotlin (and possibly Dart). |
Hrm. Makes me wonder if the heterogenous version from earlier isn't better, despite the need to maintain word types. |
Very good questions and observations IMHO! I think that creating a heap model supporting so many different languages can only end in two ways:
I think the last point is the reason why compiling high level statically typed languages to dynamic languages like Scheme or JavaScript works so well, while you've to double fold yourself when compiling a "non-OO language" to the JVM or CLR. However, Wasm wants to provide some safety with its memory model. This means we can't be too dynamic. So the question is if adding more type information to a GC reference, besides its lower level layout, is better or not. I.e. what is the added value of knowing that a reference is of type
Which version are you referring to? |
The value is that whenever we know the static type, we don't need any checks at all to access the struct's fields, and static type knowledge propagates over dependent field loads. So the question becomes: how often do we know the static types of things? And how large or small is the performance penalty when we don't? When we start with a property loading chain like If we think about scenarios where most things are statically typed as anyref and we have to typecheck/cast everything all the time, then just doing bounds checks is faster than RTT-based subtype checks, no doubt about that. If we consider scenarios where most types are statically known, then having all those types allows skipping all checks, which obviously yields higher peak performance. I suppose in an ideal world, someone would volunteer to prototype both approaches end-to-end, so that we could measure their practical performance on a wide range of real-world applications... any takers? :-) |
Yes, this is a point I think we're not taking into account enough. Sadly I think "performance thru strong typing at the Wasm level" and "ease of porting as many existing language representation/runtimes as possible" are fundamentally at odds with eachother.
There is one point where a dedicated Wasm GC can win over a linear memory GC in performance, and that is the stack. A linear memory GC has to put most of its stack into memory, for it to be scannable, turning many But yes, a dedicated Wasm GC shouldn't be slower than the linear memory alternative, which would make it an unattractive target for many languages also. |
I like that thought because it goes well with the mental model of a compiler person. It provides some feeling of "being in control of the layout", or maybe the superficial notion of "being able to reuse the layout of the existing compiler to the degree possible". I suspect that this is mostly prejudice on my part, though, and we must identify what the qualitative difference to the "more typed" proposals actually is. (Or whether, in the end, with all the necessary extensions, this proposal turns into more or less the same thing as the existing proposal, but with value types replaced by byte sequences.) Generally, when looking at linear memory, and the heap, I see only the size in bytes and alignment of values (which compilers traditionally have to deal with). I don't need to see whether there is a float, an unsigned int or a signed int or whatever stored in that linear memory of a heap block. On the linear memory implementation, WASM gets this right: Only when I read from the memory and the value goes into a register, it needs a type (in order for the engine to know what registers it can reasonably go in! So that makes perfect sense!). If I write an int and read a float, that's my own, harmless mistake: WASM doesn't need to prevent me from running that, as there is no security risk associated with that, right? If I'm wrong here, correct me please.
Currently, this is true. If WASM engines implemented support for stack operations (e.g. allow linear memory GCs to walk the local variables of the stack to relocate their references), linear memory GC could be faster. But... for the sake of reduction of overall complexity, I would rather not want to have to deal with two (or more) GCs. What is really quite interesting is that, with one shared GC, suddenly the complexity cost of interoperating with other garbage collected languages becomes manageable. Yes, you still have to write some wrappers to account for calling conventions and heap representation, but there is no need to carefully place and remove pointers in GC root tables, deal with circular unreachable structures, etc. in addition. |
Then default to copies, not pointers ;) No really, that's what I argue in #78 |
@jakobkummerow Thanks, I see the qualitative difference between structref (without typed reference arrays) and the current GC MVP proposal now. |
Well, let's start with fleshing this proposal out, since it has drawn so much attention. I'll try to put together a more detailed proposal (including considerations like non-reference type imports) in time for the GC meeting in 2 weeks. |
Let me know if you'd like some help. |
(moved to new issue since people probably muted this one) |
Update: More fleshed-out proposal in the follow-up issue.
Thinking about a true "MVP" for GC got me to this proposal, inspired by CHERI:
pointer
, probably 64-bit for wasm32.pointer
instead ofi32
+memidx
.allocate
that acceptsi32
size and returns a newpointer
of at least that size.pointer.load
andpointer.store
that only acceptpointer
, noti32
+memidx
.Big difference from existing proposals is that it doesn't try to track the full structure of a thing, only which parts of it are pointers.
As constructed above, requires tag bits to track which parts of the structure are pointers, which doesn't sound cheap. One alternative is to fix the region that can store pointers, which has the potential advantage of enabling code to be pointer-size agnostic.
Anyway, this is probably half-baked and doesn't work for some reason, but I figured I'd put it out there. :)
The text was updated successfully, but these errors were encountered: