-
Notifications
You must be signed in to change notification settings - Fork 758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recursive Types #3063
Comments
Ok, so I read that paper. The gist is that it presents an algorithm for canonicalizing recursive types in O(n log n) time via a clever variable numbering scheme a la De Buijn indices. The downside is that the presentation in the paper is extremely dense and further complicated by the presence of universal types in the system the paper considers as well as discussion of many properties we don't care about. On the other hand, there are diagrams and examples and a complete low-level description of the algorithm, so I believe that with some work we would be able to translate that algorithm into code. I also read another paper, https://open.bu.edu/handle/2144/1800, which solves the problem of canonicalizing, equating, and hash-consing recursive types by translating them to DFAs, minimizing the DFAs, then translating the DFAs back to recursive types. This paper was much easier to read, but that's mostly because all the complexity was offloaded onto DFA minimization algorithms presented elsewhere. There were also no diagrams or examples, so I'm not sure it would be any easier to implement this method. The algorithm this paper presents also requires a completely separate DFA implementation and has a time bound of O(n^2), so I think we should try to implement the method from the first paper first. |
What if represent equirecursive types as "infinite trees"? So type equality of recursive types may be simplified to infinite tree equality. WDYT? Another approach is using bisimulation and circular coinduction of streams: |
Wondering if we can get around a fancy algorithm due to binary and text format, by design, already giving us a canonicalized name or index upon reading, so we may just use a placeholder type for every name or index of an unknown type seen and canonicalize into our internal map once parsing is complete. Directly recursive case: (struct $a (field (ref $a))
;; ^ $a = Struct()
;; ^ $a.fields += &$a Indirectly recursive cases: (struct $a (field (ref $b))
;; ^ $a = Struct()
;; ^ $a.fields += &($b = Placeholder())
(struct $b (field (ref $a))
;; ^ &&$b = Struct()
;; ^ $b.fields += &$a I'm probably missing something? :) |
@MaxGraey, yes, both those papers present concrete algorithms for checking equality on infinite trees. It's complicated because you can't actually calculate or store the infinite tree directly. My guess is that the bisimulation and coinduction approaches would be very similar. @dcodeIO Unfortunately I think there are edge cases that will require more sophistication. For example, consider the following progressively more interesting cases. I use a slightly folded representation of the types for brevity.
Here,
After that substitution, it is clear that But here's a more interesting example:
Here
These types are also all equivalent, but they don't even include references to themselves that can be abstracted away by some canonicalization. I don't think there is a straightfoward algorithm that would be able to figure this out. |
Oh, and all this is just for the initial hash-consing of the types that lets us check equality in constant time. I haven't even looked into calculating subtyping relations yet 🙃 |
Hmm, but looking at your examples, the only equality we need during parsing is of names/indexes, right? So after parsing, we have a set of potentially duplicate types with unique names/ids that we can deduplicate by first hashing the contents of the now complete types, and on collisions check if equal, and if equal deduplicate? Edit: Oh, just realized that one does not simply hash recursive types. |
Yeah exactly, before you hash it you have to canonicalize it somehow. The papers I linked to both provide solutions for canonicalizing recursive types. |
That moment you realize that you need to collect potentially cyclic types in order to collect potentially cyclic garbage to ultimately overcome the cycle of prematurely assuming that there must be an easier way. Damn cycles. Cycles everywhere. Sorry for the noise :) |
Have been thinking about this a little while looking into other changes, and it appears that either some fundamental change to how we are canonicalizing types will be necessary or we'd need a separate For instance, even if we'd have an API like In either case, i.e. fundamentally changing how types are represented internally or adding a builder doing the necessary C++ magic, this looks like something better suited to be done by someone with a clear vision of how all of this should fit together. Is there a chance that someone of you guys can pick this rather interesting task up? :) |
Yes, I have some ideas about how to support creating recursive types. I can start on an implementation sometime this week :) |
Any progress on this? 🙂 Canonicalization aside, it would be very useful for Binaryen to support reading and processing modules defining recursive struct types, as these arise naturally when translating many programs. It is currently blocking running Binaryen on the output from the Dart-to-Wasm prototype. |
Thanks for the ping! There has been some progress, particularly the infrastructure committed in #3418. The remaining tasks are to incorporate that infrastructure into the binary and text parsers and to update various routines that traverse types to gracefully handle recursive types. It's good to know that this is blocking the Dart prototype from working with Binaryen; I will prioritize this work again. |
Update: I have landed #3584 and #3588 that update the text and binary parsers to support use-before-definitions of types, so the "only thing" left is to figure out how represent and construct the recursive types themselves. Yes, that's the hard part, but at least all the code for actually using that functionality is done now. I don't see why we wouldn't have full recursive type support within a week. |
Another update: Now that #3602, #3610, and #3624 have landed, recursive types should mostly work and be ready for experimentation. There is a known bug in which extra types are sometimes emitted into the output, but it shouldn't affect correctness and I have a fix pending in #3626. Please let me know about any other bugs you might find, While the GC proposal currently uses equi-recursive structural types and the V8 prototype currently implements nominal types, Binaryen currently implements an ad-hoc hybrid system where self-referential type definitions (i.e. definitions of types that directly or indirectly refer contain themselves) are nominal and all other types definitions are structural. Despite the underlying weirdness here, the difference between nominal and structural types should not matter at all as long as you don't go out of your way to try detect it. |
Recursive types seem to be working, thanks! Binaryen can now successfully optimize some programs coming from I have now run into an issue with implicit upcasts from subtypes, though. This seems to be unrelated, so I have filed a separate issue. |
The typed function reference proposal introduces directly recursive and mutually recursive types, and the possibilities for recursion are expanded in the original GC proposal, which @dcodeIO is currently working on prototyping in Binaryen. #3012 expands Binaryen's type system to support arbitrarily nested types introduced in the GC proposal, but it does not allow for the creation of recursive types. This issue sketches out my thinking on how we could build support for recursive types.
First of all, creating a recursive type requires that the type be available before its definition is constructed. That means we need a way to pre-allocate types, and because of mutual recursion, we will need to be able to pre-allocate an arbitrary number of types at once. The preallocated types should be in an uninitialized state that makes them unusable while their definitions are constructed. Then they should be "finalized" all at once, which should involve applying the type definitions then canonicalizing and de-duplicating the underlying interned type definitions, as happens today. Preallocated types that turn out to be duplicates should be freed and should not be leaked to the rest of the program.
Canonicalization of recursive types is going to be the tricky part. The way #3012 is currently written, only
value_type
s as defined in the grammar of GC types are represented asType
s, unlike in the WebAssembly text and binary formats, which give bothvalue_type
s anddef_type
s identifiers in the type section. That means we have to deduplicate slightly more complex structures, but doesn't change the fundamental problem much.The state of the art for canonicalizing and checking equality of equi-recursive types like these seems to be described in https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.2276, which I have not yet had a chance to read. Should be fun, though!
The text was updated successfully, but these errors were encountered: