Skip to content
This repository was archived by the owner on Apr 25, 2025. It is now read-only.

Nominal types vs. inter-module interaction #148

Closed
jakobkummerow opened this issue Oct 6, 2020 · 52 comments
Closed

Nominal types vs. inter-module interaction #148

jakobkummerow opened this issue Oct 6, 2020 · 52 comments

Comments

@jakobkummerow
Copy link
Contributor

If I understand correctly, one of the biggest arguments for the current MVP's structural static type system is that it makes sharing types across module boundaries quite straightforward:

;; Module A
(type $string (array i8))
(function $hash (export "hash") (param (ref $string)) (result (ref $string))
 body...)
 
;; Module B
(type $my_string (array i8))
(function $hash (import "module" "hash") (param (ref $my_string)) (result (ref $my_string)))

...and things just work, because $string and $my_string are automatically recognized as being the same type.

Notable observations here are:

  • "things just work" might be a bit of an exaggeration: the two modules must still intentionally choose to use compatible/identical type definitions. Maybe one of them publishes an API spec that the other adopts, or maybe both authors communicate out-of-band and discuss what types to use, etc.
  • the names don't need to match, as the coordinating host (in case of browsers: JavaScript) can take e.g. one module's "md5hash" export and feed it to the other module's "stringhash" import.
  • considering that Wasm modules are typically not handwritten but rather generated by toolchains, it is not entirely clear how developers writing "surface language" code (Java etc.) can even control/influence the Wasm types that their toolchain will generate for them. Maybe this whole multi-module scenario is only realistic for sets of modules produced in one go by the same version of the same toolchain having a whole-program view of the surface language source? If so, that would likely change our collective perspective on a bunch of the concerns discussed below.

Now, if we switched the static type system to nominal semantics, that would in particular mean that two type definitions give us two distinct types, so the example above would no longer work "just like that". To still support cross-module object sharing, e.g. for function calls, we have to find an alternative solution.

One idea is to not change what happens at the module boundary, i.e. keep structural typing there. In a case like the above, where a module attempts to satisfy a function import with another module's function export and these functions use certain types, the engine would automatically try to match the types. I believe that this would work for the module interaction case, however it creates another problem: merging modules would potentially change semantics, and that is unacceptable. (Merging two modules is straightforward and should work fine, but consider a three-module case: module M imports functions from N and O, the modules define types $m, $n, $o that are structurally identical and used in these functions' signatures. If N and O are merged, the merging tool cannot know that $n and $o are supposed to be the same nominal type (because there are no function calls between N and O that would implicitly provide this information); when linking M with the merged NO, it becomes apparent that two nominally distinct types $n and $o in NO are supposed to match the same type $m in M. This would re-introduce structural type equivalence into NO "through the back door".) So I think this idea does not hold up to critical inspection.

Another idea that has been mentioned before is to import and export types just like functions:

;; Module A
(type $string (export "string") (array i8))

;; Module B
(type $my_string (import "module" "string") (array i8))

(Note that module B repeats the (array i8) type definition in order to allow separate compilation, i.e. we want to enable engines to compile code for B before having seen A or knowing that B will import things from A.)
That solves the problem of needing a single type, but creates a (potential?) new problem: it requires the modules to agree on the export/import role assignment. For functions, this is natural, because the whole point is that the function definition is large and complex and one module's responsibility. For types, this is less natural, especially if we imagine an NPM-style decentralized ecosystem of libraries: what if I want to import one module's string hashing function and another module's string compression function, who gets to export the string type used for these interactions?

Per the observations above, there is some doubt whether this scenario is realistic (and hence whether the concern is warranted), but we may want to avoid the issue anyway, so as not to paint ourselves into a corner. We could work around this difficulty by searching for more symmetric/flexible approaches to sharing types across different modules.

One idea is to define a symmetric version of the directed export→import concept. Maybe:

;; Module A
(type $string (importexport "string") (array i8))

;; Module B
(type $my_string (importexport "string") (array i8))

(Side note: maybe we should pick a different strawperson keyword than importexport, such as publicname, just to make (spoken) conversations less ambiguous: "importexport" sounds like "import/export", but describes a significantly different notion.)

Problems solved with this approach:

  • we now have an explicit way to achieve the identification of two type definitions with one another that structural static types gave us implicitly. Notably, when two modules annotate two types this way as supposedly identical, the engine has to do the same structural type check as it does with the current MVP to verify this annotation's validity. (Or at least a very similar check; nominal types make recursive steps return more quickly but otherwise the algorithm is about the same.)
  • there's no longer a need to select an exporting module, all modules have equal "rank" wrt. type definitions
  • module merging is easy, as a merging tool knows exactly which types should be identified. (Such a tool may or may not perform its own checks to ensure that this identification of types is valid; if it is invalid, then the resulting merged module will be invalid and will fail validation later on.)

Problems created:

  • the modules must agree on the name "string". I would argue that this is not a problem in practice: since even in the structural world, module authors have to agree on matching type definitions, they now simply have to agree on one more detail, which is the type's importexport name. That's not a significant increase in coordination burden.
  • by having an implicit global shared namespace for importexport type names, there could be unintentional collisions. That is a serious concern that needs a solution.

Solution A: treat identical importexport names as requests that may well fail: a matching name causes the engine to perform a structural type check; if that check fails, then that's not a validation error, the types simply won't be identified with each other. If the two modules don't try to use the types in question for any interactions, then that's just fine. One way to look at this would be to say "we still have structural typing across module boundaries, but with the twist that the importexport name is part of the structural definition".

Solution B: analogous to how the coordinating host can map function exports to imports with different names, we could also empower (or burden) this coordinator with matching up type names. This would, effectively, give each module its own namespace for public type names. (This might be desirable in addition to solution A in order to solve the opposite problem: unintentional mismatches of type names.)

Solution C: please contribute other ideas by commenting :-)

Speaking of comments, I'm mostly trying to get a conversation / thought process started here. There may well be other problems that I have overlooked; please point them out! There may well be better, or at least additional, ideas for solutions that I haven't thought of; please present them!
If we're generally happy with nominal typing inside a module, it would be great to find a way to make it viable for multi-module scenarios as well -- maybe we can then finally arrive at a type system that we can reach consensus on?

To give credit where it's due: thanks to @tebbi, @rossberg, and @RossTate for having discussed these ideas with me. This post is more thought-through than it would have been without their input :-)

@dcodeIO
Copy link

dcodeIO commented Oct 6, 2020

Recently thought about something potentially related but in context of interface types: What if one module's string is (struct (field (i32 (ref array i16)))) and another module's string is (struct (field (i32 (ref array i16)))) as well, both using the name jstring by coincidence, but even though structurally identical, the i32 is a hashCode that is computed differently in what seems to be compatible? Will then fail late and apparently randomly at runtime, likely hard to debug. May well be that this is a non-concern in this context, but thought I mention.

@taralx
Copy link

taralx commented Oct 6, 2020

I'm just going to say that, at first glance, this looks like the same problem that linkers try to solve using weak symbols.

@aardappel
Copy link

Solution A) is nice because it just improves upon structural typing with "nominal typing light", i.e. just reduce the amount of structural typing matches by additionally matching on tag "string". Unintentional collisions are thus not a serious problem, since they just make things strictly better than not having this tag at all.

And it's not hard to come up with more elaborate names that will not clash unless someone specifically wants to.

Conversely, you could allow people to still opt in to structural typing, by specifying "" as a name (matches structurally unless a tag is specified) or "*" (matches anything structurally). Again, that would make it strictly better than only structural typing.

@littledan
Copy link
Contributor

Based on @rossberg 's observation that, to express generics, rtts might be dynamically created at runtime in a potentially unbounded way: this sort of canonicalization system would also have to be available at runtime. Maybe there could be an instruction that accepts an rtt and a some kind of chunk of memory, to cache the rtt in the system using the memory chunk as a dynamic key. (Or, it could be done in a user module which everyone is expected to import, as many others have discussed...)

@RossTate
Copy link
Contributor

Based on @rossberg 's observation that, to express generics, rtts might be dynamically created at runtime in a potentially unbounded way.

This observation is false. There already exists a C# compiler that produces independently verifiable (i.e. typed) x86 that counters this claim.

@conrad-watt
Copy link
Contributor

This observation is false. There already exists a C# compiler that produces independently verifiable (i.e. typed) x86 that counters this claim.

Could you make an issue that describes how this is done, with reference to the example @rossberg used in his slides?

@rossberg
Copy link
Member

Based on @rossberg 's observation that, to express generics, rtts might be dynamically created at runtime in a potentially unbounded way.

This observation is false. There already exists a C# compiler that produces independently verifiable (i.e. typed) x86 that counters this claim.

My precise observations were:

  1. The C#/.NET semantics allows for an unbounded number of dynamic type instantiations.
  2. All these must be distinguishable at runtime by language-level (explicit and implicit) casts.
  3. Hence an implementation of C#/.NET has to implement dynamic type generation.
  4. If it ought to be able to piggyback Wasm RTTs and casts for that, then Wasm RTTs must be generated dynamically.

Please explain which of these is false. AFAICT, the only solution was to essentially forego the premise of (4) and implement a mirror RTT representation in user space. I believe that's what you said as well.

The problem with such an implementation is that it fundamentally requires "erasure" for generic type parameters. Hence it would not be able to benefit from more precise typing via Wasm-level generics once we add them, and would instead be left with "unnecessary casts" for each access to something typed by a parameter -- the very problem you complain about elsewhere wrt array accesses.

@jakobkummerow
Copy link
Contributor Author

I think "the premise of (4)" is the crucial point. Yesterday's presentation sounded (to me) as if it took that premise as a given -- which I had trouble wrapping my head around, but clarifying that it is a conditional premise clears that up.

I think this premise is very much up for discussion. As even yesterday's presentation pointed out early on, the primary purpose of the types in WasmGC is to aid the engine in producing safe and fast code. Supporting every single feature of every single surface-level language's type system is a non-goal in particular because it is impossible.

For a low-level language with "assembly" in its name, it is appropriate and also nicely general/flexible to provide low-level primitives; we'd then expect module producers to build their individual required semantics on top of that. On the other hand, for better performance, we want to allow engines to (safely!) drop "unnecessary casts", which requires the type system to express somewhat higher-level (i.e. closer to the surface language) notions and guarantees. Clearly we have to draw the line somewhere.

I'm not sure where exactly we should draw the line; most folks on this issue tracker would probably agree that that's a very difficult question. I think it's fair to argue that supporting cast-free array element loads is more important than enabling dynamically generated type parameters to piggyback on the Wasm-level casts -- so drawing the line somewhere between these two wouldn't generally sound unreasonable to me. I mean, even if the C# type system allows you to generate String[][][][][] dynamically at runtime, (1) how often do you write such code, and (2) how often is it critical for the success of your app that such code can eliminate "unnecessary" casts?

@RossTate
Copy link
Contributor

The problem with such an implementation is that it fundamentally requires "erasure" for generic type parameters.

Following up on @jakobkummerow's comment, this sentence seems to be the source of misunderstanding. On slide 36 of my presentation overviewing the related literature titled "Custom Casting with Identifiers", I showed how "nominal" typed assembly languages reason about casts at the assembly level. That is, these "nominal" systems are able to reason about casts below the level WebAssembly rtts; they can describe the invariant maintained by a WebAssembly engine's use of an array of identifiers in an object's GC descriptor and guarantee that the assembly instructions used to access that array and compare its content with a given rtt actually ensure the object has the type specified by the rtt. To these "nominal" systems, this data structure used by engines to implement rtts is just one of the many data structures that one can use to implement casting and that these systems can reason effectively about. Some of these data structures place the relevant casting information in the object's GC descriptor, while others (like that on slide 36) place these data structures in the object itself, which enables even a program with generics to be implemented with a finite number of GC descriptors (if it wants).

@rossberg
Copy link
Member

@jakobkummerow, I agree, and that was kind of what I was getting at. Unfortunately, though, it does not matter how often such a features is used. In a separate compilation scenario you cannot make useful assumptions about that and usually need to compile for the general case. (The only localised scenario would be such a constrained and rare case that it’s hardly worth optimising for.) In practice, the implication is that you won’t be able to piggyback C#/.NET casts on Wasm casts.

That in turn means that adding generative RTTs to Wasm really only helps Java. Which raises the question that I have raised before: whether it is worth adding them at all for the MVP.

@rossberg
Copy link
Member

@RossTate, yes, but that is assuming a whole machinery of highly non-standard and specialised typing extensions that I estimate would be post-post-mvp at best.

@RossTate
Copy link
Contributor

Can you explain what you mean by "highly non-standard"? There is only one implemented system that is comparable to the Post-MVP goals of the GC proposal, and it uses these techniques, so I don't understand how you can classify them as highly non-standard.

@tlively
Copy link
Member

tlively commented Mar 12, 2021

I would like to voice my support for the importexport solution suggested here. In particular, I like that it solves the import/export asymmetry problem while keeping the benefits of nominal types. In particular, Wasm consumers would not need to do any graph canonicalization and calculating LUBs would be much simpler and faster, which is important for Binaryen.

@littledan
Copy link
Contributor

I like the idea of somehow adopting nominal struct types (if only to facilitate interaction with JS, e.g., so that fields can be easily named), but I worry that it could be a bit difficult to maintain this importoutput namespace (just having to agree on the names, I mean). It also feels a bit weird to me that the importoutput names are suddenly "global" in a sense, which I think is new in the Wasm module system.

Another option we might consider is having array types be structural, but struct types are nominal. I think this is a sort of natural split that many programming languages use anyway. The whole graph canonicalization problem goes away with this defintion of type equality, if I understand correctly.

With structural array types and nominal struct types, simple examples like strings can just work without needing an importexport capability. By contrast, classes (structs) usually have one module where they're initially declared, so the need for them to be structural/importoutput seems somewhat lower.

@tlively
Copy link
Member

tlively commented Mar 12, 2021

Yes, the global namespace is a little weird. Perhaps the importexport mechanism should have to explicitly specify the modules with which it is trying to coordinate, just like normal imports do.

Another option we might consider is having array types be structural, but struct types are nominal. I think this is a sort of natural split that many programming languages use anyway. The whole graph canonicalization problem goes away with this defintion of type equality, if I understand correctly.

Unfortunately not. Here's an example of three array type definitions that are equivalent under an equirecursive system but not syntactically equivalent:

(type $a (array (ref $a)))
(type $b (array (ref $c)))
(type $c (array (ref $b)))

@conrad-watt
Copy link
Contributor

conrad-watt commented Mar 13, 2021

@tlively this seems to still require a new global namespace, for modules rather than types. At least for tools, the importexport idea is like having a convention that all type definitions are imported, and a separate "manifest" module is responsible for exporting all types. When merging/packaging modules ahead of time, any type exports not respecting this convention can be moved into the manifest and turned into imports.

Things do get more complicated when attempting to compose modules together at runtime (maybe provided by a CDN). In the importexport world, the engine will implement one choice of runtime type management logic (e.g. managing global namespace, and resolving clashing type/module names) and the user's composition of modules must fit that logic. In a (nominal) import world, the user has to implement runtime type management logic themselves to ensure type imports and exports are matched up, which has the disadvantage of requiring more glue code, but the advantage that resolution of edge cases can be tailored to individual usecases.

Unfortunately not. Here's an example of three array type definitions that are equivalent under an equirecursive system...

In the world @littledan is hypothesising, it might make sense to restrict array types as being defined inductively (with respect to other array types).

@sbc100
Copy link
Member

sbc100 commented Mar 14, 2021

I'm just going to say that, at first glance, this looks like the same problem that linkers try to solve using weak symbols.

Indeed, I thought of C++'s COMDATs which work much like groups of weakly defined symbols: https://www.airs.com/blog/archives/52: "In C++ there are several constructs which do not clearly live in a single place".

@tlively
Copy link
Member

tlively commented Mar 15, 2021

@tlively this seems to still require a new global namespace, for modules rather than types.

I don't think this is different from how normal imports work, is it? e.g. all Emscripten imports come from the global "env" or "wasi" modules.

At least for tools, the importexport idea is like having a convention that all type definitions are imported, and a separate "manifest" module is responsible for exporting all types. When merging/packaging modules ahead of time, any type exports not respecting this convention can be moved into the manifest and turned into imports.

Yes, I agree that this is the case for tools operating on module graphs. For tools operating on a single module at a time, it is more akin to having the module declare and export all of their types. Both points of view are useful in their respective contexts.

Things do get more complicated when attempting to compose modules together at runtime (maybe provided by a CDN). In the importexport world, the engine will implement one choice of runtime type management logic (e.g. managing global namespace, and resolving clashing type/module names) and the user's composition of modules must fit that logic. In a (nominal) import world, the user has to implement runtime type management logic themselves to ensure type imports and exports are matched up, which has the disadvantage of requiring more glue code, but the advantage that resolution of edge cases can be tailored to individual usecases.

Yes, I would expect that in addition to importexport, we would also allow toolchains to use normal imports and exports for use cases where there is already inherent asymmetry, such as when central runtime modules export types to client modules. importexport would be able to fill the role of either import or export, so it would be possible to import an importexported type or to importexport an exported type.

@conrad-watt
Copy link
Contributor

conrad-watt commented Mar 15, 2021

I don't think this is different from how normal imports work, is it? e.g. all Emscripten imports come from the global "env" or "wasi" modules.

Currently it's a very flexible convention of the host as to how "global" the module namespace really is. For example, the JS-API doesn't have a global namespace; a record labelled "env" must be passed as an explicit argument to the instantiation call (and could be varied with each instantiation). An importexport declaration wouldn't map so neatly to this approach, unless we added a first-class "type manifest" object to JS which is passed in and returned with every instantiation.

@tlively
Copy link
Member

tlively commented Mar 15, 2021

Oh, I see what you mean. I wasn't thinking that importexport should introduce any new namespace that is more global than what we already have.

Say modules foo and bar want to coordinate their string types.

;; foo.wasm
(type (importexport "bar" "bar_string" as "foo_string") ...)
;; bar.wasm
(type (importexport "foo" "foo_string" as "bar_string") ...)

Say we instantiate foo.wasm first, so we don't yet have the exports of bar.wasm available to pass as imports to foo. Despite not having "bar" imports, the instantiation of foo.wasm succeeds, and foo_string is exported as a newly created type. The foo instance's exports are passed as imports to the bar instance in a record labeled "foo", so the bar instance uses and re-exports foo_string as bar_string and does not create a fresh type. Since the importexports are symmetrical, we similarly could have instantiated the modules in the other order with no problems. Nothing new needs to be introduced in the JS API.

@conrad-watt
Copy link
Contributor

@tlively ah I see, this is like a fleshed-out solution B of the OP? I was reading all of the comments in the frame of (something like) solution A... sorry. I agree it makes sense to add a module namespace to the importexport declaration in this case.

I like this idea. It would still require some user-defined central coordinating logic (to keep a record of previously exported types), so I don't know what @rossberg thinks. This solution's primary value would be in smoothing over issues of unclear ownership (e.g. two modules failing to compose because they both export a nominal type they "morally" want to share),

@tlively
Copy link
Member

tlively commented Mar 15, 2021

@conrad-watt, can you elaborate on the user-defined central coordinating logic that would still be needed?

@conrad-watt
Copy link
Contributor

@tlively imagine some system that needs to dynamically compose modules together at runtime in some fancy way. In general, the top-level may not know ahead of time whether the modules it's composing together want to share types. So it has to come up with some convention to decide when to allow an importexport declaration to create a new type, and when to provide a previously created type as a matching import.

The obvious system would be to use the declared namespaces naively (although I'm sure something more complicated could be done). So the top-level script keeps a map from (module,name) to type. Before instantiating a module, the script checks whether any of the module's importexport declarations already exist in the map, and if so provides them as imports. Otherwise, it grabs the newly exported types and sticks them in the map for future use. This means that all instantiations have to flow through some central user code which manages the map.

I personally don't think requiring such code in this scenario would be a problem, but @rossberg has been attached to the idea that even the most complicated module composition schemes could in principle avoid requiring the user to manage types in a central location, by using rtt.canon everywhere. Essentially, the engine takes responsibility for managing the central collection of types (through rtt.canon/sub memoisation). This was something that solution A was trying to achieve as well (by having an ambient/global namespace).

@tlively
Copy link
Member

tlively commented Mar 16, 2021

When using the declared namespaces naively, it is necessary to instantiate the modules in a topological order such that exporters are instantiated before the corresponding importers. It is also sufficient to unconditionally provide the exports of every module instantiated so far as the available imports to the next module. This is true with just function imports and exports, and neither type imports and exports nor type importexport change that as far as I can tell.

As far as more complicated systems go, I was just chatting with @sbc100 about how Emscripten's dynamic linking system already requires type coordination for correctly implementing dynamically loaded function pointers with MVP WebAssembly, so it's already not the case that type coordination can be avoided by all systems. I'm sympathetic to the idea that type coordination should not be be overly burdensome and in particular I think it makes sense to resolve the asymmetry problem, but beyond that we will need to be more precise about the problems we want to solve to make forward progress.

@lukewagner
Copy link
Member

To check my understanding first: could this idea be framed as enabling modules to optimize their validation+instantiation-time by reducing the number of type definitions that need to be checked for structural equality? In theory, importexport adds an extra degree of expressivity, allowing types that would otherwise be identified structurally to be kept distinct, but that doesn't seem to be part of the motivation, as far as I can see.

If that is the case: could these optimizations also be achieved by replacing (importexport "a" "b") with a boolean attribute, say, "canonicalize". If canonicalize is not present, the type definition would be "nominal"/"generative" (producing a distinct type per instance, i.e., what you get without structural typing), and if canonicalize was present, it would behave the same as-if it was an importexport with a string that was automatically derived from the structural type. The benefit being that toolchains wouldn't need to worry about string collisions and namespace management.

But I also wonder if there's a separate unstated performance objective here of specifically wanting to have a string as a way to simplify the underlying structural equality algorithm? If so, then I think the actual win depends on the pending choice of iso- vs. equi-recursive structural equality, since, iiuc, iso-recursion would make type definitions fairly easy to hash and canonicalize.

@conrad-watt
Copy link
Contributor

conrad-watt commented Mar 17, 2021

To check my understanding first: could this idea be framed as enabling modules to optimize their validation+instantiation-time by reducing the number of type definitions that need to be checked for structural equality? In theory, importexport adds an extra degree of expressivity, allowing types that would otherwise be identified structurally to be kept distinct, but that doesn't seem to be part of the motivation, as far as I can see.

I think this is something that comes from having nominal types in general, rather than specifically importexport. Two modules could choose to export their types under different names, or one could import the other's type (which would require a shallow structural check at instantiation-time). The novelty of importexport is that it allows you to defer this choice to instantiation-time (edit: since having to make the ahead-of-time choice as to who is responsible for exporting was one of the big criticisms of nominal types).

@tlively
Copy link
Member

tlively commented Mar 17, 2021

This discussion of importexport is predicated on a hypothetical assumption that we will move from structural typing to nominal typing for recursive types. The motivation for that right now is purely to eliminate the complexity of graph canonicalization that structural typing requires for realistic engines. Once we have an implementation, it will also be good to measure it's performance and factor that into this discussion as well. If we allowed structural typing to be used at all for recursive types, we would not achieve the goal of eliminating graph canonicalization.

@conrad-watt
Copy link
Contributor

If so, then I think the actual win depends on the pending choice of iso- vs. equi-recursive structural equality, since, iiuc, iso-recursion would make type definitions fairly easy to hash and canonicalize.

If we allowed structural typing to be used at all for recursive types, we would not achieve the goal of eliminating graph canonicalization.

Are we concretely considering iso-recursive types? This would also avoid graph canonicalisation.

@lukewagner
Copy link
Member

Ah, avoiding graph canonicalization is what I was guessing and getting at in my third paragraph in my last response (although I said "performance objective", when it's also an "engineering complexity objective"). If that is the major motivating goal, +1 to considering iso-recursion. IIUC, with importexport and names, the structural compatibility conditions that need to be checked end up being (non-canonicalizing) simple graph comparisons in the rough shape of iso-recursion anyways.

@tlively
Copy link
Member

tlively commented Mar 18, 2021

Yes, both nominal types and iso-recursive types would eliminate the graph canonicalization in essentially the same way. @rossberg has hinted that he will present on the iso-recursive types option in the not-too-distant future, and I expect we will want to compare and contrast that design with both the current equi-recursive proposal and the nominal design explored in this issue. I would be surprised if any of those three options was the clear best by all the measures we care about.

@RossTate
Copy link
Contributor

I'll note that iso-recursive types will not remove the burden of type canonicalization from tools or engines. With iso-recursive types, you can still have distinct type indices in the same module are across modules defining the same type, and rtt.canon will still require those to produce the same rtt.

Also, the various proofs of undecidability I gave apply just as well to (extensions of) iso-recursive types as to equi-recursive types.

And iso-recursive types have composability problems that are well-known to be problematic in the world of modules, the details of which I'm sure @rossberg knows better than me.

So, although iso-recursive types will make subtyping/canonicalization linear-time rather than quadratic-time (and still not constant-time), I do not expect them to solve any of the real existing problems and I expect them to introduce new even subtler problems.

@tlively
Copy link
Member

tlively commented Jun 9, 2021

In #220, we had this example using importexport:

;; A.wasm
(module
  (type $B.U (importexport "B" "U" as "B.U") (struct (ref $T))
  (type $T (importexport "B" "A.T" as "T") (struct (ref $B.U))
)
;; B.wasm
(module
  (type $A.T (importexport "A" "T" as "A.T") (struct (ref $U))
  (type $U (importexport "A" "B.U" as "U") (struct (ref $A.T))
)

@rossberg asked a few questions and comments about this example, and I'm answering here so as not to clutter the other thread with the details of this particular mechanism.

Technically, I don't think that mechanism would be nominal types -- certainly not in the established sense of generative type definitions.

It's still nominal in the sense that types have identity independent of their structure and two instances agree on the identity of a type only if it has explicitly been exported (possibly via importexport) from one and imported (possibly via importexport) into the other.

Nobody has worked out a proper semantics for this idea (there is no precedent for something like it AFAICT (*)), so it's difficult to say what it actually would be. For example, if I externally project corresponding types from each module, are they equivalent or not?

The semantics is no different from normal (nominal) type definitions and type imports, except that whether it is a definition or an import is determined by whether an import with the expected name is supplied at instantiation time. If the import is supplied, this is no different from any other nominal type import. Otherwise, it is no different from any other nominal type definition.

So if you look at corresponding types exported from two instances, they will be equivalent if and only if the export from one instance was supplied as an import to the other instance.

Note that you don't have their importexport names at this point anymore, because they're not part of the types themselves (and if they were, we'd be back to an anti-modular global type namespace). If this is coherently definable at all, I believe it's ultimately some hybrid that is more structural than nominal in nature, though in some odd way tied to module boundaries (which makes me wary, because that typically breaks composability).

Does it make sense now that importexport builds on the existing import/export mechanisms and is not any more dependent on names or module boundaries than the existing mechanisms?

But even then I don't see how it would solve the recursive linking problem without real recursive linking in Wasm. Concretely, how would you link your two modules given just Wasm's module instantiation constructs?

Either module could be instantiated first, but without loss of generality let's assume A is instantiated first. We instantiate it with no imports, so both of the importexports act as type definitions and exports. The new instance A exports the two new types with names "B.U" and "T". These exports are then passed as imports (with module name "A") to the instantiation of B. These imports match the names expected by B's importexports, so they both act as imports rather than type definitions. Because the types were explicitly exported and A and imported into B, A and B agree on the identity of the types.

This linking action of taking all the exports produced so far and passing them as potential imports to the next module to be instantiated is no different from how linking would be done today.

@rossberg, does that all make sense?

@tlively
Copy link
Member

tlively commented Jun 9, 2021

A couple more points I just thought of:

  1. The exporting part of importexport doesn't do anything new, so a better name and semantics for the new construct would be weak_import, which uses the imported type if provided or otherwise defines a new type. I'll keep using importexport in conversation for consistency, though.

  2. This is actually a little weirder than I previously described because type validation could not be completed until the identities of all imports are determined at instantiation time. Module validation and compilation would produce a list of subtype constraints that have to be met and the constraints would have to be checked at instantiation time. On one hand I can see how this could be considered to meaningfully diverge from the current validation model, but on the other hand it does not require any instantiation-time code generation and instantiation already does type checking of the imports, so I don't think this is a clear cut violation of the current model.

@rossberg
Copy link
Member

@tlively, thanks for the clarification. If I understand what you have in mind then it is that a definition becomes retroactively generative if it's not supplied as an import. Or in other words, if the import isn't supplied, then a generative one is synthesised implicitly.

Just to clarify, how does that work transitively? Imagine:

(module $A
  (type $t (importexport "" "t") (struct))
  (type $u (importexport "" "u") (struct (field (ref $t))))
)

What if I supply "u" as an import, but not "t"? Is that rejected? What if I had

  (type $t (struct))

And what if it was

  (type $t (func))

or

  (type $t (importexport "" "t") (func))

and "t" is or is not supplied?

  1. The exporting part of importexport doesn't do anything new, so a better name and semantics for the new construct would be weak_import, which uses the imported type if provided or otherwise defines a new type. I'll keep using importexport in conversation for consistency, though.

I agree that separating these from exports would be a natural factorisation (and would avoid complicating static linking tools more than necessary).

  1. This is actually a little weirder than I previously described because type validation could not be completed until the identities of all imports are determined at instantiation time. Module validation and compilation would produce a list of subtype constraints that have to be met and the constraints would have to be checked at instantiation time. On one hand I can see how this could be considered to meaningfully diverge from the current validation model, but on the other hand it does not require any instantiation-time code generation and instantiation already does type checking of the imports, so I don't think this is a clear cut violation of the current model.

If I understand you correctly, then that would be a total show stopper. The signature of a module (as a list of imports and exports) would no longer suffice to actually describe the type of a module. You'd have additional constraints on the side that you'll need to surface and make explicit somehow, because they affect what instantiations are allowed. And these constraints would generally depend on implementation details of the module, so could be an abstraction leak. Consequently, this not only breaks our current compilation model (and the module linking proposal), there also is no clean way to extend it to fix that, at least not with such a semantics.

The obvious alternative to this semantics would be that validation conservatively assumes that any weak imports are distinct. If they become equivalent after the fact then that's okay, but for all intends and purposes of validation, they are considered to be distinct types.


But apart from these technical questions, the bigger one is: would such a feature actually solve the problem with nominal types? And the answer is: no. While it allows you to defer the decision whether a type is an import or an export until link time, you still have to make a choice at that point. And to maintain type sharing, you'd be forced to opt all importexport's into imports, and therefore still produce a module graph with inverted dependencies. Which still amounts to a global program transformation, which is only viable if all clients to a module are known and linking is performed and controlled at a central place. So you essentially gain nothing over regular type imports, modular layering and incremental linking remains broken.

Basically, this already follows from my description above: "in other words, if the import isn't supplied, then a generative one is synthesised implicitly." This makes clear that this feature does not fundamentally change the expressiveness of the system. It merely adds some convenience: you don't need to define fresh types explicitly. But there is no further difference to just doing that at the place where you would omit them.

@tlively
Copy link
Member

tlively commented Jun 10, 2021

@tlively, thanks for the clarification. If I understand what you have in mind then it is that a definition becomes retroactively generative if it's not supplied as an import. Or in other words, if the import isn't supplied, then a generative one is synthesised implicitly.

Yes, that's a good way to put it 👍

Just to clarify, how does that work transitively? Imagine:

(module $A
  (type $t (importexport "" "t") (struct))
  (type $u (importexport "" "u") (struct (field (ref $t))))
)

What if I supply "u" as an import, but not "t"? Is that rejected?

At instantiation time, there would be a shallow structural check (taking nominal identities into account) to ensure that all of the imported types have the same top level structure as the fallback definitions that were used during validation and compilation. In this example, the type of $u's field cannot have the same identity as $t because $t is generated during the instantiation of $A, when $u must already have been created, so instantiation would fail.

What if I had

  (type $t (struct))

Instantiation would necessarily fail in this case for the same reason: the instantiation of $A generates $t, so it cannot possibly be part of the imported $u.

And what if it was

  (type $t (func))

or

  (type $t (importexport "" "t") (func))

and "t" is or is not supplied?

I think func would work the same way as struct in this regard. Is there a compatibility hazard here with signature types currently being specified as being structural? If so, I think it would be good to reason through that together on a new thread.

...

  1. This is actually a little weirder than I previously described because type validation could not be completed until the identities of all imports are determined at instantiation time...

If I understand you correctly, then that would be a total show stopper...

The obvious alternative to this semantics would be that validation conservatively assumes that any weak imports are distinct. If they become equivalent after the fact then that's okay, but for all intends and purposes of validation, they are considered to be distinct types.

Ah, good idea. I think that would work.

But apart from these technical questions, the bigger one is: would such a feature actually solve the problem with nominal types? And the answer is: no. While it allows you to defer the decision whether a type is an import or an export until link time, you still have to make a choice at that point. And to maintain type sharing, you'd be forced to opt all importexport's into imports, and therefore still produce a module graph with inverted dependencies. Which still amounts to a global program transformation, which is only viable if all clients to a module are known and linking is performed and controlled at a central place. So you essentially gain nothing over regular type imports, modular layering and incremental linking remains broken.

Basically, this already follows from my description above: "in other words, if the import isn't supplied, then a generative one is synthesised implicitly." This makes clear that this feature does not fundamentally change the expressiveness of the system. It merely adds some convenience: you don't need to define fresh types explicitly. But there is no further difference to just doing that at the place where you would omit them.

I don't see how inverted dependencies are a problem, but I'll respond in more depth over in #220, where you have raised the same point in #220 (comment), since this objection is not specific to this particular scheme.

Edit: actually, I'll reply to that here, too, since I think the details of importexport specifically are important to my response.

@tlively
Copy link
Member

tlively commented Jun 10, 2021

But apart from these technical questions, the bigger one is: would such a feature actually solve the problem with nominal types? And the answer is: no. While it allows you to defer the decision whether a type is an import or an export until link time, you still have to make a choice at that point. And to maintain type sharing, you'd be forced to opt all importexport's into imports, and therefore still produce a module graph with inverted dependencies. Which still amounts to a global program transformation, which is only viable if all clients to a module are known and linking is performed and controlled at a central place. So you essentially gain nothing over regular type imports, modular layering and incremental linking remains broken.

The compilation of each module does need to know about the modules it imports types from and the structure of those imported types. That knowledge also needs to include transitive dependencies. The compilation scheme would use importexport for all of those dependencies and transitive dependencies to describe the structure of the type graph in each module without introducing new "intantiated-before" constraints between those modules.

Criticality, however, this compilation scheme does not require knowledge of client modules, so I disagree that it requires global program knowledge or a global program transformation. AFAICT, this scheme supports fully separate compilation of libraries that do not depend on each other, even if those libraries end up getting linked into the same program.

Basically, this already follows from my description above: "in other words, if the import isn't supplied, then a generative one is synthesised implicitly." This makes clear that this feature does not fundamentally change the expressiveness of the system. It merely adds some convenience: you don't need to define fresh types explicitly. But there is no further difference to just doing that at the place where you would omit them.

AFACT, it adds just a tiny bit of expressiveness—the ability to create type import-export pairs between modules that do not create "instantiated-before" constraints between those modules. And that extra expressiveness is precisely what is required to support cyclic dependencies in the type graph.

@rossberg
Copy link
Member

You are correct that the instantiated-before constraints are not hardwired into the modules themselves. But the same would be true if I used regular imports and created the types manually, by some mechanism, wouldn't it? There is no effective difference I can see.

Moreover, while the instantiated-before constraint is not technically hardwired into the modules, you still have to commit to it in your linking scheme/convention, and all modules have to agree on the same conventions upfront. Otherwise they will not be linkable with one another. So for all practical purposes, whether the direction is decided by enforcement or by mere convention, has the same effect.

That knowledge also needs to include transitive dependencies.

FWIW, that is not generally necessary when types are exports, only when they are inverted into imports. In the export approach, it's sufficient to reexport a transitive export, and the intermediate import may be devoid of any constraints on the type if they are irrelevant to the intermediate module, i.e., it has no usage constraints. With the import model, all these become definitional constraints, imposed on some other module supplying the types (e.g., a client), and they have to be repeated at every level to check the constraints transitively.

It's probably more effective if I prepare another presentation for some of the next meetings to explain these problems more concretely.

@tlively
Copy link
Member

tlively commented Jun 15, 2021

You are correct that the instantiated-before constraints are not hardwired into the modules themselves. But the same would be true if I used regular imports and created the types manually, by some mechanism, wouldn't it? There is no effective difference I can see.

Yes, I agree. The only difference is that importexport could be an addition to the core spec, so no external mechanism or refactoring of modules would be necessary. Besides that, this idea is essentially the same as the refactoring @RossTate suggests in #220 (comment).

Moreover, while the instantiated-before constraint is not technically hardwired into the modules, you still have to commit to it in your linking scheme/convention, and all modules have to agree on the same conventions upfront. Otherwise they will not be linkable with one another. So for all practical purposes, whether the direction is decided by enforcement or by mere convention, has the same effect.

Agreed, but is there anything special about type imports and exports that make this more constraining than before? These constraints and conventions already have to be in place for all other imports and exports to work correctly.

That knowledge also needs to include transitive dependencies.

FWIW, that is not generally necessary when types are exports, only when they are inverted into imports. In the export approach, it's sufficient to reexport a transitive export, and the intermediate import may be devoid of any constraints on the type if they are irrelevant to the intermediate module, i.e., it has no usage constraints. With the import model, all these become definitional constraints, imposed on some other module supplying the types (e.g., a client), and they have to be repeated at every level to check the constraints transitively.

Yes, I agree that intermediate modules that just re-export imported types would require knowledge of the type structure in these nominal schemes but not in the equirecursive scheme with type bounds. But in either system the defining module upstream from the intermediate module and the using module downstream from the intermediate module both need to know (to some nontrivial extent) the shape of the type, so I can't imagine a realistic compilation scheme for which requiring the type shape in the intermediate module would be a problem.

Can you describe such a compilation scheme?

It's probably more effective if I prepare another presentation for some of the next meetings to explain these problems more concretely.

More examples would be great and it would be helpful if you could post them to a discussion thread so we all can take the time to digest and respond to them. Then once we're all on the same footing, we can continue the discussion with higher bandwidth in a meeting.

@rossberg
Copy link
Member

Agreed, but is there anything special about type imports and exports that make this more constraining than before? These constraints and conventions already have to be in place for all other imports and exports to work correctly.

Yes, but the problem isn't the presence of type imports or exports per se. The problem is inverting the original direction of (structural) type exports to imports during either compilation or linking. That only works if you consistently do it for all modules in the program. And it's a whole program transformation because it does not respect the dependency structure of the source, e.g., changes which modules you can link independently and in what order. Obviously, if you are doing whole-program linking anyway then that doesn't matter, but otherwise it does.

Yes, I agree that intermediate modules that just re-export imported types would require knowledge of the type structure in these nominal schemes but not in the equirecursive scheme with type bounds. But in either system the defining module upstream from the intermediate module and the using module downstream from the intermediate module both need to know (to some nontrivial extent) the shape of the type, so I can't imagine a realistic compilation scheme for which requiring the type shape in the intermediate module would be a problem.

In the export scheme, only the defining module needs to know the exact type, all downstream ones can choose to ignore it, partially or completely, while still passing through the information. For one, that changes code size, the amount of transitive imports needed (potentially exponentially so!), and the recompilation story when changing a type, but all those are comparably minor points.

The big problem is handling type sharing across multiple independent modules. When exporting structural types, you can just define them in multiple places, and you can link and use each module independently, in different places, and reexport their types without worrying how downstream clients might use them (together).

(In fact, for structural types, you don't even need explicit exports, but that's just another aside.)

When inverted into nominal imports from the client, then you simply cannot do that. You cannot link either of the modules before you know if there is or isn't a downstream use where these types flow together and need to be compatible. You ultimately need to know the entire program before you can decide where to define the types and how to supply them.

More examples would be great and it would be helpful if you could post them to a discussion thread so we all can take the time to digest and respond to them. Then once we're all on the same footing, we can continue the discussion with higher bandwidth in a meeting.

I could try that, but it would take way more time. I have increasingly realised that quite a bit of context needs to be established for a thorough explanation, and there's been ample evidence here that a GitHub discussion is not an effective way to achieve that, because we all get dragged down into excessively time-consuming tangents. I believe it's much more effective and time-efficient for everybody to walk through examples interactively, at least for starters.

@RossTate
Copy link
Contributor

There is already an example in flight in #220. If you now believe that the example you chose is somehow not representative, then please provide one that is more representative. Feel free to give more context on the example, e.g. not just the source code but also how things are compiled, loaded, and linked—in fact, I'd prefer you provide that additional context. But I, for one, have found the GitHub discussions to be much more effective, and especially much more fair, than presentations.

@rossberg
Copy link
Member

#220 is about mutual recursion, which is a different can of worms. As per the current issue, I was referring to explaining the general modularity hazards of nominal types, to get past repeating myself over and over. This time I even ought to be able to demo how they directly affect something like the Wob compiler.

@tlively
Copy link
Member

tlively commented Jun 17, 2021

Agreed, but is there anything special about type imports and exports that make this more constraining than before? These constraints and conventions already have to be in place for all other imports and exports to work correctly.

Yes, but the problem isn't the presence of type imports or exports per se. The problem is inverting the original direction of (structural) type exports to imports during either compilation or linking. That only works if you consistently do it for all modules in the program. And it's a whole program transformation because it does not respect the dependency structure of the source, e.g., changes which modules you can link independently and in what order. Obviously, if you are doing whole-program linking anyway then that doesn't matter, but otherwise it does.

I'm not too worried about having to do this inversion for all modules in a program, since they all presumably use the same compilation scheme. I'm curious about the distinction you're drawing between whole-program and non-whole-program linking here. Can you explain what the latter might look like?

Yes, I agree that intermediate modules that just re-export imported types would require knowledge of the type structure in these nominal schemes but not in the equirecursive scheme with type bounds. But in either system the defining module upstream from the intermediate module and the using module downstream from the intermediate module both need to know (to some nontrivial extent) the shape of the type, so I can't imagine a realistic compilation scheme for which requiring the type shape in the intermediate module would be a problem.

...

The big problem is handling type sharing across multiple independent modules. When exporting structural types, you can just define them in multiple places, and you can link and use each module independently, in different places, and reexport their types without worrying how downstream clients might use them (together).

...

When inverted into nominal imports from the client, then you simply cannot do that. You cannot link either of the modules before you know if there is or isn't a downstream use where these types flow together and need to be compatible. You ultimately need to know the entire program before you can decide where to define the types and how to supply them.

To be clear, types in independent modules would "flow together" only if they were structural types in the source language as well, right? Rather than examining the downstream users of such types, the compilation scheme could also unconditionally import them, but that would require the dynamic linker to perform some module introspection to discover which shared types to provide at runtime. This gets us back to the old arguments about whether it is acceptable to have to import source-level structural types like tuples from a central shared location, so I'm not sure there's much more to say here until we get guidance from real-world implementations.

@tlively
Copy link
Member

tlively commented Jun 17, 2021

More examples would be great and it would be helpful if you could post them to a discussion thread so we all can take the time to digest and respond to them. Then once we're all on the same footing, we can continue the discussion with higher bandwidth in a meeting.

I could try that, but it would take way more time. I have increasingly realised that quite a bit of context needs to be established for a thorough explanation, and there's been ample evidence here that a GitHub discussion is not an effective way to achieve that, because we all get dragged down into excessively time-consuming tangents. I believe it's much more effective and time-efficient for everybody to walk through examples interactively, at least for starters.

I agree that high-bandwidth discussions during the video calls is valuable as well. Perhaps we could get the best of both worlds if the slides were posted at least a few business days before the meeting so folks can get a rough idea of what will be covered and how they might respond. @rossberg, does that sound good to you? If that doesn't leave time to prepare slides for the next meeting, it would be fine to defer to the following meeting.

@rossberg
Copy link
Member

I know very few people who ever succeed in completing slides days in advance, no matter when a presentation is scheduled. ;)

Practicalities aside, I want to point out that recent requests like this appear to reflect a rather non-constructive understanding of discussion: individuals demand early access to slides in order to prepare their "response" (as I've read here a number of times). Why should the presenter not equally demand access to that response beforehand, so that they have time to think about a response to that? And so on? Obviously, this drives the idea of an in-person discussion ad absurdum.

A discussion is no shoot-out. If people need time to think about something that's said or presented in a meeting, then it's totally fine to have follow-up discussions.

@tlively
Copy link
Member

tlively commented Jun 24, 2021

Fair enough, and we do have recent examples of productive follow-on conversations such as our iso-recursive typing discussion.

@RossTate
Copy link
Contributor

I would be less concerned if there were not even more examples of follow-on discussions to @rossberg's presentations that have come to a halt because of @rossberg's absence (and then he has even spoken in meetings as if the issue had been decided due to his presentation). To make matters worse, even for those discussions that have come to an apparent conclusion (typically due to others' efforts), I often find that people still believe @rossberg's presentation is the status quo even when the conclusion ends up contradicting his presentation (because, understandably, people do not follow these detailed discussions). This pattern leads to a sprawl of unresolved issues in which everyone has a different understanding of the current status of the issue, making for a huge mess in communication and coordination in which there is no sense of progress.

So, before adding yet another item to the sprawl, I request that we first schedule time to discuss the outcomes of the follow-ons to @rossberg's prior presentations and hopefully bring at least some items to a close. We already have a plan to discuss #217, which was created as a follow-on to his most recent presentation, and I think we should do the same for the other issues that were created as follow-ons to prior presentations.

@tlively
Copy link
Member

tlively commented Jun 24, 2021

I agree that we should do a better job of summarizing and closing out old discussion issues. But a public GitHub discussion is not an appropriate place to discuss the conduct of members of the community. Since we don't yet have a standard process for appropriately raising this kind of concern, let's discuss your specific concerns in a private meeting instead.

@rossberg, please schedule time to present in an upcoming meeting whenever it is convenient for you. I am eager to make progress on these discussions however we can.

@rossberg
Copy link
Member

I agree that there are too many open issues, but as far as I can tell, for the relevant ones, that is due to the lack of a satisfactory solution or conclusion. There are also various issues open that are only marginally relevant to the MVP. Perhaps we should close some of those in order to focus the discussion better and waste less time and energy on tangents.

@jakobkummerow
Copy link
Contributor Author

In light of today's meeting and how the importexport idea showed up there, I'd like to re-emphasize that the core idea behind this suggestion was to make canonicalization opt-in for each type.

Instead of all types getting canonicalized automatically (which causes a lot of time spent doing that, with no way to opt out), modules could annotate types that they (use for interfacing with other modules and hence) need to have canonicalized (using whichever canonicalization algorithm/semantics we settle on). The hope is that the number of types participating in canonicalization on would be significantly smaller (1%? 10%?) than the number of types overall showing up in a module, and hence the time spent on canonicalization would be a lot less. An order of magnitude (or two) less instantiation work would make a big difference for practical viability.

Using a name along with that keyword was just a strawman detail intended to further reduce the number of possible pairs of types that the canonicalization system would have to check. If that's not considered viable, we can totally drop that detail, and have a parameterless please-canonicalize-this optional annotation on (type ...) definitions.

@rossberg
Copy link
Member

@jakobkummerow, thanks for the clarification. Isn't that just another way of saying that you suggest having both structural and nominal type definitions? (Which would be completely orthogonal to exports and imports?)

FWIW, if canonicalisation can be made linear, then I would be very surprised if it occurred as a relevant cost factor in compilation or instantiation.

@jakobkummerow
Copy link
Contributor Author

Yes, it'd be one particular way of having both kinds of types.

I'm interested in making forward progress, so I'm looking for pragmatic compromises. Without going into detail here, there appear to be undeniable arguments that some (current or future) use cases are best served by nominal types, whereas others require type canonicalization; so to me the resulting question is: "how can we practically accommodate both?"

The canonicalization time results we've seen so far in the equirecursion experiments were worrying.

@tlively
Copy link
Member

tlively commented Feb 10, 2022

We've settled on the isorecursive type system in #243 as the solution to many of the issues raised here, so I'll close this issue.

@tlively tlively closed this as completed Feb 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests