Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompatibilities with Bounded Parametric Polymorphism #116

Closed
RossTate opened this issue Aug 14, 2020 · 19 comments
Closed

Incompatibilities with Bounded Parametric Polymorphism #116

RossTate opened this issue Aug 14, 2020 · 19 comments

Comments

@RossTate
Copy link
Contributor

RossTate commented Aug 14, 2020

Terminology note: Parametric polymorphism is the use of instantiable type parameters, whereas subtype polymorphism is the use of subtyping. Bounded parametric polymorphism furthermore lets type parameters have subtyping constraints, combining both parametric and subtype polymorphism.

In #114, the following extensions are described as part of a sketch for supporting parametric polymorphism:

  • Allow type parameters on function types:

    • functype ::= (func <typeparam>* <valtype>* <valtype>*)
    • typeparam ::= (typeparam $x <typeuse>?)
    • like with type imports, the <typeuse> describes a bound, contraining instantiation to a subtype; they are likewise instantiated with heap types
  • Add a way to reference a type parameter as a heap type:

    • heaptype ::= ... | (typeparam $x)
    • a type parameter is a subtype of its bound

These two extensions make subtyping undecidable. See Benjamin Pierce's proof of undecidability of this system in Bounded Quantification is Undecidable from 1993.


#114 also describes the following extensions towards supporting parametric polymorphism:

  • Allow type parameters on type definitions:

    • typedef ::= (type <typeparam>* <deftype>)
    • recursive type definitions with parameters must uniformly recursive, i.e., not expand to infinitely large types (details TBD)
  • Add a way to supply type arguments in a type use:

    • typeuse ::= (type $t <heaptype>*)
    • type uses in a heap type must instantiate all parameters

These two extensions, combined with the two above, make type validation, i.e. whether a type itself is valid, undecidable.


#114 states the following goal:

  1. It is highly desirable to avoid the rabbit hole of reified generics, where every type parameter has to be backed by runtime type information. That approach is both highly complex and has a substantial overhead, even where that information isn't needed.

#114 then gives the following example for polymorphically constructing pairs:

 (type $Pair (typeparam $X)
   (struct (field (ref (typeparam $X)) (field (ref (typeparam $X)))))
 )

 (func
   (typeparam $T)
   (param $rttT (rtt (typeparam $T)))
   (param $x (ref $T))
   (result (ref $Pair (typeparam $T)))

   ;; allocate a pair with full RTT information
   (struct.new_with_rtt
     (rtt.canon $Pair (typeparam $T) (local.get $rttT))
     (local.get $x) (local.gt $x)
   )
 )

Note that this effectively uses reified generics. The subset of extensions listed above require reified generics, so even if WebAssembly does not always reify its type parameters, these extensions would require all polymorphic languages compiling to WebAssembly to use reified generics (if they want to take advantage of WebAssembly's polymorphism).

#114 also provides the following extension:

  • Possibly, also allow type fields in structs, which would technically amount to existential types:
    • datatype ::= ... | (struct <typefield>* <fieldtype>*)
    • typefield ::= (typefield $x <typeuse>?)
    • again, the <typeuse> describes a bound
    • consecutive field types can refer to type fields in the same way as to type parameters

This approach can eliminate the need for reified generics. However, limiting typefield to only specify an upper bound is known to be unable to express many common patterns. For example, it's insufficient for enabling writes to (generic) arrays or for enabling function calls to (first-class) functions.

Unfortunately, the further extension with lower bounds, which would be able to express these common patterns, is undecidable (by dualizing Pierce's proof).


Developing decidable sufficiently expressive combinations of bounded parametric polymorphism and structural subtyping is, to the best of my knowledge, still a longstanding open research problem. Note that these incompatibilities exist without even taking into account the current MVP's use of type recursion, making it all the more unlikely that we will be able to add bounded parametric polymorphism to the current MVP.

@taralx
Copy link

taralx commented Aug 15, 2020

Perhaps the issue 14 that you wanted to refer to is in a different repo?

@RossTate
Copy link
Contributor Author

Oops! That was supposed to be 114, and somehow I got 14 in my head. Thanks.

@rossberg
Copy link
Member

rossberg commented Aug 25, 2020

Once more, an interesting list of strong claims.

These two extensions make subtyping undecidable. See Benjamin Pierce's proof of undecidability of this system in Bounded Quantification is Undecidable from 1993.

The sketch intentionally left out details like the subtyping rules on quantified types, so I don't know how you arrive at that conclusion. It's well-known how to avoid the problem, see e.g. the same paper you cite.

These two extensions, combined with the two above, make type validation, i.e. whether a type itself is valid, undecidable.

Same here.

Note that this effectively uses reified generics. The subset of extensions listed above require reified generics, so even if WebAssembly does not always reify its type parameters, these extensions would require all polymorphic languages compiling to WebAssembly to use reified generics (if they want to take advantage of WebAssembly's polymorphism).

It's left to the producer to decide where they need reified type parameters. Not all polymorphic languages would need it, only if and where they need casts. The whole point of introducing polymorphism is to reduce the need for casts. For example, it should be possible to compile System-F-style polymorphism under this extension without a single cast, which is sufficient for a class of languages. Of course, languages that provide reified generics (or user casts) will have to compile them as such. But even they may be able to optimise them away in a relevant number of cases.

(Edit: That is not to say that we might not want additional features to support richer languages.)

@RossTate
Copy link
Contributor Author

It's well-known how to avoid the problem, see e.g. the same paper you cite.

All the restrictions/variants in the paper I cited were already explored in this context and were unable to express the invariants of major languages. (The cited paper was published before even the first TAL paper.)

The sketch intentionally left out details like the subtyping rules on quantified types

Finding decidable subtyping rules that are sufficiently expressive for the application at hand (i.e. representing major languages) has been one of the open problems in this space for over a decade.

For example, it should be possible to compile System-F-style polymorphism under this extension without a single cast

As an example of how challenging this problem is, this statement appears to be false, even putting decidability issues aside. As I mentioned in the original post, you cannot express type-erased first-class functions with only upper-bounded existential types.


It's one thing when a plan leaves out various details to be ironed out, but it's a completely different thing when ironing out those details involves solving multiple longstanding open research problems (only one of which is decidability). I presented these issues with your plan to you years ago, and yet they are still unresolved. And across this entire time, there has been an alternative design strategy that has been demonstrated to overcome the challenges the GC proposal is facing, and we could have been spending these last few years collaborating on improving upon that design strategy.

@rossberg
Copy link
Member

All the restrictions/variants in the paper I cited were already explored in this context and were unable to express the invariants of major languages.

First, that does not make your original claim any more correct.

Second, this is an inevitable problem in the general case. Expressing all "invariants" of all "major languages" (let alone others) would require a magical all-encompassing über type system. Clearly that is infeasible, since there is no tractable or decidable type system that could do that, as I'm sure you are aware.

Worse, almost all mainstream type systems are unsound or undecidable or both, so the idea that they could ever map down seamlessly to a sound decidable Wasm type system is broken to begin with.

Adding all sorts of ad-hoc typing mechanisms for specifically supporting and privileging a hand-selected set of (sub)languages is not a reasonable option either.

Thus, all we can do is (a) accept that there will always be gaps, and (b) make fallback to dynamic checks as flexible and efficient as possible.

Finding decidable subtyping rules that are sufficiently expressive for the application at hand (i.e. representing major languages) has been one of the open problems in this space for over a decade.

Exactly. Which is why it would be a bad idea to design the system around the idea that we could.

And again, nothing will be "sufficiently express" for almost any major language.

As an example of how challenging this problem is, this statement appears to be false, even putting decidability issues aside.

I'm not sure how that would be so, given that F-style polymorphism doesn't even need bounds. I suspect you are again conflating multiple issues, like closure conversion for a language with first-class bounded quantification?

I presented these issues with your plan to you years ago, and yet they are still unresolved.

The issue is that there is no satisfying solution to the problem, let alone a complete one. So the only way to deal with that is to accept these limitations, and avoid the temptation to make this a research problem or prematurely optimise the design for incomplete ad-hoc solutions.

@taralx
Copy link

taralx commented Aug 26, 2020

@rossberg:

The issue is that there is no satisfying solution to the problem, let alone a complete one. So the only way to deal with that is to accept these limitations, and avoid the temptation to make this a research problem or prematurely optimise the design for incomplete ad-hoc solutions.

Then why are we doing anything more than the simplest possible mapping to a GC? The current MVP is highly opinionated about the direction we should go with this.

@RossTate
Copy link
Contributor Author

avoid the temptation to make this a research problem

I, too, am confused by this comment. It is already known how to make the nominal approach I have been advocating for express multiple major languages. The open research problem is how to do the same with the structural approach the current MVP commits us to.

@fgmccabe
Copy link

If I understand the situation, the biggest issue with a nominal type system is how to deal with module boundaries. The implication I am hearing is that you need structural types at module boundaries.
IMO, I am not sure of the logic of that. Again, IMO, it is possible to expose nominal types at module boundaries and have type definitions checked at module boundaries.
But, if correct, then that becomes a serious impediment to an extensive nominal type system.

@RossTate
Copy link
Contributor Author

Are you referring to the discussion at the in-person meeting? The central claim of that presentation was never substantiated. I asked for the evidence of the claim offline and was never given any. I also asked @rossberg to file an issue on the topic so that we could all better understand it, and he never did. We should not let unexamined claims about alternatives deter us from even considering those alternatives.

@RossTate
Copy link
Contributor Author

Also, as a meta-note, presenting surprise technical arguments during CG meetings wastes the CG's time because it pulls everyone into an uninformed discussion, causes lots of confusion and misinformation because no one understands any of the details, and creates a hostile environment because you never know when you'll suddenly be put on the spot to improvise a response to a prepared one-sided presentation. We should discourage such surprise technical arguments in CG meetings.

@conrad-watt
Copy link
Contributor

conrad-watt commented Aug 26, 2020

Are you referring to the discussion at the in-person meeting? The central claim of that presentation was never substantiated. I asked for the evidence of the claim offline and was never given any. I also asked @rossberg to file an issue on the topic so that we could all better understand it, and he never did. We should not let unexamined claims about alternatives deter us from even considering those alternatives.

Starting from the slides for that presentation (link), can we hone in on parts that deserved more evidence/examination, as potential topic sentences for such an issue?

@RossTate
Copy link
Contributor Author

The arguments in the slides are mostly examples showing how complicated nominal types supposedly make imports. But notice that none of the examples remotely resemble the nominal types in the SOIL Initiative's proposal nor the conceptual import design we suggested we would use. Instead, each of these examples are artificially complicated constructions. For example, the last example claimed that a nominal system would need to import an entire class hierarchy (fields and v-tables and all) just to compile a surface-level class import, when in fact the SOIL Initiative's design would let you just import the scheme (without any attributes).

So the entire slide deck is centered around strawman arguments. Unfortunately, because I was completely surprised by these examples, by the time I understood what they were doing it was too late to correct the fallacy. Consequently, in my opinion the real result of the hour-and-a-half discussion was that the entire CG came away misinformed about the issue, rather than better informed.


But the issue here is about the current MVP. I would appreciate us staying on topic, and in particular the comments @taralx and I made above should be responded to.

@conrad-watt
Copy link
Contributor

conrad-watt commented Aug 27, 2020

#116 (comment)
I'm sympathetic to this stance, although it applies equally to every proposal that's been put forward aside from the 2-types proposal. I like the 2-types proposal and wouldn't mind seeing it implemented as a safety valve while the interminable nominal vs structural debate continues. I like the idea that language implementations could incrementally optimise parts of their state by swapping out particular uses of structref (e.g. vtable) with whatever gets spec'd next. One could even go further and make a 1-type proposal without tagref by defining a structref with disjoint anyref, funcref, structref, and scalar heap index spaces.

#116 (comment)
This comment is a general statement about structural vs nominal types, so bringing up the in-person structural vs nominal slide deck is not off-topic. I think you have legitimate points in response, and if you feel the committee has been misinformed, this should be dealt with. Consider that if the committee could reach a high-level consensus on structural vs nominal types as the right approach for the MVP, most of the current debate would be cut through (and this would not preclude both co-existing in more optimised post-MVP extensions). There needs to be some kind of dialectic here, not just a parade of structural and nominal proposals talking past each other.

@RossTate
Copy link
Contributor Author

Thanks for those thoughts.

I think you have legitimate points in response, and if you feel the committee has been misinformed, this should be dealt with.

Yeah, I'm not sure what the proper way to do this is. Previously I have been told that all issues should be about the proposal the issue is on. The issue with that talk is not about the current MVP but about misinformation about other alternatives, so I have been unsure how to proceed.

Consider that if the committee could reach a high-level consensus on structural vs nominal types as the right approach for the MVP, most of the current debate would be cut through

Yes, but for that discussion to be productive, we first need to discuss and understand the various dimensions in isolation (to at least some degree). That's why I've posted a number of issues each focusing on different dimensions, and why I'm trying to keep each of those issues focused. This issue is about extensibility with quantification. Based on the above discussion, I think a reasonable summary of this dimension is that there are known techniques for extending nominal type systems with quantification in a manner that can express multiple major languages, whereas that is an open research question for structural type systems and the plan for the current MVP relies on having an answer to that question but does not make progress towards finding that answer. I would like to get consensus on that summary for this dimension of the larger discussion.

@conrad-watt
Copy link
Contributor

conrad-watt commented Aug 27, 2020

Yeah, I'm not sure what the proper way to do this is. Previously I have been told that all issues should be about the proposal the issue is on. The issue with that talk is not about the current MVP but about misinformation about other alternatives, so I have been unsure how to proceed.

Given that we currently have at least two alternative proposals under discussion in the issues of this repo, that ship might have sailed already. I think @fgmccabe was insightful in pointing out that we are currently lacking a procedural way to bump competing proposals/ideas into each other and declare a provisional winner (at least for the purpose of focussing immediate MVP work).

I would be very interested in reading an explanation as to why the SOIL proposal (or maybe now the proposal of #119) avoids the issues laid out in the in-person slides - maybe as a separate issue (can reference #116 (comment) as needing a response). It might be less confrontational to conceptualise the response as "why nominal types can be better than you think", rather than as rebutting "misinformation".

I would like to get consensus on that summary for this dimension of the larger discussion.

I would summarise the discussion up to this point (bearing in mind the example from #120) as "there are scenarios where a post-MVP nominal system can have fewer dynamic casts than a structural one". If I'm understanding correctly, without bounded parametric polymorphism a nominal MVP (like #119) would require similar casts to the current structural MVP in the scenarios of #120?

With the various extensions laid out in #114, would the following (quote from #120) still be true?

every virtual/interface method in Java/C#/Kotlin/Scala will have to cast the this pointer,

^ ah, maybe not a productive question to ask right now without thinking about @rossberg's proposed subtyping rule

@rossberg
Copy link
Member

@taralx:

Then why are we doing anything more than the simplest possible mapping to a GC? The current MVP is highly opinionated about the direction we should go with this.

The MVP tries to adopt the simplest possible typed data model: the well-established "C" model of structs & arrays. As such, I'd argue that it is a fairly common model for describing low-level data layout that has been adopted in many other places, including JS's Typed Objects proposal. The MVP tries to stay away from getting inventive as much as possible, though unfortunately, it's not always possible.

@RossTate:

So the entire slide deck is centered around strawman arguments. Unfortunately, because I was completely surprised by these examples, by the time I understood what they were doing it was too late to correct the fallacy. Consequently, in my opinion the real result of the hour-and-a-half discussion was that the entire CG came away misinformed about the issue, rather than better informed.

I did not present any surprising new insight there. The scalability issue of sharing type identities by parameterisation (a.k.a. imports) has been identified since at least 1984 (MacQueen's seminal module paper) and is even explained in a standard textbook such as Advanced Topics in Types and Programming Languages (Section 8.8), albeit in a slightly different context. If you think that's "misinformation", or that your proposal magically solves this problem, then I'd like to learn how.

@fgmccabe:

If I understand the situation, the biggest issue with a nominal type system is how to deal with module boundaries. The implication I am hearing is that you need structural types at module boundaries.
IMO, I am not sure of the logic of that. Again, IMO, it is possible to expose nominal types at module boundaries and have type definitions checked at module boundaries.

There are multiple things to be aware of, but the one with module boundaries is that any transparent type import (i.e., one exposing some structure, so that the importing module can e.g. access fields), whether nominal or not, will still require a full structural link-time check of the (transitive closure of) the provided type against the expected type. That's exactly a structural equivalence/subtype check. There is no way around that if you want to avoid later runtime checks (like in the JVM). The coherence of type definitions cannot be vacuously guaranteed across module boundaries, whether nominal or not.

@RossTate:

avoid the temptation to make this a research problem

I, too, am confused by this comment. It is already known how to make the nominal approach I have been advocating for express multiple major languages. The open research problem is how to do the same with the structural approach the current MVP commits us to.

I've yet to see a scalable solution to the modularity problem of separately compiling and linking structural type constructs under a nominal typing discipline. Preferably one that doesn't just shove the problem onto external tools or magic linker hacks, or resorts to runtime checks for the most basic primitives.

(It's worth remembering that Wasm already consciously chose structural typing for its function types, as e.g. observed by call_indirect. There was a valuable discussion about that in the early days of Wasm (unfortunately not recorded), where it dawned on us that a nominal treatment would cause hairy problems for producers and for linking. That's why the group decided to go structural with them. It is no different for other forms of types, once we realise that there is more data to represent than just objects and classes, which already happen to be conveniently nominal in the source language.)

That being said, the above quote was rather directed at the idea of making it a prerequisite for the MVP that we already have an all-encompassing design that does everything perfectly. That is far, far away from being a solved problem -- occasional assertions to the contrary based on special-case solutions notwithstanding.

@RossTate
Copy link
Contributor Author

RossTate commented Aug 27, 2020

That being said, the above quote was rather directed at the idea of making it a prerequisite for the MVP that we already have an all-encompassing design that does everything perfectly.

I don't believe I ever suggested this. My suggestion was to consider an approach that is known to handle at least some languages well, rather than the current approach that has not been demonstrated to handle any language well.

then I'd like to learn how

Then ask this question before giving a presentation that has not been informed by the answer. Or at least warn me that we will be discussing the topic and give me the opportunity to illustrate how we address the problem.

I would be very interested in reading an explanation as to why the SOIL proposal (or maybe now the proposal of #119) avoids the issues laid out in the in-person slides - maybe as a separate issue (can reference #116 (comment) as needing a response). It might be less confrontational to conceptualise the response as "why nominal types can be better than you think", rather than as rebutting "misinformation".

Thanks for the advice, @conrad-watt! I'll take up your suggestion once I have the time, probably next week.

@taralx
Copy link

taralx commented Sep 3, 2020

Okay, so I finally got around to reading all of this and understanding it.

@rossberg:

The MVP tries to adopt the simplest possible typed data model: the well-established "C" model of structs & arrays.

The part of the MVP that is structs & arrays is uncontended. But it also establishes a particular kind of limited discriminated union as well, which remains the main point of contention.

The MVP tries to stay away from getting inventive as much as possible, though unfortunately, it's not always possible.

But it is more possible than the MVP proposes, unless you think that i31ref is not inventive.

If we're going for simplest possible, then how about we go with discriminated union? #109 has tagref, but in a more explicitly typed variant we'd need to actually spell out the acceptable alternatives. It removes the need for anyref and i31ref, because you can always declare an explicit union of the types you care about. Unique reference variants can easily map to no-ops, meaning that for many implementations these unions would be pay-as-you-go, with little to no overhead except in those cases where you need to coerce numeric types.

@tlively
Copy link
Member

tlively commented Jun 16, 2022

Given that we're at phase 2 with the current MVP and that we've been fairly aggressive about cutting out unnecessary complexity, I'm going to close this issue as not actionable right now.

@tlively tlively closed this as completed Jun 16, 2022
rossberg added a commit that referenced this issue Jul 3, 2024
Test `return_call*`s to callees that return more values than the caller does
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants