Skip to content

Operator == on reified types #821

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
eernstg opened this issue Feb 6, 2020 · 12 comments
Closed

Operator == on reified types #821

eernstg opened this issue Feb 6, 2020 · 12 comments
Labels
nnbd NNBD related issues question Further information is requested

Comments

@eernstg
Copy link
Member

eernstg commented Feb 6, 2020

The nnbd spec has new rules for the result returned by operator == in the case where it compares two instances of type Type which were obtained as reified representations of Dart types.

In short, they consider the underlying types, normalize them, erase *, and then compare syntactically.

For example, Object* compares equal to Object, but not equal to Object?.

However, it is not unreasonable to claim that the properties of Object* resemble those of Object? more closely than those of Object. In other words, it seems to contradict reasonable expectations that operator == treats Object* in this way.

Of course, the dynamic type of an instance never has a * at the top level, so these conflicts arise mainly for type arguments, and for parameter/return types of function types.

Cf. also #818, where this kind of conflict comes up in the special case of (a) X* Function<X extends Object*>(X*) vs. (b1) X Function<X extends Object>(X) respectively (b2) X Function<X extends Object?>(X). Again, (a) compares equal to (b1) and unequal to (b2), but the behavior of (a) more closely resembles (b2) than (b1).

In general, these new rules will always consider legacy types to be equal to some nnbd types where ? does not occur, and unequal to the same types where ? does occur, even though the latter are, arguably, more reasonable candidates for being considered equal.

Can we do something about this? Isn't it both confusing and impractical that the legacy Object* type is equal to the nnbd Object type and unequal to Object?, but it's much more like the latter (and the same holds for any T*, T, and T?)?

The currently specified approach uses mutual subtyping, that is: If t1 and t2 are reifications of the types T1 and T2 then t1 == t2 evaluates to true iff T1 <: T2 and T2 <: T1. The language team agreed on this, but we have held back on getting it implemented because of worries about breakage.

We could do it now, as part of the switch to NNBD. One nice property of the mutual-subtyping approach is that it is consistently aligned with assignability (that is, if the run-time type of an object is equal to the type annotation of a variable then it is safe to assign that object to that variable).

@leafpetersen, @munificent, @lrhn, WDYT?

@eernstg eernstg added question Further information is requested nnbd NNBD related issues labels Feb 6, 2020
@sigmundch
Copy link
Member

My main worry of moving towards mutual subtyping for implementing equality, is that I'm not sure if we can implement it via canonicalization techniques anymore.

Up until recently dart2js used to override Type.== and we are very excited that we no longer do. Overriding == meant that we couldn't properly support using type literals in switch statements and I believe we had other issues too. For reference, here are a few issues highlighting this problem:
dart-lang/sdk#17207
dart-lang/sdk#21553
dart-lang/sdk#30257

My 2c: I'd like to make sure that we continue to have the freedom to use canonicalized runtime types to implement equality (that is, to allow us to implement a.runtimeType == b.runtimeType via identical(a.runtimeType, b.runtimeType))

@eernstg
Copy link
Member Author

eernstg commented Feb 7, 2020

@lrhn and I just discussed this with @johnniwinther, and he also mentioned that if we wish to move to mutual subtyping we're at a point in time right now where we should get it done. It is not obvious how difficult this will be, but at least it wasn't obvious to @johnniwinther that it is unrealistic. So maybe we can do it.

continue to have the freedom to use canonicalized runtime types

Ideally, we could get this kind of speed and maintain that operator == returns a result that makes sense.

Up until recently dart2js used to override Type.== and we are very
excited that we no longer do.

Right, if all run-time entities representing reified types are canonicalized then type equality can be implemented as a simple pointer comparison. But I'm slightly surprised that there is no need for exceptions (that is, I thought that only most run-time entities representing types would be canonicalized, not all of them).

Anyway, a pure mutual-subtype approach would actually allow us to use canonicalization, though based on a slightly more coarse-grained model of types: We take the Dart subtype graph and collapse all cycles (for instance, there is only one top type and it is not possible at run time to detect whether it was spelled dynamic or FutureOr<Object?> in the source code).

(Of course, this applies to instances of Type obtained as reified types, it would never fly to drop the distinction between the representation of List<int> and List<int?>` when used as types.)

We already have normalization which will collapse a lot of types that are mutual subtypes into a single representative, and NNBD_TOP_MERGE will collapse many others. We will need to double-check all the definitions involved, but the end result would be that the Dart subtype graph is transformed into a partial order (in particular: an acyclic graph), and mutual subtyping exists among two nodes T1 and T2 in this graph iff T1 and T2 is the same type. So this definitely allows for using canonicalization.

The trade-off is that we will then have a rather coarse-grained model of types in terms of reifications of type Type. But this equality relation is reflexive/transitive/anti-symmetric as it should be, and I believe it is a model which is comprehensible and consistent.

In the future, with pure nnbd, we can eliminate the * types. This allows us to make the run-time type reification more fine grained (such that List<int> and List<int?> will be distinct), if we wish to do so.

Note that reified types are often used for purposes that are somewhat similar to run-time reflection, e.g., to create a map where a type can be mapped into a function that somehow performs a task with respect to that type. I suspect that a clear and consistent model of types, albeit somewhat coarse-grained, will be at least as useful as a more fine-grained model where we claim that a list that will throw if we add null has the same type as List<int*>, but the one that won't throw has a different type. ;-)

@eernstg
Copy link
Member Author

eernstg commented Feb 7, 2020

Here's a bit more about the rationale. The approach that I'm now recommending (after a lot of discussion), and which is described here in more detail, could be roughly summarized as follows:

Reified types ignore nnbd.

This is because the mutual subtyping approach implies that we eliminate all cycles in the subtype graph in order to get a proper equivalence relation, and a relation which can be implemented using canonicalization. But this means that equality on reified types is a bit more coarse-grained than you would expect (and maybe more than you'd like ;-), in particular because it makes List<int> and List<int?> equal (when compared as reified types, and, of course, not for other purposes like dynamic type checks in assignments and such).

The main rationale for this is that (1) it is a mild transition from current code, so the semi-reflection-ish kind of code that exists and uses reified types (say, places where a Map<Type, Function> is used) would just continue to work, (2) a more fine-grained distinction, e.g., the nnbd spec rules, seem to have a complex and confusing interaction between legacy types and nnbd types, and (3) this kind of reflection-ish code may not benefit hugely from having a more fine-grained relation.

We can use this approach forever, if we wish to do that.

We can also choose to refine the equivalence when * types disappear (because elimination of subtype cycles at that point would simply maintain the distinctions like the one between List<int> and List<int?>), if we consider that to be worthwhile. That will be a breaking change, and we can decide on having it or not having it when we get there. I think this is safer and more practical.

@lrhn
Copy link
Member

lrhn commented Feb 20, 2020

There are definitely a number of problematic complex cases that we might not be able to solve satisfactorily in all cases.

Core Requirement

I think that there is a core functionality that we should ensure that we do support. It mostly boils down to:

new Foo().runtimeType == Foo

That operation must work (as long as nobody overrides runtimeType, so don't do that!)

The runtimeType of an object should not depend on where the object is created (we do not want to distinguish between null-safe and legacy objects, only types). It should be a non-nullable type (except for null.runtimeType which is tautologically nullable).

The test should work whether the code occurs in a null-safe library or a legacy library.
Since the type of new Foo().runtimeType is a non-nullable type, then the value of a type literal must be equal to that non-nullable type.

If the value of Foo in a legacy library is a type object for the Foo* type, then that type object must be equal to the type object for the non-nullable Foo type.

It should preferably work whether Foo is generic or not. Maybe it can't because that depends on type argument put into the object at creation time, which does depend on where the object is created. At least both types should be instantiated to bounds the same way.

That does not say whether the type object of Foo* should be equal to the type object of Foo?. They are mutual subtypes, so it's not unreasonable, but it will mean that equality is not transitive unless the type objects of Foo and Foo? are equal. In the long run, they probably shouldn't be, and that means that making them so in the short run will push tough a breaking change ahead of us. I'd prefer to avoid that. (Not being transitive is an option, just not a particularly nice option.)

So, summary: the type objects of Foo and Foo* should be equal. That can mean that they are the same object (there is no type object for Foo* at all), or that they are just equal (which still means same hash code too).

Composite types

That leaves us with the issue of composite types. Is a type objects of a List<int*> equal to those of a List<int> and/or a List<int?>?

First remember that any two type objects which are equal must also have the same hashCode. That means that any complex computation needed to figure out whether two type structures are equal must also come with a normalization that can be used to compute the hash code.
And even if we go for an equality that is not transitive, the hash code normalization needs to combine all objects that are transitively equal anyway.

That's one of the primary arguments that have been used against using mutual subtype as defining Type object equality so far. It would mean having to pick a canonical object for all equivalence classes in order to get a hash code for it, which is potentially even more complicated than just checking for mutual subtyping because it's not restricted to the two types being compared.

And with that restriction, I think we should not be trying to do composite type equality in any clever way, we should just use "the same type" as rule, with "the same type" being defined as having the same normalization in the NNBD normalization rules (NORM plus equating dynamic, void and Object? to avoid distinguishing between those - and they should also be equal).

That would allow a structural equality check traversing normalized type structures and allowing different top types to be equal.

(I'd still recommend removing the distinction between Object?, dynamic and void entirely at run-time, mapping all of them to the run-time type Object?, so we don't even have to worry about that normalization).

Summary

In summary I'd suggest:

  • The type objects of a type T* and T (no * or ?) are the same, that of T. Creating the type object removes an outer *, and only an outer *.
  • The type objects of a type T? is a different object.
  • Type objects are equal if and only if the types normalize to the same type using NORM.

That allows us to use the existing type normalization algorithm (which we'd do anyway), which allows a consistent hashCode, and it satisfies the core usage above for types that are not generic.

It does mean that for Type typeOf<T>() => T, the typeOf<List<int>>() gives non-equal type objects when used in a null safe library and in a legacy library.

The only other alternative is to what Erik suggested above, to remove all nullability information (and then normalize), but I don't think the later breaking change to reintroduce the difference is worth it, and I don't think not introducing the difference is going to be viable in the long run either.

A non-transitive equality where the type objects of int and int* are equal, those of int* and int? are equal, but those of int and int? are not, would work recursively for composite types too - a List<int> would be equal to a List<int*> which would be equal to a List<int?>, but List<int> is not equal to List<int?>. It's consistent, but expensive because it will require a us to remove nullability information and normalize in order to compute the hash code, but still keep the un-normalized type around for equality.
And it's still not transitive, which has its own issues.

@Hixie
Copy link

Hixie commented Feb 20, 2020

My impression is that regardless of what we do, this will be confusing, and thus the best thing we can do is be as simple as possible, meaning, have as little magic as possible. I would also posit the following principles:

  • In practice, this won't matter unless the affected code is already pretty crazy, so trying to make things "intuitive" is unlikely to actually make code better.
  • In practice, types like Foo* only matter during transition, and so if they become a problem you just migrate the code further along until they go away.
  • In practice, performance is going to matter more than semantics here.

Given that, I would say that there should be no magic runtimeTypes, and so int, int*, and int?, and correspondingly Foo<int>, Foo<int*>, and Foo<int?>, should all be distinct types that do not compare equal and have different hashcodes.

If magic were to be introduced, I think I would limit it to the top-level *-stripping behaviour @lrhn describes above, such that the types would actually be int, int?, Foo<int>, Foo<int*>, and Foo<int?> and there is never such a thing as a lone int*.

@leafpetersen
Copy link
Member

This is a very long discussion about something that I thought was pretty well settled. I haven't read through all of the discussion in detail yet, since a number of the initial premises of the discussion seem wrong to me. Let me summarize my understanding of the situation, and then perhaps someone can (briefly!) summarize what, if any the remaining issues are?

So I see two issues raised. The first is mutual subtyping, and the second is the treatment of * types.

Mutual subtyping

First, mutual subtyping. I'm not sure that I understand the discussion around whether to use mutual subtyping as the criteria for equality, since I am proposing to use something (hopefully) equivalent to mutual subtyping. That is, the specification defines runtime type equality to be performed by comparing the normal forms of the types. In the final type system with no * types, I hypothesize that syntactic equality on normal forms is equivalent to mutual subtyping. @stefantsov has done some work on the side towards a formal proof of this.

In the presence of * types, I'm not sure if normalization is equivalent to mutual subtyping. Is that the source of concern above? I'm not worried about this, should I be?

Legacy type equality

As specified, I have chosen to make runtime type equality equate T and T*. I believe this is almost entirely a forced choice. We have customers that use type objects as keys in DI maps, and I believe that it will be completely unworkable if inserting something in a map with int as a key in an opted in library and then looking it up again using int as a key in an opted out library fails, as it would if we did not equate int and int*. I believe that @crelier ran into this concretely when he first implemented this. As I have specified it, two type objects that are textually equal in the source code will compare as equal at runtime regardless of whether they are in an opted in or opted out library. If we make the change proposed here, this is no longer true.

So that's the summary of the existing specification and why the choices in it were made. Are there problems with the above that I'm missing?

@lrhn
Copy link
Member

lrhn commented Feb 21, 2020

@leafpetersen I do think that the proposed equality satisfies my "core requirement", and the only difference is that it ignores * in composite components as well. Which means that they agree on entirely null-safe programs.
A compatible hash code should be easily computable by simply letting T* have the same hash code as T.

My worry about performance would be that you need to traverse the type structure on every check for equality - even if you canonicalize all type objects (at least until the program is entirely null safe). That's where not ignoring * in component type, but only at top-level, could be more efficient, but it can also cause more inequalities.

That is assuming an implementation strategy (embedding the internal type representation directly in the Type object). Another alternative is to traverse the type and remove the *s when creating the Type object, which would also allow you to canonicalize at this point and compute the hash code up front if necessary.

I'm fine with either approach if the people implementing it are.

@leafpetersen
Copy link
Member

Another alternative is to traverse the type and remove the *s when creating the Type object, which would also allow you to canonicalize at this point and compute the hash code up front if necessary.

I believe that the implementation strategy that we sketched on the board here for the web compilers was to cache two normalized forms: the NORM, and the * erased NORM. Am I remembering that correctly @fishythefish @sigmundch @rakudrama @nshahan ?

@crelier
Copy link

crelier commented Feb 21, 2020

I agree with @leafpetersen that this issue should have been settled by now. We raised concerns about type equality in November and type equality was redefined then. In the VM, we have implemented the nnbd specified type equality (and optimized it with intrinsics on all architectures). There is no noticeable slow down, although we cannot rely on hashes, since different hashes do not imply unequal types. Hashes reflect canonical equality.

@sigmundch
Copy link
Member

I believe that the implementation strategy that we sketched on the board here for the web compilers was to cache two normalized forms: the NORM, and the * erased NORM. Am I remembering that correctly @fishythefish @sigmundch @rakudrama @nshahan

correct. Whenever type objects are returned, you get the normalized type objects. So == continues to be implemented as identical (that's basically why I raised my concerns earlier in this bug that we should not change equality into mutual subtyping)

@lrhn
Copy link
Member

lrhn commented Feb 21, 2020

Excellent. And if we don't print the *s on legacy types, it also means that types which print the same are also equal. That's probably worth something.

@eernstg
Copy link
Member Author

eernstg commented Feb 24, 2020

I was worried about reified type equality in a transitional world (that is, where some * types occur in the current program). I hoped that we could make the rules less confusing. I proposed that we could normalize reified types by legacy-erasing them (which would ensure that identical and hashCode just works, so good performance would be ensured, and it would ensure easy migration to nnbd).

However, as I also mentioned, the legacy-erasure approach would introduce the need to have a breaking change later on, in order to get to the model in the pure nnbd setting.

So let's keep the rules as they are.

My worries about migration haven't disappeared, of course, and I've written an example at the end of this comment to illustrate why migration will have some inconvenient elements with the current rules.


A couple of remarks on the discussion:

@Hixie was also worried about the transitional semantics:

regardless of what we do, this will be confusing

As @leafpetersen noted, we will have approximately the mutual subtyping semantics (except that we still distinguish dynamic, void, and Object?) in the pure nnbd setting:

In the final type system with no * types, I hypothesize that syntactic
equality on normal forms is equivalent to mutual subtyping.

That's good, and I had no doubts about that. @leafpetersen continues:

In the presence of * types, I'm not sure if normalization is equivalent
to mutual subtyping. Is that the source of concern above? I'm not worried
about this, should I be?

The transitional subtyping relation isn't a partial pre-order (that is, it isn't transitive), so I agree that it's unclear what it means to say that it corresponds to normalization. That's the reason why I proposed using legacy-erased types as the basis for reified type equality: This is basically transitional subtyping made transitive, and equating top types.

This would make equality on reified types more coarse-grained, but it would give us some nice properties: The reified types would reflect a proper partial order (transitive, reflexive, anti-symmetric), and the story about which reified types are equal (across a program with some nullsafe and some legacy libraries) would be very simple: Legacy-erase, considering top types as equal, then check. Of course, if reified types are always normalized in this sense then identical and hashCode can be used to test for equality, and there is no performance issue.

About the question:

I'm not worried about this, should I be?

With the specified rules, it will be necessary in nullsafe code to compensate for the kind of type inequality that I was worried about: Null-safe code will need to manually erase nnbd types when they will be reified and tested for equality.

Here is an example. Assume that we have a piece of legacy code where a function f actually uses null (so when migrating it we will need to use ? on some types, which is the situation where the discrepancy arises). Assume that we are using a map to enable lookups for this function based on a reified type:

// Library L before migration.
List<int> f(int i) => [i, null];
typedef List<int> fType(int i);
Map<Type, Function> map = { fType: f };

void main() {
  fType g = f;
  print("Call via lookup: ${map[fType](42)}. Directly: ${g(42)}");
}

When we migrate this library, we need to introduce a distinction between the regular type (here: fTypeMigrated, the type of f), and the type which is used for equality tests (here: fTypeForTypeEquality):

// Library L after migration.
List<int?> f(int? i) => [i, null];

// Actual type of `f`.
typedef List<int?> fTypeMigrated(int? i);

 // "Additional type of `f`", needed for equality tests.
typedef List<int> fTypeForTypeEquality(int i);

Map<Type, Function> map = {
  // Use the additional type when reified and tested for equality.
  fTypeForTypeEquality: f,
};

void main() {
  // Use the actual type of `f` when it is used as a type.
  fTypeMigrated g = f;
  // Use the additional type for lookups (again: reified and tested for equality).
  print("Call via lookup: ${map[fTypeForTypeEquality](42)}. "
      "Directly: ${g(42)}");
}

In the migrated code, it is a compile-time error to change the type of g to fTypeForTypeEquality. So we can't just migrate by leaving the declaration and all usages of fType unchanged. If fType is used as a type, we must use the type alias declared as fTypeMigrated (adding ?s). Of course, that type could still have the name fType in the migrated code, but I used the new name fTypeMigrated here in order to make it easy to distinguish the pre-migration and the post-migration entities.

But we also need fTypeForTypeEquality: Legacy code cannot denote any type which is equal to fTypeMigrated, but it can denote a type which is equal to fTypeForTypeEquality. So if it should be possible for legacy code to continue to perform lookups in map, using a type which is denotable by legacy code, then we must store f with key fTypeForTypeEquality, not fTypeMigrated.

So my response to the question about whether we should be worried would be as follows:

I expect migration of code using reified types (for instance: using maps from Type to entities related to the given type, like map above) to be forced into making a new distinction: There is (1) the actual type, and there is (2) the type-as-needed-for-reified-equality.

Each location in client code where said type is mentioned would need to use one or the other (only one of them will work). The choice depends on whether it will be used as a type, or it will be reified and used for equality tests. This may or may not be worrying.

In any case, I wasn't aware that implementation efforts occurred already in November 2019. The conclusion today must be that we leave the rules unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
nnbd NNBD related issues question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants