Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support child class fields that reference parent class param/type fields #8018

Closed
bradcray opened this issue Dec 13, 2017 · 46 comments
Closed

Comments

@bradcray
Copy link
Member

It seems that a hole in the current initializer design is caused when child class fields refer to type/param fields in a parent class. The catch-22 is that the parent class type/param fields are not set up until the super.init() call is made, yet the child class fields are intended to be initialized before making that call. For example:

class Parent {
  param rank: int;
  ...
}

class Child: Parent {
  var bounds: rank*int;

  proc init(param rank: int) {
    // compiler's attempt to initialize `bounds` will occur here, which requires knowing `rank`
    // yet rank is not known until the following call is made
    super.init(rank);
  }
}

This order of operations is a little artificial because in actuality, the compiler is going to resolve all the generic fields at compile-time before the program even starts running. This suggests to me one of two possible broad approaches (which, admittedly, have come up in discussions with users about initializers):

  1. Since the compiler has already computed the value of rank before any execution-time code runs, have it do the work necessary to permit references to the generic field in phase 1 of the child's initializer. I.e., don't change the language, make the implementation smarter.

  2. Introduce an explicit phase 0 to the existing phase 1 and phase 2 initialization phases of the compiler where phase 0 results in ensuring that all the generic fields are calculated. Again, this would happen in compile-time, but might be made manifest in the code explicitly by having separators between phases 0 and 1 as we do today between 1 and 2. Though I don't actually like the details of the following proposal, imagine something like:

class Child {
  proc init(param rank: int) {
    // phase 0 is here and would establish any generic `type`/`param` fields in `Child`
    super.typeInit(rank);  // here ends phase 0: set up the parent's generic fields
    // phase 1 is here and would establish any normal fields in `Child`
    super.init();  // here ends phase 1: set up the parent's normal fields
    // phase 2 is here: do whatever you want
  }
}

Or maybe in the above, the call to super.typeInit() should come before the Child class's local generic initialization (so that a child's generic fields could depend on a parent's values?)

@bradcray
Copy link
Member Author

I'd be curious to learn what Swift does here.

@lydia-duncan
Copy link
Member

We should specify that the problem is with field type declarations. Field declared initial values can be sidestepped with the appropriate value that would be used.

@lydia-duncan
Copy link
Member

I've also debated whether this implies that we should have call to Phase 1 of the parent type be separable from Phase 2 of the parent type, and from the division between the phases itself. When we discussed initializers, the alternate proposed syntax would permit this easily:

  proc init() {
    super.init();
    ... // Phase 1 code
  } finalize {
    ... // Phase 2 code
    super.init(); // or an alternate syntax for the Phase 2 call
  }

This way, either strategy (parent fields before child fields, parent fields after child fields) would be equally supported without making it dependent on compile time versus execution time fields, meaning that writing the program will be more comprehensible (though with the negative aspects mentioned in our discussion of the syntax choice).

@bradcray
Copy link
Member Author

I'd be curious to learn what Swift does here.

Oh, I think the answer to this question is "angle brackets." :'(

@bradcray
Copy link
Member Author

We should specify that the problem is with field type declarations. Field declared initial values can be sidestepped with the appropriate value that would be used.

I think it's field initializers as well, isn't it? I.e., replacing the bounds declaration above with:

  var bounds = rank*2;

yields a similar error (as it seems it should/would).

@lydia-duncan
Copy link
Member

You get the same error (as it should), but you can explicitly give the field a different value, e.g.

proc init(param rank) {
  bounds = rank*2;
  super.init(rank);
}

Which isn't possible for the declared type.

@lydia-duncan
Copy link
Member

But I have debated about making omitted initialization "do the right thing" as well, if that's what you're wondering

@bradcray
Copy link
Member Author

but you can explicitly give the field a different value

Well sure, but you can't not give it a value and have it work as it should. I.e., I don't think the problem is strictly about field type declarations.

@lydia-duncan
Copy link
Member

lydia-duncan commented Dec 13, 2017

Well sure, but you can't not give it a value and have it work as it should.

I guess I view the main problem as "this gives you an error message and you can't do anything to fix it in your initializer", which only applies when the problem is with the declared type. But it sounds like the main problem for you is "I can't use an inherited field in the declaration of a child field without getting an error message if I rely on its omitted initialization". Which is fair (and worth discussing), but was an intentional design decision rather than something we forgot about.

@noakesmichael
Copy link
Contributor

noakesmichael commented Dec 13, 2017 via email

@bradcray
Copy link
Member Author

but was an intentional design decision

I think it's a design decision that needs to be revisited then.

@noakesmichael
Copy link
Contributor

noakesmichael commented Dec 13, 2017 via email

@lydia-duncan
Copy link
Member

Alright, then the question we will discuss is "should an inherited field be accessible in Phase 1 of the child?"

I'd like to know what we think should be on the table for this re-evaluation.

  • Param and type fields only?
  • The type of any inherited field? (i.e. should var y: inheritedField.type; be a valid declaration?)
  • The value of var and const fields?

Personally, I think we should strive for a unified policy on all the cases above - I think it is worthwhile to preserve the unity of the previous approach, as it will simplify the mental model for the user. But that's just my opinion.

@bradcray
Copy link
Member Author

There are elements of this conversation that remind me of a
recent “error” that Nick filed against initializers (#7938).

I agree that there are some similarities, but I also think that different conclusions could be reached for them, potentially. One interpretation of @nspark's issue under the phase 0 notion proposed above might be "none of the class fields are known until phase 0 has run, so the reference to idxType in the formal argument list doesn't make sense because it's a use-before-def (assuming that "the end of phase 0" is "somewhere within the body of the initializers."). I think a challenge for a unified approach which tried to extend phase 0 to be complete prior to evaluating the formal arguments is that it seems you could write unstable programs in which the definition of phase 0's assignments depended on the arguments and their types and vice-versa.

the question we will discuss is "should an inherited field be accessible in Phase 1 of the child?"

The primary question I'm interested in seeing discussed by opening this issue is "Should an inherited field that has already been computed at compile-time (and is guaranteed to be by the language) be available in phase 1 of a child initializer at execution time? And if so, how does that impact the current initializer design?" (where I strongly believe that the answer to the first question needs to be "yes" — otherwise there's not much value in making it a compile-time field or inheriting from a generic class).

@noakesmichael
Copy link
Contributor

noakesmichael commented Dec 14, 2017 via email

@bradcray
Copy link
Member Author

I agree that there are differences and potentially more
challenges for Nick’s case. And I agree that one doesn’t
need to unify these two cases. However

  1. The cognitive slip seems similar to me

I agree-ish.

  1. One might argue that Nick’s initializer does not modify idxType
    within its body and “so” the compiler “knows” that this particular
    initializer is only applicable for those concrete instantiations
    where idxType is certain to be int.

This would remain true even if there are other initializers that
do assign idxType and hence generate additional concrete
types.

A case that made (makes) me think that extending phase 0's evaluation to include the formal argument list would present instability challenges is the following which seems halting-problem-esque:

class C {
  type idxType;
  var x: idxType;
  proc init(x: idxType) {
    if (x.type == real) then
      this.idxType = int;
      this.x = x: int;
    } else if (x.type == int) then
      this.idxType = real;
      this.x = x: real;
    }
  }
}

This doesn’t feel entirely different from trying to develop a
robust definition/implementation for phase0.

I agree. I was really just trying to say that I am skeptical that phase 0 could extend as far as the formal argument list due to cases like the above (but could easily be missing something).

I continue to think this challenge is at least partly driven by the
effort to use a single block of code to describe both the steps
to determine the concrete type at compile-time and to initialize
execution-time instances of a particular concrete type.

I think that may be true, but don't want our solution to be "angle brackets!"
I am comfortable with more of a phase 0 (compile-time initialization)
delineation within a block of code or within a special block of code.
For me, Lydia's example falls into that category as well.
I'd also be comfortable with an approach in which the compiler simply
treated compile-time code differently without requiring the user to
demarcate it. These two approaches are what I was trying to illustrate
at the outset (I'm open to others as well, but don't have any in hand
personally).

@noakesmichael
Copy link
Contributor

noakesmichael commented Dec 14, 2017 via email

@bradcray
Copy link
Member Author

the following which seems halting-problem-esque:
...

Waking up this morning, I'm wondering if that example would not have been legal due to its assignment of fields within conditionals? In which case, maybe I can't create an unstable example...

I also spent more time musing about Lydia's example and am trying to recall what the advantages were for putting super.init() at the transition point between phase 1 and phase 2 rather than at the head of the initializer? One answer would be "to distinguish between the phases", but it seems like there were others as well?

@lydia-duncan
Copy link
Member

I agree that any Phase 0 would have to be separate from the handling of the argument list.

@lydia-duncan
Copy link
Member

@bradcray - one of the arguments in support of the super.init()-as-transition-point strategy (as opposed to the two-code-blocks strategy) was that it allowed local variables to be shared across the phases. I think in practice we haven't felt the need to use that functionality as much, though, so maybe it isn't a good argument for sticking with the current strategy.

@lydia-duncan
Copy link
Member

Additionally, it is trivially easy to enforce that the same super.init() or this.init() call is made to impact both bodies, and we could surround the call with a compile-time conditional without having to duplicate it.

Also, the alternate syntax could allow the finalize block to be dropped but not necessarily the Phase 1 block (at least, it wouldn't be as visually easy), making it more supportive of Phase 1 as the default. I do have a long list of initializers that would probably benefit from Phase 1 as the default, though, so that also might not be as strong a reason to keep with the current strategy.

A lot of these decisions on initializers seem to be intertwined or linked.

@bradcray
Copy link
Member Author

bradcray commented Dec 15, 2017

Here's my current initializer counterproposal w.r.t. this challenge after stewing on it for a few days, and taking into account the experiences of the past year, but I'd be curious what flaws others see in it (besides the fact that it's different than what we have today. My hope is to come up with something that addresses current perceived flaws and which is, for the most part, something that we could develop a script to mechanically translate to from today's code):

In English:

  • the first line of a [class] initializer may be a super.init(...) call that invokes a parent class initializer; if it is not an implicit super.init() will be inserted by the compiler and the parent class must be callable with a 0-argument initializer (Object) will be. Records have no inheritance, so no need of this call (explicitly or implicitly)

  • for a class, after the above call, the class ID of the object is set to the parent class. (Thus in an object hierarchy Grandparent::Parent::Child, the object will start as an Object, become a Grandparent, a Parent, and a Child as the initializers progress. More on that below.

  • the subsequent lines correspond to phase 1 today, and must share similar rules to phase 1 except that they may refer to parent class fields since super.init() has already completed. We might also consider permitting class initializers to call methods and/or functions, keeping in mind that the type of this would be the parent type and would resolve/dispatch only on that basis.

  • the end of phase 1 is demarcated by ...something... but not using curly brackets. It could either be a built-in placeholder function call (e.g., finalize() or initDone() or whatever good name we can come up with) or a standalone keyword-based statement. The goal of avoiding curly brackets is the same as in today's world: to avoid introducing new lexical scopes (or the implication of them). Also to keep the body of the initializer monolithic.

  • this transition in phase also reflects when the class ID is changed from the parent class to the child class such that any subsequent actions would have full access to the fields. For records, it represents the point at which methods/functions can be called on the record at all (as today)

  • in the absence of this demarcation, I think I'd say that the body of the initializer is "phase 1 by default" based on experience living in a phase 2 by default world this year, and the fact that the superclasses would already be initialized at the outset. But I haven't thought about this point for very long yet.

  • phase 2 is much like today.

In code, and supporting the notion of transitioning the class type at the demarcation point:

class Parent {
  param rank: int;
  ...
}

class Child: Parent {
  var bounds: rank*int;

  proc init(param rank: int) {
    super.init(rank);
    this.foo();  // calls as though `this` was a Parent
    bounds = ...;
    bar(this);   // calls as though `this` was a Parent
    fieldsComplete();  // this is the transition call... the cue the compiler uses like it does super.init() today
    this.baz();  // calls as though 'this` was a Child
  }
}

@nspark
Copy link
Contributor

nspark commented Dec 15, 2017

I thought I'd chime in with a few thoughts/observations, some of which may wander out of the main thread of this discussion.

  1. If I understand @bradcray's (counter-)proposal and the current state of initializers correctly, the parent-child initilization order changes in Brad's new scheme. I think it goes:

    • current initializers: Child.init() ➡️ Child: Phase 1 ➡️ Parent: Phase 1 ➡️ Parent: Phase 2 ➡️ Child: Phase 2
    • Brad's proposal: Child.init() ➡️ Parent: Phase 1 ➡️ Parent: Phase 2 ➡️ Child: Phase 1 ➡️ Child: Phase 2
  2. I'd prefer the Phase 1 ➡️ Phase 2 transition to happen without a magic function name; e.g., fieldsComplete() in Brad's proposal. I think a method on this seems more appropriate; e.g., this.fini(). Such a method should be permissible to define and overload or even to call outside of the initializer. It just happens to serve an additional, special purpose within the initializer.

  3. I think Phase 1 by default makes sense. If they're not explicitly added, the compiler can add calls to super.init() and this.fini() at the beginning and end of the init() function body.

  4. One argument I'd make for angle brackets would be to make the scope of rank clear in the member variable declaration of Brad's example:

class Child<param rank: int>: Parent<rank> {
  var bounds: rank*int;
  ...
}
  1. Alternatively, accessing rank from Parent could be required to be explicit:
class Child: Parent {
  var bounds: super.rank*int;
  ...
}

All in all, I think Brad's proposal makes sense with my strong preference for including (2) from above.

@bradcray
Copy link
Member Author

Replying to some of Nick's points:

If I understand @bradcray's (counter-)proposal and the current state of initializers correctly

I think you got it.

I think a method on this seems more appropriate

I'm open to that as well. At present, I don't have any real opinion as to what the separator should be.

...If they're not explicitly added, the compiler can add calls ...

I should've called out that the separator could be added implicitly in the phase 1 by default world.

One argument I'd make for angle brackets

I'm still strongly opposed to angle brackets... :D

@mppf
Copy link
Member

mppf commented Dec 15, 2017

I'm coming a little bit late to this party, but one thing does occur to me.
If we want the program Brad described at the outset to work, 2 strategies in particular appeal to me:

  1. Initialize parent fields before child fields (so that 'rank' being param doesn't impact the fact that it's initialized in the parent and visible in the child initializer).
  2. Initialize compile-time known things in advance, in a separate trip up and down the inheritance hierarchy.

I think that Brad's proposal is some combination of these, but that other variants of it are possible. For example, if all that was really necessary was to get parent fields initializing before child ones, perhaps there is an even simpler proposal?

One interesting thing is that in D, the boundary between Phase 1 and Phase 2 happens once all the fields are initialized. So there might not even need to be an explicit boundary. We might want one anyway, of course.

I worry about trying to handle all the "compile-time" things at once, since we have runtime types for arrays. In particular, there needs to be a runtime path for data to flow as the runtime representation of the type (i.e. the array bounds).

class Parent {
  var A; // array type
}

class Child: Parent {
  var OtherArray: A.type;

  proc init(n: int) {
    // compiler attempts to initialize OtherArray here

    var parentArray: [1..n] int;
    super.init(parentArray); // "type" ie bounds of OtherArray established here
  }
}

I believe that we drew inspiration for initializing child fields before parent ones from Swift. I'm not sure that's the right choice for us, though, given the way types can have runtime components or (or the way our language allows it to look like types/params work at runtime, including with runtime variables of generic type).

@bradcray
Copy link
Member Author

I believe that we drew inspiration for initializing child fields before parent ones from Swift. I'm not sure that's the right choice for us, though

Swift also arguably has an initial run up the hierarchy by way of its angle brackets that we don't have in the current implementation, right?

One interesting thing is that in D, the boundary between Phase 1 and Phase 2 happens once all the fields are initialized.

I believe we've kicked around proposals in which the boundary was implicit, but decided to start with an explicit one first in order to simplify the compiler's job, to make the distinction highly visible to users, and because it seemed easier to relax the requirement over time than to add it in. I continue to find this "implicit transition" intriguing but continue to feel more secure keeping it explicit for now.

@mppf
Copy link
Member

mppf commented Dec 15, 2017

Initializing child fields first has the advantage that a parent initializer can always call a virtual method in its Phase 2. That's not true in Brad's proposed alternative, for example. (Or in C++, as far as I know).

I think any plan for changing the child-to-parent order should have a strategy for dealing with virtual method calls. I think Brad's proposal basically disallows these by adjusting the object type during initialization.

One "food for thought" idea: could we allow child fields to be initialized before super.init(), even if they didn't have to be? Maybe there is a way to develop this idea into something that allows developers to opt-in to Swift like functionality in this area.

@mppf
Copy link
Member

mppf commented Dec 15, 2017

Swift also arguably has an initial run up the hierarchy by way of its angle brackets that we don't have in the current implementation, right?

Yes, I agree about that.

edit: I think it'd be interesting to find a solution that had parent-before-child for type/param fields and establishing the type of fully generic fields in Phase 0, and then child-before-parent for other fields in Phase 1. I'm not sure yet if Brad's proposal does this or doesn't.

@bradcray
Copy link
Member Author

Initializing child fields first has the advantage that a parent initializer can always call a virtual method in its Phase 2. That's not true in Brad's proposed alternative, for example. (Or in C++, as far as I know).

This was intended to be supported by my proposal — this was the bit about setting the class ID to the respective class ID of each class as that stage completes its phase 1 initialization (so for awhile the class will be an Object, then a Grandparent, then a Parent before finally being a Child). The dynamic dispatches can be written and can occur, but will only resolve to ancestors in the tree that have completed phase 1, never to that of the final child class (at least, until its own phase 2 is reached). That seems natural, safe, and like it should "just work" to me, which is part of what I liked about the proposal. Do you think it misses something?

@mppf
Copy link
Member

mppf commented Dec 16, 2017

What if we went all the way up & down the inheritance hierarchy initializing types and then did Phase 1 and Phase 2 as we do now?

E.g.

class Parent {
  param rank: int;

  proc init(param rank: int) {
    this.typeinit(rank);
    // phase 1
    super.init();
    // phase 2
    writeln(getNameDuringInit());
  }

  proc getNameDuringInit() {
    return "Parent";
  }

}

class Child: Parent {
  var bounds: rank*int;

  proc getNameDuringInit() {
    return "Child";
  }

  proc init(param rank: int) {
    // super.typeinit must be 1st statement in initializer
    // super.typeinit establishes all type/params all the way up the heirarchy
    // parent types/params are established before child types/params
    super.typeinit(rank);
    // after super.typeinit we are in Phase 1 as today
    bounds = ...;
    super.init();
    // phase 2 statements
    this.baz();  // calls as though 'this` was a Child
  }
}

Main open question about this: is super.typeinit always required?

edit: this isn't really different from hat Brad proposed in the issue description...

@mppf
Copy link
Member

mppf commented Dec 16, 2017

Here is an example of a program that benefits from Swift-style Phase 2 that can call child methods (and it's arguably a variant of the above example, I just wanted to post it separately to be clear).

class Parent {
  var name:string;

  proc getNameDuringInit() {
    return "Parent";
  }

  proc init() {
    super.init();
    name = getNameDuringInit();
  }
}

class Child: Parent {
  proc getNameDuringInit() {
    return "Child";
  }

  proc init() {
    super.init();
  }
}

var p = new Parent();
writeln(p.name); // outputs "Parent"
var c = new Child();
writeln(c.name); // outputs "Child"

To do this in Parent-Child field initialization order requires code creating the objects to do something like this:

var c = new Child();
c.setup(); // finish setting up the c object
           // possibly calling virtually-dispatched methods in the process

@mppf
Copy link
Member

mppf commented Dec 16, 2017

I don't think the typeinit strategy I described solves the problem well enough in the presence of runtime types. The idea was to treat types differently, but a runtime type can depend on a regular variables.

For example:

class Parent {
  var n: int;
  var A:[1..n] int;
}
class Child : Parent {
}

Now n must be initialized before the (runtime) type of A is known. That implies that we need do one of the following things:

  1. Stop having runtime types.
  2. Treat runtime types differently from types for initialization purposes (i.e. something like typeinit could establish the compile-time portion of the types but not the run-time portions). This might include disallowing the above pattern.
  3. Change the field initialization order.

(1) It seems to me that runtime types are an appealing part of the language especially as a way to enable generic programming with arrays.

(2) Treating runtime types differently seems fraught to me. I think the language is generally designed in a manner that tries to avoid making runtime types require different code from compile-time-only types. Of course there are plenty of bugs in this area...

(3) I think changing the field initialization order might work. What would we need to change it to?

Supposing that each class type up the inheritance diagram has Phase 1 initializing its fields and Phase 2 that completes initialization with the ability to call methods on the full object. Then we could have the following order which would support even the above pattern.

Parent Phase 1
Child Phase 1
Parent Phase 2
Child Phase 2

I believe somebody proposed this when we were designing the current initializers. I don't think we had these examples in mind when we made the current choice.

What implications would this new order have for the syntax? I can think of three strategies:

Strategy One : weird control flow
class Parent {
  param rank: int;
  var x:int;
  proc init() {
    // we could insist on a super.init() call at this point

    // Phase 1 begins
    // method calls not available (or operate only with object type)
    x = 1;

    yield to subclass init; // Or something to mark the Phase 1 - Phase 2 transition point

    // Phase 2 begins
    // method calls now available on fully initialized object (with Child type)
  }
}

class Child: Parent {
  var bounds: rank*int;

  proc init(param rankArg: int) {
    super.init(rankArg);

    // Phase 1 begins
    // method calls not available (or operate only with Parent type)
    bounds = ...;

    yield to subclass init; // Or something to mark the Phase 1 - Phase 2 transition point

    // Phase 2 begins
    // method calls now available on fully initialized object (with Child type)
  }
}

The main drawback I see of Strategy One is that the control flow is pretty weird (and not so apparent from the code written in the initializer). What we get out of this weird control flow is that Phase 2 of the initializers can use temporaries from Phase 1 or arguments from the initializer.

What control flow am I talking about? This:
Parent Phase 1
Child Phase 1
Parent Phase 2
Child Phase 2

Strategy Two : Two-block

Lydia already pointed out that the two-block variant of initializer syntax might be better for this kind of thing:

I've also debated whether this implies that we should have call to Phase 1 of the parent type be separable from Phase 2 of the parent type, and from the division between the phases itself. When we discussed initializers, the alternate proposed syntax would permit this easily:

  proc init() {
    super.init();
    ... // Phase 1 code
  } finalize {
    ... // Phase 2 code
     // [Lydia had a super.init() here that I've removed]
  }

This way, either strategy (parent fields before child fields, parent fields after child fields) would be equally supported without making it dependent on compile time versus execution time fields, meaning that writing the program will be more comprehensible (though with the negative aspects mentioned in our discussion of the syntax choice).

For my running example, it would look like this:

class Parent {
  param rank: int;
  var x:int;
  proc init() {
    // we could insist on a super.init() call at this point

    // Phase 1 begins
    // method calls not available (or operate only with object type)
    x = 1;
  } finalize {
    // Phase 2 begins
    // method calls now available on fully initialized object (with Child type)
  }
}

class Child: Parent {
  var bounds: rank*int;

  proc init(param rankArg: int) {
    super.init(rankArg);

    // Phase 1 begins
    // method calls not available (or operate only with Parent type)
    bounds = ...;
  } finalize {
    // Phase 2 begins
    // method calls now available on fully initialized object (with Child type)
  }
}

The two-block syntax would remove the ability to have temporaries across Phase 1 and Phase 2, but it would enable arguments across Phase 1 and Phase 2.

This would implement the ordering I described above like this:
Parent Phase 1 aka init
Child Phase 1 aka init
Parent Phase 2 aka finalize
Child Phase 2 aka finalize

Strategy Three : separate finalize method

Actually, the compiler currently supports a separate zero-arguments proc initialize() as part of the old-style constructors. Note though that in the current compiler, if both proc initialize() and the constructor are provided, proc initialize() runs before the constructor.

Brad has occasionally argued that he's found this initialize method useful. One way we could (arguably) keep it is to decide that it is the way to implement Phase 2.

So, the idea is that proc init would only ever implement Phase 1 and that Phase 2 would be implemented in a proc finalize.

class Parent {
  param rank: int;
  var x:int;
  proc init() {
    // we could insist on a super.init() call at this point
    // this function is implementing Phase 1
    // method calls not available (or operate only with object type)
    x = 1;
    // Phase 2 is not available in any 'proc init'
  }

  proc finalize() {
    super.finalize(); // May want to insist this is present somewhere in finalize()
    // this function is implementing Phase 2
    // method calls now available on fully initialized object (with Child type)
  }
}

class Child: Parent {
  var bounds: rank*int;

  proc init(param rankArg: int) {
    super.init(rankArg);

    // Phase 1 begins
    // method calls not available (or operate only with Parent type)
    bounds = ...;
    // Phase 2 is not available in any 'proc init'
  }

 proc finalize() {
    // this function is implementing Phase 2
    // method calls now available on fully initialized object (with Child type)
    super.finalize(); // May want to insist this is present somewhere in finalize()
  }
}

This version has the advantage that Chapel programmers need less special knowledge about initializers. The main details to know are that the compiler adds calls to proc init and proc finalize to implement object construction. But the bodies of these functions themselves don't have any special control flow rules.

It has the disadvantage that any information that proc finalize needs from the proc init arguments has to be encoded into the class instance itself somehow - the arguments are no longer available. But - that's arguably also an advantage, in that it's obvious how to write code that is run no matter which proc init was called - you put that in proc finalize.

Interestingly, depending on where the super.finalize() call appears in Child.finalize, it can implement either of these orders:

Parent init (Phase 1)
Child init (Phase 1)
Parent finalize (Phase 2)
Child finalize (Phase 2)

Parent init (Phase 1)
Child init (Phase 1)
Child finalize (Phase 2)
Parent finalize (Phase 2)

(But of course we could decide to insist that super.finalize be always at the start or end of a proc finalize).

@bradcray
Copy link
Member Author

I haven't caught up with Michael's weekend musings other than to understand that by asking about virtual dispatch he was asking for a parent's initializer to be able to call into a child's method (whereas the dynamic dispatch I was talking about would only permit calls into ancestor methods within an initializer, or within one's own methods in phase 2, never a child's). I wanted to point out that having something like the current initialize() hook (or postInit() as I've been thinking of it in the post-constructor world) would provide that support while also permitting users to continue leveraging the default initializer as advocated for elsewhere.

@cassella
Copy link
Contributor

The first two points of @bradcray's counterproposal put me in mind of
C++. If you move the optional super.init() call earlier, before
the opening { of the initializer, even moreso. (I presume the
proposal would allow for a sibling this.init() call there
instead?) And the effective type of the object changing as it
progresses is also C++esque.

Before the { is also where they put field initialization,
particularly for const and ref fields. Though they don't have the
ability to use loops and local variables to compute those values.

C++'s approach to the object's type changing through its construction
is that in the body of a class C constructor, the object is a
C. It's on the writer of the constructor to call only methods
that can cope with the object in whatever partially-constructed state
it may be in.

Unrelated to anything thus far, I was wondering if there'd be any
mileage in giving explicit initialization its own syntax, e.g.
x := 27? Then in cases where initialization vs. assignment is
important, the programmer can specify them explicitly without having
to manage phase1 or phase2. And in cases where it's not important,
the programmer doesn't need to think about it.

(I'm imagining/hoping that the compiler would be free to substitute an
initialization if the first use of a field is written as an
assignment. Subject to consideration of side effects, etc. Then the
:= syntax might only be a way to assert that initialization would
be happening anyway.)

@lydia-duncan
Copy link
Member

lydia-duncan commented Dec 19, 2017

I suspect Brad's position on shifting to only allowing the initialization of parent fields first is more a simplification than a strong objection to allowing both strategies, but just in case I wanted to reiterate that I think allowing both orders and letting the user pick is the right call. I worry that switching from one strategy to the other will just lead to churn again later down the road, and see supporting both as a way to avoid that potential for churn (and as friendlier to the user).

I'm concerned that altering virtual/dynamic dispatch throughout the course of the initializer would be confusing for users, potentially dangerous, and difficult to implement correctly. I feel similarly worried about allowing method calls during Phase 1, though I recognize that we have some code that seems to desire it and that the virtual/dynamic dispatch proposal is an attempt to allow it. I'm not sure I have a good alternate proposal (and there is a part of me that wonders if the desire for this feature is the old constructors implementation hurting our forward progress rather than in keeping with our stated goal of following a more principled approach).

I am intrigued by Michael's Option 3 proposal, but would likely need to muse/discuss it more before feeling confident in choosing between it and Option 2. Michael asked a very good driving question in conversation today, which seemed important to capture: "Do we tend to want the same Phase 2 for all initializers on a type, or do we tend to want a different Phase 2 per initializer?"

I think I'm otherwise in agreement with what has been discussed so far.

@lydia-duncan
Copy link
Member

@cassella - apologies, but I would prefer to keep this thread on initializers and inheritance, rather than initializers in general. We did consider an alternate syntax for initialization in our original discussions, but chose to forgo it. If you feel strongly that this should be revisited now, would you mind opening another issue?

@cassella
Copy link
Contributor

Sorry about that. I don't feel strongly about it. I can delete my comments here if it would help keep this issue focused.

@bradcray
Copy link
Member Author

I now have caught up on this thread and was excited to see that Michael's response also referred to using an initialize() replacement to capture doing calls from parent object creation to child method calls. I also like that it obviates the need for a fieldsComplete() marker as in my most recent proposal (if I'm understanding it correctly and we're not losing anything. As far as I can tell, any code that I would've put after fieldsComplete() in that proposal could now come at the start of Michael's finalize() routine.

I'm not crazy about the name finalize() but I think I like the concept.

Responding to some of Lydia's comments:

I suspect Brad's position on shifting to only allowing the initialization of parent fields first is more a simplification than a strong objection to allowing both strategies, but just in case I wanted to reiterate that I think allowing both orders and letting the user pick is the right call.

I think calling it a simplification is correct, but I might object to supporting both strategies in the name of simplicity. I feel pretty confident that the module code I've converted (and am working on converting) does not need the Swift-style "init child fields first" approach and also find it counterintuitive (since I think of child classes as specializing parent classes, it seems only natural that they would establish their unique aspects second). So I think switching between "whose fields are initialized first?" might be overkill. That said, one way to get it might be to permit the super.init() call to appear anywhere within an init() routine in Michael's proposal (whereas mine required it to be the first line). Since, in Michael's proposal, init() no longer has a phase 1 and phase 2, this means it could be placed wherever without needing to separate the phases. That said, is the ability to initialize a child's fields before a parent's considered a strength of Swift's, or was it just a tactic used in order to get the phase 1 vs. 2 semantics and dynamic dispatch from parent initializers to children?

I worry that switching from one strategy to the other will just lead to churn again later down the road, and see supporting both as a way to avoid that potential for churn (and as friendlier to the user).

I don't for the reason I alluded to above: It seems hard for me to imagine a case in which an initializer author would need to require that a child's fields were initialized before a parent's.

I'm concerned that altering virtual/dynamic dispatch throughout the course of the initializer would be confusing for users, potentially dangerous, and difficult to implement correctly.

I definitely don't agree with the latter two. The implementation seems trivial (during the phase 1 to phase 2 transition for a class, set its CID for that initializer's class). I don't think it's dangerous (the object is a valid instance of that class at that point. I concede that it may be confusing, but frankly, don't think it would be all that confusing or surprising to someone who was already creating a class hierarchy with inherited initializers...

Michael asked a very good driving question in conversation today, which seemed important to capture: "Do we tend to want the same Phase 2 for all initializers on a type, or do we tend to want a different Phase 2 per initializer?"

That's an interesting question... I think I have been writing different phase 2 code for different initializers on a given type, but I think it's typically been due to restrictions as to what can be expressed in phase 1 (or maybe philosophical thoughts about what I think should be in phase 1?). If this turns out to be a problem (which seems... likely), I think my preference would be to go with my previous proposal plus a postInit / finalize concept like I alluded to last night / Michael did in his option 3 for the sake of getting child-class dispatch plus the ability to leverage the compiler-provided initializer.

I should also add that I'm curious whether anyone has a more compelling / realistically-oriented example of a parent class wanting to call a child method during initialization than the simple one above?

@mppf
Copy link
Member

mppf commented Dec 20, 2017

Re this question:

"Do we tend to want the same Phase 2 for all initializers on a type, or do we tend to want a different Phase 2 per initializer?"

I don't know the answer to the question, but we could work with either answer in Strategy Two or in an adjusted Strategy Three.

If the answer is "tend to want the same", Strategy Three (the separate finalize() method) naturally does that. Strategy Two can do it as well but you'd write a new method e.g. setup() and call it from each finalize block.

If the answer is "tend to want different", Strategy Two does it naturally with the finalize block per initializer, but Strategy Three can be adapted to handle it as well. If we wanted Strategy Three to support such an idea, we might make a rule about which finalize(...) method is called in the event there are several with different signatures - e.g. we try to resolve first one with the same arguments that init had, and if that didn't work, try the no-arguments version. (Or we could even consider always insisting that a finalize(...) be available with the same argument signature as the init).

In any event I don't think we need 3 phases.

@mppf
Copy link
Member

mppf commented Dec 21, 2017

Just a note, C# calls what we'd call a deinitializer a finalizer...

@bradcray
Copy link
Member Author

Earlier in this issue, Lydia asked whether a child initializer should be able to refer to parent const/var fields in phase 1 of its initializer and I essentially said "I don't care about that here/now." But having stewed on it a bit longer, I think I actually do. Here's a motivating example:

class C {
  var D = {1..n};
  var A: [D] real;
}

class C2 : C {
  var B: [D] string;
}

It feels very natural to me that a subclass should be able to declare arrays over a parent class's domain (or a domain in terms of the parent class's integer field or ...) and yet I believe that, at present, C2's phase 1 initializer wouldn't be able to establish fields like 'B' because C had not been initialized yet.

The proposal that Mike and I are working on ought to address this I believe. Just wanted to correct myself and note that it isn't simply the type/param/compile-time fields that seem likely to run afoul of our current "initialize child phase 1" first approach... other dependent types/values seem like they might as well.

@bradcray
Copy link
Member Author

Completely unrelated to my previous note: The Collections modules (DistributedBag, DistributedQueue) also run into the original challenge of having subclasses who want to refer to a parent class's type field in their field declarations. Their hierarchy is also much simpler than domain maps, and so would be a better case to study for any new proposal.

@mppf
Copy link
Member

mppf commented Jan 11, 2018

@bradcray - it's possible to construct an example like your latest C2/C that uses a runtime type instead, which makes it even clearer to me that we need a solution to the ordering issue that works for both type and value fields. Let me know if you'd me to create a full example along these lines.

@bradcray
Copy link
Member Author

bradcray commented Jan 18, 2018

Let me know if you'd me to create a full example along these lines.

I feel like we've got plenty of examples to motivate an "initialize parent first" approach, but if you want to supply an additional case, I'll throw it into the mix. I'm not guessing what you're alluding to (and would've thought that C2.B above did have a runtime type).

@mppf
Copy link
Member

mppf commented Jan 19, 2018

@bradcray - here's the example I'm thinking of:

class ComputationState {
   var n:int;
   var StateArray:[1..n] real;
}
class ExtendedComputationState : ComputationState {
  var ExtendedStateArray:StateArray.type;
}

(Of course we could construct examples where the runtime type is stored in a type field, too

class ComputationState {
   type StateArrayType;
   var StateArray:StateArrayType;
}
class ExtendedComputationState : ComputationState {
  var ExtendedStateArray:StateArrayType;
}

).

Here not only are there variables with runtime types, but the runtime type of the parent class field is used to initialize the child class field. I think this is strong evidence that we need initialize parent fields first (at least as long as such patterns with runtime types are possible).

@bradcray
Copy link
Member Author

Closing this issue, as I believe it has now been superseded by #8283.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants