Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How should non-nullability handle fields that are initialized after object creation? #146

Closed
Tracked by #110
leafpetersen opened this issue Dec 18, 2018 · 58 comments
Assignees
Labels
nnbd NNBD related issues

Comments

@leafpetersen
Copy link
Member

A pattern that comes up with some frequency is to have fields in an object which is not initialized in the constructor (or at least not in the initializer list), but which should never be observed to be null once some further initialization is done. How should Dart with non-nullable types handle this? Some options that have been used in other languages:

  • Kotlin style lateinit: mark a field as "will be initialized before use". The programmer writes the code as if it is non-nullable, and the compiler checks uses that it can't prove are safe.
  • Swift style implicitly unwrapped types. Swift allows you to mark a type as nullable, but allowed to be accessed as non-nullable.
  • Pre-post conditions on methods (allow methods to specify in some way that they initialize certain fields, and other methods to specify that they require certain fields to be initialized).

cc @lrhn @eernstg @munificent

@srawlins
Copy link
Member

I think there is a request to support a "late initialize" for finals as well, including final locals, though I can't find it. e.g.

void f(bool x) {
  final int i;
  if (x) {
    i = 7;
  } else {
    i = 42;
  }
  // ...
}

This could be a very similar request; could maybe be solved with one solution.

Broadly, declaring locals with type foo; of course cannot be elevated to be non-nullable without really rearranging the code. If a solution for fields will be inapplicable to locals, it would be great to have a similar thread about locals.

@leafpetersen
Copy link
Member Author

I think there is a request to support a "late initialize" for finals as well, including final locals, though I can't find it. e.g.

For locals, I think that we will likely have some form of definite assignment analysis to allow you to write code like your example. It's an interesting question as to whether this should be allowed for final variables as well: it seems to me that it would be nice, but on the other hand single assignment definite analysis is a bit trickier.

@kasperpeulen
Copy link

kasperpeulen commented Dec 18, 2018

Kotlin style lateinit: mark a field as "will be initialized before use". The programmer writes the code as if it is non-nullable, and the compiler checks uses that it can't prove are safe.

Kotlin also allows to initialize final non-nullable fields in the constructor, even without marking them as lateinit. Dart doesn't support this either, which always gives me a good bit of frustration as this often means you need to write a factory constructor if you want final fields. Kotlin supports this, and it works the same for final locals as in @srawlins example.

fun main(args: Array<String>) {
    print(MyClass(true).i);
    f(false);
}

class MyClass(x: Boolean) {
    val i: Int;
    init {
      if (x) {
       	i = 7;
      } else {
      	i = 42;
      }
    }
}

fun f(x: Boolean) {
    val i: Int;
    if (x) {
        i = 7;
    } else {
    	i = 42;
    }
    println(i);
}

I think supporting this style is more elegant then lateinit, and should handle most cases.
Dart now implicitly assigns null in both cases.
I think it would be great if Dart stops implicitly assigning null to variables.
In most cases this is not what you want.

Note that the value of i is not given the value undefined as in javascript. It gives an error if you try to read the variable:

image

@munificent
Copy link
Member

munificent commented Dec 18, 2018

I think we can split this into a few problems:

Analyzable

These are storage locations (variables, fields, top-level variables) that aren't initialized at their declaration but where we can reasonably statically determine that it is assigned before use.

For local variables, definite assignment analysis works for most common cases:

void f(bool x) {
  var i;
  if (x) {
    i = 7;
  } else {
    i = 42;
  }
  print(i);
}

Whether you can make a local like this as final or not is orthogonal, I think.

For instance fields, Dart does have a solution, factory constructors and constructor initialization lists. @kasperpeulen, I think you're right to point out that it's not a very usable solution, but it's there. Given that, I think we should move discussion of improving this to a separate issue, or just not worry about it for now.

The good thing about analyzable locations is that we can make them soundly non-nullable without needing the performance cost of any runtime checking to validate that it isn't null.

The bad part is that we can't reasonably do this for all storage locations. The analysis just isn't feasible, or would require adding too much complexity to the language (i.e. some kind of complex typestate mechanism that tracks which fields for a given instance have been known to be initialized) to be worth it.

A variable in this category:

  • Has a non-nullable getter.
  • Has a non-nullable setter.
  • Never needs to promote. It's always non-nullable.
  • Needs no runtime checking ever.

"Latched non-null"

These are storage locations where we can't statically prove that one will be initialized to a non-null value before its read. But we can statically prove that it will only ever be assigned a non-null value.

This means that if you ever read it and get a non-null result, you know it will be non-null forever after. That implies that we could safely promote it:

class Box {
  // Not real syntax.
  @latched int value;
}

main() {
  var box = Box();
  print(box.value); // "null".
  box.value = null; // Static error. Setter is non-nullable.
  box.value = 3; // OK.

  if (box.value != null) {
    box.value.isEven; // OK. Can promote.
  }
}

We don't even need to do additional runtime checking inside the promoted body. For this to work, we do need more control over overriding to ensure that there couldn't be some malicious subclass of Box that overrides the value getter to return null after the setter has been called.

So this basically means a "latched variable":

  • Has a nullable getter.
  • Has a non-nullable setter.
  • Can soundly promote after checking.
  • Needs to store a bit indicating whether or not it is null so that you can do null checks. Does not need runtime checking after promotion.

"Non-null by policy"

Then I think we may also run into cases where a variable may get assigned null, but where the code has some surrounding policy that ensures its never used when it is null and where the overhead of manually null-asserting on every use would be frustrating.

These are addressed by Swift's "Implicitly Unwrapped Optionals". For example:

class Border {
  int _thickness;
  Color _color;

  Border(this._thickness, [this._color]) {
    if (_thickness > 0 && _color == null) {
      throw ArgumentError("Must have color if border has non-zero thickness.");
    }
  }

  void draw() {
    if (_thickness > 0) {
      paintBox(_thickness, _color.red, _color.green, _color.blue);
    }
  }
}

There's a policy here, and the class enforces it, but the type system can't see it.
We could not have language support to make cases like this easier. I think that's probably the best plan to start with.

But, if stuff like this comes up often, we could add another kind of implicitly checked variable that:

  • Has a nullable getter.
  • Has a nullable setter.
  • Cannot soundly promote after checking.
  • Needs to be checked at runtime before using or assigning to something non-nullable.

It's basically syntax sugar to not require ! every time you use it, but it has all of the unsafety and performance cost of a nullable type.

@leafpetersen
Copy link
Member Author

Kotlin also allows to initialize final non-nullable fields in the constructor, even without marking them as lateinit.

Note that Kotlin constructor initialization is also not null-sound though - you can see and call methods on non-nullable variables in their uninitialized state.

@kasperpeulen
Copy link

kasperpeulen commented Dec 20, 2018

@munificent

So this basically means a "latched variable":

  • Has a nullable getter.

I don't understand why you would want this. Kotlin gives a runtime error, if you read or use it before it is initilized. UninitializedPropertyAccessException That seems better than to allow it to have a nullable getter.

image

I think your model is that it assings null first, and then this type is later promoted. But I have the feeling that is not what happens. In Kotlin, you don't implicitly assign null if you assign nothing, even not if the type is nullable:

image

I think a better anology is that Kotlin implicitly assings something like an JS undefined value to the variable. And when you try to read undefined, it will be a compile time error and with lateinit it will raise an UninitializedPropertyAccessException.

@leafpetersen

Note that Kotlin constructor initialization is also not null-sound though - you can see and call methods on non-nullable variables in their uninitialized state.

That is not true. It is a compile time error if you see and call methods on non-nullable variables in their uninitialized state. With lateinit it is a runtime exception.

@leafpetersen
Copy link
Member Author

leafpetersen commented Dec 20, 2018

@kasperpeulen

That is not true. It is a compile time error if you see and call methods on non-nullable variables in their uninitialized state. With lateinit it is a runtime exception.

This seems to disagree with you, and the following Kotlin program compiles, runs and prints out the value of an uninitialized non-nullable field in the Kotlin playground.

fun main(args: Array<String>) {
    print("${TestClass().v}\n")
}

class AA() {
    override fun toString() : String { return "AA" }
}
class TestClass() {
    val v: AA;
    init {
      setup();
      v = AA();
    }
    fun setup() {
        print("$v + \n");
    }
}

@kasperpeulen
Copy link

kasperpeulen commented Dec 20, 2018

@leafpetersen
Ah, that’s right. So I guess if the compiler fails to recognize that it is not initialized, then it does compile and it indeed is null.

I think I was mistaken in my analogy as you have to normally explicitly set it to null. A better anology would then be Dart’s ‘void’, as the compiler tries to make sure you don’t use a void variable but at runtime it has no special value, if you “escape it”.

However, I would rather have a runtime exception like with lateinit, if the compiler fails to recognize.

@srawlins
Copy link
Member

srawlins commented Jan 2, 2019

LMK if locals should be a separate discussion, but I had a thought about how this will affect test code:

void main() {
  Completer<bool> c1;
  List<int> l1;
  // ...
  setUp(() {
    c1 = ...;
    l1 = [1, 2, 3];
  });
  // tests use and modify c1, l1, whose values and state are always set/reset between tests.
  test('list remove removes', () {
    l1.remove(1);
    expect(l1, not(contains(1));
  });
}

(E.g. this typical test in angular)

This is a super common pattern in all testing, and it would be great if users didn't have to make them all nullable (both in the short term for the migration pain, and in the long term, for the benefits of NNBD.

But this code is not analyzably non-nullable (from Bob's analyzable above); it could be encoded as latched non-nullable.

Non-null by policy, @munificent, would just mean that these are declared as Completer<bool>? and List<int>?, right? In order to support the latched concept, we'd need something like latched Completer<bool>? If that's the case, then there will either be a huge migration to question marks or latched annotations.

@leafpetersen
Copy link
Member Author

This is a super common pattern in all testing, and it would be great if users didn't have to make them all nullable (both in the short term for the migration pain, and in the long term, for the benefits of NNBD.

Well, the problem is that without some syntax, I don't know how to tell the difference between that code (for which you would want runtime checking) and other cases (for which you want the compiler to tell you that you've forgotten to assign a local). Unless you're suggesting that all locals be latched by default? This seems a bit loose, but maybe.

I guess another option would be to say that

  • it's a warning if definite assignment fails
  • a local that isn't definitely assigned becomes latched by default

Then you could suppress definite assignment warnings for test files.

@srawlins
Copy link
Member

srawlins commented Jan 2, 2019

Unless you're suggesting that all locals be latched by default? This seems a bit loose, but maybe.

No, I'm not suggesting that, but it sure sounds like I am 🙁. I think I just painted myself into a corner and realized there will probably be a migration for tests with uninitialized variables declared in main.

If we say that an NN local declared without an assigned value is legal (because it is latched), that might be confusing... sort of changes the whole idea of NN. And I imagine back-ends want to weigh in on the whole latching concept anyhow. I'm not casting a vote in any direction here, just noting that this will probably require that every test file go through a migration for uninitialized locals.

@munificent
Copy link
Member

I don't understand why you would want this. Kotlin gives a runtime error, if you read or use it before it is initilized. UninitializedPropertyAccessException That seems better than to allow it to have a nullable getter.

There are two different use cases:

  1. A field that may not be initialized until later. Users may access the field before it's initialized and see that it's null. In fact, seeing that it's null is how they tell if it's initialized. But, it can only be assigned a non-null value. Once it has, you know for certain that it will never be null again, which makes it safe to type promote on it.

  2. A field that may not be initialized until later. It's a programmatic bug to try to access it before it has been initialized. You don't even want users to be able to write code that determines at runtime whether or not it has been. It should just throw an exception.

The "latched non-null" I describe is for the former, but I think you have in mind the latter. For the latter, something more like the "non-null by policy" is a better fit.

@lrhn
Copy link
Member

lrhn commented Jan 7, 2019

The descriptions here sound very much like Dart's lazily initialized static fields. If you read it before you write it, it executes some code to get the initial value.

We could define lazily initialized instance fields, or even local variables, in almost the same way: If you read before writing, some code is run. That code can either initialize the variable lazily or (if the initializer expression is omitted) just throw an UninitializedError. If you write first, it's fine and no initialization will happen. That allows you late-init and lazy-init in the same language feature, with late-init just being a throwing lazy-init initializer.

That approach would be using an existing language feature in a more general way, rather than inventing a new one (which is great, because we don't want too many similar features). The only issue is that static fields are lazy by default, but instance fields or local variables are not, so we'd need more syntax to opt-in to it. Some options:

  • Make lazy a built-in identifier and write lazy int x;. Or lateinit or some other word.
  • Make non-nullable variables with no initializer late-init. That leaves no way to get lazy initialization, though.
  • Invent a new assignment operator: int? x =#= initializer and use that for for lazy initialization (but it's annoying that it's different from static fields).
  • Prefix the variable somehow: int *x = ... or int *x;. Works both with and without initializer, but not intuitive (Not for any character in !@#%^&*\ - some even look like they mean something else).

We actually do need to modify what we do to static fields too, because if the initialization fails (throws, perhaps due to cyclic references), then it currently stores null in the field. We should probably not do that any more, since we can't always store null. I'd say that failure during initialization should keep the field uninitialized, so the next access also fails.

@munificent
Copy link
Member

We could define lazily initialized instance fields, or even local variables, in almost the same way: If you read before writing, some code is run.

Very interesting! I assume for instance fields, you would then be able to access this in the initializer? That would make a common use case much more pleasant. Today, you have to mark that field non-final and initialize it in the body. This would let you give it a normal initializer expression.

That code can either initialize the variable lazily or (if the initializer expression is omitted) just throw an UninitializedError. If you write first, it's fine and no initialization will happen. That allows you late-init and lazy-init in the same language feature, with late-init just being a throwing lazy-init initializer.

I like it. This should be able to be combined with final so that you have lazy final fields that rely on their initializer but take advantage of the initializer being run after this is available.

I think a modifier keyword is the right approach for this, and lazy sounds like a good one. Swift and Scala use the same modifier to select similar behavior.

@leafpetersen
Copy link
Member Author

Note that Kotlin has both. Combining them may be reasonable.

This does seem like it has a substantial implication on the cost model though. A lateinit field is going to be a lot cheaper (at least in terms of code size) than a lazy field is, so if all we give you is laziness, that makes lateinit expensive.

@leafpetersen leafpetersen added the nnbd NNBD related issues label Jan 23, 2019
@leafpetersen
Copy link
Member Author

The main remaining concern that I have with combining these is the cost model. The latched non-null model has a very clear and relatively cheap implementation strategy:

  • Initialize the field to null
  • Check that the field value is non-null on every read that is not dominated by another read or write of the same variable.

The lazy model is more expensive. You can't count on using null as a sentinel value in general. For fields with non-nullable type, you could use null as a sentinel value, but then you need extra storage for the initializer lambda, and you need extra caller side code to call the initializer lambda in the case that it's empty. Or you put the logic on the callee side with a getter, probably better for code size but not necessarily for performance.

cc @rakudrama @a-siva @mraleph for thoughts on cost model.

@yjbanov
Copy link

yjbanov commented Feb 6, 2019

Have you considered postponing solving this problem until we know more about how nnbd without this feature ends up being used? It would be interesting to know how often this comes up and neither initializer lists nor factories help.

BTW, Go solves this issue with zero values. However, Go does not have non-null pointers, which may be why this solution works.

@leafpetersen
Copy link
Member Author

Have you considered postponing solving this problem until we know more about how nnbd without this feature ends up being used?

We always consider the null hypothesis (do nothing!) :) That said, discussions with the angular team point to a lot of potential use cases for this (they are doing some digging for data on whether it would actually work for them). And the fact that Swift, Kotlin, and Typescript, all have some feature aimed specifically at this kind of use (implicit unwrapping, lateinit, and definite assignment assertions, respectively) is pretty strong evidence in favor of a need for this (though of course, every language is different).

@rakudrama
Copy link
Member

There are a couple if interesting cases that come up in code I am familiar with.

  1. final cycles in constructor
    In the following we are forced to make _myHelper non-final and nullable. I'd like to have a way for it to be final and nonnullable.
class Thing {
  final arg;
  Helper? _myHelper;
  Thing(this.arg) {
    this.prepare();
    _myHelper = Helper(this);
  }
}
class Helper {
  final Thing myThing;
  Helper(this.mything);
}

By the time Thing is completely constructed, _myHelper is assigned and never changes. prepare() calls unbounded code. Examples are protobufs (GeneratedMessage <->FieldSet) and Angular Dart AppViews (AppView <-> AppViewData).

  1. late init final fields
class FrameworkThing {
   SubObject? _sub;
   Derived? _derived;
   FrameworkThing() { ...}

   activity1(arg) {
     _sub = Sub(this, arg, ...);
     _derived = Derived(_sub);
   }
   activity2() {
     _sub.activity2a(_derived);
     _sub.activity2b();
   }
  activity3() {
    _sub.activity3();
  }
}

_sub is assigned once before any use. I'd like to find a way to declare these fields final and non-null.
The inability to express this is contagious: _derived has the same problem only because it is a function of _sub. The framework ensures that activity1() is called once and before activity2(), but there is no realistic way to determine that at compile time.

The cost model of late-init final is reasonable, it can be defined as modified getters and setters, and overriding definitions explained in terms of the getter and setter just like regular fields.

Sub? _sub_field;
void set _sub(Sub value) {
  assert(_sub_field == null);
  _sub_field = value;
}
Sub get _sub {
  assert(_sub_field != null); // or a non-assert check.
  return _sub_field;
}

Every case I have seen has been null -> non-null, but different sentinel value or a side field could be used instead of a null sentinel for int fields in the VM.

I think the initializer case and the general case are subtly different. The initializer case does not need extra syntax, since the class declaration can be checked to see if the final field is assigned once in all constructor bodies, and some constructors could do a regular early initialization. The C++ style initializer list is needlessly complex and uncomfortable, and I can imagine the front end 'promoting' trivial assignments to initializers.

In the general case, some syntax would be needed

final Sub _sub late-init;

I have been experimenting with an annotation on the field:

@pragma('dart2js:assignedOnceBeforeUsed')
Sub? _sub;

By putting the annotation on one example of the initializer pattern in AppView, a large angular app was reduced in size by 0.18%. This is entirely due to the load-elimination optimization knowing that the field is effectively final - dominating stored value or load can be reused.
The optimization is better than could be achieved by changing the Angular code generator (to cache field values in locals) because it works across source code method boundaries for inlined code.
This is not a huge win, but it does seem promising enough to try an experiment of making the Angular Dart generator emit the experimental annotation to assess the full potential @sigmundch @ferhatb.

@lrhn
Copy link
Member

lrhn commented Feb 11, 2019

For the final cycles, a late-init/write-once semantics would work.
If you write:

class Thing {
  final arg;
  lateinit final Helper _myHelper;
  Thing(this.arg) {
    _myHelper = Helper(this);
    this.prepare();
    _myHelper.init();
  }
  // May be overwritten in subclasses.
  void preare() {}
}

then the compiler should be able to see that the assignment to the lateinit final field dominates everything, so it can make access cheap, even if you have to delay initialization until after prepare. If you initialize _myHelper after calling the open-ended this.prepare(), then it's not clear that there won't be earlier reads (or writes), so there must be an overhead on accesses.

Or using lazy initialization:

class Thing {
  final arg;
  lazy final Helper _myHelper = Helper(this);  // initialized on first read. There can be no write.
  Thing(this.arg) {
    this.prepare();
    _myHelper;  // Force initialization here, if not earlier.
  }
}

For the 2. final late-init field, I don't see a simple rewrite to laziness. You need the initialization to depend on arguments to a method, so there must be a write operation, and then you really do need a "write once" semantics. On the other hand, not all protocol invariants need to be encoded as a state invariant. Nothing in the example code documents that activity1 is called first, so the compiler must be very clever to realize that.

As you state, this protocol invariant can be implemented using _sub_field and custom getters/setters.
Introducing late-init for this case is not going to be more efficient unless the compiler can detect that activity1 is always called before anything that reads _sub. (Annotations are probably not going to get you very much, unless you have an analysis to verify that they are correct).

There are many true things about programs that cannot be expressed statically. In this case, I'd probably just do:

Sub? _sub;
/// on every access: _sub!...;
/// on every store: _sub ??= value; 

This would ensure that I only assign once (but won't prevent me from trying again).

Defensive programming inside a single class shouldn't be necessary, the threat model there is someone who can edit the class/library source. If the code is too complicated to ensure proper internal invariants, then you might need more documentation and more asserts.

@leafpetersen
Copy link
Member Author

Briefly summarizing some white board discussion from last week (partially also captured in comments from @rakudrama above).

It seems clear that we could unify late init and lazy under one syntax, but in order to get all of the benefit, we would need to allow lazy fields to be declared with no initializer, and essentially treat them as late init. So you could write final lazy int x;, and this would be allowed to be written to once. My takeaway from the discussion (and I think @munificent was in the same camp) was that this starts to feel too weird. It's not at all obvious what that is intended to mean, and even if you know what it means, it feels bolted on. The semantics are really different from lazy, so using the lazy keyword just seems confusing. Given that it's not clear to me that you can get the same cost model for late init if you overload lazy to get it (because overriding), my thinking is currently that:

  • We should have some form of late init, syntax TBD
    • This should be available on statics as well for uniformity
    • Not clear that this is useful on locals, but could allow
  • We might add now, or later, some form of lazy
    • Should be available on locals as well

@srawlins
Copy link
Member

There was a brief locals-in-tests discussion above, and I'll just say that having late init or lazy available before (or along with) NNBD is important for test readability and ergonomics. Otherwise all shared variables must be nullable. A test would look like:

void main() {
  Foo? foo;   // Honestly a real Foo, instantiated in setUp().
  Bar? bar1;  // Honestly a real Bar, instantiated in setUp().
  Bar? bar2;  // Honestly a real Bar, instantiated in setUp().
  Baz? baz;   // Honestly a real Baz, instantiated in setUp().
  // ...

  setUp(() {
    foo = Foo();
    bar1 = Bar1(foo!!); // Ouch.
    // ...
  });

  // Tests use and modify foo, bar1, bar2, baz, whose values and state are always
  // set/reset between tests.
  test('foo something', () {
    foo.m1(bar1!!); // Ouch.
    var nnList = <Bar>[bar1!!, bar2!!];  // Ouch ouch.
    // ...
    // I imagine we'll have a Warning/Hint/Lint about unguarded access on a
    // nullable object?
    foo?.m1();  // Ouch.
  });
}

@lrhn
Copy link
Member

lrhn commented Mar 4, 2019

My thinking is more towards implicit laziness.

  1. A local variable is only lazy if it's non-nullable and has no initializer. Then reading before writing throws. It makes no sense to lazily initialize a local variable, it's too confusing
  2. A static/top-level variable is always lazy.
  3. An instance variable is lazy if its initializer refers to this or if it is non-nullable and has no initializer.

A lazy variable is in one of four states:

  • uninitialized (throws on access)
  • initializer (initializer expression not evaluated yet)
  • initializing (throws cyclic init error on access)
  • initialized (has value)

A non-null variable with no intializer is always lazy and starts as "uninitialized".
A lazy variable with an initializer starts as "initializer".

Reading an "uninitialized" variable throws.

Reading an "initializer" variable changes the state to "initializing" and evaluates the initializer expression and, if successful, stores the result in the variable and makes it "initialized", and if unsuccessful, makes the variable "uninitialized" (future reads will throw, but will not evaluate initializer again).

Reading a variable which is "initializing" throws a cyclic initialization error (otherwise it's similar to uninitialized, it throws on access).

Reading a variable which is "initialized" returns the value.

Writing a non-final variable stores the value and makes it "initialized".

This does not cover computed "late-init" of a final variable. That's not a new problem, either, so I'm not sure we need to handle it now.

For local variables, we might be able to address most concerns with assignment-based type promotion. If you make the variable nullable and non-final, then assign a non-null value to it on all branches, maybe we can deduce that the variable is non-nullable, and promote to that locally.

For instance variables, we can still go the two-constructor way.

@eernstg
Copy link
Member

eernstg commented Mar 4, 2019

@lrhn wrote:

A lazy variable is in one of four states:

That's cool! But why wouldn't it work to do the same thing for final variables, cutting it down to the paths through this status diagram that make sense: They can be 'initializer' variables (which means that we can have final instance variables whose initializing expression has access to this). They cannot be 'uninitialized' initially, but they could still get there due to the error path, say, because of a cyclic initialization dependency.

The point is that this would allow developers some extra flexibility (laziness and access to this from the initializer for a final instance variable, laziness for a final local variable, and a guarantee that you won't see this variable with two different values).

I believe that the discipline that goes with final for these kinds of variables would still be similar enough to the discipline for other final variables to make avoid too much confusion.

@lrhn
Copy link
Member

lrhn commented Mar 4, 2019

I did intend it to work for final variables too, and I think it does as written. It won't allow "late init" write-once to a final variable, because you just can't write to a final variable. A final variable will need an initializer, but then it should just work.

@leafpetersen
Copy link
Member Author

My thinking is more towards implicit laziness.

I need a lot of convincing to go this route. It feels like a complete foot gun to me that the evaluation order of the program is completely changed by minor (and potentially inadvertent) changes. Change a static method to an instance method, and all of the sudden some initializers that used to run eagerly start running lazily (because they call that method). Refactor a nullable field to be non-nullable, get no static warning.

Moreover, if I want a lazy init field, I have to make sure it references this in order to get laziness. If we're going to support the feature, make it available to everyone.

For local variables, we might be able to address most concerns with assignment-based type promotion

This doesn't address the case that @srawlins described above.

This does not cover computed "late-init" of a final variable. That's not a new problem, either, so I'm not sure we need to handle it now.

It's a nice one to handle en passant though, no? There was a comment in another thread that you thought this could be handled via laziness. Do you have a better pattern to handle this than the following?

class A {
  B? _b;
  void set_b(B _b) {
    if (this._b != null) throw DuplicateInit;
    this._b = _b; 
  };
  final lazy B b = _b!;
}
class B {
  A? _a;
  void set_a(A _a) { 
    if (this._a != null) throw DuplicateInit;
    this._a = _a; 
  };
  final lazy A a = _a!;
}
A buildCycle() {
  var a = new A();
  var b = new B();
  a.set_b(b);
  b.set_a(a);
  return a;
}

It's true that you can do this... but yikes.

@eernstg
Copy link
Member

eernstg commented Mar 11, 2019

evaluation order .. completely changed by minor (and potentially inadvertent) changes

A lazy init field would obviously need a syntactic marker (e.g., a lazy modifier), for exactly that reason. I thought this was already part of the proposals, but otherwise I agree that it should be.

With respect to the final cycle, we could at least make a static decision about where to break the cycle, and then initialize everything except the "breaking point" in the topologically required order. Not very pretty, but at least it's one step less ugly. ;-)

class A {
  final lazy B b = _b!;
  B? _b;
  void set_b(B _b) {
    if (this._b != null) throw DuplicateInit;
    this._b = _b;
  };
}

class B {
  final A a;
  B(this.a);
}

A buildCycle() {
  var a = new A();
  var b = new B(a);
  a.set_b(b);
  return a;
}

However, this means that the final lazy feature doesn't play a very important role here, we might as well use B get b => _b!; except for the additional optimization opportunities that final lazy B b = _b!; might allow for.

@munificent
Copy link
Member

I'm sorry I didn't get caught up on this discussion before you all talked about it in AAR.

A couple of questions:

1. What about Sam's scenario above:

void main() {
  Foo? foo;   // Honestly a real Foo, instantiated in setUp().
  Bar? bar1;  // Honestly a real Bar, instantiated in setUp().
  Bar? bar2;  // Honestly a real Bar, instantiated in setUp().
  Baz? baz;   // Honestly a real Baz, instantiated in setUp().
  // ...

  setUp(() {
    foo = Foo();
    bar1 = Bar1(foo!!); // Ouch.
    // ...
  });

  // Tests use and modify foo, bar1, bar2, baz, whose values and state are always
  // set/reset between tests.
  test('foo something', () {
    foo.m1(bar1!!); // Ouch.
    var nnList = <Bar>[bar1!!, bar2!!];  // Ouch ouch.
    // ...
    // I imagine we'll have a Warning/Hint/Lint about unguarded access on a
    // nullable object?
    foo?.m1();  // Ouch.
  });
}

Can those local variables be marked lazy and have a non-nullable type? If not, it would be good to come up with some solution for this. It's a very common pattern in tests. We're talking thousands and thousands of local variables. In fact, it's probably the dominant reason people write type annotations for locals inside Google code.

2. Do lazy fields need initializers?

Can I do:

class C {
  lazy int f;

  initialize(int value) { f = value; }
}

If so, lazy doesn't seem like the right name for it. If not, it seems like we're losing an important use case.

How about late? It's as short as lazy and carries the same connotation of "it won't happen right now" without the explicit baggage of lazy evaluation from other languages.

3. Can lazy final fields be assigned to?

I would assume no. It's final, after all. But Leaf's last example does that. Is that a mistake, or does final mean something different?

@munificent
Copy link
Member

I spent some more time trying to work through how lazy composes with other features. There are three aspects of a field (and maybe local variable) declaration:

  • Is it final?
  • Is it lazy?
  • Does it have an initializer?

(There is a fourth, "does it have a non-nullable type?". But I don't think we want that to affect the semantics beyond the usual type error checking, so I'll ignore that.)

If we allow all the combinations, here's what I think the semantics could be:

  • int i;: Mutable field, default initialized to null. If the field has a non-nullable type, that's a type error.
  • int i = foo();: Mutable field eagerly initialized to foo().
  • final int i;: Must be initialized in the constructor initialization list. Compile error to assign. Compile error to write this for a local variable.
  • final int i = foo();: Eagerly initialized to foo(). Compile error to assign.
  • lazy int i;: No static error. Runtime error to read before assigning.
  • lazy int i = foo();: Can be freely assigned. If accessed before assigned, initialized to foo().
  • lazy final int i;: No static error. Runtime error to assign more than once or read before assigning.
  • lazy final int i = foo();: Lazily initialized by calling foo() on first access. Static error to assign?

Those are the behaviors I would intuit and that I think are useful. In particular, Sam's example is covered by allowing lazy uninitialized local variables:

void main() {
  lazy Foo foo;

  setUp(() {
    foo = Foo();
  });

  // Tests use and modify foo, etc.
  test('foo something', () {
    foo.m1(); // Fine. Checked at runtime.
  });
}

If those are the semantics you all have in mind, then the next step is thinking about how we explain them (and whether lazy is the best keyword). final no longer means "give me a compile error if I forget to initialize it". I would explain them like:

  • final means the variable cannot be re-assigned. It will get a value once and then never change. It does not mean that it must be a assigned. You can have a lazy final variable that never ever gets initialized, as long as you never try to use it.

  • lazy means "defer the behavior as far as possible". It means the initializer does not run until right before its result is needed, if ever. It means that the invariant checking that a final variable is initialized before use happens later at runtime instead of at compile time. Likewise, the invariant that a final variable is not re-assigned is validated at runtime instead of at compile time.

I think that works, though I worry lazy will mislead. How much of this do I have right?

I spent a little time talking to Stephen about this and he's worried about the code size implications of the runtime invariant checking for lazy fields. He'd like a production build to be able to eliminate all of those checks. When you say "will throw an error", is it enough to say that that's an AssertionError and that dart2js can simply not throw those?

@eernstg
Copy link
Member

eernstg commented Apr 5, 2019

Sounds good, @munificent!

I was wondering about the exact same thing that Stephen was worried about. For instance, do we actually wish to enforce this property?:

lazy final int i;: No static error. Runtime error to assign more
than once or read before assigning

It sounds like such a variable would need to start out having a special value meaning "uninitialized". If it is allocated inline as a bit array of length 64 and interpreted as a two's complement signed integer encoding then we can't use null for the special value, but we could have a second field storing the "has_been_initialized" boolean state for i. With that, we'd raise the dynamic error whenever we read it and get the special value (but we may then have to read two values), and whenever we are about to write it and see that it currently does not have the special value (so every write is preceded by a read, unless we can optimize that away based on a definite assignment analysis).

A deployed application could omit these checks, assuming that it is acceptable to have non-standard semantics (maybe this means that the variable starts out with the value zero, and there are no checks to enforce that it is initialized before use).

But it doesn't seem likely to me that the deployed application could maintain the precise semantics and eliminate those checks (or those extra "has_been_initialized" storage locations).

@lrhn
Copy link
Member

lrhn commented Apr 5, 2019

@munificent

1.

The variable declarations will be lazy Foo var1; It is non-nullable, can be assigned by setUp, and it's an error to read it before the first write.

2.

Lazy fields do not need initializers, If they do not have one, they will throw if they are read before being written. It is as if the default initializer expression of a lazy variable is throw UninitializedError() instead of null.

Combinations

Lazy means initialized on first read.
No initializer means a default initialize of null for non-lazy variables and throw for lazy variables.
Final means you cannot assign.

That makes it a static type error to have no initializer on a non-lazy non-nullable variable because null is not assignable to the variable type, but on a lazy variable, the type of throw is Never, which is assignable to a non-nullable type.

So, the only one that is wrong is:

lazy final int i;: No static error. Runtime error to assign more than once or read before assigning.

This could be a compile-time error for any non-instance variable, because it's an error to read it before it's assigned (lazy+no initializer) and it's a compile-time error to assign to it (final), so the variable is as useless as a local final int x; with no initializer. We can also allow it and it will always throw when read.
If it's an instance variable, it must be initialized by the constructor, so the lazy won't matter. (Unless we make lazy-final variables not need to be initialized like plain final variables, then you can leave it uninitialized and forever throwing, or you can initialize it in the initializer).

In any case, this is not "write once" semantics, and we do not have any "write once" variables.

Omitting errors

It's not an assertion error, it's a proper run-time error situation in the run-time semantics that has no alternative valid behavior. Omitting the throw means that the code has no meaning. The variable has no value, so reading it cannot return a value.

The reason for lazy variables throwing is that if they are non-nullable, they cannot have a value until they are assigned. It can't just be an assertion, reading them before that cannot possibly return any type-safe result (and the specification will not specify a type-unsafe behavior.
No Dart implementation can omit throws that the specification requires.
That goes for Dart2js omitting type errors, and they do that anyway. It's just not Dart semantics any more.

If Dart2js ignores this check and returns null, they are (again) not implementing Dart by omitting a specified throw, but then they are not type-safe to begin with, so I won't say whether that's good or bad.
It's just not Dart.

@leafpetersen
Copy link
Member Author

I think some of this has been covered, but some additional context and comments.

There are a couple of different questions in play here:

  • What set of semantic choices from the list that @munificent described above do we cover?
  • Do we cover them using a lazy modifier only, or lazy + late?
  • What overrides are valid for lazy and/or late?
  • What cost semantics do we expect to be able to provide for these?
  • For lazy variables, do we do cyclic initialization checking?

Starting with the cost semantics. Throughout this discussion, unless otherwise specified, I'm going to assume that we are in a situation where we are not able to devirtualize the field read.

Question 1: Is there a performance difference between late int x; and lazy int x;

That is, for the non-final late init use case:

  • On the callee side, can we implement the getter for a late int x more efficiently than for lazy int z?
  • On the caller side, can we get any benefit from knowing that we are reading from something marked late instead of lazy?

On the callee side, I believe the implementations are equivalent.

On the caller side, I believe that in the absence of override restrictions, there is essentially no performance benefit to late over lazy, because:

  • any read of a field that you see marked as late still needs to go through the getter, since it might be overridden arbitrarily.
  • you could choose to, by convention, have an "unchecked" entry to use in the case where you are doing a read dominated by another read/write, which then anything which implements an interface with a 'lazy' or 'late' variable must provide with appropriate semantics (possibly just redirecting to the main entry point).

So in the absence of override restrictions, I don't see a perf benefit here to having late.

If you don't allow a late variable to be implemented/overridden by anything except a late variable (and similarly for lazy), then I believe that you can optimize late much more efficiently than lazy. You can also implement lazy more efficiently.

class A {
  late int x;  // We compile this to just a field, and use `null` as a sentinel value
  late int? y; // We add a compiler private __y_sentinel value (or use getter)
  lazy int l;  // One approach:
                  // We have a compiler private backing store __l__backing
                  // We use `null` as a sentinel value
                  // We must provide the getter as well 
}

void test(A a) {
   a.x; // First read, compiles to `load(a.x)!`
   a.x; // Dominated second read, compiles to `load(a.x)`
   a.y; // First read, compiles to `if(!a.__y__sentinel) throw NullError; load(a.y);`
   a.y; // Second read, compiles to `load(a.y)`

  a.l; // First read of lazy, could compile to `a.__l__backing ?? call_getter(a.l)` if we use a sentinel
  a.l; // Dominated second read, compiles to `load(a.__l__backing)`
}

This is all a bit speculative, but my read on it is that if we were to restrict overriding, then there are perf benefits to having both, otherwise no perf reason to have both.

Question 2: Do we do cyclic initialization checking on lazy?

In my initial draft spec, I am proposing removing cyclic initialization checking from lazy variables in general (i.e. existing toplevel and static fields), and specifically for the new lazy variables. I do not believe that the benefit of catching this error early is worth the cost: the implementation of the checking requires quite a bit of heavy mechanism (e.g. wrapping the evaluation of the initializer in a try catch, keeping an extra bit of state around to see whether you are in process of being initialized, etc). The code gets a fair bit smaller if you elide all of that. In the rare case that you actually do accidentally do introduce an initialization cycle, you will almost certainly get a stack overflow immediately anyway, so the benefit of doing this checking seems close to zero to me.

However, there is one unfortunate side effect of this, which is that it is possible to do a cyclic read which does not cause an infinite loop. See, e.g. the code above. This is ok for non-final variables, but @lrhn was unhappy about the fact that this allows a final variable to be observed to have two different values. So per the referenced comment, we are proposing to make that a checked error for final variables. So the implementation of final lazy int? x = e is:

  var __x_backing_store = __private_sentinel
  int get x {
     if (__x_backing_store != __private_sentinel) return __x_backing_store;
     var tmp = e;
     if (__x_backing_store != _private_sentinel) throw DoubleWriteToFinal;
     return __x_backing_store = tmp;
  }

For locals, you don't need the check. For fields, in the fairly common case that e does not reference this, then you can eliminate the check. Unfortunately, for toplevel variables and static fields you can only eliminate the check if e is fairly trivial.

I would expect that dart2js would elide this check in production mode.

3. Can lazy final fields be assigned to?

I would assume no. It's final, after all. But Leaf's last example does that. Is that a mistake, or does final mean something different?

My example does not assign to any final fields. I don't see any mistakes in it. It does illustrate the fact that without cyclic initialization checking, you can end up initializing a final field to two different values.

Question 3: Given that context, do we disallow overriding late with non-late etc?

I'm very tempted by this, but my sense is that it's a bit un-Dart like to do so. If I have an interface that specifies that there is an int x field, it seems really useful to be able to implement that with a late or lazy concrete field. Otherwise the interface writer has restricted the possible implementations in a way that seems unfortunate.

So at a minimum, it seems to me that we want to allow overriding a non-lazy/late with a lazy/late.

Just allowing that direction might be reasonable, and it might actually not hurt optimization. You need to support non-lazy/late access, which means that you do need to provide the getter access path, but when you have an instance that has late/lazy fields, you can use the optimized path.

It's not clear to me how much expressiveness we lose by forbidding overriding a lazy/late with a non-lazy/late. If you really want non-lazy semantics or non-late semantics in a subclass, you might have to jump through small hoops:

class A {
  lazy int x;
  late int y;
}

class B {
  int _x = foo(); // I really want this to be run at allocation time, but x has to be lazy, so I cache it here.
  lazy int x = _x;
  late int y = 3;  // We could just allow you to write this, I guess?
}

So this is a tenable position, and if we made this restriction, I would be more strongly in favor of having both.

Question 4: What combinations of semantics do we want to cover?

It is a hard requirement, from my standpoint, that we cover the use case of a non-nullable variable with no initializer that is initialized after allocation. We have copious evidence from other languages that this is a very useful feature.

There are three ways to get this:

  • With late only.
  • With lazy, where we allow you to write lazy int x; as shorthand for lazy int x = throw Uninitialized
  • With both lazy and late

It is a nice to have (but not a requirement) to have a way to write lazy fields and lazy locals. This is mostly orthogonal to NNBD.

  • This requires lazy

It is a nice to have (but not a requirement) to support the use case of final non-nullable variables that are not initialized in the constructor header.

  • This could be covered by allowing final lazy int x; which gets treated as a write once late variable.
  • This could be covered via final late int x; which gets treated as a write once late variable.

My take on the last two is that both are fairly niche, but that the first (lazy fields/variables) is probably more generally useful than the second (final late).

My take on the question of allowing final lazy int x; is that I'm ok with it, but at that point you're essentially assigning two different special case semantics to lazy variables based on finality and initializer, and that seems like at least one (and possibly two) too many. So if we want to cover that case, I'd lean towards having late and lazy both.

Summary

  • I'm fine with adding both modifiers
  • If we only have one, I'm mildly inclined towards it just being lazy, and not covering the final lazy int x; use case
  • I'm open to the possibility of disallowing overriding lazy/late with non-lazy/late, but probably not to disallowing the other direction
  • I still prefer getting rid of the cyclic initialization checking

Thoughts, comments, corrections welcome.

@rakudrama
Copy link
Member

Q2: "For fields, in the fairly common case that e does not reference this, then you can eliminate the check."
This is hard to prove since, once you are past the eager initializers, this can have an alias.

@munificent
Copy link
Member

So, the only one that is wrong is:

lazy final int i; No static error. Runtime error to assign more than once or read before assigning.

This could be a compile-time error for any non-instance variable, because it's an error to read it before it's assigned (lazy+no initializer) and it's a compile-time error to assign to it (final), so the variable is as useless as a local final int x; with no initializer.

Yeah, I think this is the least compelling of the combinations I went through. What I had in mind is that it would let you write code like:

class Cache {
  lazy final Object _value;

  void cache(Object value) {
    _value = value;
  }

  void get() => _value;
}

The intent is that you have a field that can't be eagerly initialized, but you only want to allow it to be initialized once. Today, I usually write that using an explicit assert:

  void cache(Object value) {
    assert(value == null, "Can only cache once.");
    _value = value;
  }

This shows up fairly frequently when you have cyclic references between objects. You want those to be immutable, but only one can be actually eagerly final. This would give you a more graceful way to express that.

Do we cover them using a lazy modifier only, or lazy + late?

I may have been unclear in my comment, but by late I wasn't suggesting adding another keyword with different "lateinit-like" semantics. I was simply suggesting using the keyword late instead of lazy because some of the combinations where it comes into play don't intuitively seem "lazy" to me.

My example does not assign to any final fields. I don't see any mistakes in it.

Dumb mistake on my part. I misread first as field. Oops!

It is a hard requirement, from my standpoint, that we cover the use case of a non-nullable variable with no initializer that is initialized after allocation. We have copious evidence from other languages that this is a very useful feature.

+1. And from our own code. Look at any test inside Google that contains a call to setUp() and you'll lots of examples of it.

at that point you're essentially assigning two different special case semantics to lazy variables based on finality and initializer, and that seems like at least one (and possibly two) too many.

That was a concern of mine too. That's why I tried to come up with a new plausible explanation for each modifier such that those explanations do roughly compose and produce the semantics I sketched out for each combination. I think the explanations I came up with for lazy (which may or not be spelled late) and final more or less work OK, but I'm curious what other people think.

@lrhn
Copy link
Member

lrhn commented Apr 8, 2019

I'm not particularly worried about overriding lazy fields with non-lazy fields or vice-versa, overriding fields with fields of the same name is likely very rare (and mostly a mistake).

I do care about allowing a getter/setter declaration to override any getters/setters introduced by a field.
So, it doesn't matter whether your field is lazy or not, I do want to be able to override it with a getter declaration. If that is the case, then there is no reason to prohibit overriding with a field because I can always do:

class SubClass extends SuperClass {
   // can't override lazy field foo?!?
   int _myFoo;
   int get foo => _myFoo;
   void set foo(int value) { _myFoo = value; }
}

So, I don't see any value to a restriction around that which doesn't preclude getters and setters, and I do not want to prohibit getters and setters.

About the missing "late-init final" write-once case, it is a useful case and we are not covering it with the "lazy" modifier.

The late write-once case is not covered by Dart today, and the change to non-nullable types is mostly orthogonal to the feature. We need to do something for the non-nullable-variable-without-initializer case which is allowed by Dart today (because everything is nullable), but which won't be allowed under NNBD. The lazy modifier allows that use case, by introducing a default initializer which throws. We also needed to modify the current lazy behavior used by static fields because it initialized to null on a throwing initializer, which would again not work under NNBD.

So, the changes to lazy, and the allowing it in new places, were necessitated by the NNBD change, which is why we are planning them now.

The "write-once" variable is not made necessary by NNBD. If we can introduce it at the same time, and we have a good, consistent syntax and semantics for it, and the necessary time to implement it, then that's fine. If not, we are still no worse off than we already are, and we can add the feature at a later time.

@leafpetersen
Copy link
Member Author

overriding fields with fields of the same name is likely very rare

It's not that rare. We did that experiment, remember... :(

(and mostly a mistake)

Won't argue with that... :)

So, the changes to lazy, and the allowing it in new places, were necessitated by the NNBD change, which is why we are planning them now.

This isn't really true. The most direct way to solve the NNBD issues is to add late. The lazy feature is almost entirely orthogonal to NNBD. We just happened to find a way to tack on a solution to the NNBD problem (allow lazy int x; to mean the same thing as late int x;) using lazy, and we decided that we wanted lazy anyway. So the question of whether to solve the NNBD issues with lazy vs late is entirely an open one.

I think I am of the opinion that if I can only have one of lazy int x = e; or late final int x; as a byproduct of solving the NNBD related issue, that it is probably a more generally useful feature to have lazy fields over final late fields. But I'm not 100% convinced. In particular, lazy int x; is not an entirely intuitive way to say "non-nullable variable which can be assigned at some later time and will be checked on reads", and late final int x does have interesting use cases.

@leafpetersen
Copy link
Member Author

Following up with some notes from discussion with @rakudrama . There's an interesting initialization pattern that he sees in code (particularly angular code) where a number of fields are initialized outside of the constructor, in one location, and in sequence:

 build(arguments) {
  c.x = something;
  c.y = somethingelse;
  c.z = everything;
}

Making these final would have optimization benefits elsewhere (you can eliminate redundant loads).

You could optimize the initialization code very nicely, since you only need to check the first write (if the first write was not done, you haven't entered this code, and if was done, then it's an error).

@leafpetersen
Copy link
Member Author

leafpetersen commented Apr 10, 2019

Based on discussion, I think I'm inclined towards the following:

  • We do an initial spec supporting both lazy int x; and final lazy int x;
  • We get some experience with it
  • We consider trying to get some UX work done on this
  • Based on that, we done one of:
    • Keep this syntax
    • Replace both lazy int x; and final lazy int x; with the late versions
    • Remove final lazy int x; entirely

I still have some doubts about re-using lazy. I think getting some UX input on this would be great if we could.

@lrhn
Copy link
Member

lrhn commented Apr 10, 2019

True, the problem we need to solve for NNBD is the ability to have non-nullable variables with no immediate initializer, which are then initialized later. Both lazy and late solve that problem, and late is more directly aimed at that particular problem.
On top of that, final late int x; allows write-once semantics, which we don't have now, and lazy var x = StringBuffer() allows initialize-on-use that would otherwise be done by x ??= StringBuffer() where the variable is read.

So, both solve the immediate problem, both have extra uses, and it may or may not be too confusing to add both.

Or we could combine them into one feature (using the word late because doing things late is still lazy, but it sounds better):

int x; // Compile-time error (unless instance variable initialized by all constructors)
int? x;  // Allowed, eagerly initialized to null.
int x = 2; // eagerly initialized to 2.
int x? = 2; // eagerly initialized to 2.
final int x; // Compile-time error (unless instance variable initialized by all constructors).
final int x?; // Compile-time error (unless instance variable initialized by all constructors).
final int x = 2; // eagerly initialized to 2. Cannot be written.
final int x? = 2; // eagerly initialized to 2. Cannot be written.

late int x; // Throws when read, until written.
late int x?; // Throws when read, until written. 
late int x = 42; // Initializes when read, unless written first.
late int x? = 42; // Initializes when read, unless written first.
final late int x;  // Throws when read, until written, can only be written once.
final late int? x;  // Throws when read, until written, can only be written once.
final late int x = 42; // Initializes when read, cannot be written.
final late int x? = 42; // Initializes when read, cannot be written.

The final late-no-initializer case stands out as the only with "can only be written once".
That write can be a constructor initializer, but you only need the late if there is some construction which doesn't write the value.

There is the option of allowing a single write to final late-with-iniitalizer. I think it would be confusing, though, because you see final ... x = 42; and then you assume that x can only have the value 42. Having to recognize the late and deducing that the value could be changed anyway seems like a foot-gun.

@munificent
Copy link
Member

I think your table is exactly the same as all of the cases I suggested/inferred here, so those all look great to me.

Having to recognize the late and deducing that the value could be changed anyway seems like a foot-gun.

I could be wrong (and it would be great to get some UX data on this), but I'm not too worried about this. A reader would hopefully see late there and assume the author put it there for a reason, so they know something interesting is going on.

I think it would have been a much worse foot gun to infer lateness from the nullability of the type because then it's really non-obvious what's happening.

@munificent
Copy link
Member

munificent commented Apr 15, 2019

No, top-level variables and static fields would remain implicitly late/lazy as they are now. (The semantics would be very slightly different in error cases where you have cyclic references.) This is more or less necessary, since there is no "eager" order that we could evaluate them in. The top level of a Dart library is not "executed" top-down. All declarations are, well, declarative, and happen "simultaneously".

Also, implicit laziness is a good thing for these in terms program startup time. (Many years ago, there was discussion of also making instance fields be implicitly lazily initialized, but the language team felt that would be too surprising to users coming from other languages.)

@munificent
Copy link
Member

assume that for global variables, "late" is the default.

I don't think Leaf and Lasse are proposing to make top-level and static variables behave exactly like an instance variable with late would behave. Just that they continue to be "lazy" like they are now.

Making them behavior exactly like they implicitly have late/lazy might be nice in terms of symmetry and consistency, but I don't think it would do what users actually want. In particular, the combination of final and (implicit) late/lazy is strange.

With an instance field or local variable, you see both the final and late/lazy, so it's clear you are opting in to the combination of those features. In particular, you bothered to write late/lazy, which means you went out of your way to turn some static checking around initialization into runtime checking.

I don't think that implication carries over to implicit late/lazy. If you just write:

int i;

I don't think it's reasonable to assume that you don't want a compile-time error that you forgot to initialize it.

@lrhn
Copy link
Member

lrhn commented Apr 18, 2019

True. I think you should be able to write lazy on top-level/static variables and get the same behavior as local/instance lazy variables, but if you omit the lazy, you get the same static behavior as a non-lazy variable, except that the initializer is evaluated lazily:

int x; // Compile-time error.
int? x;  // Allowed, eagerly initialized to null.
int x = 2; // lazily initialized to 2.
int x? = 2; // lazily initialized to 2.
final int x; // Compile-time error.
final int x?; // Compile-time error.
final int x = 2; // lazily initialized to 2. Cannot be written.
final int x? = 2; // lazily initialized to 2. Cannot be written.

@leafpetersen
Copy link
Member Author

Filed a discussion issue on the question of late x vs late var x here.

@lrhn
Copy link
Member

lrhn commented Apr 18, 2019

Is it final late x; or late final x; or either way?

I'd be fine with either way. I'm not sure which one I would prefer if we have to pick one order.

@leafpetersen
Copy link
Member Author

I'm proposing late final x in the feature spec PR, based mostly on:

  • Asking a few people which they preferred
  • The analogy to late var x vs var late x
  • The possibility that we go with late var x instead of late x (and I definitely don't like var late x)

I also don't have super strong feelings though.

@leafpetersen
Copy link
Member Author

Ok, I think this is fully resolved, at least until we get more data from prototyping.

@rrousselGit
Copy link

Playing around with the preview of NNBD, I've found a corner case where late variables are not sound:

void main() {
  late final int a;

  void cb() {
    a = 42;
  }

  print(a); // null
}

or:

void main() {
  late final int a;

  void cb() {
    a = 42;
  }

  cb();
  cb(); // assignment performed twice
}

Is this intended?

From my understanding, the compiler shouldn't allow assigning late final variables inside local functions/closures.

Nor should the compiler consider that the late variable is initialized if the only init is performed by a local function/closure.

@lrhn
Copy link
Member

lrhn commented Dec 15, 2019

Yes, it is intended.

Late variables are not sound. At all.
They are the "dynamic" of initialize-before-use/definite-assignment, intended for the cases where the programmer knows, but can't otherwise convince the compiler, that the variable is indeed always initialized before it is read. So it is certainly possible to read them before they are initialized, and the compiler will let you unless it's absolutely 100% certain the variable cannot possibly be initialized yet.

You are allowed to assign a late final variable anywhere, except if the compiler can say with absolute 100% certainty that the variable is already initialized. The compiler isn't that clever about it.
It's up to you to ensure that you only initialize each late final variable once, and if you do it more than once, it's a run-time error. There is no problem initializing it inside a local function (or inside multiple local functions), and the compiler won't try to track whether you call those functions more than once. That's on you.

The compiler also doesn't consider the variable initialized. It considers it possibly initialized, which is why it would allow you to read it. There exists an assignment, and the compiler isn't clever enough to rule out that it has been executed, and since you said the variable was late, it assumes you know what you are doing.

So, in short:

  • A non-late variable must definitely be assigned before you are allowed to read it.
  • A non-late final variable must definitely not be assigned when you try to assign it.
  • A late variable must not definitely not be assigned when you read it. (Insert "So you're telling me there is a chance" meme here).
  • A late final variable must not definitely be assigned when you try to assign to it.

A late variable with an initializer is considered definitely assigned.
A non-late variable with an initializer, or with a definitely nullable type, is considered definitely assigned.

@rrousselGit
Copy link

I see.

Then is there a plan to infer the situations where it's obvious that we're doing something illogical?

For example I've also found we can do:

late final a:
a = 0;
a = 1;

or:

late final a = 0;
a = 1;

@lrhn
Copy link
Member

lrhn commented Dec 15, 2019

Yes.

There is a "definite assignment"/"definite unassigned" analysis which tries (best effort, not too clever) to recognize when a variable is definitely assigned, when it's definitely not assigned, or when it's potentially either (can't say for sure). The last one is what happens if your code is too clever for the analyzer to figure out, and probably what happens for all non-local variables except in very clear cases.

A case like late final a; a = 0; a = 1; would recognize at the second assignment that a is definitely assigned by the previous statement, and disallow assigning it again.
A variable with an initializer expression is always definitely assigned. So is a non-late variable with a nullable type (it's always at least null).

The difference between late and non-late variables is what happens in the potentially assigned/unassigned case.

A non-late variable cannot be read if it's only potentially assigned, it must be definitely initialized. A late variable can be read unless its definitely unassigned. In that case, the compiler trusts that you know what you are doing, but adds a run-time check if it's not definitely assigned, just to be sure.

A final non-late variable can only be assigned if it's definitely unassigned. A final late variable can be assigned unless it's already definitely assigned. Again, in that case, the compiler trusts you, but adds a run-time check if the variable isn't definitely unassigned.

I'm not sure about the current status of the implementation of this analysis. I believe most of it is working, but there can easily be edge cases we haven't covered yet.

@leafpetersen
Copy link
Member Author

#750

@rajkananirk
Copy link

how thi let assign in this code https://laratuto.com/non-nullable-instance-field/

@lrhn
Copy link
Member

lrhn commented Feb 7, 2022

@rajkananirk
The linked code is

class Question {
  String questionText;
  bool questionAnswer;

  Question({required String q, required bool a}) {
    questionText = q;
    questionAnswer = a;
  }
}

The canonical way to write this in Dart, before and after null safety, is

class Question {
  final String questionText;
  final bool questionAnswer;

  Question({required String q, required bool a}) : questionText = q, questionAnswer = a;
}

(I'd question the variable naming too. I'd probably go with:

class Question {
  final String question;
  final bool answer;

  Question({required this.question, required this.answer});
}

but I can see that "q" and "a" are common abbreviations that make sense in the context.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
nnbd NNBD related issues
Projects
None yet
Development

No branches or pull requests

10 participants