Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we provide a way to query the status of late variables? #324

Closed
Tracked by #110
leafpetersen opened this issue Apr 23, 2019 · 26 comments
Closed
Tracked by #110

Should we provide a way to query the status of late variables? #324

leafpetersen opened this issue Apr 23, 2019 · 26 comments
Labels
nnbd NNBD related issues

Comments

@leafpetersen
Copy link
Member

As proposed, the only way to query the state of a late initialized variable to see if it has been written would be to try to read it, and catch the resulting error. We explicitly discourage catching errors, so this seems non-ideal. Should we provide this functionality? And if so, how do we expose it?

For comparison, Kotlin provides a way to do this via reflection.

If we do provide this, then what is the right behavior for it on late int x = 3 + 3? Is this variable considered initialized before the first read/write (since it has an initializer)? Or is it considered uninitialized since the initializer has not yet been run (and may in principle never complete with a value and cause the variable to actually be initialized)?

@leafpetersen
Copy link
Member Author

In Kotlin, you can do it without reflection

Not really, at least as I understand it. The .isInitialized property is defined on property references which are part of the reflective API.

@munificent
Copy link
Member

munificent commented Apr 23, 2019

My quick answer is "no", let's try to get by without it, at least for now. If you need access to that bit, you can always not use late and (more or less) implement the same behavior yourself manually with a "was initialized" field you can control:

// Instead of:
class Memo<T> {
  late T value_;

  void store(T value) {
    value_ = value;
  }

  T getOrDefault(T defaultValue) {
    if (<magic...>) return value_;
    return defaultValue;
  }
}

// Do:
class Memo<T> {
  T? value_;
  bool hasValue_;

  void store(T value) {
    value_ = value;
    hasValue_ = true;
  }

  T getOrDefault(T defaultValue) {
    // Can't check for null value_ here since T might be a nullable type.
    if (hasValue_) return value_!;
    return defaultValue;
  }
}

My hunch is that this need is rare enough that the workaround isn't too onerous. It might be nice to eventually provide access to the state somehow, but that tiptoes towards something meta-programmy around working with identifiers as storage locations instead of as the values they contain. I'm not sure if we're ready to bite that off just now. This is a feature I think we could add later without undue pain today.

Here's some more stuff that might be related:

Detecting optional parameter passage

We have an analogous issue around detecting if an optional parameter was explicitly passed or not. The language does track that extra bit of data because it needs it to determine whether the default value should be used or not. (I hoped we could eliminate that bit of magic in Dart 2.0, but alas.) Consider:

original([param = "some secret value only original knows"]) {
  print("original got $param");
}

forward([param]) {
  original(param);
}

main() {
  original();
  original(null);

  forward();
  forward(null);
}

There's no way to implement forward() such that a call to it behaves like an unwrapped direct call to original(). Direct calls to original() pass a hidden bit of "was param passed?" state that forward() has no way to capture on calls to itself.

Ancient Dart used to support this with a ?parameter operator. The idea was that this would let you forward parameters. The above then becomes:

forward([param]) {
  if (?param) {
    original(param);
  } else {
    original();
  }
}

The problem, of course, is that it's scales exponentially (!) in the number of parameters you want to forward. To forward a function with five optional parameters, you have to write out all 32 combinations.

The operator was killed but, unfortunately, the hidden state wasn't.

I bring this up because it's another place where the runtime tracks state that the user can't access. If we want to solve it for late variables, we may want to see if that solution can cover this too.

A third potential use is conditional logic mixed with definite assignment for locals. I think we want to support:

int i; // Non-nullable.

if (condition) {
  i = 1;
} else {
  i = 2;
}

i.isEven; // OK.

It's not inconceivable that a user might want to write something like:

int i;

if (condition) i = 1;
stuff();
if (another) i = 2;
moreStuff();
if (<i wasn't caught by either of the above two cases) i = 3;

i.isEven;

Capturing a parameter's name

When you throw an ArgumentError, it's useful to include the name of the argument in the error message:

sqrt(int i) {
  if (i < 0) throw ArgumentError.value(i, "i", "Must be positive.");

  // ...
}

Right now, you use a string literal, which is brittle in the face of typos and parameter renames. C# 6.0 added a nameof() notation to extract the name of the given identifier as a string:

sqrt(int i) {
  if (i < 0) throw ArgumentError.value(i, nameof(i), "Must be positive.");

  // ...
}

You can also use it to capture other named elements like classes and maybe members. It is unfortunate that the syntax is much longer than the string literal.

A C programmer would say this is a good place to use the preprocessor and the stringizing operator.

@kevmoo
Copy link
Member

kevmoo commented Apr 23, 2019

Another vote for "no" – worried that we'd lose the ability to optimize this in many contexts if we need to be able to query the value.

Revisit if there's a screaming user requirement.

@lrhn
Copy link
Member

lrhn commented Apr 24, 2019

Also "no" from me.

It smells like reflection, and it's at a local variable level, where we never did reflection before.

Giving that ability would mean that we are forced to delay evaluation of late x = 3 + 3; because the specification says that the initializer is evaluated on the first access. The compiler can totally recognize that the initializer is constant and convert it to var x = 6; with no late-ness overhead. I guess that's essentially Kevin's optimization concern again.

And if our experience with allowing you to query a parameter for being passed or not was anything to go by, this ability might cause more problems than it solves. We will likely see late bool x; used as a tri-value state (with isAssigned(x) == false being the third), rather than just doing the proper thing.
It smells just like a different kind of null, except that it stacks with that, so now you can have late bool? x; as a four-valued variable ("unknown", "yes", "no", "neither"?)

So, too scary for me.

(I do want a ##i operator that "symbolizes" an identifier in scope).

@munificent
Copy link
Member

It smells like reflection, and it's at a local variable level, where we never did reflection before.

It's worse than that. If it was just for locals, we could easily statically tell if the query capability was ever used and if not still optimize the lateness away when possible. But because public fields can be marked late, a modular compiler can't tell if there isn't some other code out there doing someObject.someField.isInitialized.

We will likely see late bool x; used as a tri-value state (with isAssigned(x) == false being the third)

Everyone knows the correct three states are true, false, and fileNotFound.

@leafpetersen
Copy link
Member Author

Ok, this seems like a clear no.

@leafpetersen
Copy link
Member Author

I want to re-open this at least briefly to discuss this again in light of some experience with the migration. In migrating the core libraries, we have a number of examples of code that ends up being migrated to look like the following:

E singleWhere(bool test(E element), {E orElse()?}) {
    late E result;
    bool foundMatching = false;
    for (E element in this) {
      if (test(element)) {
        if (foundMatching) {
          throw IterableElementError.tooMany();
        }
        result = element;
        foundMatching = true;
      }
    }
    if (foundMatching) return result;
    if (orElse != null) return orElse();
    throw IterableElementError.noElement();
  }

This isn't terrible, but it does have some redundancy in that you are forced to explicitly represent the state of the late variable (which must be tracked by the compiler anyway) .

Does this example cause anyone to change their mind about this?

@matanlurey
Copy link
Contributor

Isn't this just the wrong application of late? I'd thought to write it as:

E singleWhere(bool test(E element), {E orElse()?}) {
    E? result;
    for (E element in this) {
      if (test(element)) {
        if (result != null) {
          throw IterableElementError.tooMany();
        }
        result = element;
      }
    }
    if (result != null) return result;
    if (orElse != null) return orElse();
    throw IterableElementError.noElement();
}

@eernstg
Copy link
Member

eernstg commented Nov 1, 2019

The tricky bit here is that E can be nullable, which means that null cannot be used as evidence for not being initialized.

@lrhn
Copy link
Member

lrhn commented Nov 1, 2019

Agree, you would still need the extra boolean, but it's true that it's duplicate effort to have an extra "isInitialized" boolean and a late field. So:

E singleWhere(bool test(E element), {E orElse()?}) {
    E? result;
    bool foundMatching = false;
    for (E element in this) {
      if (test(element)) {
        if (foundMatching) {
          throw IterableElementError.tooMany();
        }
        result = element;
        foundMatching = true;
      }
    }
    if (foundMatching) return result as E;
    if (orElse != null) return orElse();
    throw IterableElementError.noElement();
  }

would be the corresponding non-late approach.
It contains an unnecessary E cast instead of an unnecessary late-init-check. Not sure which one is cheaper.

Another approach is to do it with two loops:

E singleWhere(bool test(E element), {E orElse()?}) {
  var it = this.iterator;
  while (it.moveNext()) {
    E result = it.current;
    if (test(result)) {
      while (it.moveNext()) {
        if (test(it.current)) {
          throw IterableElementError.tooMany();
        }
      }
      return result;
    }
    if (orElse != null) return orElse();
    throw IterableElementError.noElement();
  }

We are storing a state into a variable instead of keeping it in the control flow. The value of result depends on that state, but we don't have dependent types (and we don't want to allocate an Option like value to combine the state and value).

@eernstg
Copy link
Member

eernstg commented Nov 4, 2019

It would be convenient to be able to use the data that the implementation must maintain for a late variable without initializer:

E singleWhere(bool test(E element), {E orElse()?}) {
    late E result;
    for (E element in this) {
      if (test(element)) {
        if (result.isInitialized) {
          throw IterableElementError.tooMany();
        }
        result = element;
      }
    }
    if (result.isInitialized) return result;
    if (orElse != null) return orElse();
    throw IterableElementError.noElement();
  }

This code is nicer than the code that we'd use today:

We avoid allocating the extra variable bool foundMatching = false; and we avoid maintaining it along the way. We would also have the added benefit that a lint could be raised at if (result.isInitialized) if static analysis could prove that result.isInitialized is always true at that point, or always false. So it is tempting to consider a mechanism like this.

Given that it is semantically quite different from a member access, we might prefer a specialized syntax for it. However, no specialized operators seem to work really well, and we could also treat isInitialized as a "magic" language-defined extension getter which is only available on a late variable with no initializer, and only when no other member with that basename is available (that is, no instance members and no extension members).

This would also make it non-breaking, and we could add it at any point in the future where we have the resources to do it.

PS: I don't think it smells like reflection, it smells much more like using a resource which is guaranteed by the language semantics to be available anyway, and it's definitely not costly in a way that resembles reflection, so why not. ;-)

@lrhn
Copy link
Member

lrhn commented Nov 4, 2019

My gut feeling is that we are breaking an abstraction, and that is always bad. In the long run it may cause us as much anguish as the "is optional parameter passed or not" query would have.

If a late variable needs to be queried, it's no longer just a late variable, it's an optional variable. Use a nullable variable if the type is not nullable, or an Option class if it is, or use your program state, or do something else to know that it has been initialized. Don't try to fit a square peg in a round hole. This is not what late variables are for.

@eernstg
Copy link
Member

eernstg commented Nov 5, 2019

breaking an abstraction

Why wouldn't it be just as reasonable to consider a late variable along with its magic isInitialized getter to be an abstraction? Part of the motivation is exactly that we can use isInitialized without thinking about whether it's implemented via a special value (a variable whose type is non-nullable can use null) or a separate boolean variable (when the type is potentially nullable).

The x? query on an optional parameter was different in that there is no guarantee that a reasonable implementation of the rest of Dart will provide support for determining whether that parameter was passed on omitted, and there was no appetite for investing a non-trivial amount of overhead into supporting it (e.g., by having an extra boolean named parameter for each named parameter).

With a late variable with no initializer the language does promise to be able to determine dynamically whether an initialization has taken place (such that we can throw if we're reading it too early, and we can throw if a final late is "initialized twice"), so we can support isInitialized without incurring an overhead. It's actually a manually re-implemented boolean variable like foundMatch that amounts to an overhead, both in that it takes space to allocate and time to update, and also in that it could be maintained incorrectly—so we might as well use an abstraction to ensure correctness and save the time & space. ;-)

@munificent
Copy link
Member

munificent commented Nov 5, 2019

It does feel weird that the runtime has to track some state that the user can't access. At the same time, I think using late should be an implementation detail of a library. If we allow any user that has access to the name of some member to also access the magic "check late bit" API, then the fact that the member uses late is now part of its API:

// foo.dart
late int i;

// main.dart
import 'foo.dart';
main() {
  print(i.isInitialized);
}

Here, if the maintainer of "foo.dart" removes late from i, they will break "main.dart". The nice thing about the old ? operator for parameters was that it was at least implicitly encapsulated because it only applied to parameters, which naturally have local scope.

We could say that you can only use the magic "is initialized" API inside the library where the member is declared, but in that case, it pushes even more towards having a different syntax instead of it looking like a getter.

But perhaps the simpler way to think about this problem is: If you don't like that the runtime has a bit you can't access, instead of adding a way to access it, you can just choose to not use late for that case and get rid of the hidden bit entirely.

@leafpetersen
Copy link
Member Author

Thinking about this, an additional concern is that this potentially forces the compiler to maintain the "isInitialized" state even when it is not otherwise needed (especially in a modular setting). For example, I expect that in many cases the compiler will be able to prove that a late field is initialized before it is ever accessed, in which case it should just implement it as a normal field. If there is a possibility that some other part of the program will query the 'initialized' status of that field however, it becomes harder to eliminate that state.

@eernstg
Copy link
Member

eernstg commented Nov 6, 2019

@leafpetersen wrote:

prove that a late field is initialized before it is ever accessed

How would we prove statically that a field with no initializer is always initialized before it is accessed, if that isn't because it always gets initialized in a constructor initializer list? In that case I don't think there is any reason for it to be late.

@tatumizer wrote:

isInitialized call can be supported for any field,

If anybody is worried that this could be confusing, we can always lint v.isInitialized when it is trivial (e.g., when v is not late, or when v is a local variable which is 'definitely assigned' or 'definitely not assigned' at the point where v.isInitialized occurs).

@lrhn
Copy link
Member

lrhn commented Nov 6, 2019 via email

@eernstg
Copy link
Member

eernstg commented Nov 6, 2019

If you can see that this is not leaked before the late field is
initialized in the body of the constructor, then it can also be optimized.

Sure, if you're doing it really, really early in that constructor body. ;-)

class A {
  late int i, j;
  A() {
    i = 1; j = 2;
  }
}

class B extends A {
  set i(value) {
    print(super.i);
    print(j);
    super.i = value;
  }
}

main() => B();

(The tools don't currently implement enough to handle this code, but dart --enable-experiment=non-nullable runs and prints 'null' twice; it should throw instead when the uninitialized storage location is read by the A.i getter.)

@leafpetersen
Copy link
Member Author

From discussion this morning, we will not provide support for this for now. There are too many concerns about this becoming part of the API, and about implications for optimizability, as well has how to present this nicely to the user. Adding an "external" extension method would be one approach, but this is unappealing, since without extensive custom static error checking, the method would be applicable in nonsensical places.

If we choose to do this in the future, there is an expressed preference to instead make this state available via a "pseudo variable/field" in the syntax, something along the lines of:

  late (int x, bool isInitialized);

or perhaps

  late int x {isInitialized};

@Levi-Lesches
Copy link

I know this was closed a while ago, but don't the issues described go away if the feature is implemented like that?

  1. If the library author doesn't want isInitialized to become a part of the API, then they can just declare it as _isInitialized.
  2. As far as optimization goes, I don't see how this would be any different than a regular bool field. And if declaring isInitialized becomes optional in the first place (which the second example seems to imply), then the library author would have to explicitly declare that the value is more important than optimizing it away.
  3. And again, the issues of how the variable can be accessed outside the class become irrelevant if isInitialized is treated as just another variable/field. The only issue I can find with this is that it adds a non-final bool field to the class (which is clearly intended), which might trip up const constructors and the @immutable annotation, but again, that's how classes with non-final fields behave anyway.

I find myself writing the code with a separate bool value that I have to maintain quite often, and if the runtime can do that for me then that's great.

@lrhn
Copy link
Member

lrhn commented Jun 12, 2020

I do think the API based problems go away if we, say, allow late{_initX} int x; to be queried for initialization as _initX.
It doesn't remove the the optimization based problems, and I personally don't think the feature would carry its own complexity weight. It's not that important, but we'd introduce a completely new syntax which only works to expose a bool getter.

I think it's actually better, on average, to not do anything here.

Now, if we had a different syntax for getter/setters, maybe one where you could encapsulate a field or other propertie, then it could fit in more easily.
Say:

int x { // declare setter and getter with same name and type easily.
  get => _x;
  set => _x = it;
}
late int y { // Use *same kind* of block syntax for init property of late variable.
  init _init;
}

Then the syntax would not be as foreign.

Even then I still think exposing initialization is a breach of abstraction and will prevent perfectly good optimizations, and that's reason enough to not do it.

@Hixie
Copy link

Hixie commented Aug 19, 2020

FWIW I ran into sort of wanting this just now. I don't think we should have a way to actually do it.

What I had was a field that I'd only set once, to a non-null value, and later I would either use it if it was non-null or else not use it. I could express this as a nullable mutable field, but the mutability doesn't really convey that it should only be set once. I could use a late final nullable field, but then I'd have to explicitly set it to null in the constructor, which is awkward. Maybe that's the right answer though.

@lrhn
Copy link
Member

lrhn commented Aug 20, 2020

We don't have a way to represent "initialized to null and then set once to non-null" (would that be "single promotion" rather than "single assignment"?). That's just a nullable variable.

Using late final means that it's either uninitialized or initialized, but we have no way to tell which one from the outside. You can't initalize that in a constructor, and still be able to set it to non-null later.

@Hixie
Copy link

Hixie commented Aug 20, 2020

In the case I had, if it was ever initialized, it would be done in the constructor. I worked around it by adding an "else" branch that initialized it to null explicitly. It's a bit ugly but it works with late final.

@xuanswe
Copy link

xuanswe commented Jan 5, 2021

We don't have a way to represent "initialized to null and then set once to non-null" (would that be "single promotion" rather than "single assignment"?). That's just a nullable variable.

Using late final means that it's either uninitialized or initialized, but we have no way to tell which one from the outside. You can't initalize that in a constructor, and still be able to set it to non-null later.

I also see that late final T? t; is useful.
It's ok to change from late T t to T? t. But it's not the case when we need final.

@om-ha
Copy link

om-ha commented Oct 22, 2021

Some tips I came up with from advice of different dart maintainers, and my self-analysis:

late usage tips:

  • Do not use late modifier on variables if you are going to check them for initialization later.
  • Do not use late modifier for public-facing variables, only for private variables (prefixed with _). Responsibility of initialization should not be delegated to API users. EDIT: as lrhn mentioned, this rule makes sense for late final variables only with no initializer expression, they should not be public. Otherwise there are valid use cases for exposing late variables. Please see his descriptive comment!
  • Do make sure to initialize late variables in all constructors, exiting and emerging ones.
  • Do be cautious when initializing a late variable inside unreachable code scenarios. Examples:
    • late variable initialized in if clause but there's no initialization in else, and vice-versa.
    • Some control-flow short-circuit/early-exit preventing execution to reach the line where late variable is initialized.

Please point out any errors/additions to this.

Enjoy!

Sources:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
nnbd NNBD related issues
Projects
None yet
Development

No branches or pull requests

10 participants