A fast and safe List.cast #1189

Hixie · 2020-08-28T20:51:39Z

A pattern we see in some of Flutter's hottest code paths is:

   List<Foo> algorithm(List<Foo> oldList) {
     List<Foo?> newList = List<Foo?>.generate(oldList.length, null, growable: false);
     // fill in newList using oldList
     ...
     // the algorithm guarantees that newList has no nulls in it
     return newList.cast<Foo>();
   }

The cast introduces a hidden expense: every access to the new list in the future checks for nulls.

Right now we're planning on working around it like this:

   class _NullFoo extends Foo {
     static Foo instance = _NullFoo();
   }

   List<Foo> algorithm(List<Foo> oldList) {
     List<Foo> newList = List<Foo?>.generate(oldList.length, _NullFoo.instance, growable: false);
     // fill in newList using oldList
     ...
     // the algorithm guarantees that newList has no instances of _NullFoo in it
     return newList;
   }

...but it's awkward to do this and doesn't really reflect the desire, which is for an equivalent of "late".

Proposal: List could have a new method, "castAndClear" or "destructiveCast" or some such, which returns a new List cast as specified, but using the backing store of the old list, and neutering the old list (so it doesn't provide a back door to breaking the type system).

leafpetersen · 2020-08-28T21:04:02Z

cc @alexmarkov @mraleph @a-siva for thoughts on possible approaches using targeted optimizations to make an existing pattern fast.

cc @lrhn @natebosch @jakemac53 @zichangg for thoughts on the library API

natebosch · 2020-08-28T21:20:31Z

I can't think of a reasonable way to support this on user defined List implementations.

Hixie · 2020-08-28T21:26:22Z

user-defined list implementations aren't going to have good performance anyway, so their implementation can be:

List<R> castAndClear<R>() {
  List<R> result = cast<R>().toList();
  clear();
  return result;
}

lrhn · 2020-08-28T21:40:50Z

From a library API perspective, it looks like you want a "ListBuilder" for a List<Foo> where some slots can be uninitialized during construction.
You want this more efficiently than copying all the elements to a new list. Otherwise it would be easy, you would just eagerly create a new list: var newList = List<T>.from(list, growable: false);. That checks each element exactly once, and anything less than that is not going to be sound.

We can probably create such a builder class. It would basically be a List<T?> wrapped in some builder API, and then have a destructive getList which checks that the internal list contains only T elements, then overwrites the internal list's type parameter (using platform magic) and releases it, while making the builder become inert.
The builder would probably throw when reading an uninitialized (null) slot of the underlying list in any way, so that would do checks on every read instead. If you don't need to read until you're done building, it shouldn't cost extra.

I don't think we can do something directly on a fixed-length list. The VM implementation implement those as a single object, and we will not change the type parameters of an object that others might have a reference to. Especially if we get any kind of variance control, that'll be unsound. With unsound covariant generics, it's more of an annoyance issue if someone else can change the type of a list that you created.

Growable lists are easy on the VM. There you can create a new growable-list wrapper for the internal fixed-length list backing store, and change the type parameter of the internal backing store if necessary, because nobody else has access to it.
Probably not as easy in JavaScript, though.

Hixie · 2020-08-28T21:57:32Z

In the updateChildren and mount cases, we only ever write to the list, which would be consistent with that design. I'm not sure whether that's always the case though. In general I think it's simpler if we have fewer APIs; adding a list builder API when the list API works fine would be unfortunate IMHO.

Hixie · 2020-08-28T21:58:50Z

(i should say... the benefits here pale in comparison to what we could get with the ability to seal a list, dart-lang/sdk#27755)

jakemac53 · 2020-08-29T04:09:39Z

Do you fill in the list in sequence and just want to allocate it at a predefined size, or do you arbitrarily assign entries at any index (while earlier indexes might still be null)?

If the former, I could imagine an api that just allows you to pre-allocate the list at a certain size, but doesn't set its length to that size. The length would grow normally as you add values, and accessing items outside of the length would result in out of range errors. That would allow you to not have to continually grow the list when you know its size ahead of time, while still typing the generic as non-nullable from the start.

rakudrama · 2020-08-29T05:38:21Z

Lists with non-null elements are definitely more painful to use. I'd like to see this problem improved.

I think in most cases a cast<T>() is worse that copying, especially copying a temporary list that becomes free.
I would recommend just copying the list. The temporary list is a list-builder and it will be GC-ed soon.

It looks like mount could be written with List.generate.
It would be worth asking the VM team to check if after this optimization whether AOT fully inlines the closure and does scalar replacement on the context to generate the desired code.
It looks promising that this might currently be the best way to generate a fixed length list like this.

Inlining of List.generate and its closure argument is an example of how the standard library can be made more efficient without the need to add even more ways of doing things.

One of the problems of introducing a new kind of List is that it tends to degrade the performance of all code handling Lists through the disease of polymorphism. In addition to being horribly slow due to the extra checks, cast<T>() Lists are different to other lists, so increase polymorphism - another nail in the coffin of performance. I have a hypothesis that Flutter might be faster if it copied children lists just prior to assignment to reduce polymorphism. The JIT VM does a good job of discovering that only a few kinds of List flow to a particular place (e.g. field). For AOT and dart2js the problem is much harder. You can help the compiler by using this pattern:

    final List<Element> _children;
    ...
    SomeConstructor(List<Element> children)
        : _children = List.of(childen, growable: false)

There is a chance here with the direct initialization of a final field with a value directly coming from a constructor that the compiler should know (or could be taught to know) the exact kind of list stored in _children. It might seem expensive to copy the children, but it also expensive to index the children if the compiler can't tell what kind of list it is. That access cost is more smeared out in a profile and harder to identify.

I wish there were more type-safe ways to generate unmodifiable lists.

Unmodifiable lists are great for protecting algorithms from modifications and are as compact as fixed length lists. They share a lot of implementation details with const Lists. I would add one twist for unmodifable lists - that the standard ways of creating an unmodifiable list are permitted to return an existing unmodifiable list. e.g List.unmodifiable(a) may return a if it already is an unmodifiable list with the right type.

Now that we have more pain points in creating fixed length lists, please let us solve them in a way that also works for unmodifiable lists.

Imagine, if you will, that something like unmodifiable[...a, ...b] created an unmodifiable list with a single allocation.

Hixie · 2020-08-29T06:02:38Z

Do you fill in the list in sequence and just want to allocate it at a predefined size, or do you arbitrarily assign entries at any index (while earlier indexes might still be null)?

The latter, at least for the updateChildren case. For the former there's a constructor which makes this trivial.

mraleph · 2020-08-31T11:19:25Z

What we could do on the VM side (and I think on dart2js side) is to add a recognition of the following pattern: given List<Foo?> newList = /* list allocation */ and List<T>.from(newList, growable: false) being the last use of newList we could emit a special code which verifies that newList does not contain null - after which we "transmute" it in place and return it as a result of List.from instead of allocating a new list.

This optimisation is rather trivial to implement but it has some degree of fragility - e.g. it depends on escape analysis which might fail for various reasons silently. So the pattern can go from 0 allocation to list allocation rather easily. One way to address this is to have a special method for this sort of "transmutation" and mandate that compilers issue warnings/errors when they can't perform a 0 cost transmutation.

Notice that the only way to satisfy type system here is to perform a scan over the whole list to verify that the list does not contain null values. I am looking at this code which builds list from both ends - meaning that traditional approach with a unidirectional builder pattern (building list from 0 to length one element at a time), does not work for you. Would additional scan over a list reflect on performance? How performance critical this place really is? What happens if you add a list copy in this method (copying List<T?> into List<T>), do any benchmarks actually regress?

A more provocative question here would be: is it really important that List<Foo> algorithm(List<Foo> oldList) returns list of non-nullable Foo? If it is just an internal implementation detail it might be okay to return List<Foo?>. I understand that it might be undesirable from API documentation point, but if you weight it against doing tricks to satisfy NNBD - maybe it is just simpler to return List<Foo?>.

Finally, I would like to highlight that workaround _NullFoo in general might have some non-local effects, especially if Foo originally was a leaf (did not have subclasses or implementors) - because it expand the set of concrete classes which might appear in variables of type Foo from just Foo to both _NullFoo and Foo. I think we should be capable of coping with most of the impact because _NullFoo does not override anything, but at least type-checks will grow in size (previously is Foo would be a single comparison, not it has to check for range). This is less of a concern if Foo already has a lot of subclasses/implementors (which seems to be the case for your particular use cases).

Hixie · 2020-08-31T16:44:00Z

Banning nulls from these codepaths has led to some interesting features like we can finally make some of these classes have const constructors since we can remove the null-checking asserts. This will have huge performance benefits for static UIs.

If we had the ability to run non-trivial code in const expressions then this wouldn't be an issue and we could use List<Foo?> but today that's not an option, there's no non-const code between the public API and the internal storage.

Obviously any additional scan is O(N) work. This is an O(N) algorithm so we're just increasing the constant factor. It's pretty performance-critical though. If this was Rust I'd mark this code as "unsafe" and go crazy.

jakemac53 · 2020-08-31T17:12:04Z

+1 for the ListBuilder type approaches suggested. To get really concrete the api could be probably as simple as this:

/// [T] here is the desired eventual type of the generic for the list produced by `build`.
abstract class ListBuilder<T extends Object> {
  int length;
  
  // If [length] is provided, then this produces a fixed length list.
  ListBuilder({this.length = 0});

  /// Returns a non-nullable list, and invalidates this object for any future use.
  /// Asserts that all entries are non-null.
  /// O(n) for the null checks, but avoids copies when possible on the platform.
  List<T> build();

  /// Only allow assigning objects of type T.
  void operator []= (int index, T object);
  
  /// Potentially we could avoid even defining this? Probably useful for some algorithms though.
  T? operator[](int index);
}

This should be feasible to implement on all major platforms efficiently.

mraleph · 2020-09-01T07:32:21Z

It's pretty performance-critical though.

If there are any benchmarks that demonstrate this it would be worth mentioning them here for posterity - so that we for example can refer to them if we were to prototype some solutions.

Hixie · 2020-09-01T07:54:11Z

Most of the benchmarks that involve layout (so e.g. those that scroll or do transitions) should be exercising this code.

lrhn · 2020-09-01T07:55:48Z

There is also the option, a little far out perhaps, to make all List implementations "late non-nullable":

Always initialize list slots to null.
Throw if reading a slot which is still null while the list is non-nullable.

That makes every list a list builder. You create it in uninitialized state, write to every slot, then you can pass it on without worrying about anyone getting errors.

We could potentially detect that every slots has been filled, and then transition the object to an implementation which doesn't check. It would mean a non-null check on every read, or (more) polymorphism of lists.

It would avoid having a ListBuilder<T> which will probably implement List<T> anyway, so it is a List (because the List interface is intended for building lists), but where we have some small tweak, like the toList being destructive, or adding one extra method doing the same thing.

alexmarkov · 2020-09-01T17:34:52Z

I think destructive cast / move constructor is by far the simplest solution for this problem, still having only 1 object allocation + O(n) pass extra overhead which is hard to beat. Ordinary growable List<T?> could serve as a list builder, and move constructor (maybe something like List<T>.moveFrom(List elements, {bool growable = true})) can verify elements and steal backing storage from the source growable array, either wrapping it into another growable list instance or morphing it into a fixed-size list of appropriate element type (it is safe to do as backing storage is not exposed out of growable list). This move constructor should only accept standard growable lists as a source to guarantee that it doesn't have any hidden overheads. Source growable list is cleared as a result.

Of course, we can also introduce specialized NonNullableListBuilder, but it would have similar performance characteristics and it shouldn't implement List for efficiency (to avoid extra polymorphism). I think it would be just an extra API to learn for programmers, without any benefits over simpler move constructor solution.

Re: making lists late: extra null check on every access to a list element is likely to be a significant and unacceptable performance hit.

natebosch · 2020-09-01T17:56:06Z

This move constructor should only accept standard growable lists as a source to guarantee that it doesn't have any hidden overheads.

I don't know how we can accomplish this ergonomically. This is a contract that can't be expressed in the signature, and may be something that leaks into other API boundaries in ways that also can't be expressed in the signature.

alexmarkov · 2020-09-01T18:09:13Z

@natebosch Yeah, the fact that growable vs. fixed-size list distinction is not exposed explicitly in public types is inconvenient (but also has its advantages of List APIs being simpler). This shouldn't be a big problem when building lists, as source list is likely to be allocated in the same scope.

We can describe that limitation in the constructor's documentation and enforce by throwing ArgumentError at run time, but that may not be enough. Move constructor is expected to clear the source list, so I'm not sure it even makes sense for sized-size lists. As for user-defined growable lists, we can probably support them in move constructor with less efficiency if it makes more sense from API perspective.

jakemac53 · 2020-09-01T19:02:54Z

It would avoid having a ListBuilder<T> which will probably implement List<T> anyway, so it is a List (because the List interface is intended for building lists), but where we have some small tweak, like the toList being destructive, or adding one extra method doing the same thing.

I don't think it should implement List<T> - it doesn't need the full api surface and its [] operator should return T?, imo. I also think it is worth while to distinguish between the apis so it is more explicit if you have a ListBuilder and not a List. Otherwise you could have what you think is a list, and halfway through using it you are no longer allowed to write to it because somebody else "sealed" it.

It is a pretty fundamentally different thing, imo.

There is also the option, a little far out perhaps, to make all List implementations "late non-nullable":

How is this meaningfully different from List.cast? This would be a major step in the wrong direction imo, providing significant negative value compared to the existing non-nullable list types we have today. Those would now all come with extra checks on access.

jakemac53 · 2020-09-01T19:12:34Z

As far as the move constructor I think that is also a good idea, except as @natebosch brought up the problem with not being able to enforce the normal growable list constraint.

The constructor should still work I think regardless of the original list type and just fall back on slower behavior if it can't be optimized based on that. Note that even a ListBuilder type solution would have similar issues - not being optimizable for custom ListBuilder types (unless we disallowed those).

rakudrama · 2020-09-01T19:49:29Z

It is not possible to implement a moveFrom constructor in JavaScript such that it (1) avoids copying by stealing and (2) vacates the original list. To avoid copying, the input and output must be the same JavaScript Array object.

It seems hard to write a builder that would be better than a List. As @lrhn says, the List API already is a List builder.
A new builder is just moving the problem into the Builder class.
Lists already have lots of special optimizations that won't work as well on a Builder, for example, bounds check elimination.

A small library change we could make for using List as a builder would be to have some constructors take an initial capacity to avoid growing when adding.

  final List<Element> children = List.empty(capacity: widget.children.length);
  ...
  for ... {
    ...
    children.add(newChild);
    ...
  }
  _children = List.of(children, growable: false);  // or List.unmodifiable

This would not help dart2js but might help the VM avoid intermediate allocations, and leave an internal _List of exactly the right size to recycle.

jakemac53 · 2020-09-01T20:02:34Z

It is not possible to implement a moveFrom constructor in JavaScript such that it (1) avoids copying by stealing and (2) vacates the original list. To avoid copying, the input and output must be the same JavaScript Array object.

It seems hard to write a builder that would be better than a List. As @lrhn says, the List API already is a List builder.
A new builder is just moving the problem into the Builder class.

Wouldn't the ListBuilder class fix this problem? You can have an internal JS Array object which you build up, and then you can safely use that as the backing storage for the produced List (after checking it for compatibility with the desired type). You don't have to worry about other outside references to the list.

rakudrama · 2020-09-01T23:10:28Z

The title of this issue is 'A fast and safe List.cast'.
Copying a list is the 'safe and fast List.cast'.
Issue closed? 😉

Seriously, use copying where necessary to avoid List.cast.
Lets restart the conversation with profiling data showing which list copying needs attention. I believe we might be better off working on higher impact improvements (I will detail one below), but only concrete profiling data will tell.

If list copying is a bottleneck and we decide some clever optimizations are the solution, I want to design something that works equally reliably for dart2js and AOT.

mount can be written using a growable list, List.add and List.of; or List.generate as I mentioned yesterday.

updateChildren It is probably easiest to use a temporary List<Element?>, but it could be adjusted as follows:

If there are no oldChildren, run a simplified algorithm that creates the first child before allocating the List.filled(length, firstNewChild, growable: false), and then fill in the rest. If this is common, having the streamlined loop uncluttered with the general case might pay for itself in performance.
This is an example from the SDK of the pattern of getting the first element before allocating.

If there are some elements in oldChildren, use List.filled(newWidgets.length, oldChildren.first, growable: false).

Looking at the code, I believe that the list copying cost will be swamped by the other costs, and the code could be sped up in other places to more than compensate. Consider mount

  void mount(Element parent, dynamic newSlot) {
    super.mount(parent, newSlot);
    _children = List<Element>(widget.children.length);
    Element previousChild;
    for (int i = 0; i < _children.length; i += 1) {
      final Element newChild = inflateWidget(widget.children[i], IndexedSlot<Element>(i, previousChild));
      _children[i] = newChild;
      previousChild = newChild;
    }
  }

The innocent-looking widget.children[i] is calling this.widget. This is one of any number of get:widget methods in the subclasses, so it can't be inlined. Each of the targets looks like this:

  Viewport get widget => super.widget as Viewport;

There is a chain of super calls

  MultiChildRenderObjectWidget get widget => super.widget as MultiChildRenderObjectWidget;
  RenderObjectWidget get widget => super.widget as RenderObjectWidget;

and finally

abstract class Element extends DiagnosticableTree implements BuildContext {
  ...
  Widget get widget => _widget;
  Widget _widget;

This cost can be reduced from N+1 calls to one call by using a local variable:

  final widgetChildren = widget.children;
  ...
     widgetChildren[i]

dnfield · 2020-09-02T06:20:52Z

This cost can be reduced from N+1 calls to one call by using a local variable:

Why isn't the compiler just doing that for us already? Shouldn't it see repeated access in a loop that doesn't modify the accessor and just throw that into a single accessor that's cached for the loop?

dnfield · 2020-09-02T06:23:23Z

Similar question if the compiler sees that I've created a nullable list that ends up filled completely with non-nulls, and then I cast it to non-null.

Why can't the compiler just optimize my code so that it works out fine? Why do I need to coax it by filling a list with a dummy non-null sentinel (like the first item in the list in @rakudrama's example, or the private sentinal in @Hixie's)?

lrhn · 2020-09-02T07:32:15Z

@dnfield
The compiler can't see, for certain, that the call to widget will always go to a the same field. The widget access is virtual, so it could be overridden in a subclass.
A whole-program analysis may be able to recognize that no subclass exists, or overrides the widget getter, but apparently that isn't true here (which is why @rakudrama said that the getter cannot be inlined).

Similarly, a compiler really has no chance figuring out that a nullable list is going to be completely filled by non-nulls, not unless it's being filled using a particularly trivial scheme (like for (var i = 0; i < list.length; i++) list[i] = nonNullValue;). In such trivial cases you could usually also just generate the list using [for (var i = 0; i < length; i++) nonNullValue] or List.generate anyway.
The compiler can't guess. It has to be absolute sure that the list is really filled with non-null values before it can optimize away the checks, and that kind of knowledge is incredibly hard to deduce at compile-time.

mraleph · 2020-09-02T07:39:06Z

FWIW at least on hello_world Dart VM AOT compiler manager to fully inline this particular getter chain (it seems there are no overrides in subclasses on such a trival app). It still can't hoist it out because there is a call inside the loop which is not inlined - and thus can mutate the widget field. (our global analysis does not compute field-write impacts, so we can't figure out that inflateWidget can't change widget field).

dnfield · 2020-09-02T18:31:34Z

@lrhn - within the lexical scope, how could the widget access change?

IOW, why can't the virtual call just be made once and then cached for that lexical scope? This should work regardless of whether there are any subclasses or not, right?

For the list case, the more I think about it, the more I suspect what we want is an ImmutableList that can be pre-allocated and written once to during a build. I think there's probably some way to statically analyze when a user uses a list that way, but I also suspect that it would be expensive in developer time and confusing for users when their list suddnely stopped performing as nicely because they altered some code (rather than having an ImmutableList type that would just not allow them to do the wrong thing).

jakemac53 · 2020-09-02T21:04:50Z

IOW, why can't the virtual call just be made once and then cached for that lexical scope? This should work regardless of whether there are any subclasses or not, right?

See this gist as an example, widget in this scope could be coming from a subclass and have arbitrary behavior.

dnfield · 2020-09-02T21:07:22Z

Ohhh. I forget that I think of getters as things that should be side-effect free. That's not necessarily the case though. It's all making sense now.

leafpetersen added the nnbd NNBD related issues label Aug 28, 2020

mraleph mentioned this issue Sep 2, 2020

remove redundant casts in expressions like (x as A) as B where B is a subtype of A dart-lang/sdk#43296

Open

rakudrama mentioned this issue Aug 17, 2021

Sealing or watching List objects, or copy-on-write List objects dart-lang/sdk#27755

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A fast and safe List.cast #1189

A fast and safe List.cast #1189

Hixie commented Aug 28, 2020

leafpetersen commented Aug 28, 2020

natebosch commented Aug 28, 2020

Hixie commented Aug 28, 2020 •

edited

Loading

lrhn commented Aug 28, 2020 •

edited

Loading

Hixie commented Aug 28, 2020

Hixie commented Aug 28, 2020

jakemac53 commented Aug 29, 2020 •

edited

Loading

rakudrama commented Aug 29, 2020

Hixie commented Aug 29, 2020

mraleph commented Aug 31, 2020

Hixie commented Aug 31, 2020

jakemac53 commented Aug 31, 2020 •

edited

Loading

mraleph commented Sep 1, 2020

Hixie commented Sep 1, 2020

lrhn commented Sep 1, 2020

alexmarkov commented Sep 1, 2020

natebosch commented Sep 1, 2020

alexmarkov commented Sep 1, 2020

jakemac53 commented Sep 1, 2020

jakemac53 commented Sep 1, 2020

rakudrama commented Sep 1, 2020 •

edited

Loading

jakemac53 commented Sep 1, 2020 •

edited

Loading

rakudrama commented Sep 1, 2020

dnfield commented Sep 2, 2020

dnfield commented Sep 2, 2020

lrhn commented Sep 2, 2020

mraleph commented Sep 2, 2020

dnfield commented Sep 2, 2020

jakemac53 commented Sep 2, 2020

dnfield commented Sep 2, 2020

A fast and safe List.cast #1189

A fast and safe List.cast #1189

Comments

Hixie commented Aug 28, 2020

leafpetersen commented Aug 28, 2020

natebosch commented Aug 28, 2020

Hixie commented Aug 28, 2020 • edited Loading

lrhn commented Aug 28, 2020 • edited Loading

Hixie commented Aug 28, 2020

Hixie commented Aug 28, 2020

jakemac53 commented Aug 29, 2020 • edited Loading

rakudrama commented Aug 29, 2020

Hixie commented Aug 29, 2020

mraleph commented Aug 31, 2020

Hixie commented Aug 31, 2020

jakemac53 commented Aug 31, 2020 • edited Loading

mraleph commented Sep 1, 2020

Hixie commented Sep 1, 2020

lrhn commented Sep 1, 2020

alexmarkov commented Sep 1, 2020

natebosch commented Sep 1, 2020

alexmarkov commented Sep 1, 2020

jakemac53 commented Sep 1, 2020

jakemac53 commented Sep 1, 2020

rakudrama commented Sep 1, 2020 • edited Loading

jakemac53 commented Sep 1, 2020 • edited Loading

rakudrama commented Sep 1, 2020

dnfield commented Sep 2, 2020

dnfield commented Sep 2, 2020

lrhn commented Sep 2, 2020

mraleph commented Sep 2, 2020

dnfield commented Sep 2, 2020

jakemac53 commented Sep 2, 2020

dnfield commented Sep 2, 2020

Hixie commented Aug 28, 2020 •

edited

Loading

lrhn commented Aug 28, 2020 •

edited

Loading

jakemac53 commented Aug 29, 2020 •

edited

Loading

jakemac53 commented Aug 31, 2020 •

edited

Loading

rakudrama commented Sep 1, 2020 •

edited

Loading

jakemac53 commented Sep 1, 2020 •

edited

Loading