Should ToArray methods be allowed to return Array.Empty<T>()? #14848

stephentoub · 2015-07-15T13:22:51Z

We've been trying to use Array.Empty<T> in most places where we would otherwise construct a new T[0]. Some places we've explicitly tried to stay away from changing are ToArray methods, in particular because of an assumption that there's an expection that these methods always produce a new array, e.g. with MSDN documentation that states things like "Creates an array", "Copies the elements to a new array", etc.

The question is: should we relax that and instead use Array.Empty<T> in ToArray methods as well?

Several months back, before the corefx libraries were on GitHub, I'd done a sweep through them doing hundreds of empty array replacements. In general I stayed away from changing anything that looked like it was risky, e.g. ToArray methods, places where the arrays were obviously used for reference equality, etc. I apparently was a little over aggressive with regards to my own guidelines, and ended up changing some of the non-generic collection's ToArray methods to use Array.Empty, though.

Personally I'm tempted to say that ToArray methods should be able to use Array.Empty, and that we should update the generic collections and LINQ to do so, but that since the non-generic collections exist primarily for compatibility reasons, we should revert the changes to those ToArray methods.

Thoughts?

The text was updated successfully, but these errors were encountered:

JonHanna · 2015-07-15T14:06:00Z

I was half-way to writing this exact issue last night when a crying baby took priority, and I say yes.

I don't think reference uniqueness is something that should be inferred from any API method's result unless:

The object is mutable.
The documentation explicitly says that it is unique (rather than just language that implies it like "creates").

In any case of an immutable reference type, I think using a pre-existing object is not just a reasonable optimisation to take, but a reasonable optimisation to expect that other code may or may not take. (And by extension therefore, it is unreasonable to expect the same policy from version to version and build a dependency on it).

I'd note that there have been some changes about just what returns string.Empty over past frameworks, and while they seem to have confused some people in a general "why is that?" level, not stymied any.

JohanLarsson · 2015-07-15T14:55:16Z

I vote for performance and updated documentation, starve the GC.

JonHanna · 2015-07-15T15:29:35Z

cc @James-Ko since they've done the work agreement here would allow for.

rickbrew · 2015-07-15T17:46:59Z

I'm highly in favor of doing this. I've always needed to have a ToArrayEx extension method (lovely name, right?) that does this (it does a few other things too). You'd be surprised how many 0-length arrays are lying around if you don't do anything to consolidate them.

In particular it makes it much cheaper to have something like an immutable tree data structure where each node can have a list of attributes or children, but where many nodes have 0 of those. Instead of scattering null checks everywhere (or some other non-terse pattern), just have a constructor that takes your usual IEnumerable and call ToArray() to copy it. So it's good for performance as well as readability and maintainability. And on a mobile platforms you'll have fewer objects and smaller code size, which is really important.

By baking this into a core extension method, it will propagate and encourage good habits and performance. If someone really needs unique object identity then they can do their own new T[0] or write their own ToUniqueArray() extension method.

It may also be worth taking this pattern further and ensuring that methods such as AsReadOnly() employ a cached 0-length ReadOnlyCollection.

I'd also like to say that this seems like a very good place to test out a policy of "if no one screams then ship it." This is a trivial change to make, and also a trivial one to undo if it really becomes necessary.

justinvp · 2015-07-15T17:49:28Z

👍

jamesqo · 2015-07-15T21:25:46Z

I do think that method names like ToArray seem to imply that you are creating a new array, but the average person probably wouldn't test for this in code. I rarely call methods like ToArray, ToList, etc., and when I do, I call them only once (no comparisons between two results of ToArray). Also, there doesn't seem to be a good reason to test for reference equality between arrays- they're just containers for other objects. I don't see a practical place where this could happen.

Edit: If this passes, I'm wondering if the same logic could be applied to other To... methods in Enumerable. Should we check for

if (source is T[]) return (T[])source;

in ToArray, and similarly in ToList, to avoid creating a new array if it already is one? To my knowledge lists don't have any side-effects when you enumerate through them, so I'm wondering if that works as well. Obviously this might be less sound than returning Array.Empty for empty arrays, but I still haven't seen two arrays being tested for reference equality.

bbowyersmyth · 2015-07-15T22:13:27Z

I'm in favour of ToArray returning a cached empty array. I would shy away from making any behaviour changes to the legacy collection types though. Anyone still using them likely has compatibility high on their agenda.

rickbrew · 2015-07-15T22:42:56Z

@James-Ko, you can't just return the same array back to the caller if it has 1+ items in it because one of the most important use cases for ToArray() and ToList() is specifically making a copy so you can make changes without modifying the original. You can't change the length of an array, and there's no items you can modify within a zero-length array, therefore all zero-length arrays are interchangeable. Zero-count Lists aren't interchangeable because you can modify them, e.g. by calling Add().

What you're thinking of could be articulated as something like AsArrayOrToArray(): if the input is an array, then give it back to me, otherwise call ToArray and give me its result. It's certainly easy enough to implement but I'm not sure it needs to be in CoreFX because the return value creates two drastically different outcomes depending on what the input is, and it's not clear which one you've got just by inspecting the code locally ("did I get a copy or was this basically just a cast?"). It is a useful method (I have it in my code!) but you do have to be careful and intentional about using it.

JonHanna · 2015-07-16T09:11:37Z

Should we check for
if (source is T[]) return (T[])source;
in ToArray, and similarly in ToList,

Definitely not. The optimisation of replacing a new T[0] with Array.Empty<T>() is one of the general class of aliasing different cases of equivalent immutable types together.

Non-empty arrays, lists, and most other collections are not immutable, so such a use would always be a bug. (Internal ToLists can benefit from such code but they put the onus on the calling code not to mutate which can't be enforced by a public API and reduces usefulness if it could).

More generally, it's worth noting that replacing new T[0] with Array.Empty<T>() is not really an optimisation at all; it's almost always a pure improvement because it has up-sides (faster, less memory use) but no down-side (assuming we do agree that expecting reference difference isn't reasonable).

There are some cases though where the use is more of a question of optimisation for one quality over another, or for performance in one case over another.

An optimisation would be one that introduced a _size == 0 case where one had not existed before; costing non-empty cases a slight penalty to gain empty-cases a benefit. (Conversely, you could remove the _size == 0 case from Queue<T>.ToArray() for the opposite trade-off). Such introductions are less no-brainers than where _size == 0 or similar already exists, with the benefit depending not just on the relative improvement, but the relative frequency of empty to non-empty cases.

I think that ToArray() and indeed all other API methods should be free to return Array.Empty<T>() instead of T[0]. I also feel they should be free to switch from Array.Empty<T>() to T[0] unless documented specifically as returning Array.Empty<T>(). Whether they should in a given case is another matter. Queue<T>.ToArray() follows the empty test with another branch and at least one method call, perhaps two, so it's probably worth keeping the size == 0 test there. With a ToArray() that after creating the array went straight into a for loop that terminated on i == _size the trade-offs are less clear-cut.

omariom · 2015-07-16T20:07:32Z

I never assume reference equality of results of any methods I don't control.
👍 to Array.Empty

stephentoub · 2015-07-17T13:34:11Z

@terrajobst, good topic for next design meeting?

MrJul · 2015-08-30T16:45:47Z

Considering that even some string constructors return cached objects today, I think that sets a good precedence for allowing BCL methods to return cached immutable objects in general.

ReferenceEquals(new string(new char[0]), String.Empty) is true.

terrajobst · 2016-01-21T00:28:25Z

We discussed this offline and concluded that we should do the following:

We make this change to .NET Core without any quirking. Customers using a newer package will get the new behavior.
We bring this change to .NET Framework, but we’ll quirk it. Only customers targeting the latest version will get the new behavior.

joshfree assigned stephentoub Oct 12, 2015

stephentoub closed this as completed in dotnet/corefx#5562 Jan 21, 2016

msftgits transferred this issue from dotnet/corefx Jan 31, 2020

msftgits added this to the 1.0.0-rtm milestone Jan 31, 2020

ghost locked as resolved and limited conversation to collaborators Jan 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should ToArray methods be allowed to return Array.Empty<T>()? #14848

Should ToArray methods be allowed to return Array.Empty<T>()? #14848

stephentoub commented Jul 15, 2015

JonHanna commented Jul 15, 2015

JohanLarsson commented Jul 15, 2015

JonHanna commented Jul 15, 2015

rickbrew commented Jul 15, 2015

justinvp commented Jul 15, 2015

jamesqo commented Jul 15, 2015

bbowyersmyth commented Jul 15, 2015

rickbrew commented Jul 15, 2015

JonHanna commented Jul 16, 2015

omariom commented Jul 16, 2015

stephentoub commented Jul 17, 2015

MrJul commented Aug 30, 2015

terrajobst commented Jan 21, 2016

Should ToArray methods be allowed to return Array.Empty<T>()? #14848

Should ToArray methods be allowed to return Array.Empty<T>()? #14848

Comments

stephentoub commented Jul 15, 2015

JonHanna commented Jul 15, 2015

JohanLarsson commented Jul 15, 2015

JonHanna commented Jul 15, 2015

rickbrew commented Jul 15, 2015

justinvp commented Jul 15, 2015

jamesqo commented Jul 15, 2015

bbowyersmyth commented Jul 15, 2015

rickbrew commented Jul 15, 2015

JonHanna commented Jul 16, 2015

omariom commented Jul 16, 2015

stephentoub commented Jul 17, 2015

MrJul commented Aug 30, 2015

terrajobst commented Jan 21, 2016