Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

Reduce allocations in {Concat,SelectMany,Append,Prepend}.ToArray by close to 50% #14675

Merged
merged 3 commits into from
Jan 26, 2017

Conversation

jamesqo
Copy link
Contributor

@jamesqo jamesqo commented Dec 22, 2016

Background: Some ToArray implementations employ a double-buffering strategy if we don't know the exact length of the output array. First the items are copied into some sort of buffer, then they are transferred into an array. For example, in the following snippet

var array = new[] { -1, -2, -3 };
Enumerable.Range(1, 100).Concat(array.Where(i => true)).ToArray();

In ToArray, we know the size of the lhs enumerable but not how many elements are in the rhs one. So we allocate a buffer with 100+ elements to hold everything, calculate the final size of the array, and re-copy those elements.

Buffer: [1 2 3 .. 100 -1 -2 -3 0 0 .. 0]
Array: [1 2 3 .. 100 -1 -2 -3]

Description: Instead of copying everything twice, we can buffer only the enumerables we can't get the count of. If we reach an ICollection or an IIListProvider and it has N items, we queue it to be copied later, and tell the builder to leave a gap of N items in the output array. Finally, when ToArray is called, we revisit all of those queued collections and copy them directly to the output.

Here is how the above example works now:

// Stage 1: First pass
Buffer: [-1 -2 -3 0 .. 0]
Markers: [(Index: 0, Count: 100)] // During ToArray(), leave a gap of 100 items at index 0
QueuedCollections: [Enumerable.Range(1, 100)]

// Stage 2: Call ToArray()
Array: [0 0 0 .. 0 -1 -2 -3] // The first 100 slots are empty.

// Stage 3: Loop thru each of the indices in Markers and copy the corresponding
//          queued collection to that index.
Array: [1 2 3 ... 100 -1 -2 -3]

Performance results: An initial test is showing almost 50% decrease in GCs in SelectMany(e => e).ToArray(), assuming e is always an ICollection. results / test

Methods this affects: Concat().ToArray(), SelectMany().ToArray(), Append/Prepend().ToArray()

I had to pull in the changes from #13942 because this affects SelectMany, so the LOC in the diffs is exaggerated. Once/if that's merged, I'll rebase.

cc @stephentoub @VSadov @JonHanna

@jamesqo jamesqo changed the title Introduce SparseArrayBuilder to reduce some ToArray allocations in Linq. Reduce allocations in {Concat,SelectMany,Append,Prepend}.ToArray by close to 50% Dec 22, 2016
@jamesqo jamesqo changed the title Reduce allocations in {Concat,SelectMany,Append,Prepend}.ToArray by close to 50% [RFC] Reduce allocations in {Concat,SelectMany,Append,Prepend}.ToArray by close to 50% Dec 23, 2016
array[arrayIndex + i] = enumerator.Current;
}

Debug.Assert(!enumerator.MoveNext(), $"Expected length of {nameof(source)} to be exactly {count}.");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By using a for-loop we save 1 MoveNext call in Release, since we know exactly when the sequence will end. e.g. For 0-length enumerables we would normally call MoveNext once, but here we will skip the loop entirely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this requires that MoveNext() will return true exactly count times anyway, we could have a conventional foreach, leave count out of the call and ++arrayIndex rather than ++i and doing the addition.
Which of these micro-opts is the greater is hard to guess at from first principles rather than profiling though (and would depend on MoveNext() complexity).
Losing the ability to test count in debug would make some flaws harder to spot, though they should still be obvious enough to the tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JonHanna While it is true that count is only of debug value, I don't think think this function semantically makes sense without it. If you're saying it's safe to copy this enumerable to this array, then you would have to know the count of the enumerable (or at least an upper bound thereof) in advance. I doubt there will ever be a callsite for this function that could not provide a count.

Copy link
Contributor Author

@jamesqo jamesqo Jan 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have decided to remove this optimization and just go with the regular foreach, however. I don't think calling side-effecting code like MoveNext only in debug is a good idea, and who knows-- maybe some iterators/enumerators will not work the same if disposed too early.

/// <summary>
/// The index within the buffer to select.
/// </summary>
internal int Column { get; set; }
Copy link
Contributor Author

@jamesqo jamesqo Dec 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-note: This doesn't really need to be a mutable struct. Maybe it would be better to make it immutable. (edit 1 mo. later: done)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe profile that. The effect either way can be surprising sometimes (at least I find so).

Copy link
Contributor Author

@jamesqo jamesqo Dec 24, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JonHanna 👍 I doubt it will make much of a difference since this is an 8-byte struct that fits in an x64 register, and it's only being copied once per CopyTo call. Regardless, I will take your advice.

(edit 1 mo. later: I have had trouble building the repo recently and so am unable to run perf tests. I very much doubt making this struct immutable will have a noticeable impact because an extra copy is only made once per method call. Besides, the struct is only 8 bytes.)

/// </summary>
public T[] ToArray()
{
if (_count == _first.Length)
T[] array;
if (TryMove(out array))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a bit of cleanup-- added a TryMove function that attempts to get this builder as a single array without copying.

++index;
}
}
EnumerableHelpers.Copy(_source, array, index, count - 1);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some cleanup-- I stumbled across this snippet of code, which looks like it has the exact same semantics as the Copy function I've introduced.

{
builder.Add(item);
}
builder.Reserve(_appended.Count);
Copy link
Contributor Author

@jamesqo jamesqo Dec 24, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general pattern is

var builder = new SparseArrayBuilder<T>(initialize: true);

...

if (CanPredetermineSizeOfSomeItems)
{
    // Semantically equivalent to builder.AddRange(
    // Enumerable.Repeat(default(T), PredeterminedSize)),
    // except we don't add anything to the buffer.
    builder.Reserve(PredeterminedSize);
}

...

// Creates an array with a bunch of zero-initialized
// gaps in it where `Reserve` has been called
T[] array = builder.ToArray();

// Get the list of locations where gaps were added.
ArrayBuilder<Marker> markers = builder.Markers;

for (int i = 0; i < markers.Count; i++)
{
    // Each Marker tells us the index of the gap in
    // the array, as well as how wide the gap is.
    Marker marker = markers[i];

    // Copy the corresponding collection/nodes/etc.
    // to this gap
    ...
}

This is not the best example because we don't have to use the markers (we know exactly where the gaps are here, at the start & at the end), but you can see the above code structure in Concat & SelectMany.

// remove this workaround.
Debug.Assert(initialize);

_builder = new LargeArrayBuilder<T>(initialize: true);
Copy link
Contributor Author

@jamesqo jamesqo Dec 24, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked into it extensively yet, but I'm conjecturing that some of the speed-wise regressions in the tests for small collections could be abated if we could switch to the : this()-style pattern here and in LargeArrayBuilder. The JIT does not handle large structs very well yet, and additionally will not be able to inline either of these methods. (edit: Made the changes)

@JonHanna
Copy link
Contributor

It certainly seems soundly reasoned. I think passing the count which is of debug value only AFAICT and then optimising out a MoveNext() is the wrong direction compared to a normal foreach that assumes the count would be correct, but apart from that this seems sound. What's the impact on time taken?

@JonHanna
Copy link
Contributor

Test Innerloop Ubuntu14.04 Debug Build and Test
(Build timeout)
Test Innerloop Ubuntu14.04 Release Build and Test
(https://ci.dot.net/job/dotnet_corefx/job/master/job/ubuntu14.04_release_prtest/1626/testReport/junit/System.Security.Cryptography.X509Certificates.Tests/PublicKeyTests/TestPublicKey_Key_RSA/ since there's a collection involved its not entirely impossible that this is related so lets see if it happens again and is a flaw here before opening a CI-related issue).

@jamesqo
Copy link
Contributor Author

jamesqo commented Dec 24, 2016

@JonHanna

What's the impact on time taken?

I posted some benchmarks for SelectMany in the description: results / test. For moderate to large subcollections (8, 13, 21, 34) there seems to be a clear improvement (10-60% faster). For smaller subcollections (1, 2, 3, 5), however, the time seems to have increased dramatically, even though the GCs have decreased because we're not allocating enumerators anymore. I'm considering just falling thru to AddRange if the count of the collection is small enough.

@karelz
Copy link
Member

karelz commented Jan 10, 2017

@VSadov @OmarTawfik can you please review the change when you get a chance?

@OmarTawfik
Copy link
Contributor

LGTM. @VSadov?

@jamesqo jamesqo changed the title [RFC] Reduce allocations in {Concat,SelectMany,Append,Prepend}.ToArray by close to 50% [WIP] Reduce allocations in {Concat,SelectMany,Append,Prepend}.ToArray by close to 50% Jan 20, 2017
@jamesqo jamesqo changed the title [WIP] Reduce allocations in {Concat,SelectMany,Append,Prepend}.ToArray by close to 50% Reduce allocations in {Concat,SelectMany,Append,Prepend}.ToArray by close to 50% Jan 21, 2017
@jamesqo
Copy link
Contributor Author

jamesqo commented Jan 21, 2017

@VSadov PTAL when you have time-- I have tidied up this PR and posted perf results at #14675 (comment)

@karelz
Copy link
Member

karelz commented Jan 25, 2017

@VSadov @OmarTawfik ping?

@VSadov
Copy link
Member

VSadov commented Jan 26, 2017

LGTM

@VSadov VSadov merged commit a75d133 into dotnet:master Jan 26, 2017
@jamesqo jamesqo deleted the sparse-array-builder branch January 26, 2017 22:18
@karelz karelz modified the milestone: 2.0.0 Feb 4, 2017
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Reduce allocations in {Concat,SelectMany,Append,Prepend}.ToArray by close to 50%

Commit migrated from dotnet/corefx@a75d133
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants