Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

Specialize the single-selector overload of SelectMany. #13942

Merged
merged 8 commits into from
Dec 28, 2016

Conversation

jamesqo
Copy link
Contributor

@jamesqo jamesqo commented Nov 24, 2016

e.SelectMany(i => i) is a very popular option for flattening a list of enumerables: http://stackoverflow.com/questions/1590723/flatten-list-in-linq

This PR optimizes SelectMany calls followed by ToArray or ToList, leading to a substantial (~40%) speedup. Instead of looping through each item of the projected sequence, yield returning it from the iterator, and adding it to the list/LargeArrayBuilder, we simply call AddRange.

Performance test: https://github.com/jamesqo/Dotnet/blob/05daf42403b615942706a7e9be32c8b22db80e67/Program.cs Results: https://gist.github.com/jamesqo/635e6e49b3ceb6e9527161f3472188f7

You may notice that there are a lot of regressions (substantially more allocations happening) for ToList. That puzzled me too, until I found out the version of S.P.CoreLib I was using did not include this change; in my tests, a buffer was being allocated every time List.AddRange was called. Still, however, the new implementation was faster.

I've also added more testing for these changes in the PR.

cc @JonHanna @VSadov @stephentoub

yield return subElement;
}
}
return new SelectManySingleSelectorIterator<TSource, TResult>(source, selector);
Copy link
Contributor Author

@jamesqo jamesqo Nov 24, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have preferred to keep this SelectManyIterator, but the compiler complains because there are other SelectManyIterator members.

{
checked
{
count += _selector(element).Count();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Add tests similar to the ones in Concat to make sure checked is used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you going to add these tests in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I forgot. Yes.

if (!_subEnumerator.MoveNext())
{
_subEnumerator.Dispose();
_state = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we set _subEnumerator to null here? That would help in the case where the subsequent GetEnumerator() call throws, at which point if we didn't null this out we'd be holding on to a disposed enumerator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done.

@stephentoub
Copy link
Member

You may notice that there are a lot of regressions (substantially more allocations happening) for ToList. That puzzled me too, until I found out the version of S.P.CoreLib I was using did not include this change; in my tests, a buffer was being allocated every time List.AddRange was called

What about the regressions for iteration and ToArray?

Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than my few questions/comments, LGTM. Thanks!

@jamesqo
Copy link
Contributor Author

jamesqo commented Nov 24, 2016

@stephentoub

What about the regressions for iteration and ToArray?

I believe those are just random fluctuations in the results... most of them appear to be when the sub-collection length is really small, like 1-3, and AddRange doesn't help much. Also this PR is only supposed to help ToArray / ToList perf, I only included iteration benchmarks to show they maintained the status quo.

edit: I forgot to add, most of the regressions are like 5-10% and inconsistent.

@VSadov
Copy link
Member

VSadov commented Nov 24, 2016

LGTM
Nice change!!

Also was interesting to learn about #6892. Nice change too. Normally I would not like an idea of exposing internal buffers to strangers, but indeed, List<T> is mutable and hardly a lot of extra damage could be done from charing in that case.

@stephentoub
Copy link
Member

I believe those are just random fluctuations in the results

I'm not convinced that there aren't real regressions in some cases. Consider the common case where the enumerable returned for each iteration isn't an ICollection. Won't List.AddRange with its current implementation end up doing more work than if each element were Add'd individually?

_sourceEnumerator.Dispose();
_sourceEnumerator = null;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_subEnumerator may not be null here, and need to be disposed too, if the enumeration is jumped out of.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll add a dispose for that too.

@jamesqo
Copy link
Contributor Author

jamesqo commented Nov 25, 2016

@stephentoub Ah, you make a good point. Looks like how it's currently implemented we'll end up in a loop calling Insert on each item on the source, while right now we call Add on each item in the iterator. Insert has 2 more branches than Add; it's debatable whether adding 2 extra branches is faster than 1 virtual method call, but at any rate it looks like the code in List can be improved. I'll make a PR to coreclr.

goto case 3;
case 3:
// Take the next element from the sub-collection and yield.
if (!_subEnumerator.MoveNext())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that if previously an exception came out of here, such as from the MoveNext, the instance would be Dispose'd immediately; now it requires Dispose to be called on the enumerator explicitly. Are we ok with that @VSadov?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this happen every time we replace a yield-based iterator with a class?

Copy link
Contributor Author

@jamesqo jamesqo Dec 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VSadov, can you comment on whether you think this is an acceptable change when you have time? Thanks.

Copy link
Member

@VSadov VSadov Dec 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a common practice in LINQ. See, for example implementation of SelectEnumerableIterator
Since client can stop enumerating at any time, voluntarily or due to a fault, the client needs to be able to dispose the entire internal state of the iterator. Thus we create a composite Dispose.
We could stop just at that, but we often, as a good effort, dispose on state transitions as well. That is mostly to release resources early and does not need to be hardened for faulting scenarios.

@jamesqo
Copy link
Contributor Author

jamesqo commented Dec 5, 2016

Resolved merge conflicts

@jamesqo
Copy link
Contributor Author

jamesqo commented Dec 5, 2016

@stephentoub Responding to your comments from earlier:

I'm not convinced that there aren't real regressions in some cases. Consider the common case where the enumerable returned for each iteration isn't an ICollection.

My tests didn't actually hit that branch since each subcollection was an array. However, I modified them slightly to remove the ToArray call here and ran with a build of coreclr that included dotnet/coreclr#8306 (which has been merged). It still looks like mostly improvements excepting a couple of fluctuations: https://gist.github.com/jamesqo/fb78ba5b5b3c26f4a67fac4aa40dabf3

@karelz karelz modified the milestone: 1.2.0 Dec 6, 2016
@VSadov
Copy link
Member

VSadov commented Dec 28, 2016

LGTM

@VSadov VSadov merged commit 5bf69d1 into dotnet:master Dec 28, 2016
@jamesqo jamesqo deleted the select-many branch December 28, 2016 01:29
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Specialize the single-selector overload of SelectMany.

Commit migrated from dotnet/corefx@5bf69d1
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants