-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Specialize the single-selector overload of SelectMany. #13942
Conversation
yield return subElement; | ||
} | ||
} | ||
return new SelectManySingleSelectorIterator<TSource, TResult>(source, selector); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have preferred to keep this SelectManyIterator
, but the compiler complains because there are other SelectManyIterator
members.
{ | ||
checked | ||
{ | ||
count += _selector(element).Count(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Add tests similar to the ones in Concat
to make sure checked is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you going to add these tests in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I forgot. Yes.
if (!_subEnumerator.MoveNext()) | ||
{ | ||
_subEnumerator.Dispose(); | ||
_state = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we set _subEnumerator to null here? That would help in the case where the subsequent GetEnumerator() call throws, at which point if we didn't null this out we'd be holding on to a disposed enumerator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done.
What about the regressions for iteration and ToArray? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than my few questions/comments, LGTM. Thanks!
I believe those are just random fluctuations in the results... most of them appear to be when the sub-collection length is really small, like 1-3, and edit: I forgot to add, most of the regressions are like 5-10% and inconsistent. |
LGTM Also was interesting to learn about #6892. Nice change too. Normally I would not like an idea of exposing internal buffers to strangers, but indeed, |
I'm not convinced that there aren't real regressions in some cases. Consider the common case where the enumerable returned for each iteration isn't an ICollection. Won't List.AddRange with its current implementation end up doing more work than if each element were Add'd individually? |
_sourceEnumerator.Dispose(); | ||
_sourceEnumerator = null; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_subEnumerator
may not be null here, and need to be disposed too, if the enumeration is jumped out of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'll add a dispose for that too.
@stephentoub Ah, you make a good point. Looks like how it's currently implemented we'll end up in a loop calling |
goto case 3; | ||
case 3: | ||
// Take the next element from the sub-collection and yield. | ||
if (!_subEnumerator.MoveNext()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that if previously an exception came out of here, such as from the MoveNext, the instance would be Dispose'd immediately; now it requires Dispose to be called on the enumerator explicitly. Are we ok with that @VSadov?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this happen every time we replace a yield-based iterator with a class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VSadov, can you comment on whether you think this is an acceptable change when you have time? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a common practice in LINQ. See, for example implementation of SelectEnumerableIterator
Since client can stop enumerating at any time, voluntarily or due to a fault, the client needs to be able to dispose the entire internal state of the iterator. Thus we create a composite Dispose.
We could stop just at that, but we often, as a good effort, dispose on state transitions as well. That is mostly to release resources early and does not need to be hardened for faulting scenarios.
Resolved merge conflicts |
@stephentoub Responding to your comments from earlier:
My tests didn't actually hit that branch since each subcollection was an array. However, I modified them slightly to remove the |
LGTM |
Specialize the single-selector overload of SelectMany. Commit migrated from dotnet/corefx@5bf69d1
e.SelectMany(i => i)
is a very popular option for flattening a list of enumerables: http://stackoverflow.com/questions/1590723/flatten-list-in-linqThis PR optimizes
SelectMany
calls followed byToArray
orToList
, leading to a substantial (~40%) speedup. Instead of looping through each item of the projected sequence, yield returning it from the iterator, and adding it to the list/LargeArrayBuilder
, we simply callAddRange
.Performance test: https://github.com/jamesqo/Dotnet/blob/05daf42403b615942706a7e9be32c8b22db80e67/Program.cs Results: https://gist.github.com/jamesqo/635e6e49b3ceb6e9527161f3472188f7
You may notice that there are a lot of regressions (substantially more allocations happening) for
ToList
. That puzzled me too, until I found out the version of S.P.CoreLib I was using did not include this change; in my tests, a buffer was being allocated every timeList.AddRange
was called. Still, however, the new implementation was faster.I've also added more testing for these changes in the PR.
cc @JonHanna @VSadov @stephentoub