Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

Remove two iterators from Concat #15943

Merged
merged 4 commits into from
Mar 24, 2017
Merged

Remove two iterators from Concat #15943

merged 4 commits into from
Mar 24, 2017

Conversation

jamesqo
Copy link
Contributor

@jamesqo jamesqo commented Feb 8, 2017

Was part of #15389 but moved to another PR. Fixes https://github.com/dotnet/corefx/issues/15942

Changes:

  • Remove 2 iterators from Concat that specialize for ICollections. Instead of having different iterators, set a flag on the iterator if all inputs so far have been ICollections. This allows us to determine in constant time whether we can get the count cheaply and preallocate for ToArray/ToList.

  • This allows us to move the source enumerables into linked list nodes that are of uniform type, ConcatNIterator<TSource>. Previously, we had to typecast every time we crossed a node.

  • Implement GetCount and ToArray in the derived classes rather than the base to avoid virtual calls and improve time complexity.

    • Avoid walking the linked list quadratically for GetCount, since we don't need to sum the counts in order.
    • Remove unnecessary virtual calls to GetEnumerable for 1 Concat followed by ToArray.
    • Make 1 Concat followed by Count cheap if the both of the concatees' counts can be gotten cheaply.
  • Add a ReserveOrAdd method to SparseArrayBuilder which either reserves space for the enumerable if its count can be predetermined, or eagerly adds it otherwise. This saves quite a few lines of code.

Performance tests: memory leak reduction / regressions from this change Need to be updated.

/cc @JonHanna @stephentoub @VSadov

@jamesqo jamesqo changed the title Refactor Concat and reduced linked list size Refactor Concat and reduce linked list size Feb 8, 2017
@karelz
Copy link
Member

karelz commented Feb 8, 2017

cc: @ianhays @safern as it touches also Collections.

@jamesqo
Copy link
Contributor Author

jamesqo commented Feb 8, 2017

@karelz Not really. Those are just Common files which are only used by Linq.

@karelz
Copy link
Member

karelz commented Feb 8, 2017

Ah, thanks for clarification. Will try to remember it in future :)

@jamesqo
Copy link
Contributor Author

jamesqo commented Feb 11, 2017

Test Innerloop Ubuntu14.04 Release Build and Test

int count;
if (EnumerableHelpers.TryGetCount(enumerable, out count))
// TODO: Add test to ensure the bug is fixed.
if (builder.ReserveOrAdd(enumerable))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please submit a separate PR with the SelectMany bug fix. It doesn't have anything to do with Concat.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

/// At face value, using list nodes seems worse than chaining together the iterators directly and
/// letting the iterator itself hold a reference to the latest source. So why allocate a list node in
/// addition to each iterator? It lets the GC reclaim intermediary iterators in the chain which
/// have fields that are not relevant to later iterators, like <see cref="_enumerator"/> or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really a win in real usage? We're allocating additional objects in order to allow the GC to reclaim other objects. And in general, the objects you build up in these chains aren't held onto for very long: you do a few concats and enumerate the result.

@stephentoub
Copy link
Member

Performance tests: memory leak reduction / regressions from this change

I'm not seeing this as an improvement. The test for letting things get GC'd is creating a chain of 5M concats... that's not a real scenario. The cases shown in the regressions list are more representative of real usage and are regressions.

@jamesqo jamesqo changed the title Refactor Concat and reduce linked list size Remove two iterators from Concat Feb 14, 2017
GetEnumerable is faster, and we avoid covariant array typechecks that come with reference types.
@jamesqo
Copy link
Contributor Author

jamesqo commented Feb 14, 2017

@stephentoub, I've worked on this some more and you might want to take another look. I've removed the SingleLinkedNode thing for the time being, so we only make 1 allocation per Concat. I've posted some more comprehensive benchmarks here that demonstrate both the improvements/regressions introduced by this PR. Some things to note:

  • Before, we were saying that x.Concat(y) could not be counted cheaply even if x and y were both IIListProviders. This meant we did not preallocate for ToList and there was wasteful memory usage. This PR fixes that for the case of 1 Concat and decreases GCs a lot.

  • The results at the top of the analysis show pretty drastic regressions if all inputs are ICollections. However, these results are actually skewed in favor of the old implementation; the old implementation typecasts to ICollection before the Concat iterator is returned, so that doesn't get included in the benchmark. The new implementation doesn't do the typecast until GetCount/ToArray/ToList is called, so it does get included. So, many of the regressions are exaggerated.

/// If the items' count is predetermined to be 0, no reservation is made and the return value is <c>false</c>.
/// The effect is the same as if the items were added, since adding an empty collection does nothing.
/// </remarks>
public bool ReserveOrAdd(IEnumerable<T> items)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit tests for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add them when I get the chance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephentoub Nvm, this type actually depends on EnumerableHelpers.Linq.cs which makes use of internal Linq interfaces. So there's no way to write unit tests for this type currently, unless maybe with InternalsVisibleTo. But either way that should probably be done in a separate PR bc some existing methods are outside the scope of this PR.

@@ -231,18 +231,10 @@ public TResult[] ToArray()
{
IEnumerable<TResult> enumerable = _selector(element);

int count;
if (EnumerableHelpers.TryGetCount(enumerable, out count))
if (builder.ReserveOrAdd(enumerable))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change to SelectMany in this PR for Concat?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I introduced this method to help with Concat (it's used there) then I realized it could be used in SelectMany too.

return new ConcatNEnumerableIterator<TSource>(this, next, 2);
// Instead of linking directly to this Concat2Iterator, we create 2 new ConcatNIterators
// for the first two sources and create a third one to hold the next source.
// This simplifies `GetEnumerable` because the nodes are of uniform type and we don't
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This simplifies 'GetEnumerable'

And makes the common case of foreach (var item in source.Concat(something).Concat(somethingElse)) more expensive, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@stephentoub
Copy link
Member

@jamesqo, thanks for your work on this, but I'm still not understanding why this PR is actually a good thing. The most common scenario for Concat is to have a few concats that are then enumerated, and it seems this PR makes that slower, no?

@jamesqo
Copy link
Contributor Author

jamesqo commented Feb 16, 2017

thanks for your work on this, but I'm still not understanding why this PR is actually a good thing. The most common scenario for Concat is to have a few concats that are then enumerated, and it seems this PR makes that slower, no?

The main point of this PR was to remove the two iterators, but I guess I got a little sidetracked with the uniform-type node thing. I thought the previous code for looping over the enumerables looked pretty verbose with all of the typecasts, so that's why I wanted to change that. Nevertheless, I've reverted back to the original scheme where the tail is an abstract class, and fortunately I found a way to loop without too much boilerplate by enscapulating the typecast in a property.

@jamesqo
Copy link
Contributor Author

jamesqo commented Feb 21, 2017

Updated the gist again with latest commit.

@danmoseley
Copy link
Member

@VSadov @OmarTawfik @stephentoub seems this is ready for next review.

@karelz
Copy link
Member

karelz commented Mar 16, 2017

@VSadov @OmarTawfik ping again

Copy link
Member

@VSadov VSadov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@VSadov VSadov merged commit d1c4eea into dotnet:master Mar 24, 2017
@jamesqo jamesqo deleted the concat.nodes branch March 25, 2017 01:15
@karelz karelz modified the milestone: 2.0.0 Mar 25, 2017
jkotas added a commit to jkotas/coreclr that referenced this pull request Jan 20, 2018
The timezone ids used case insensitive comparisons everywhere, except in the dictionary used to cache timezones.

Fixes dotnet/corefx#15943
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants