-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid allocating HashSet in Distinct() for some counts #97845
Conversation
If we can get the count for the underlying source and it's 0 or 1, we can avoid allocating the HashSet, as distinctness only matters when there are multiple elements.
Tagging subscribers to this area: @dotnet/area-system-linq Issue DetailsIf we can get the count for the underlying source and it's 0 or 1, we can avoid allocating the HashSet, as distinctness only matters when there are multiple elements.
|
I wonder if 2 elements can get similar optimization in a sense that you can just compare those without using Hashset and return either 1 or 2 elements. Or using something like ValueHashSet for some small count of values like 7 or less? Or are these an overreach? |
I considered that, but it would mean extra invocations of GetHashCode and Equals, which can be user-provided functions of arbitrary complexity, and while it might help the smaller cases, it would hurt the larger cases when that work then needed to be duplicated. Plus extra complexity here. |
public TSource[] ToArray() => Enumerable.HashSetToArray(new HashSet<TSource>(_source, _comparer)); | ||
public TSource[] ToArray() | ||
{ | ||
if (TryGetNonEnumeratedCount(_source, out int count) && count < 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimizing for counted sources of size < 2 seems somewhat niche to me, is it worth the added type tests?
Just curious, because the iterator state count for Distinct was maxed at 2 for a long time, but is there a reason that a third state couldn't be added to defer an allocation in the enumerator? The allocation of HashSet could be deferred to state 2 if MoveNext has an additional item, and state 3 can be the final state. I think it comes out to be the same number of ops. |
I'm changing this code in another PR. Will close this for now and revisit. |
If we can get the count for the underlying source and it's 0 or 1, we can avoid allocating the HashSet, as distinctness only matters when there are multiple elements.