-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Enumerable.Distinct<T>
performance like HashSet<T>
#42760
Comments
Tagging subscribers to this area: @eiriktsarpalis, @jeffhandley |
The code of your benchmarks will prove that you're doing it right much more than a thousand words :) Also, can you please convert it to use BenchMarkDotNet? |
Tagging subscribers to this area: @eiriktsarpalis, @jeffhandley |
Hi @NetMage Thank you for your proposal. I've given it a try and measured using the following benchmarks: using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Collections.Generic;
using System.Linq;
namespace SetOrHashSet
{
class Program
{
static void Main(string[] args) => BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);
}
[GenericTypeArguments(typeof(int))]
[GenericTypeArguments(typeof(string))]
public class BenchmarksRefactor<T>
{
[Params(10, 1000)]
public int Size { get; set; }
[Params(true, false)]
public bool Unique { get; set; }
private T[] _array;
[GlobalSetup]
public void Setup()
{
if (typeof(T) == typeof(int) && Unique)
_array = (T[])(object)(Enumerable.Range(0, Size).ToArray());
else if (typeof(T) == typeof(int) && !Unique)
_array = (T[])(object)Enumerable.Range(0, Size / 2).Concat(Enumerable.Range(0, Size / 2)).ToArray();
else if (typeof(T) == typeof(string) && Unique)
_array = (T[])(object)Enumerable.Range(0, Size).Select(x => x.ToString()).ToArray();
else if (typeof(T) == typeof(string) && !Unique)
_array = (T[])(object)Enumerable.Range(0, Size / 2).Concat(Enumerable.Range(0, Size / 2)).Select(x => x.ToString()).ToArray();
}
[Benchmark]
public void Enumerate()
{
foreach (var item in _array.Distinct())
{
}
}
[Benchmark]
public int Count() => _array.Distinct().Count();
[Benchmark]
public T[] ToArray() => _array.Distinct().ToArray();
[Benchmark]
public List<T> ToList() => _array.Distinct().ToList();
}
} The results I got: For
For
If we switch to
If it was just about the CPU regression for small inputs, we could think about changing it. But 65% increase in allocated memory is not worth it. Having said that, I am going to close the issue. Thanks, |
Can we improve that internal class? Surely, |
Adding optimizations typically increases code complexity. This needs to be justified using measurements from real-world applications. So far I've never observed the internal However, if someone is willing to study the optimizations applied to Here you can find the docs that describe how to profile and benchmark local dotnet runtime builds: |
Description
The improvements to
HashSet
means that the light weightSet
internal class used byDistinct
is slower than implementing your ownDistinct
withHashSet
. I suggest thatSet
be improved similarly, or that it be replaced withHashSet
.Configuration
.Net Core 5.0.0-rc.1.20451.14
Windows 10 v1909
x64
Data
Here are some rough benchmarks comparing the performance of
Enumerable.Distinct
withMyDistinct
which basically uses a copy ofDistinctIterator
withSet
replaced withHashSet
andMyDistinct2
which isHere are the benchmark results. They were run over millions of random
int
s from 1 to 105 for Lots of collisions, 1 toint.MaxValue
for Few collisions, usingToList
thenSelect(x => x)
. TheToList
was to pre-compute theToString
conversions for thestring
andobject
tests (which was done by casting thestring
toobject
). TheSelect
was to prevent anyIList
optimizations.As can be seen, the library
Distinct
method is always slowest, withMyDistinct2
either fastest or close toMyDistinct
for fastest. I am not sure whyforeach
is so much faster thanDistinctIterator
forint
.Analysis
I believe the
Set
classAdd
method should be modified to be similar to theHashSet.AddIfNotPresent()
.The text was updated successfully, but these errors were encountered: