Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Overload for Distinct to receive a func. #27665

Closed
RubenMateus opened this issue Oct 18, 2018 · 14 comments
Closed

Feature Request: Overload for Distinct to receive a func. #27665

RubenMateus opened this issue Oct 18, 2018 · 14 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Linq
Milestone

Comments

@RubenMateus
Copy link

I noticed in the Distinct there is no overload for receiving a func, we always have to create a new lambdacomparer and put the func inside the comparer, is it possible to have that abstracted and use it like:
Distinct((a, b) => a.Stuff == b.Stuff && a.Stuff1 == b.Stuff1)
Does this even make sense?
If so i think i have a simple solution for this.

@stephentoub
Copy link
Member

To be efficient, wouldn't it need two funcs, one for equality and one for hash code?

@RubenMateus
Copy link
Author

what would the hash code do? i dont really know how efficient it would be.
Please give me some guidance, i can gladly contribute to this.

@stephentoub
Copy link
Member

stephentoub commented Oct 18, 2018

Distinct works by building a hash set (https://github.com/dotnet/corefx/blob/master/src/System.Linq/src/System/Linq/Set.cs) so that for each new item it sees, it can hash it and determine relatively quickly whether it's already seen that item. That requires a hash code, and that's why IEqualityComparer has both a comparison and hash code method. Without that, Distinct would end up being O(N^2), having to compare every new item individually against every previous item yielded to determine whether it was the same.

So the overload would need to take two delegates, and the call site would end up being quite complicated. I don't think this is something we should add.

If we wanted to do something, I'd prefer to see us add the equivalent of Comparer.Create for EqualityComparer, with EqualityComparer.Create taking the same two delegates. That could then be used with Distinct, but also elsewhere.

@svick
Copy link
Contributor

svick commented Oct 18, 2018

I think the right way to improve this is to add DistinctBy (already mentioned in comments of https://github.com/dotnet/corefx/issues/14119).

With that and C# 7.0 tuples, you could write:

DistinctBy(a => (a.Stuff, a.Stuff1))

This is simpler to use than the proposed overload of Distinct and doesn't have any issues with GetHashCode (because the implementation can call GetHashCode on the tuple).

@RubenMateus
Copy link
Author

RubenMateus commented Oct 18, 2018

i like @svick idea alot, how hard would it be?

@GSPP
Copy link

GSPP commented Oct 19, 2018

There should by all kinds of XxxBy methods (DistinctBy, ExceptBy, MaxBy, ...). I have needed them many times and therefore implemented them.

I have seen some of them proposed in a scattered around way. I would be willing to make a ticket with a formal API proposal that centralizes all of the XxxBy proposals. Is that a good idea?

@jnm2
Copy link
Contributor

jnm2 commented Oct 20, 2018

.DistinctBy(a => (a.Stuff, a.Stuff1)) is the equivalent of:

.GroupBy(a => (a.Stuff, a.Stuff1)).Select(group => group.First())

Or by merging the Select into the GroupBy:

.GroupBy(a => (a.Stuff, a.Stuff1), (key, group) => group.First())

But with performance opportunities that these equivalents do not have.

@jnm2
Copy link
Contributor

jnm2 commented Oct 20, 2018

Same situation with MaxBy. .MaxBy(a => (a.Stuff, a.Stuff1)) is the equivalent of the poorly-performing:

.OrderByDescending(a => (a.Stuff, a.Stuff1)).First()

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@RubenMateus
Copy link
Author

any news on this? will it enter in any release?

@jnm2
Copy link
Contributor

jnm2 commented Jul 22, 2020

I recently learned that Distinct has a documented guarantee that values will be returned in the original order in which they are seen, but GroupBy documentation explicitly does not guarantee that groups will be returned in the original order in which each key was first seen.

Edit: Actually it's the other way around

@svick
Copy link
Contributor

svick commented Jul 22, 2020

@jnm2 Where did you see that? The documentation for GroupBy<TSource,TKey>(IEnumerable<TSource>, Func<TSource,TKey>) says:

The IGrouping<TKey,TElement> objects are yielded in an order based on the order of the elements in source that produced the first key of each IGrouping<TKey,TElement>. Elements in a grouping are yielded in the order they appear in source.

So, at least this overload does guarantee the order. (Though not all overloads have this note.)

@jnm2
Copy link
Contributor

jnm2 commented Jul 22, 2020

Oh no, I had them exactly backwards! GroupBy preserves ordering and Distinct does not.

@adamsitnik adamsitnik removed the untriaged New issue has not been triaged by the area owner label Sep 2, 2020
@eiriktsarpalis
Copy link
Member

So, something like this?

public static class Enumerable
{
    public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector);
}

FWIW F# core has a similar method.

@eiriktsarpalis
Copy link
Member

This request is already covered by #27687. Closing this one.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 15, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Linq
Projects
None yet
Development

No branches or pull requests

9 participants