-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Enumerable.*By operators (DistinctBy, ExceptBy, IntersectBy, UnionBy, MinBy, MaxBy) #27687
Comments
Not sure why #14753 was closed way back when. It is crazy that LINQ doesn't have these methods built in. I think it's been rare in the 10 years since LINQ's introduction that I've ever had a project where I don't end up writing one or more of these extension methods again and again myself. I think the argument (mentioned in the other thread) of, "oh well you can do these things by creating a custom comparer" doesn't hold water. Writing a custom comparer is very heavyweight. And, in practice, I can probably count on one hand the number of times I've seen developers actually go and write their own comparers. Instead they usually a) use home-grown extension methods that do these things or b) fall back to writing a foreach loop. One of the main points of LINQ is brevity- writing what you want to do, not all the plumbing code required to do it. Forcing users to write a separate class each time they want to slice their enumerable a different way is the absolute antithesis of this. To implement these methods, I'd imagine it could just use something like Jon Skeet's ProjectionEqualityComparer and call the existing non-*By methods. (Probably deserves a separate issue, but that class should be exposed publicly in the framework proper as well) |
@maryamariyan |
In an effort to unify my own *By extension methods (with the hope that those would someday would become part of the runtime) I came looking for a reference implementation that I could grab. |
@terrajobst What needs to happen to get these API suggestions reviewed and get any official feedback from the team? I saw you mention in the standup yesterday that you guys have a clear plate now... :) |
The API process is documented. Each area has an owner and it's their responsibility to work with the issue opener to get the issue either into the state api-ready-for-review or closed. In this case, that's @eiriktsarpalis and @adamsitnik. |
A few remarks:
|
@eiriktsarpalis Is there a reason you guys haven't reviewed this (or scheduled it for review) yet? I realize it doesn't follow the template but it seems pretty clear what it's asking and obviously you guys are aware of the proposal. |
It's not different in terms of behavior (I assume you meant IEqualityComparer), and the same is true of all those proposed methods (ExceptBy, UnionBy, IntersectBy, DistinctBy). But writing a custom equality comparer every time you want to perform one of those operations based on a given property is time-consuming, and the logic of the comparison isn't immediately visible without looking at the implementation. The *By overloads make the code terser and more readable. |
Couple of reasons. The size of the backlog is significant so it takes us time before we can get to triage everything. I also need to be confident that any proposal has a strong chance of getting approved before I mark it as ready for review: this includes gathering evidence that the addition provides value to users, poking holes in the design, and finalizing the API shape. Following the template is part of that, but that's merely a convention meant to speed up API review that frankly takes a tiny fraction of the whole effort.
I suppose my original question is should they contain the |
If somebody were to step up and commit to implementing what's approved, would that be enough to take it forward? There are 56 positive votes at this point. I'll also use this opportunity to lobby for #27449 (comment) (IEnumerable should have a extension for creating fixed size chunks) 😄 |
Yes, this would be manually implementing what Here is a more complex example for
This showcases the need for supporting two collections of differing types. This cannot be supported with an
I believe you are correct. It seems consistent to make all these types of methods have a
Can you elaborate on what might not make sense here?
Some of these could be done with a custom comparer while some cannot. Those, that process items of differing types need a key selector. I hope you find my "business example" plausible. I certainly have worked many times on line of business app code like this (I'm a consultant). |
Ah, I had missed that clarification. I had assumed that So I would probably tend towards adding the following APIs: namespace System.Linq
{
public static class Enumerable
{
IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector);
IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer);
IEnumerable<TSource> ExceptBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector);
IEnumerable<TSource> ExceptBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer);
IEnumerable<TSource> IntersectBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector);
IEnumerable<TSource> IntersectBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer);
IEnumerable<TSource> UnionBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector);
IEnumerable<TSource> UnionBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer);
TSource MinBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector);
TSource MinBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult>? comparer);
TSource MaxBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector);
TSource MaxBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult>? comparer);
// Missing min & max overloads accepting custom comparers added for completeness
TResult Min<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult>? comparer);
TResult Max<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult>? comparer);
}
} Note that I've removed the Min*/Max* overloads that take a |
Removing support for heterogeneous sources (e.g. In my estimation, most uses of these I don't see much of a clarity impact here. The call sites look very simple and if you just write the obvious code the correct overload will be picked. |
I'm simply arguing that the particular signature is not defining an intersection operation, at least from a strict set theoretic perspective. If the point of the *By methods is to provide an alternative to the existing Do you have examples of other LINQ-like APIs that use a similar signature for intersection operations? |
I do not believe there is a lambda-taking existing method. A comparer certainly is different from a
I can certainly see your point that a different name than |
Perhaps I didn't phrase my original statement very well: I'm not claiming that there currently is a method accepting a lambda, but that the proposed |
@GSPP I'd be keen on getting a version of this approved in time for .NET 6, but I would still gravitate towards the shape as described here. Would it be possible to update your original post using the API proposal issue template so I can mark this ready for API review? I still think intersecting and subtracting over heterogeneous sources is useful, so it's certainly something I'll be bringing up during the discussion. |
namespace System.Linq
{
public static class Enumerable
{
public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector);
public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer);
public static IEnumerable<TSource> ExceptBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector);
public static IEnumerable<TSource> ExceptBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer);
public static IEnumerable<TSource> ExceptBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TKey> second, Func<TSource, TKey> keySelectorFirst);
public static IEnumerable<TSource> ExceptBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TKey> second, Func<TSource, TKey> keySelectorFirst, IEqualityComparer<TKey>? comparer);
public static IEnumerable<TSource> IntersectBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector);
public static IEnumerable<TSource> IntersectBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer);
public static IEnumerable<TSource> IntersectBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TKey> second, Func<TSource, TKey> keySelectorFirst);
public static IEnumerable<TSource> IntersectBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TKey> second, Func<TSource, TKey> keySelectorFirst, IEqualityComparer<TKey>? comparer);
public static IEnumerable<TSource> UnionBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector);
public static IEnumerable<TSource> UnionBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer);
public static TSource MinBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector);
public static TSource MinBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult>? comparer);
public static TSource MaxBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector);
public static TSource MaxBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult>? comparer);
// Missing min & max overloads accepting custom comparers added for completeness
public static TResult Min<TSource, TResult>(this IEnumerable<TSource> source, IComparer<TResult>? comparer);
public static TResult Max<TSource, TResult>(this IEnumerable<TSource> source, IComparer<TResult>? comparer);
}
public static class Queryable
{
public static IQueryable<TSource> DistinctBy<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> keySelector);
public static IQueryable<TSource> DistinctBy<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> keySelector, IEqualityComparer<TKey>? comparer);
public static IQueryable<TSource> ExceptBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TSource> source2, Expression<Func<TSource, TKey>> keySelector);
public static IQueryable<TSource> ExceptBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TSource> source2, Expression<Func<TSource, TKey>> keySelector, IEqualityComparer<TKey>? comparer);
public static IQueryable<TSource> ExceptBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TKey> source2, Expression<Func<TSource, TKey>> keySelectorFirst);
public static IQueryable<TSource> ExceptBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TKey> source2, Expression<Func<TSource, TKey>> keySelectorFirst, IEqualityComparer<TKey>? comparer);
public static IQueryable<TSource> IntersectBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TSource> source2, Expression<Func<TSource, TKey>> keySelector);
public static IQueryable<TSource> IntersectBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TSource> source2, Expression<Func<TSource, TKey>> keySelector, IEqualityComparer<TKey>? comparer);
public static IQueryable<TSource> IntersectBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TKey> source2, Expression<Func<TSource, TKey>> keySelectorFirst);
public static IQueryable<TSource> IntersectBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TKey> source2, Expression<Func<TSource, TKey>> keySelectorFirst, IEqualityComparer<TKey>? comparer);
public static IQueryable<TSource> UnionBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TSource> source2, Expression<Func<TSource, TKey>> keySelector);
public static IQueryable<TSource> UnionBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TSource> source2, Expression<Func<TSource, TKey>> keySelector, IEqualityComparer<TKey>? comparer);
public static TSource MinBy<TSource, TResult>(this IQueryable<TSource> source, Expression<Func<TSource, TResult>> selector);
public static TSource MinBy<TSource, TResult>(this IQueryable<TSource> source, Expression<Func<TSource, TResult>> selector, IComparer<TResult>? comparer);
public static TSource MaxBy<TSource, TResult>(this IQueryable<TSource> source, Expression<Func<TSource, TResult>> selector);
public static TSource MaxBy<TSource, TResult>(this IQueryable<TSource> source, Expression<Func<TSource, TResult>> selector, IComparer<TResult>? comparer);
// Missing min & max overloads accepting custom comparers added for completeness
public static TResult Min<TSource, TResult>(this IQueryable<TSource> source, IComparer<TResult>? comparer);
public static TResult Max<TSource, TResult>(this IQueryable<TSource> source, IComparer<TResult>? comparer);
}
} |
And will the corresponding methods be added to |
@YohDeadfall we're adding queryable methods for all of the above. |
Awesome that this is happening. I guarantee you: These new methods will be appreciated widely and we will see mentions on Stack Overflow and other community places. Developers love little gems like these. |
Now that this is approved, should we make a separate issue for |
We shouldn't need to, the approved API includes the following overloads which should be sufficient to address the requirement: public static IQueryable<TSource> ExceptBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TKey> source2, Expression<Func<TSource, TKey>> keySelectorFirst);
public static IEnumerable<TSource> IntersectBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TKey> second, Func<TSource, TKey> keySelectorFirst); |
Side question: are we going to be adding any of these to PLINQ? |
@eiriktsarpalis That only works if the the second collection is already the type of the projection. I'd say that's less common than the case when the collections are two completely unrelated types. class Dog { public string DogName {get; set;} }
class Cat { public string CatName {get; set;} }
public static IEnumerable<TFirst> IntersectBy<TFirst, TSecond, TKey>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TKey> firstKeySelector,
Func<TSecond, TKey> secondKeySelector);
public static IEnumerable<TFirst> ExceptBy<TFirst, TSecond, TKey>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TKey> firstKeySelector,
Func<TSecond, TKey> secondKeySelector);
//Find dogs where a cat shares their name
var popularDogs = dogs.IntersectBy(cats, d => d.DogName, c => c.CatName);
//Find dogs where a cat does not share their name
var unpopularDogs = dogs.ExceptBy(cats, d => d.DogName, c => c.CatName); Obviously this example is contrived but real-world scenarios like this happen all the time. |
@MgSam this should probably cover your use case: //Find dogs where a cat shares their name
var popularDogs = dogs.IntersectBy(cats.Select(c => c.CatName), d => d.DogName);
//Find dogs where a cat does not share their name
var unpopularDogs = dogs.ExceptBy(cats.Select(c => c.CatName), d => d.DogName);
Seems reasonable, but I would defer to @adamsitnik, @carlossanlop and @jozkee on that call. I would recommend creating a separate issue and we can revisit at a later iteration. |
@eiriktsarpalis Great point that there is a relatively easy workaround given the new methods. I haven't wrapped my brain around having them available yet. :) That said, these overloads that I've proposed are straightforward and useful enough to be added outright, IMO. In other parts of LINQ, (like |
I would like to work on this if possible |
Hi @C-xC-c, thanks for volunteering. I had actually started working on the feature a few weeks back but just didn't have enough time to see it through: eiriktsarpalis@7e37b77. Feel free to reuse any or none of that. |
Hi again @C-xC-c, have you begun work on the feature? I actually just finished work on my own branch so will post a PR today. Hope that's ok. |
I was working on my branch locally but that's okay! Glad to see this get done! |
I propose to add LINQ methods of the pattern
*By
. For example:This method would behave like
Distinct
except that equality is determined based on the key provided bykeySelector
. The key could be any value including anonymous types and value tuples.A motivating case could be this:
Logic like that is reasonably common in business logic code. It is not easy to implement without
ExceptBy
. In particular, the following is undesirable because it leads to quadratic cost and repeated enumeration:In the past I have had a need for
*By
methods many times so I have written them myself. A web search reveals great interest. I believe there is a strong case for adding methods like this.There has been interest on this issue tracker as well:
Proposed API
@eiriktsarpalis has provided the following shape for API review. Please refer to my original proposal below for reference.
and equivalent Queryable APIs:
EDIT @eiriktsarpalis: the key change in this amendment is that the
ExceptBy
andIntersectBy
overloads do not allow heterogeneous element types for thesecond
parameter. This makes it less of a join-like construct and more compatible with both the existingExcept
andIntersect
methods as well as the proposed signature forUnionBy
. Please follow the conversation after this comment for more details on the issue.Open Questions
ExceptBy
andIntersectBy
methods can be generalized by admitting heterogeneous element types in thesecond
parameter. This enables applications like the one cited in the first example, at the cost of requiring a separatekeySelector
argument for the second collection. Note that this generalization is not admissible in theUnionBy
case.*By
methods accept custom comparers for the key types in addition to key selector lambdas? While there is certainly precedent for similar methods in LINQ, there is also an element of over-engineering here: the natural equality/ordering semantics of ad-hoc key projections are almost always sufficient. YAGNI.Original API Proposal
Here is the API proposal. All of these methods come with an overload with and without
comparer
. I kept the existing naming conventions and argument ordering.Further notes:
DistinctBy
: The order of the output elements should be documented to be the same order as the input. Any reasonable implementation that comes to mind does it this way. For compatibility reasons this could not ever be changed anyway after the first version ships.Distinct
should be similarly documented if not already done.Distinct
already behaves this way.ExceptBy
andIntersectBy
the same is true for the first input. Only items from the first input are ever returned and their order can be kept.Except
does the same thing today.UnionBy
I'm not sure about the order.MinBy
/MaxBy
it should be documented that the first element in the sequence with the minimum/maximum key is returned.MinBy
/MaxBy
can take adefaultValue
which is used in case the sequence is empty. Overloads without default value throw in that case.Min
/Max
to bring them on par with the functionality added byMinBy
/MaxBy
(comparer
anddefaultValue
).ExceptBy
andIntersectBy
can be different. This is because we only ever return items of the first sequence. We need two key selectors in that case. There's a simpler overload that has only one element type as well.The text was updated successfully, but these errors were encountered: