Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Enumerable.*By operators (DistinctBy, ExceptBy, IntersectBy, UnionBy, MinBy, MaxBy) #27687

Closed
GSPP opened this issue Oct 21, 2018 · 35 comments · Fixed by #50335
Closed

Add Enumerable.*By operators (DistinctBy, ExceptBy, IntersectBy, UnionBy, MinBy, MaxBy) #27687

GSPP opened this issue Oct 21, 2018 · 35 comments · Fixed by #50335
Assignees
Labels
api-approved API was approved in API review, it can be implemented api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Linq help wanted [up-for-grabs] Good issue for external contributors
Milestone

Comments

@GSPP
Copy link

GSPP commented Oct 21, 2018

I propose to add LINQ methods of the pattern *By. For example:

    IEnumerable<TSource> DistinctBy<TSource, TKey>(
         IEnumerable<TSource> source,
         Func<TSource, TKey> keySelector,
         IEqualityComparer<TKey> comparer = null)

This method would behave like Distinct except that equality is determined based on the key provided by keySelector. The key could be any value including anonymous types and value tuples.

A motivating case could be this:

	IEnumerable<Order> allOrders = GetTransactions();
	IEnumerable<int> completedOrderIDs = GetCompletedOrderIDs();
	IEnumerable<Order> remainingOrders =
          allOrders.ExceptBy(completedOrderIDs, order => order.ID, orderID => orderID);

Logic like that is reasonably common in business logic code. It is not easy to implement without ExceptBy. In particular, the following is undesirable because it leads to quadratic cost and repeated enumeration:

	IEnumerable<Order> remainingOrders = allOrders.Where(o => !completedOrderIDs.Contains(o.ID));

In the past I have had a need for *By methods many times so I have written them myself. A web search reveals great interest. I believe there is a strong case for adding methods like this.

There has been interest on this issue tracker as well:

Proposed API

@eiriktsarpalis has provided the following shape for API review. Please refer to my original proposal below for reference.

namespace System.Linq
{
    public static class Enumerable
    {
        public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector);
        public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer);
        
        public static IEnumerable<TSource> ExceptBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector);
        public static IEnumerable<TSource> ExceptBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer);
        
        public static IEnumerable<TSource> IntersectBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector);
        public static IEnumerable<TSource> IntersectBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer);
        
        public static IEnumerable<TSource> UnionBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector);
        public static IEnumerable<TSource> UnionBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer);
        
        public static TSource MinBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector);
        public static TSource MinBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult>? comparer);
        
        public static TSource MaxBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector);
        public static TSource MaxBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult>? comparer);
        
        // Missing min & max overloads accepting custom comparers added for completeness
        public static TResult Min<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult>? comparer);
        public static TResult Max<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult>? comparer);
    }
}

and equivalent Queryable APIs:

namespace System.Linq
{
    public static class Queryable
    {
        public static IQueryable<TSource> DistinctBy<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> keySelector);
        public static IQueryable<TSource> DistinctBy<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> keySelector, IEqualityComparer<TKey>? comparer);

        public static IQueryable<TSource> ExceptBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TSource> source2, Expression<Func<TSource, TKey>> keySelector);
        public static IQueryable<TSource> ExceptBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TSource> source2, Expression<Func<TSource, TKey>> keySelector, IEqualityComparer<TKey>? comparer);

        public static IQueryable<TSource> IntersectBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TSource> source2, Expression<Func<TSource, TKey>> keySelector);
        public static IQueryable<TSource> IntersectBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TSource> source2, Expression<Func<TSource, TKey>> keySelector, IEqualityComparer<TKey>? comparer);

        public static IQueryable<TSource> UnionBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TSource> source2, Expression<Func<TSource, TKey>> keySelector);
        public static IQueryable<TSource> UnionBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TSource> source2, Expression<Func<TSource, TKey>> keySelector, IEqualityComparer<TKey>? comparer);

        public static TSource MinBy<TSource, TResult>(this IQueryable<TSource> source, Expression<Func<TSource, TResult>> selector);
        public static TSource MinBy<TSource, TResult>(this IQueryable<TSource> source, Expression<Func<TSource, TResult>> selector, IComparer<TResult>? comparer);

        public static TSource MaxBy<TSource, TResult>(this IQueryable<TSource> source, Expression<Func<TSource, TResult>> selector);
        public static TSource MaxBy<TSource, TResult>(this IQueryable<TSource> source, Expression<Func<TSource, TResult>> selector, IComparer<TResult>? comparer);

        // Missing min & max overloads accepting custom comparers added for completeness
        public static TResult Min<TSource, TResult>(this IQueryable<TSource> source, Expression<Func<TSource, TResult>> selector, IComparer<TResult>? comparer);
        public static TResult Max<TSource, TResult>(this IQueryable<TSource> source, Expression<Func<TSource, TResult>> selector, IComparer<TResult>? comparer);
    }
}

EDIT @eiriktsarpalis: the key change in this amendment is that the ExceptBy and IntersectBy overloads do not allow heterogeneous element types for the second parameter. This makes it less of a join-like construct and more compatible with both the existing Except and Intersect methods as well as the proposed signature for UnionBy. Please follow the conversation after this comment for more details on the issue.

Open Questions

  • The ExceptBy and IntersectBy methods can be generalized by admitting heterogeneous element types in the second parameter. This enables applications like the one cited in the first example, at the cost of requiring a separate keySelector argument for the second collection. Note that this generalization is not admissible in the UnionBy case.
  • Should the *By methods accept custom comparers for the key types in addition to key selector lambdas? While there is certainly precedent for similar methods in LINQ, there is also an element of over-engineering here: the natural equality/ordering semantics of ad-hoc key projections are almost always sufficient. YAGNI.

Original API Proposal

Here is the API proposal. All of these methods come with an overload with and without comparer. I kept the existing naming conventions and argument ordering.

namespace System.Linq
{
    public static class Enumerable
    {
        public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector);
        public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IEqualityComparer<TKey> comparer);

        public static IEnumerable<TSource> ExceptBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TKey> second, Func<TSource, TKey> keySelectorFirst);
        public static IEnumerable<TSource1> ExceptBy<TSource1, TSource2, TKey>(this IEnumerable<TSource1> first, IEnumerable<TSource2> second, Func<TSource1, TKey> keySelectorFirst, Func<TSource2, TKey> keySelectorSecond);
        public static IEnumerable<TSource1> ExceptBy<TSource1, TSource2, TKey>(this IEnumerable<TSource1> first, IEnumerable<TSource2> second, Func<TSource1, TKey> keySelectorFirst, Func<TSource2, TKey> keySelectorSecond, IEqualityComparer<TKey> comparer);

        public static IEnumerable<TSource> IntersectBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TKey> second, Func<TSource, TKey> keySelectorFirst);
        public static IEnumerable<TSource1> IntersectBy<TSource1, TSource2, TKey>(this IEnumerable<TSource1> first, IEnumerable<TSource2> second, Func<TSource1, TKey> keySelectorFirst, Func<TSource2, TKey> keySelectorSecond);
        public static IEnumerable<TSource1> IntersectBy<TSource1, TSource2, TKey>(this IEnumerable<TSource1> first, IEnumerable<TSource2> second, Func<TSource1, TKey> keySelectorFirst, Func<TSource2, TKey> keySelectorSecond, IEqualityComparer<TKey> comparer);

        public static IEnumerable<TSource> UnionBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector);
        public static IEnumerable<TSource> UnionBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector, IEqualityComparer<TKey> comparer);

        public static TResult Max<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult> comparer);
        public static TResult Max<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult> comparer, TResult defaultValue);
        public static TSource MaxBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector);

        public static TSource MaxBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult> comparer);
        public static TSource MaxBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult> comparer, TSource defaultValue);

        public static TResult Min<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult> comparer);
        public static TResult Min<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult> comparer, TResult defaultValue);

        public static TSource MinBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector);
        public static TSource MinBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult> comparer);
        public static TSource MinBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult> comparer, TSource defaultValue);
    }
}

Further notes:

  1. DistinctBy: The order of the output elements should be documented to be the same order as the input. Any reasonable implementation that comes to mind does it this way. For compatibility reasons this could not ever be changed anyway after the first version ships. Distinct should be similarly documented if not already done. Distinct already behaves this way.
  2. For ExceptBy and IntersectBy the same is true for the first input. Only items from the first input are ever returned and their order can be kept. Except does the same thing today.
  3. For UnionBy I'm not sure about the order.
  4. For MinBy/MaxBy it should be documented that the first element in the sequence with the minimum/maximum key is returned.
  5. MinBy/MaxBy can take a defaultValue which is used in case the sequence is empty. Overloads without default value throw in that case.
  6. I added overloads to Min/Max to bring them on par with the functionality added by MinBy/MaxBy (comparer and defaultValue).
  7. Note, that the input element types for ExceptBy and IntersectBy can be different. This is because we only ever return items of the first sequence. We need two key selectors in that case. There's a simpler overload that has only one element type as well.
@MgSam
Copy link

MgSam commented Oct 22, 2018

Not sure why #14753 was closed way back when.

It is crazy that LINQ doesn't have these methods built in. I think it's been rare in the 10 years since LINQ's introduction that I've ever had a project where I don't end up writing one or more of these extension methods again and again myself.

I think the argument (mentioned in the other thread) of, "oh well you can do these things by creating a custom comparer" doesn't hold water. Writing a custom comparer is very heavyweight. And, in practice, I can probably count on one hand the number of times I've seen developers actually go and write their own comparers. Instead they usually a) use home-grown extension methods that do these things or b) fall back to writing a foreach loop.

One of the main points of LINQ is brevity- writing what you want to do, not all the plumbing code required to do it. Forcing users to write a separate class each time they want to slice their enumerable a different way is the absolute antithesis of this.

To implement these methods, I'd imagine it could just use something like Jon Skeet's ProjectionEqualityComparer and call the existing non-*By methods. (Probably deserves a separate issue, but that class should be exposed publicly in the framework proper as well)

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@omariom
Copy link
Contributor

omariom commented Jun 2, 2020

@maryamariyan
The original issue is five years old. Seven versions of .NET Core have been released since then.
These methods are extremely convenient (unlike the overloads with IEqualityComparer) and are easy to implement.

@lipchev
Copy link

lipchev commented Jun 28, 2020

In an effort to unify my own *By extension methods (with the hope that those would someday would become part of the runtime) I came looking for a reference implementation that I could grab.
As far as I can tell the extrema-finding functions from both MoreLINQ and Ix.NET cause the evaluation of the selector function for single-element collections.
The decision to skip it (when possible) does come with the side effects of "hiding" a possible exception occurring in it, but at the same time- there is no "valid selector guarantee" with the empty collection either- so to me it would make sense to avoid the possibly lengthy selector evaluation (when possible)- but since the normal Enumerable.Min(source, selector) does it as well- might be the dominating factor in the final decision...
Anyway, can anybody point me to an existing implementation of the proposed API - looking through these 100+ pages of search results is quite painful (in more than one ways)- and I see no way of filtering out the "*ByTest.cs" files...

@maryamariyan
Copy link
Member

/cc: @eiriktsarpalis @adamsitnik

@MgSam
Copy link

MgSam commented Aug 14, 2020

@terrajobst What needs to happen to get these API suggestions reviewed and get any official feedback from the team? I saw you mention in the standup yesterday that you guys have a clear plate now... :)

@adamsitnik adamsitnik removed the untriaged New issue has not been triaged by the area owner label Sep 2, 2020
@terrajobst
Copy link
Member

terrajobst commented Oct 2, 2020

@terrajobst What needs to happen to get these API suggestions reviewed and get any official feedback from the team? I saw you mention in the standup yesterday that you guys have a clear plate now... :)

The API process is documented. Each area has an owner and it's their responsibility to work with the issue opener to get the issue either into the state api-ready-for-review or closed.

In this case, that's @eiriktsarpalis and @adamsitnik.

@eiriktsarpalis
Copy link
Member

A few remarks:

  • I would support implementing MinBy and MaxBy. They are useful methods without good alternatives at the moment. I'm not sure if we should add the defaultValue overloads.
  • Likewise, I think DistinctBy is useful. It should also come with an IQueryable implementation.
  • Playing devil's advocate: what problem does ExceptBy solve? You assert that a naive workaround would take quadratic time, however most instances I've seen would simply use completedOrderIDs.ToHashSet(); outside of the main pipeline which is effectively equivalent.
  • Why do we need ExceptBy overloads that select keys for the second enumerable? Using your example it could be easily rewritten in terms of the first oveload: allOrders.ExceptBy(completedOrderIDs.Select(orderID => orderID), order => order.ID);
  • If I understand UnionBy correctly, is it basically using keySelector to define custom a equality semantics for TSource? How is that different from passing a custom IComparer to the existing Union method? Why should it even have a different name?
  • I'm confused by the IntersectBy signatures. Am I right to assume they're just mistakenly copied from the ExceptBy APIs?

@MgSam
Copy link

MgSam commented Dec 2, 2020

@eiriktsarpalis Is there a reason you guys haven't reviewed this (or scheduled it for review) yet? I realize it doesn't follow the template but it seems pretty clear what it's asking and obviously you guys are aware of the proposal.

@thomaslevesque
Copy link
Member

@eiriktsarpalis

How is that different from passing a custom IComparer to the existing Union method?

It's not different in terms of behavior (I assume you meant IEqualityComparer), and the same is true of all those proposed methods (ExceptBy, UnionBy, IntersectBy, DistinctBy). But writing a custom equality comparer every time you want to perform one of those operations based on a given property is time-consuming, and the logic of the comparison isn't immediately visible without looking at the implementation. The *By overloads make the code terser and more readable.

@eiriktsarpalis
Copy link
Member

Is there a reason you guys haven't reviewed this (or scheduled it for review) yet? I realize it doesn't follow the template but it seems pretty clear what it's asking and obviously you guys are aware of the proposal.

Couple of reasons. The size of the backlog is significant so it takes us time before we can get to triage everything. I also need to be confident that any proposal has a strong chance of getting approved before I mark it as ready for review: this includes gathering evidence that the addition provides value to users, poking holes in the design, and finalizing the API shape. Following the template is part of that, but that's merely a convention meant to speed up API review that frankly takes a tiny fraction of the whole effort.

It's not different in terms of behavior (I assume you meant IEqualityComparer), and the same is true of all those proposed methods (ExceptBy, UnionBy, IntersectBy, DistinctBy). But writing a custom equality comparer every time you want to perform one of those operations based on a given property is time-consuming, and the logic of the comparison isn't immediately visible without looking at the implementation. The *By overloads make the code terser and more readable.

I suppose my original question is should they contain the By suffix at all, given that they could just be thought of as overloads that accept a key selector instead of a comparer. While most current LINQ methods accepting a keySelector do have a By suffix, none of them have equivalent methods that accept IEqualityComparer instead.

@GSPP
Copy link
Author

GSPP commented Dec 9, 2020

If somebody were to step up and commit to implementing what's approved, would that be enough to take it forward? There are 56 positive votes at this point.

I'll also use this opportunity to lobby for #27449 (comment) (IEnumerable should have a extension for creating fixed size chunks) 😄

@GSPP
Copy link
Author

GSPP commented Dec 9, 2020

@eiriktsarpalis

Playing devil's advocate: what problem does ExceptBy solve? You assert that a naive workaround would take quadratic time, however most instances I've seen would simply use completedOrderIDs.ToHashSet(); outside of the main pipeline which is effectively equivalent.

Yes, this would be manually implementing what ExceptBy would automate. Having ExceptBy makes this a side-effect-free expression which composes nicely with the rest of the query. It takes away the need to manage collections and iterate. It also leads to lazy behavior.

Here is a more complex example for ExceptBy. We have two collections of objects of different type:

IEnumerable<Order> orders = ...;
IEnumerable<Payment> payments = ...;
var unpayedOrders = orders.ExceptBy(o => o.InvoiceID, payments, p => p.InvoiceID);

This showcases the need for supporting two collections of differing types. This cannot be supported with an IEqualityComparer.


If I understand UnionBy correctly, is it basically using keySelector to define custom a equality semantics for TSource? How is that different from passing a custom IComparer to the existing Union method? Why should it even have a different name?

I believe you are correct. It seems consistent to make all these types of methods have a By postfix. This could be done differently.


IntersectBy seems to be correct to me. I'll quote myself for the rationale:

Note, that the input element types for ExceptBy and IntersectBy can be different. This is because we only ever return items of the first sequence. We need two key selectors in that case. There's a simpler overload that has only one element type as well.

Can you elaborate on what might not make sense here?


custom equality comparer

Some of these could be done with a custom comparer while some cannot. Those, that process items of differing types need a key selector. I hope you find my "business example" plausible. I certainly have worked many times on line of business app code like this (I'm a consultant).

@eiriktsarpalis
Copy link
Member

eiriktsarpalis commented Dec 9, 2020

IntersectBy seems to be correct to me. I'll quote myself for the rationale:

Note, that the input element types for ExceptBy and IntersectBy can be different. This is because we only ever return items of the first sequence. We need two key selectors in that case. There's a simpler overload that has only one element type as well.

Can you elaborate on what might not make sense here?

Ah, I had missed that clarification. I had assumed that IntersectBy, being dual to UnionBy, would share the same signatures as well. You're right that the second source doesn't need to be of the same type. But I'm wondering if we're sacrificing clarity in order to extract the most generalized solution possible. The term intersection implies finding the common elements of two collections of same things, so I'm wondering if this particular generalization should be called something different. By extension, I'm wondering if this rationale also applies to ExceptBy, assuming we associate this operation with set difference.

So I would probably tend towards adding the following APIs:

namespace System.Linq
{
    public static class Enumerable
    {
        IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector);
        IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer);

        IEnumerable<TSource> ExceptBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector);
        IEnumerable<TSource> ExceptBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer);

        IEnumerable<TSource> IntersectBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector);
        IEnumerable<TSource> IntersectBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer);

        IEnumerable<TSource> UnionBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector);
        IEnumerable<TSource> UnionBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer);

        TSource MinBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector);
        TSource MinBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult>? comparer);

        TSource MaxBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector);
        TSource MaxBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult>? comparer);

        // Missing min & max overloads accepting custom comparers added for completeness
        TResult Min<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult>? comparer);
        TResult Max<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult>? comparer);
    }
}

Note that I've removed the Min*/Max* overloads that take a defaultValue parameter. Did this since #20064 uses the *OrDefault suffix for methods that accept default values. Since it's an unrelated addition I would recommend considering those overloads in a future iteration.

@GSPP
Copy link
Author

GSPP commented Dec 26, 2020

Removing support for heterogeneous sources (e.g. ExceptBy<TSource1, TSource2, TKey>) loses quite a bit of usefulness. For instance, it breaks my example in which I use ExceptBy to combine orders with payments.

In my estimation, most uses of these By methods with two sources would be heterogeneous. This is kind of the point of selecting a key to join by. These operations logically are quite related to a group join operation. Essentially, they match up elements from two sources by key equality. Then, each group is processed independently. Joins, too, require heterogeneous sources to be useful. So this is a heuristic argument.

I don't see much of a clarity impact here. The call sites look very simple and if you just write the obvious code the correct overload will be picked.

@eiriktsarpalis
Copy link
Member

eiriktsarpalis commented Jan 5, 2021

I'm simply arguing that the particular signature is not defining an intersection operation, at least from a strict set theoretic perspective. If the point of the *By methods is to provide an alternative to the existing Except method, that uses a lambda instead of a custom equality comparer, then the proposed signature is trying to do a lot more than that. I therefore think it should go by a different name.

Do you have examples of other LINQ-like APIs that use a similar signature for intersection operations?

@GSPP
Copy link
Author

GSPP commented Jan 5, 2021

the existing Except method that uses a lambda

I do not believe there is a lambda-taking existing method. A comparer certainly is different from a keySelector for two reasons:

  1. Creating a custom comparer is so much code overhead that it often is not worth it. A key innovation in LINQ was that comparers could often be replaced with a lambda expression. That was magical back then.
  2. The keySelector moves the comparison from the main item to a secondary computed value. For example, we are no longer examining "customers", we are examining the ID of a customer.

Do you have examples of other LINQ-like APIs that use a similar signature for intersection operations?

I can certainly see your point that a different name than IntersectBy or ExceptBy would be appropriate. I personally consider it fully appropriate to keep the naming scheme. At the core, this is still a set operation even if not in the classical mathematical sense.

@eiriktsarpalis
Copy link
Member

the existing Except method that uses a lambda

I do not believe there is a lambda-taking existing method.

Perhaps I didn't phrase my original statement very well: I'm not claiming that there currently is a method accepting a lambda, but that the proposed ExceptBy method could be considered an overload of the existing Except methods (the By suffix being a convention for methods that define equality via projection). As such I think that any addition should adhere to what the existing methods do rather than try to generalize into something else even if it is "essentially the same".

@eiriktsarpalis eiriktsarpalis added the api-needs-work API needs work before it is approved, it is NOT ready for implementation label Jan 20, 2021
@eiriktsarpalis
Copy link
Member

@GSPP I'd be keen on getting a version of this approved in time for .NET 6, but I would still gravitate towards the shape as described here. Would it be possible to update your original post using the API proposal issue template so I can mark this ready for API review?

I still think intersecting and subtracting over heterogeneous sources is useful, so it's certainly something I'll be bringing up during the discussion.

@eiriktsarpalis eiriktsarpalis self-assigned this Jan 21, 2021
@terrajobst
Copy link
Member

terrajobst commented Jan 28, 2021

Video

  • Looks good as proposed.
namespace System.Linq
{
    public static class Enumerable
    {
        public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector);
        public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer);
        
        public static IEnumerable<TSource> ExceptBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector);
        public static IEnumerable<TSource> ExceptBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer);
        public static IEnumerable<TSource> ExceptBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TKey> second, Func<TSource, TKey> keySelectorFirst);
        public static IEnumerable<TSource> ExceptBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TKey> second, Func<TSource, TKey> keySelectorFirst, IEqualityComparer<TKey>? comparer);

        public static IEnumerable<TSource> IntersectBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector);
        public static IEnumerable<TSource> IntersectBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer);
        public static IEnumerable<TSource> IntersectBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TKey> second, Func<TSource, TKey> keySelectorFirst);
        public static IEnumerable<TSource> IntersectBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TKey> second, Func<TSource, TKey> keySelectorFirst, IEqualityComparer<TKey>? comparer);
        
        public static IEnumerable<TSource> UnionBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector);
        public static IEnumerable<TSource> UnionBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer);
        
        public static TSource MinBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector);
        public static TSource MinBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult>? comparer);
        
        public static TSource MaxBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector);
        public static TSource MaxBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult>? comparer);
        
        // Missing min & max overloads accepting custom comparers added for completeness
        public static TResult Min<TSource, TResult>(this IEnumerable<TSource> source, IComparer<TResult>? comparer);
        public static TResult Max<TSource, TResult>(this IEnumerable<TSource> source, IComparer<TResult>? comparer);
    }

    public static class Queryable
    {
        public static IQueryable<TSource> DistinctBy<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> keySelector);
        public static IQueryable<TSource> DistinctBy<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> keySelector, IEqualityComparer<TKey>? comparer);

        public static IQueryable<TSource> ExceptBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TSource> source2, Expression<Func<TSource, TKey>> keySelector);
        public static IQueryable<TSource> ExceptBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TSource> source2, Expression<Func<TSource, TKey>> keySelector, IEqualityComparer<TKey>? comparer);
        public static IQueryable<TSource> ExceptBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TKey> source2, Expression<Func<TSource, TKey>> keySelectorFirst);
        public static IQueryable<TSource> ExceptBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TKey> source2, Expression<Func<TSource, TKey>> keySelectorFirst, IEqualityComparer<TKey>? comparer);

        public static IQueryable<TSource> IntersectBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TSource> source2, Expression<Func<TSource, TKey>> keySelector);
        public static IQueryable<TSource> IntersectBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TSource> source2, Expression<Func<TSource, TKey>> keySelector, IEqualityComparer<TKey>? comparer);
        public static IQueryable<TSource> IntersectBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TKey> source2, Expression<Func<TSource, TKey>> keySelectorFirst);
        public static IQueryable<TSource> IntersectBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TKey> source2, Expression<Func<TSource, TKey>> keySelectorFirst, IEqualityComparer<TKey>? comparer);

        public static IQueryable<TSource> UnionBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TSource> source2, Expression<Func<TSource, TKey>> keySelector);
        public static IQueryable<TSource> UnionBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TSource> source2, Expression<Func<TSource, TKey>> keySelector, IEqualityComparer<TKey>? comparer);

        public static TSource MinBy<TSource, TResult>(this IQueryable<TSource> source, Expression<Func<TSource, TResult>> selector);
        public static TSource MinBy<TSource, TResult>(this IQueryable<TSource> source, Expression<Func<TSource, TResult>> selector, IComparer<TResult>? comparer);

        public static TSource MaxBy<TSource, TResult>(this IQueryable<TSource> source, Expression<Func<TSource, TResult>> selector);
        public static TSource MaxBy<TSource, TResult>(this IQueryable<TSource> source, Expression<Func<TSource, TResult>> selector, IComparer<TResult>? comparer);

        // Missing min & max overloads accepting custom comparers added for completeness
        public static TResult Min<TSource, TResult>(this IQueryable<TSource> source, IComparer<TResult>? comparer);
        public static TResult Max<TSource, TResult>(this IQueryable<TSource> source, IComparer<TResult>? comparer);
    }
}

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jan 28, 2021
@YohDeadfall
Copy link
Contributor

And will the corresponding methods be added to Queryable, at least some of them? DistinctBy, for reference, can be used in queries by EF and even has the corresponding construction in PostgreSQL (DISTINCT BY).

@eiriktsarpalis
Copy link
Member

@YohDeadfall we're adding queryable methods for all of the above.

@eiriktsarpalis eiriktsarpalis added the help wanted [up-for-grabs] Good issue for external contributors label Jan 28, 2021
@eiriktsarpalis eiriktsarpalis removed their assignment Jan 28, 2021
@eiriktsarpalis eiriktsarpalis modified the milestones: Future, 6.0.0 Jan 28, 2021
@omariom
Copy link
Contributor

omariom commented Jan 29, 2021

@GSPP
Copy link
Author

GSPP commented Jan 31, 2021

Awesome that this is happening. I guarantee you: These new methods will be appreciated widely and we will see mentions on Stack Overflow and other community places. Developers love little gems like these.

@MgSam
Copy link

MgSam commented Feb 4, 2021

Now that this is approved, should we make a separate issue for IntersectBy and ExceptBy for heterogenous sources? I think those are also extremely useful and there's not a simple workaround to get the same functionality with existing LINQ methods.

@eiriktsarpalis
Copy link
Member

Now that this is approved, should we make a separate issue for IntersectBy and ExceptBy for heterogenous sources?

We shouldn't need to, the approved API includes the following overloads which should be sufficient to address the requirement:

public static IQueryable<TSource> ExceptBy<TSource, TKey>(this IQueryable<TSource> source1, IEnumerable<TKey> source2, Expression<Func<TSource, TKey>> keySelectorFirst);
public static IEnumerable<TSource> IntersectBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TKey> second, Func<TSource, TKey> keySelectorFirst);

@Clockwork-Muse
Copy link
Contributor

Side question: are we going to be adding any of these to PLINQ?

@MgSam
Copy link

MgSam commented Feb 4, 2021

@eiriktsarpalis That only works if the the second collection is already the type of the projection. I'd say that's less common than the case when the collections are two completely unrelated types.

class Dog { public string DogName {get; set;} }
class Cat { public string CatName {get; set;} }

public static IEnumerable<TFirst> IntersectBy<TFirst, TSecond, TKey>(
    this IEnumerable<TFirst> first, 
    IEnumerable<TSecond> second, 
    Func<TFirst, TKey> firstKeySelector, 
    Func<TSecond, TKey> secondKeySelector);
public static IEnumerable<TFirst> ExceptBy<TFirst, TSecond, TKey>(
    this IEnumerable<TFirst> first, 
    IEnumerable<TSecond> second, 
    Func<TFirst, TKey> firstKeySelector, 
    Func<TSecond, TKey> secondKeySelector);

//Find dogs where a cat shares their name
var popularDogs = dogs.IntersectBy(cats, d => d.DogName, c => c.CatName);

//Find dogs where a cat does not share their name
var unpopularDogs = dogs.ExceptBy(cats, d => d.DogName, c => c.CatName);

Obviously this example is contrived but real-world scenarios like this happen all the time.

@eiriktsarpalis
Copy link
Member

@MgSam this should probably cover your use case:

//Find dogs where a cat shares their name
var popularDogs = dogs.IntersectBy(cats.Select(c => c.CatName), d => d.DogName);

//Find dogs where a cat does not share their name
var unpopularDogs = dogs.ExceptBy(cats.Select(c => c.CatName), d => d.DogName);

Side question: are we going to be adding any of these to PLINQ?

Seems reasonable, but I would defer to @adamsitnik, @carlossanlop and @jozkee on that call. I would recommend creating a separate issue and we can revisit at a later iteration.

@MgSam
Copy link

MgSam commented Feb 8, 2021

@eiriktsarpalis Great point that there is a relatively easy workaround given the new methods. I haven't wrapped my brain around having them available yet. :)

That said, these overloads that I've proposed are straightforward and useful enough to be added outright, IMO. In other parts of LINQ, (like GroupBy<TSource,TKey,TResult>(IEnumerable<TSource>, Func<TSource,TKey>, Func<TKey,IEnumerable<TSource>,TResult>)) additional Selects are eschewed in favor of offering useful overloads that do the projection for you. Also, the new Zip overload is similarly just for convenience.

@C-xC-c
Copy link
Contributor

C-xC-c commented Mar 23, 2021

I would like to work on this if possible

@eiriktsarpalis
Copy link
Member

Hi @C-xC-c, thanks for volunteering. I had actually started working on the feature a few weeks back but just didn't have enough time to see it through: eiriktsarpalis@7e37b77. Feel free to reuse any or none of that.

@eiriktsarpalis
Copy link
Member

Hi again @C-xC-c, have you begun work on the feature? I actually just finished work on my own branch so will post a PR today. Hope that's ok.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Mar 28, 2021
@C-xC-c
Copy link
Contributor

C-xC-c commented Mar 29, 2021

I was working on my branch locally but that's okay! Glad to see this get done!

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Apr 9, 2021
@ghost ghost locked as resolved and limited conversation to collaborators May 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Linq help wanted [up-for-grabs] Good issue for external contributors
Projects
None yet