-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider adding an AggregateBy
method to LINQ
#91533
Comments
Tagging subscribers to this area: @dotnet/area-system-linq Issue DetailsA potential alternative is having an public static IEnumerable<KeyValuePair<TKey, TAccumulate>> AggregateBy<TSource, TAccumulate, TKey>(
this IEnumerable<TSource> source,
TAccumulate seed,
Func<TAccumulate, TKey, TSource, TAccumulate> func,
Func<TSource, TKey> keySelector); which lets you express source.AggregateBy(0, (count, _, _) => ++count, keySelector); Originally posted by @eiriktsarpalis in #77716 (comment)
|
I would suggest that the public static IEnumerable<KeyValuePair<TKey, TAccumulate>> AggregateBy<TSource, TKey, TAccumulate>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
Func<TKey, TAccumulate> seedSelector,
Func<TAccumulate, TKey, TSource, TAccumulate> accumulator) |
What does this mean? Which |
@chrisoverzero Dummy example for how the method would be called: var sum = Enumerable.Range(1, 100)
.AggregateBy(
keySelector: x => x % 10,
seedSelector: k => k + 100,
accumulator: (sum, key, src) => sum + src); It's probably rare that the seed would need to be selected per key, but it would be simple to design for it and would help significantly in those cases. |
What's a real-world use case where the seed value is predicated on the key? In most applications I can think of the value of "zero" is fixed (typically either public static IEnumerable<KeyValuePair<TKey, TAccumulate>> AggregateBy<TSource, TAccumulate, TKey>(
this IEnumerable<TSource> source,
TAccumulate seed,
Func<TAccumulate, TSource, TAccumulate> func,
Func<TSource, TKey> keySelector); Assuming the public static IEnumerable<KeyValuePair<TKey, TAccumulate>> AggregateBy<TSource, TKey, TAccumulate>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
Func<TKey, TAccumulate> seedSelector,
Func<TAccumulate, TKey, TSource, TAccumulate> accumulator)
{
return source.
AggregateBy<TSource, KeyValuePair<TKey, TAccumulate>?, TKey>(
seed: null,
func: (state, source) =>
{
if (state is not { Key: TKey key, Value: TAccumulate acc })
{
key = keySelector(source);
acc = seedSelector(key);
}
acc = accumulator(acc, key, source);
return new(key, acc);
},
keySelector: keySelector)
.Select(kvp => kvp.Value!.Value);
} |
@eiriktsarpalis Fair enough. Two more questions:
|
Because we tend to put the key selector delegate at the very end of a method signature in
Possibly, what ends up being used in |
I've just updated the OP with a full-blown proposal and examples. Your feedback would be appreciated. |
@eiriktsarpalis Thanks for updating the proposal. I like the overload option of using a static I would argue that the
In the most relevant similarity ( In the set-based So, from a reading standpoint, the most natural order of parameters to me would go: 1. select a key, 2. define a seed from the key (or use a static key), 3. accumulate based on the group selected by the key. I welcome any comments to the contrary, as this is just my opinion on the matter. |
Also, if we are allowing more than one overload, does it make sense to offer a |
namespace System.Linq;
public static partial class Enumerable
{
public static IEnumerable<KeyValuePair<TKey, TAccumulate>> AggregateBy<TSource, TKey, TAccumulate>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
TAccumulate seed,
Func<TAccumulate, TSource, TAccumulate> func,
IEqualityComparer<TKey>? keyComparer = null) where TKey : notnull;
public static IEnumerable<KeyValuePair<TKey, TAccumulate>> AggregateBy<TSource, TKey, TAccumulate>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
Func<TKey, TAccumulate> seed,
Func<TAccumulate, TSource, TAccumulate> func,
IEqualityComparer<TKey>? keyComparer = null) where TKey : notnull;
}
public static partial class Queryable
{
public static IQueryable<KeyValuePair<TKey, TAccumulate>> AggregateBy<TSource, TKey, TAccumulate>(
this IQueryable<TSource> source,
Expression<Func<TSource, TKey>> keySelector,
TAccumulate seed,
Expression<Func<TAccumulate, TSource, TAccumulate>> func,
IEqualityComparer<TKey>? keyComparer = null) where TKey : notnull;
public static IQueryable<KeyValuePair<TKey, TAccumulate>> AggregateBy<TSource, TKey, TAccumulate>(
this IQueryable<TSource> source,
Expression<Func<TSource, TKey>> keySelector,
Expression<Func<TKey, TAccumulate>> seed,
Expression<Func<TAccumulate, TSource, TAccumulate>> func,
IEqualityComparer<TKey>? keyComparer = null) where TKey : notnull;
} |
The existing overloads have full control over the aggregate and how it is folder, why add another transform at the end? Even if it were necessary in certain scenaria, it seems to me that chaining it with a |
I am a bit late here as it is already approved, but why don't we have an overload without the seed as in the existing |
Not providing the |
I was thinking about this kind of overload, without public static IEnumerable<KeyValuePair<TKey, TSource>> AggregateBy<TSource, TKey>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
Func<TSource, TSource, TSource> func,
IEqualityComparer<TKey>? keyComparer = null) where TKey : notnull; |
This is a reducer method, so should most probably be called |
The CountBy implementation has just been merged. It is then time to go to the next level! |
@manandre go for it 👍 |
Background & Motivation
It is fairly common for LINQ users to want to calculate an aggregate for enumerables, grouped by a specific key. This is currently achievable using the
GroupBy
methods, however this approach is often inefficient as it forces intermediateIGrouping
allocations. This is highlighted as the motivation of theCountBy
API proposal.AggregateBy
is an extension of the existingAggregate
methods that lets users fold a source enumerable, grouped by key. It is a useful method that generalizes methods such asGroupBy
andCountBy
, but also works well for other applications. Similar APIs available in frameworks like Spark are calledfoldByKey
andaggregateByKey
.API Proposal
API Usage
Implementing
LongCountBy
Implementing
GroupBy
Calculating total scores by key
Open Questions
We might want to consider the order of type parameters and delegates in the type signature. As proposed it follows the convention that
TKey
and thekeySelector
delegate go to the end of the method signature but perhaps that's not as natural here.Reference Implementation
Originally posted by @eiriktsarpalis in #77716 (comment)
The text was updated successfully, but these errors were encountered: