-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: Linq CountBy method #77716
Comments
Tagging subscribers to this area: @dotnet/area-system-linq Issue DetailsBackground and motivationCurrently, it is not so easy to count occurrences of var arr = new[]{1, 2, 2, 3, 3, 3};
var counts = arr
.GroupBy(x => x)
.ToDictionary(x => x.Key, x => x.Count()); Motivation
Other language examplesF#: let arr = [|1; 2; 2; 3; 3; 3|]
let counts = arr |> Seq.countBy id Python: from collections import Counter
arr = [1, 2, 2, 3, 3, 3]
counts = Counter(arr) Kotlin: val arr = arrayOf(1, 2, 2, 3, 3, 3)
val counts = arr.groupingBy { it }.eachCount() BenchmarkBenchmark results:
API Proposalnamespace System.Collections.Generic;
public static partial class Enumerable
{
public static IEnumerable<KeyValuePair<TKey, int>> CountBy<TSource, TKey>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
IEqualityComparer<TKey>? keyComparer = null) where TKey : notnull
} API Usagevar array = new []{1, 2, 2, 3, 3, 3};
var counts = array.CountBy(x => x);
foreach (var (key, count) in counts)
Console.WriteLine($"{key}: {count}"); Alternative Designs// 1 namespace System.Collections.Generic;
public static partial class Enumerable
{
public static IEnumerable<KeyValuePair<TKey, long>> LongCountBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? keyComparer = null) where TKey : notnull
} // 2 array.GroupBy(x => x).Counts(); RisksLooks like there is no C# method with the same name in
|
|
@epeshk yep, true. I am just pointing out that if someone wants this functionality today, it is offered by a respectable third-party library. No need to wait until the advent of .NET 8 (assuming that this API proposal will be approved). |
I'm marking this as Future for now because we don't have this on our plans for now. If anyone is willing to contribute implementation I will change it to 8.0 which should give API review higher priority |
I want to contribute this feature and start working on it in https://github.com/epeshk/runtime/tree/countBy |
Please hold until the issue is marked api-approved. Thanks! |
Please also note return type and nullability concerns. It is probably good to discuss and remove bad options before review. I think good choices for the return type are (ordered by my preferences):
This method also may or may not accept nulls as |
For source-level compatibility with MoreLinq and SuperLinq, it would be helpful to keep return type as either |
Also, both MoreLinq and SuperLinq allow |
Seeing as x.CountBy();
x.CountBy(y => y.Key);
x.LongCountBy();
x.LongCountBy(y => y.Key); Also, I feel like prefixing the methods with |
@KieranDevvs Those versions of And since this is a fairly niche method, I don't think it's important for it to have a more succinct way to write @krwq @eiriktsarpalis Since @epeshk offered to contribute implementation, should this be moved back to the 8.0 milestone, to speed up API review? |
Any chance to get it in for the .NET 8 release? |
@WeihanLi very unlikely at this point. |
Name suggestions:
|
A potential alternative is having an public static IEnumerable<KeyValuePair<TKey, TAccumulate>> AggregateBy<TSource, TAccumulate, TKey>(
this IEnumerable<TSource> source,
TAccumulate seed,
Func<TAccumulate, TKey, TSource, TAccumulate> func,
Func<TSource, TKey> keySelector); which lets you express source.AggregateBy(0, (count, _, _) => ++count, keySelector); |
I would suggest that the public static IEnumerable<(TKey key, TAccumulate result)> AggregateBy<TSource, TKey, TAccumulate>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
Func<TKey, TAccumulate> seedSelector,
Func<TAccumulate, TKey, TSource, TAccumulate> accumulator) which lets you express source.AggregateBy(_ => 0, (count, _, _) => ++count, keySelector); |
namespace System.Linq;
public static partial class Enumerable
{
public static IEnumerable<KeyValuePair<TKey, int>> CountBy<TSource, TKey>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
IEqualityComparer<TKey>? keyComparer = null) where TKey : notnull;
}
public static partial class Queryable
{
public static IQueryable<KeyValuePair<TKey, int>> CountBy<TSource, TKey>(
this IQueryable<TSource> source,
Expression<Func<TSource, TKey>> keySelector,
IEqualityComparer<TKey>? keyComparer = null) where TKey : notnull;
} |
Bringing back for API review since the question of eager vs. delayed evaluation was brought up, see Proposed alternative API with eager semantics: namespace System.Linq;
public static partial class Enumerable
{
public static IReadOnlyDictionary<TKey, int> CountBy<TSource, TKey>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
IEqualityComparer<TKey>? keyComparer = null) where TKey : notnull;
}
public static partial class Queryable
{
public static IQueryable<IReadOnlyDictionary<TKey, int>> CountBy<TSource, TKey>(
this IQueryable<TSource> source,
Expression<Func<TSource, TKey>> keySelector,
IEqualityComparer<TKey>? keyComparer = null) where TKey : notnull;
} |
After a long discussion we ended up back at the previously-approved shape, with a delayed evaluation. namespace System.Linq;
public static partial class Enumerable
{
public static IEnumerable<KeyValuePair<TKey, int>> CountBy<TSource, TKey>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
IEqualityComparer<TKey>? keyComparer = null) where TKey : notnull;
}
public static partial class Queryable
{
public static IQueryable<KeyValuePair<TKey, int>> CountBy<TSource, TKey>(
this IQueryable<TSource> source,
Expression<Func<TSource, TKey>> keySelector,
IEqualityComparer<TKey>? keyComparer = null) where TKey : notnull;
} |
For |
This was discussed, however |
Draft implementation: alternative design 2
Background and motivation
Currently, it is not so easy to count occurrences of
IEnumerable<T>
elements with existing LINQ methods in C#.Motivation
GroupBy
for counting because it materializesGrouping
objects with a list of values related to each key..Count()
will not enumerate the sequence again.Other language examples
F#:
Python:
Kotlin:
Benchmark
Benchmark results:
Benchmark for case without repetitions:
Seq = Enumerable.Range(0, SequenceLength).ToArray();
API Proposal
API Usage
Find any most frequent element:
Find duplicates:
Alternative Designs
1. With separated result type, inspired by
ILookup
2. Same as previous, but without
TKey : notnull
constraintFind duplicates:
3.
The extension method for
IEnumerable<IGrouping<TKey, TValue>>
with internal knowledge aboutGroupingEnumerator
allowing to Count values without materializing lists in memory.Questions to decide:
IEnumerable<KeyValuePair<TKey, TValue>>
as a return type or specific type instead for better naming, likeIEnumerable<(TKey Key, TCount Count)>
:Find duplicates:
Should we return
IEnumerable<>
for simplicity or go withIDictioinary<>/IReadOnlyDictionary<>/ICounts<>
to avoid copying? Specific return type could improve performance for ~x1.5 for cases when user needs a lookup.F# returns sequence, Kotlin — map, Python — special map-like type
Should
TKey
allownull
? For example,ToLookup
allownull
andDictionary
not.Risks
Looks like there is no C# method with the same name in
dotnet
organization projects.https://github.com/search?q=org%3Adotnet+CountBy&type=code
The text was updated successfully, but these errors were encountered: