-
Notifications
You must be signed in to change notification settings - Fork 416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] SortedDistinctBy #836
Comments
This is needed! There is no Linq equivalent, and using |
I believe Following is an example showing var data = new[]
{
new { Date = new DateTime(2010, 1, 1), Value = 1 },
new { Date = new DateTime(2010, 1, 1), Value = 2 },
new { Date = new DateTime(2010, 1, 1), Value = 3 },
new { Date = new DateTime(2010, 1, 2), Value = 4 },
new { Date = new DateTime(2010, 1, 2), Value = 5 },
new { Date = new DateTime(2010, 1, 2), Value = 6 },
};
var q =
from g in data.GroupAdjacent(e => e.Date)
select new { Date = g.Key, Value = g.Average(e => e.Value) };
foreach (var e in q)
Console.WriteLine(e.ToString()); Prints:
I'll close this assuming |
@atifaziz You can get that as |
@atifaziz Close enough, but I believe it is still a bit more expensive than directly implemented |
What if you want the last value? What if you want a count of items under the key? Who decides? Assuming the first would be very limiting.
It's not that it's clumsy, but perhaps wasteful to collect all values for a group only to throw away all but the first.
@Shelim That's a hard problem to solve. I think what's being asked here is a variation of public static IEnumerable<TResult> AggregateAdjacent<TSource, TKey, TAccumulator, TResult>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
Func<TKey, TSource, TAccumulator> seedSelector,
Func<TKey, TAccumulator, TSource, TAccumulator> aggregator,
Func<TKey, TAccumulator, TResult> resultSelector,
IEqualityComparer<TKey>? comparer = null)
{
comparer ??= EqualityComparer<TKey>.Default;
using var item = source.GetEnumerator();
if (!item.MoveNext())
yield break;
var key = keySelector(item.Current);
var runKey = key;
var accumulator = seedSelector(key, item.Current);
while (item.MoveNext())
{
key = keySelector(item.Current);
if (comparer.Equals(runKey, key))
{
accumulator = aggregator(runKey, accumulator, item.Current);
continue;
}
else
{
yield return resultSelector(runKey, accumulator);
runKey = key;
accumulator = seedSelector(key, item.Current);
}
}
yield return resultSelector(runKey, accumulator);
} Here are some examples of uses: var data = new[]
{
new { Date = new DateTime(2010, 1, 1), Value = 1 },
new { Date = new DateTime(2010, 1, 1), Value = 2 },
new { Date = new DateTime(2010, 1, 2), Value = 4 },
new { Date = new DateTime(2010, 1, 2), Value = 5 },
new { Date = new DateTime(2010, 1, 2), Value = 6 },
new { Date = new DateTime(2010, 1, 3), Value = 7 },
};
Console.WriteLine("First of each date:");
foreach (var item in data.AggregateAdjacent(e => e.Date, (_, e) => e, (_, a, _) => a, (_, a) => a))
Console.WriteLine(item.ToString());
Console.WriteLine("Last of each date:");
foreach (var item in data.AggregateAdjacent(e => e.Date, (_, e) => e, (_, _, e) => e, (_, e) => e))
Console.WriteLine(item.ToString());
Console.WriteLine("Count per date:");
foreach (var item in data.AggregateAdjacent(e => e.Date, (_, e) => 1, (_, a, _) => a + 1,
(d, a) => new { Date = d, Count = a }))
Console.WriteLine(item.ToString()); Prints:
To permit multiple aggregations efficiently in a single iteration, one could then use the same strategy as we did with |
The semantics of the existing This is intuitive and quite useful. If someone needs something else they should use another primitive like the
This is clearly not a question for I feel like you're expanding this into harder territory than needs be.
Having new higher-level apis that perform optimized aggregations over sorted sequences is a great idea but it's a different, more complex thing that surely should have its own issue. |
Exactly as stated The base-pure LINQ |
What if the input is not sorted? I propose:
|
"Ordered" is a vague concept anyway when a comparator is not given... for all we know they could be sorted in descending order, or with their bits reversed, or anything really... |
It's important for a use case of yours. |
Yes, I agree. The expected result is totally deterministic and if you have repeated but not adjacent values, this API would yield two items which have non-distinct key (albeit spaced by at least one other item). |
For reference here's the code we use for that in my project.
public static IEnumerable<T> OrderedDistinctBy<T, K>(this IEnumerable<T> source, Func<T, K> selector)
{
var comparer = EqualityComparer<K>.Default;
var first = true;
var previous = default(K);
foreach (var x in source)
{
if (first)
{
previous = selector(x);
first = false;
yield return x;
continue;
}
var current = selector(x);
if (!comparer.Equals(previous!, current))
{
previous = current;
yield return x;
}
}
} |
@jods4 - FYI: This is already implemented in the System.Interactive package as |
@jods4, @Shelim: I don't see the point in duplicating |
@atifaziz |
@atifaziz I don't want to bring in System.Interactive just for a single function that is less than 20 LoC, so I'm gonna stick with my local implementation. |
Actually, this is not something that needs a separate operator- there is already a solution that exists using the existing operators: var distinct = source.Lag((curr, lag) => (curr, lag)).Where(x => !comparer.Equals(x.lag, x.curr)).Select(x => x.curr); This has similar memory performance as your implementation, but doesn't require a full operator to be implemented. |
Same API as
DistinctBy
, but assumes a sorted input (by the distinct key), so it doesn't have to keep a full hashmap of previously seen values.The text was updated successfully, but these errors were encountered: