Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Enumerable.Repeat overload accepting a Func. #19454

Closed
jamesqo opened this issue Nov 23, 2016 · 11 comments
Closed

Add Enumerable.Repeat overload accepting a Func. #19454

jamesqo opened this issue Nov 23, 2016 · 11 comments
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Linq
Milestone

Comments

@jamesqo
Copy link
Contributor

jamesqo commented Nov 23, 2016

Background

The current Enumerable.Repeat function, accepting a T, returns an enumerable with the same value repeated N times. Sometimes, however, you don't want the same value repeated N times but the same function called N times. For example:

var r = new Random();
var randomNumbers = Enumerable.Repeat(r.Next(), 100); // Newbie: Why isn't this working?
var reallyRandomThisTime = Enumerable.Repeat(_ => r.Next(), 100); // That does the trick!

The current workaround for this is to do something like Enumerable.Range(0, size).Select(_ => r.Next()), which is not really as readable. I do it myself all the time in my benchmarks for example, & you can see such examples on StackOverflow.

Proposal

We should add an overload of Repeat accepting a Func<T, T>. This function will be evaluated N times as the enumerable is iterated. The input to each invocation of the function will be the result of the last invocation; the input to the first invocation will either be a user-specified seed, or default(T) if no seed was specified.

namespace System.Linq
{
    public static class Enumerable
    {
        public static IEnumerable<T> Repeat<T>(Func<T, T> selector, int count);
        public static IEnumerable<T> Repeat<T>(T seed, Func<T, T> selector, int count);
    }
}

Usage

// Numbers from 100 to 1
Enumerable.Repeat(100, n => n - 1, 100);

// Collect 100 nodes of a linked list (slowly)
Enumerable.Repeat(head, n => n.Next, 100);

// Record the results of a benchmark 10 times
Enumerable.Repeat(_ => RunBenchmark(), 10);
@jamesqo jamesqo changed the title Proposal: Add Enumerable.Repeat overload accepting a Func<T>. Proposal: Add Enumerable.Repeat overload accepting a Func. Nov 23, 2016
@jamesqo
Copy link
Contributor Author

jamesqo commented Nov 23, 2016

cc @JonHanna & @svick & @VSadov

@bartdesmet
Copy link
Contributor

Related proposal and discussion @ https://github.com/dotnet/corefx/issues/5078.

One possible concern with the overloads proposed is that a closed-over resource used in the selector is not exclusive to the enumeration. In the example, this can lead to usage of Random from multiple threads. It may make sense to have an overload that has a factory for a resource that's used by the selector.

Symmetry with the Aggregate family would make sense because generators and aggregators are closely related (anamorphism versus catamorphism, see paper). Some inspiration could be found in Ix.NET as well, where we extended LINQ to Objects with operators based on the duality with Rx.NET.

@svick
Copy link
Contributor

svick commented Nov 26, 2016

I'm not sure the combination of count and iterative selector makes sense.

Compare this with F#, which has the two functions init and unfold (expressed using C# syntax for convenience):

public static IEnumerable<T> init<T>(int count, Func<int, T> initializer);
public static IEnumerable<T> unfold<T, State>(
    Func<State, Option<Tuple<T, State>>> generator, State state);

Notice that init, which takes a count does not have iterative function, but instead a function that takes an index. And unfold, which is iterative, ends based on a condition that is governed by the passed-in function (when the Option it returns is None).

I think this reflects more closely the cases where this kind of generating a sequence is useful:

  1. The input parameter does not matter. For example the Random.Next case.
  2. The input parameter should be index. This is what init is meant for. I think this would be better match for the "numbers from 100 to 1" case: init(100, i => 100 - i). Another example would be getting values at even positions in a list: init((list.Count+1) / 2, i => list[i * 2]).
  3. The input parameter should be generated by the previous iteration. This is useful for iterating linked lists and similar situations and it's what unfold is for. You usually don't know the length of the sequence up front, but instead it's ended based on some condition in the function (like encountering the end of the linked list). You also commonly want to return different value than what you use for iteration (e.g. returning the value of a linked list node, instead of the whole node), though that can be solved by a subsequent Select. If you also want to limit the count, you could use a subsequent Take.

Case 1 is not an issue, but I think the proposed design of Repeat does not serve cases 2 or 3 well. For 2, the index is not provided, so it would have to be computed manually, which would result in awkward code. For 3, the requirement to specify the exact count up front does not make sense.

So, I would instead suggest a design that follows init (and thus serves cases 1 and 2):

public static IEnumerable<T> Repeat<T>(Func<int, T> selector, int count);

I would probably leave serving case 3 to a separate proposal, though I think it could look like this (not sure about the name):

public static IEnumerable<T> Unfold<T, TState>(
    TState seed, Func<TState, (T value, TState state)? generator);

Or maybe:

public static IEnumerable<T> Unfold<T>(T seed, Func<T, (T value, bool end) generator);

@karelz karelz changed the title Proposal: Add Enumerable.Repeat overload accepting a Func. Add Enumerable.Repeat overload accepting a Func. Nov 30, 2016
@julealgon
Copy link

Did we abandon this proposal? It's already 2 years old and I see no new discussion over here. Is there something limiting us from going ahead with a final proposal for this one? These methods are really useful and I've needed this exact behavior multiple times and ended up creating extensions myself to do it. I do not think a Enumerable.Range(..).Select(..) is a fair substitute as that's not very intuitive to read especially since on Range you have to specify start and count (IMHO).

I agree with @svick that the initial proposal is a bit too barebones/not generic enough, but would suggest using Generate instead of Unfold as the name for the more advanced method, especially because that would also match the same methods from Reactive Extensions.

@stephentoub
Copy link
Member

stephentoub commented Jul 18, 2019

I do not think a Enumerable.Range(..).Select(..) is a fair substitute

I actually do think it's a fair substitute. The goal of Enumerable isn't to provide every possible extension method, but rather to provide the most useful and impactful set of methods that enable the most common scenarios and that can be composed to implement others.

The two main examples motivating this provided were:

// Numbers from 100 to 1
Enumerable.Repeat(100, n => n - 1, 100);

// Record the results of a benchmark 10 times
Enumerable.Repeat(_ => RunBenchmark(), 10);

That can be accomplished with Enumerable today with:

// Numbers from 100 to 1
Enumerable.Range(0, 100).Select(n => 100 - n)

// Record the results of a benchmark 10 times
Enumerable.Range(0, 10).Select(_ => RunBenchmark());

The third example of gathering the nodes of a list doesn't make sense to me as a "Repeat" method.

New APIs have significant overhead, not just in terms of implementing, testing, maintaining, documenting, etc. them, but also in terms of developers seeing them, needing to learn what they mean (personally the proposed overload that takes three arguments is very confusing to me at the call site), needing to decide which option they need, and so on. This does not meet the bar for me.

@jkindwall
Copy link

I actually do think it's a fair substitute. The goal of Enumerable isn't to provide every possible
extension method, but rather to provide the most useful and impactful set of methods that enable
the most common scenarios and that can be composed to implement others.

It's one thing to compose a solution to a problem out of steps that are clear in their intent and make sense as contributing to the final output. It's another thing to force a developer to generate garbage data simply to take advantage of the side-effect that it produces a sequence of the desired length to iterate over.

If you simply wanted to execute a function 10 times, you wouldn't do this:

int[] zeroes = new int[10];
foreach (int zero in zeroes)
{
    DoSomething();
}

instead of

for(int i = 0; i < 10; i++)
{
    DoSomething();
}

as the former seems at a glance to imply there is some other purpose for the int array outside of the foreach loop.

Likewise, if all you want is to use the cleaner, Linq syntax to create a sequence of 10 unique items, doing this:

IEnumerable<Item> myItems = Enumerable
    .Range(0, 10)
    .Select(_ => CreateAnItem());

seems to imply you are somehow using a sequence of consecutive numbers to generate your items when actually, you're just throwing those numbers away.

Whereas this is much more clear in its intent:

IEnumerable<Item> myItems = Enumerable
    .Repeat(() => CreateAnItem(), 10);

Another alternative to using Enumerable.Range is just to use the existing Repeat method in the same way:

IEnumerable<Item> myItems = Enumerable
    .Repeat(<literally any value of any type>, 10)
    .Select(_ => CreateAnItem());

But the problem is the same. You're generating garbage data that has nothing to do with the goal you are trying to accomplish other than the fact that it's packaged in a sequence that you can use to iterate over.

@stephentoub
Copy link
Member

for(int i = 0; i < 10; i++) is "generating garbage data", too.

@jkindwall
Copy link

Actually, its not. You need something to keep track of the number of times the loop is iterated, and its immediately clear to anyone familiar with the basics of the language that that is the purpose of the variable "i" in the for statement (even if i isn't actually used in the body of the loop).

Contrast that with Enumerable.Range(0, 10).Select(...) where the actual contents of the generated integer sequence is useless and only the length of the sequence matters.

@stephentoub
Copy link
Member

In both cases, there is an iteration variable. In both cases, it's available to the work being performed for each iteration. In both cases, it's ignored.

@jkindwall
Copy link

Its about clarity of intent. In one case the intent is immediately clear, in the other its not.

@stephentoub
Copy link
Member

We will have to agree to disagree then. I believe the intent here is clear.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Linq
Projects
None yet
Development

No branches or pull requests

7 participants