Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unfold/Disaggregate #246

Closed
leandromoh opened this issue Feb 5, 2017 · 14 comments
Closed

Unfold/Disaggregate #246

leandromoh opened this issue Feb 5, 2017 · 14 comments
Milestone

Comments

@leandromoh
Copy link
Collaborator

leandromoh commented Feb 5, 2017

Unfold function (present and many functional languages like Haskell and F#), is a dual to fold/aggregate.

Unfold builds a list from a seed value. It takes a function which either returns null if it is done producing the list, or returns the tuple (x, y), x which is prepended to the list, and y is used as the next element in the recursive call. x and y may be of different types.

Unfold signature is:

IEnumerable<TResult> UnFold<TSource, TResult>(TSource seed, Func<TSource, Tuple<TResult, TSource>> generator)

Examples of usage:

var naturalNumbers = UnFold(0, n => Tuple.Create(n, n + 1));
var naturalNumbersAsString = UnFold(0, n => Tuple.Create(n.ToString(), n + 1));
var squaresLowerThan100 = UnFold(1, n => (n * n) < 100 ? Tuple.Create(n * n, n + 1) : null);

for some cases, the following overload may also be usefull/readable. it bassically split the function into 3 functions:

IEnumerable<TResult> UnFold<TSource, TResult>(TSource seed, Func<TSource, bool> predicate, Func<TSource, TResult> resultGenerator, Func<TSource, TSource> seedGenerator)

for example, we can do

var fibonacciNumbersLowerThan100 = UnFold(Tuple.Create(0, 1), n => n.Item1 < 100, n => n.Item1, n => Tuple.Create(n.Item2, n.Item1 + n.Item2));

instead of

var fibonacciNumbersLowerThan100 = UnFold(Tuple.Create(0, 1), n => n.Item1 < 100 ? Tuple.Create(n.Item1, Tuple.Create(n.Item2, n.Item1 + n.Item2)) : null);

I'm planning to submit a PR for this.

What do you think?

@leandromoh leandromoh changed the title UnFold UnFold/UnAggregate Feb 6, 2017
@fsateler
Copy link
Member

It would be great if we could use a type with better member names that Item1 and Item2:

class UnfoldMemo<TState, TResult> {
    public TState State { get; }
    public TResult Result { get; }

   public UnfoldMemo(TState state, TResult result) {
       State = state;
       Result = result;
   }
}
static class UnfoldMemo {
    public static UnfoldMemo<TState, TResult>Create<TState, TResult>(TState state, TResult result) => new UnfoldMemo(state, result);
}

@atifaziz
Copy link
Member

atifaziz commented Mar 1, 2017

Or make it completely generic so you can build on top and use anonymous types, tuples and what not. I would go with the following signature:

public static IEnumerable<TResult> Unfold<TState, T, TResult>(
    TState state,
    Func<TState, T> generator,
    Func<T, bool> predicate,
    Func<T, TState> stateSelector,
    Func<T, TResult> resultSelector)

Usage-wise, your examples would become:

var naturalNumbers =
    Unfold(0, x => new { x, s = x + 1 }, _ => true, e => e.s, e => e.x);
    
var naturalNumbersAsString =
    Unfold(0, x => new { x, s = x + 1 }, _ => true, e => e.s, e => e.x.ToString());
    
var squaresLowerThan100 =
    Unfold(1, x => new { x = x * x, s = x + 1 }, e => e.x < 100, e => e.s, e => e.x);

Granted they are longer but there's one benefit in the last case of squaresLowerThan100: the square is computed only once and then used in the predicate and result projection.

Anyway, if you add an overload using Tuple:

public static IEnumerable<T> Unfold<TState, T>(
    TState state, Func<TState, Tuple<T, TState>> generator) =>
    Unfold(state, generator, e => e != null, e => e.Item2, e => e.Item1);

then your examples would be as you proposed them:

var naturalNumbers = Unfold(0, n => Tuple.Create(n, n + 1));
var naturalNumbersAsString = Unfold(0, n => Tuple.Create(n.ToString(), n + 1));
var squaresLowerThan100 = Unfold(1, n => (n * n) < 100 ? Tuple.Create(n * n, n + 1) : null);

Here's the one for Fibonacci series:

var fibonacciNumbersLowerThan100 = // 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89
    Unfold(new { x = 0, y = 1 },
           s => new { r = s.x, x = s.y, y = s.x + s.y },
           s => s.r < 100,
           s => new { s.x, s.y },
           s => s.r);

Note that there's no Tuple or Item1 and Item2 and reads cleaner as @fsateler wanted it.

I wonder, though, if we should add the Tuple-based one at all. Once C# 7 ships, one can imagine adding an overload using the succinct tuple support, with the added benefit that there's no GC allocation:

public static IEnumerable<T> Unfold<TState, T>(
    TState state, Func<TState, (T, TState)?> generator) =>
    Unfold(state, generator, e => e != null, e => e.Value.Item2, e => e.Value.Item1);

The Fibonacci series example will then read quite nice and short (the only ugly part being ((int, (int, int))?) null):

var fibonacciNumbersLowerThan100 = // 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89
    Unfold((x: 0, y: 1), s => s.x < 100 ? (s.x, (s.y, s.x + s.y)) : ((int, (int, int))?) null);

@atifaziz
Copy link
Member

atifaziz commented Mar 1, 2017

As for name, I would go with either Unfold or Disaggregate.

@atifaziz atifaziz changed the title UnFold/UnAggregate Unfold/Disaggregate Mar 1, 2017
@leandromoh
Copy link
Collaborator Author

leandromoh commented Mar 2, 2017

Or make it completely generic so you can build on top and use anonymous types, tuples and what not. I would go with the following signature:

I really like this signature and its use! Although user writes more, it is of course cleaner.

there's one benefit in the last case of squaresLowerThan100: the square is computed only once and then used in the predicate and result projection.

in the original signature we can simple do

var squaresLowerThan100 = MoreEnumerable.UnFold(1, n => {
	var square = n * n;
	return square < 100 ? Tuple.Create(square, n + 1) : null;
});

I wonder, though, if we should add the Tuple-based one at all. Once C# 7 ships, one can imagine adding an overload using the succinct tuple support, with the added benefit that there's no GC allocation

I wondered the same. I think is valid create unfold using your 2 signatures (with generic and generic-based tuple) and later also add the C# 7 tuple overload.

Ah, one question: how a user that uses Visual Studio with C# 6 would see the unfold signature that use C# 7 tuples? Will he need to update something on VS? update VS itself?

The Fibonacci series example will then read quite nice and short (the only ugly part being ((int, (int, int))?) null):

Can't we simple return null since method signature define return type as (T, TState)? ?

As for name, I would go with either Unfold or Disaggregate.

Between these 2, I prefer Unfold, after all it's the concept name.

@atifaziz
Copy link
Member

atifaziz commented Mar 2, 2017

in the original signature we can simple do var squaresLowerThan100 = MoreEnumerable.UnFold(1, n => { var square = n * n; return square < 100 ? Tuple.Create(square, n + 1) : null; });

Yes, of course, technically you can but then you introduce a statement lambda as the optimal path. Perhaps if declaration expressions ever make it back into C# then this point will be mute.

how a user that uses Visual Studio with C# 6 would see the unfold signature that use C# 7 tuples? Will he need to update something on VS? update VS itself?

They will see them as ValueTuple. The usage won't be as nice and you'll only be able to work with Item1 and Item2 so the only benefit over a Tuple would be avoiding heap allocations. My hunch is then readability will win over that and people will revert to the most generic overload until a time they are in C# 7.

Can't we simple return null since method signature define return type as (T, TState)? ?

The problem here is the conditional operator (?:) that expects the true and false case expressions to be the same type or at least implicitly convertible between one and the other. When you look at s => s.x < 100 ? (s.x, (s.y, s.x + s.y)) : ((int, (int, int))?) null, the true case expression is (int, (int, int)) and not (int, (int, int))?, and there's no implicit conversion from null to the value type of (int, (int, int)). If you want to use null directly then you have to turn the other expression into (int, (int, int))?, which I don't know about you, but reads way uglier to me:

var fibonacciNumbersLowerThan100 = // 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89
    Unfold((x: 0, y: 1), s => s.x < 100 ? ((int, (int, int))?) (s.x, (s.y, s.x + s.y)) : null);

I'd rather add the ceremony to the far end and next to null as it has lesser value when reading back the code.

Name-wise Disaggregate would be the dual of Aggregate. I think we can make an exception in this case and go with Unfold but my fear is that people will get confused with its relation to Fold that's been in MoreLINQ since a very long time. Of course, that Fold is still an aggregator but which treats the sequence like a tuple that's then folded. What do you guys think?

@fsateler
Copy link
Member

fsateler commented Mar 2, 2017

Name-wise Disaggregate would be the dual of Aggregate. I think we can make an exception in this case and go with Unfold but my fear is that people will get confused with its relation to Fold that's been in MoreLINQ since a very long time. Of course, that Fold is still an aggregator but which treats the sequence like a tuple that's then folded. What do you guys think?

I'm actually surprised I've been using MoreLINQ for years and I had never noticed Fold 😮 . Apparently using ToList(); if (list.Count != x) { throw new InvalidOperationException(); } was enough for me.

I think striving for consistence with other ecosystems is a good idea, so my vote would be for Unfold.

Maybe Fold could be renamed (for 3.0), so that the confusion disappears? I googled for names relating to what this method does, and I didn't find anything. Python supports casting lists to tuples natively, haskell and f# don't appear to have a method for it.

@atifaziz
Copy link
Member

atifaziz commented Mar 2, 2017

F# doesn't need it because it has pattern matching for lists and arrays.

You could look at Fold like deconstruction, so you could call it Deconstruct (which sort of matches with deconstruction for tuples in C# 7) but isn't that being prescriptive about its usage? And if it's folding then why call it something else? There's no conflict with overloading. Consider the following example:

var csv = @"1,2,3
            4,5,6
            7,8,9";

var q =
    from s in csv.Split('\n')
    select s.Trim()
            .Split(',')
            .Select(int.Parse)
            .Fold((x, y, z) => x + y * z)

Isn't Fold just folding here?

@leandromoh
Copy link
Collaborator Author

leandromoh commented Mar 3, 2017

Technically Fold method folds, but in a unnexpected manner (the expected would be as wikipedia explain; that is like functional languages define it. The first time I saw fold on MoreLinq I thought something like "why there is a fold method if we already have aggregate?". I am in favor of rename fold. Deconstruct sounds me an appropriate name.

While we define if fold will be renamed or not, can I submit a PR for Unfold using @atifaziz signatures?

@atifaziz
Copy link
Member

atifaziz commented Mar 3, 2017

can I submit a PR for Unfold using @atifaziz signatures?

Yes, go for it.

why there is a fold method if we already have aggregate?

They do the same thing. The Fold in MoreLINQ is just an overload for sequences to fold them column-wise (horizontally) instead of row-wise (vertically). Is there anything wrong with that? Are we bastardizing the concept? Perhaps but I don't feel so and if we're not, we don't have to forcefully rename the method as long as the operation is logically the same. Aggregate is fine if you're folding the values homogeneously, like summing or averaging all values, but look at my CSV example. It's splitting a row into fields and those fields as a whole make up a row, so it aggregates or folds the fields back into a single row. Well, in the example I even turn it into a single computation ((x, y, z) => x + y * z) that's not a simple addition, but I could have done (x, y, z) => new { x, y, z }.

@atifaziz
Copy link
Member

atifaziz commented Mar 3, 2017

@leandromoh I did a very rough implementation in LINQPad earlier to test out some ideas, so throwing that in here in case you can use any bits for your PR (don't mean to jump the gun)…

static partial class MoreEnumerable
{
    // Following overload requires C# 7
    public static IEnumerable<T> Unfold<TState, T>(
        TState state, Func<TState, (T, TState)?> generator) =>
        Unfold(state, generator, e => e != null, e => e.Value.Item2, e => e.Value.Item1);

    public static IEnumerable<T> Unfold<TState, T>(
        TState state, Func<TState, Tuple<T, TState>> generator) =>
        Unfold(state, generator, e => e != null, e => e.Item2, e => e.Item1);

    public static IEnumerable<TResult> Unfold<TState, T, TResult>(
        TState state,
        Func<TState, T> generator,
        Func<T, bool> predicate,
        Func<T, TState> stateSelector,
        Func<T, TResult> resultSelector)
    {
        while (true)
        {
            var step = generator(state);
            if (!predicate(step))
                break;
            yield return resultSelector(step);
            state = stateSelector(step);
        }
    }
}

Since the seal's broken, all warranties are void of course. 😆

@leandromoh
Copy link
Collaborator Author

leandromoh commented Mar 4, 2017

@atifaziz Why do you call it "very rough implementation"? My draft was exactly like yours 😆

@atifaziz
Copy link
Member

atifaziz commented Mar 4, 2017

@leandromoh Well, there's only so many ways in which you can implement something as simple as Unfold so I wouldn't be surprised if two programmers put in separate rooms would arrive at the same code. If you have it exactly the same then I guess it must not be that rough after all. 😏 Perhaps it's rough because it needs the polishing of argument checking, laziness, unit tests and documentation. I didn't mean to jump the gun and hand you down an implementation. It was meant for your notes, to compare in case there was any difference that needed to debating further.

@leandromoh
Copy link
Collaborator Author

leandromoh commented Mar 7, 2017

Perhaps it's rough because it needs the polishing of argument checking, laziness, unit tests and documentation. I didn't mean to jump the gun and hand you down an implementation. It was meant for your notes, to compare in case there was any difference that needed to debating further.

I see. Thanks for the explanation @atifaziz. The PR was submitted.

@atifaziz atifaziz added this to the 2.4.0 milestone Mar 29, 2017
@atifaziz
Copy link
Member

@leandromoh @fsateler I'm scheduling this for 2.4.0 which is presently slated for end of April (the minor versions for 2.0 are lining up nicely with month numbers so far for this year 😄). If we can wrap up #260 today then I'll consider it for 2.3.0 that's scheduled for this Friday.

@atifaziz atifaziz modified the milestones: 2.3.0, 2.4.0 Mar 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants