Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we throw an exception for TaskSeq.zip when the sequences are of unequal length? #32

Closed
abelbraaksma opened this issue Oct 16, 2022 · 7 comments · Fixed by #102
Closed
Labels
bug Something isn't working question Further information is requested topic: surface area Adds functions to the public surface area
Milestone

Comments

@abelbraaksma
Copy link
Member

abelbraaksma commented Oct 16, 2022

While writing tests for #31, I started pondering this ☝️.

For comparison, F# throws an ArgumentException for Array.zip and List.zip, but not for Seq.zip. Most likely, this is because "getting the next item" can have an unwanted side effect, so instead, they choose to stop processing as soon as one of the sequences is exhausted.

Arguably, such side effect can still happen. I.e., for zip we have to call MoveNextAsync, which will give us a bool that tells us whether we reached the end of the sequence. But, if the first sequence still has items, but the second sequence is exhausted, the side effect of the first sequence for "one item past the last" has already happened.

In fact, I think the Seq.zip in F# should probably come with a warning. What if each item is "getting a web page of several megabytes"? AFAIK, there's no peek function that would potentially tell me, without side effects, that a sequence is exhausted.

Edit: linking the discussion in the F# repo: dotnet/fsharp#14121

@abelbraaksma
Copy link
Member Author

abelbraaksma commented Oct 16, 2022

Of relevance, for infinite sequences, this example is relevant: https://stackoverflow.com/questions/7226004/seq-zip-in-f-may-require-seemingly-extra-sequence-element-to-complete. This post gives these two examples (adjusted for clarity):

// this will run forever
Seq.initInfinite id
|> Seq.filter (fun x -> x < 2)  // causes the filter to try to find a non-existing item beyond item 2, forever
|> Seq.zip ["A";"B"]

// this "just works"
Seq.initInfinite id
|> Seq.filter (fun x -> x < 3)  // same, but now beyond item 3
|> Seq.zip ["A";"B"]

Basically: if you have an infinite sequence, and you have a filter that returns only two elements of the infinite set, then, combining that with another sequence using zip, map2, iter2 and their variants, will always blow up!

The cause being: we need to "peek", and (at least one of) the sequence(s) will try to MoveNext, but since no other element in the infinite sequence yields a proper result, we will run forever.

Currently, the way this is implemented in F# is such that it is deterministic: if one of the two sequences exhibits this behavior (i.e. MoveNext not finding the "next" item, ever), the combination of the two will run forever.

@abelbraaksma
Copy link
Member Author

abelbraaksma commented Oct 16, 2022

/cc @gusty, what do you think? I tend towards raising an exception, as we do know whether there is a "next" item (the only reason we wouldn't know this is by using the Haskell way, explained in the above SO post, but that makes this operation non-deterministic to argument order) and it feels like this was an oversight in F# Core. But at the same time, parity is a good thing as well.

TLDR, we have these options:

  • Leave as-is, which means: deterministic towards argument order and raise an exception like List.zip, but unlike Seq.zip
  • Change as with Seq.zip, that is do not raise exception, but remain deterministic towards argument order
  • Change as in Haskell, that is: do not raise exception, but allow argument order to change whether or not this works with the above scenario (see SO question).

Basically the same questions would apply to Seq.map2 and Seq.mapi2 and similar double-sequence pairwise iterations.

@gusty
Copy link
Member

gusty commented Oct 17, 2022

I would definitely align with the mechanism for Seq.zip.
IMHO the fact that it hangs in that specific situation is a corner case, which normally is not encountered in the wild, as doing a filter like that is an unsafe operation.

@abelbraaksma
Copy link
Member Author

I’m considering adding a shadowed function set that can be opened, that have an overload that takes a CancelationToken. This would not be specifically for infinite sequences, but everything here is async, so it makes sense to make them cancelable.

I’m still a little on the fence about parity with F#. They made a mistake (IMO) by not having the same behaviour for all collection types (where that makes sense and doesn’t require potential extra side effects). But there’s also the “principle of least surprise”. Hmm, yeah, ultimately I guess I have to bite the bullet and go for parity.

@abelbraaksma abelbraaksma added the question Further information is requested label Oct 22, 2022
@gusty
Copy link
Member

gusty commented Oct 22, 2022

I agree in that they made a mistake, but to me the mistake was on List and Array, while seq has the right behavior.

Of interest, in F#+ there is *.zipShortest defined for List and arrays.

@abelbraaksma
Copy link
Member Author

abelbraaksma commented Oct 22, 2022

I agree with you there. I mean: the different collections should have parity in behaviour. And if you can get away without exceptions it’s generally better, but then that should have applied to all of them.

zipShortest: interesting!

@abelbraaksma abelbraaksma added bug Something isn't working topic: surface area Adds functions to the public surface area labels Nov 24, 2022
@abelbraaksma
Copy link
Member Author

This has now been solved in #102.

@abelbraaksma abelbraaksma modified the milestones: v0.2.0, v0.3.0 Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested topic: surface area Adds functions to the public surface area
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants