Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reported length of zip of stateful iterators wrong #47790

Closed
jakobnissen opened this issue Dec 3, 2022 · 4 comments · Fixed by #51747
Closed

Reported length of zip of stateful iterators wrong #47790

jakobnissen opened this issue Dec 3, 2022 · 4 comments · Fixed by #51747
Labels
domain:iteration Involves iteration or the iteration protocol kind:bug Indicates an unexpected problem or unintended behavior kind:correctness bug ⚠ Bugs that are likely to lead to incorrect results in user code without throwing

Comments

@jakobnissen
Copy link
Contributor

A common idiom in the Python world to partition an iterator into chunks of length N is to turn it into a stateful iterator, then zip it with itself N times.

However, such a Zip iterator in Julia falsely report to have the same length as the original iterator.

Perhaps the takeaway is that stateful iterators cannot be HasLength or HasShape, as it's unknowable if another reference to the iterator iterates it.

@jakobnissen jakobnissen added kind:bug Indicates an unexpected problem or unintended behavior domain:iteration Involves iteration or the iteration protocol labels Dec 3, 2022
@StefanKarpinski
Copy link
Sponsor Member

I don't think this is really a bug: one has to be careful to only be currently be using a stateful iterator in one place or it will not produce the desired results. This is similar to the fact that if you modify a collection while iterating it bad things can happen. What could be done, however, is to copy an iterator (and its state) when doing something like this. Can you give a more concrete example?

@jakobnissen
Copy link
Contributor Author

jakobnissen commented Dec 5, 2022

Good point. On the other hand I would argue that the major use case for opting in to stateful iterators is exactly to use it multiple places. For example, one can make a Stateful(itr) to pass into two functions and in that way make sure that the second function starts the iterator where the first one stopped. Of course concurrently iterating stateful iterators is dangerous, but this is not what zip is supposed to do, I think.

In my particular example, I got an annoying type instability from using Iterators.partition, which tends to yield Vector{Any} when partitioning Generators. To get around it, I tried to use the old Python idiom for chunking an iterator:

julia> itr = (i for i in 1:9); # Base.eltype == Any

julia> first(Iterators.partition(itr, 3)) isa Vector{Any}
true

julia> collect(zip(repeat([Iterators.Stateful(itr)], 3)...))
9-element Vector{Tuple{Int64, Int64, Int64}}:
 (1, 2, 3)
 (4, 5, 6)
 (7, 8, 9)
 (139908521644592, 32433, 65537)
 (3, 4, 8)
 (3, 32, 1)
 (100, 1000, 250)
 (20, 0, 3)
 (32, 4, 0)

This produces NTuple{3, Int} which is much better for efficiency and stability, but because length is wrongly calculated, most of the resulting array is uninitialized.

It seems to me to be a tradeoff: Stateful iterators can only preserve shape information if the user promises to never have two active iterators at the same time. A safety/performance tradeoff.

@StefanKarpinski
Copy link
Sponsor Member

Aha, I see what you're trying to do: zipping an interator with itself so that the values are interleaved. If seems to me that either the contract has to be that the iterators are all unconnected or we cannot rely on predicting the length in advance. It would be nice if this worked though. Copying wouldn't help here since you want the same iterator used multiple times. One way to fix this would be for zip to only claim to be able to predict its length when its arguments are unentangled. We'd need a way to check for that though, which is challenging.

@martinholters
Copy link
Member

We don't guarantee the order in which zip iterates its inputs. The first element could just as well be (3,2,1) or any other permutation, making this use case a bit questionable.

@LilithHafner LilithHafner added the kind:correctness bug ⚠ Bugs that are likely to lead to incorrect results in user code without throwing label Aug 7, 2023
vtjnash added a commit that referenced this issue Oct 17, 2023
Stateful iterators do not have a consistent notion of length, as it is
continuously changing as elements are removed. As the main purpose of
Stateful is to take elements from multiple places, any notion of
HaveShape is invalid for those cases, and thus not useful in general.

Fix #47790
vtjnash added a commit that referenced this issue Oct 26, 2023
Stateful iterators do not have a consistent notion of length, as it is
continuously changing as elements are removed. As the main purpose of
Stateful is to take elements from multiple places, any notion of
HaveShape is invalid for those cases, and thus not useful in general.

Fix #47790
vtjnash added a commit that referenced this issue Jan 30, 2024
Stateful iterators do not have a consistent notion of length, as it is
continuously changing as elements are removed. As the main purpose of
Stateful is to take elements from multiple places, any notion of
HaveShape is invalid for those cases, and thus not useful in general.

Fix #47790
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:iteration Involves iteration or the iteration protocol kind:bug Indicates an unexpected problem or unintended behavior kind:correctness bug ⚠ Bugs that are likely to lead to incorrect results in user code without throwing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants