-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Add Stateful
iterator wrapper
#25731
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it wouldn't be better to implement this as an immutable struct in which state
and elements_produced
are RefValue
cells so that the itr
field can't be changed (and the compiler knows that).
base/iterators.jl
Outdated
""" | ||
mutable struct Stateful{S,T} | ||
state::Union{Some{S}, Nothing} | ||
itr::T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems more natural for the iterator to come first – it's common for the inferred fields to follow the explicitly supplied fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had it follow the order of type parameters, which was intentionally chosen so you could specify the state type independently of the iterator type (since I default the state type to whatever was first returned from start
)
base/iterators.jl
Outdated
end | ||
|
||
function isempty(s::Stateful{S}) where {S} | ||
done(s.itr, coalesce(s.state)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't 1-argument coalesce
just the identify function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it unwraps Some
, but is the identity otherwise:
julia> coalesce(Some(1))
1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it can be a bit surprising, but that way we don't need a separate unwrap
function (formerly get
).
base/iterators.jl
Outdated
isa(IteratorSize(s.itr), SizeUnknown) ? SizeUnknown() : HasLength() | ||
eltype(s::Stateful) = eltype(s.itr) | ||
IteratorEltype(s::Stateful) = IteratorEltype(s.itr) | ||
length(s::Stateful) = length(s.itr) - s.elements_produced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need the length of these? Is that used in the string code, for example? Without that, we don't need the whole elements_produced
field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we don't really, but it's easy enough to maintain we rely on the compiler inlining this code anyway in performant code, so the field will go away if unused.
base/iterators.jl
Outdated
mutable struct Stateful{S,T} | ||
state::Union{Some{S}, Nothing} | ||
itr::T | ||
elements_produced::Int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe call it taken
instead of elements_produced
? For brevity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
base/iterators.jl
Outdated
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase) | ||
|
||
julia> collect(Iterators.take(a, 3)) | ||
3-element Array{Any,1}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intended not to infer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, will look at it.
Yeah, I think so. I don't think it actually matters much from the compiler perspective. This was just easier for the first version. Will change that for the next update. |
In that case leaving it as-is seems simpler. |
47bf0a1
to
13ec1a4
Compare
base/iterators.jl
Outdated
whenever an item is produced. | ||
|
||
`Stateful` provides the regular iterator interface. Like other mutable iterators | ||
(e.g. Channel), if iteration is stopped early (e.g. by a `break` in a for loop), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`Channel`
and `for`
base/iterators.jl
Outdated
itr::T | ||
# A bit awkward right now, but adapted to the new iteration protocol | ||
nextvalstate::Ref{Union{VS, Nothing}} | ||
taken::Ref{Int} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This field counts how many "Taken" movies have been released at the time of execution. It is currently set to 3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't expect that using a ref-cell will necessarily be a performance win. In most cases, we'll probably end up stripping it away, but if we don't, it's an extra memory indirection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A syntax I had proposed once upon a time was the equivalent of this:
mutable struct Stateful{VS,T}
const itr::T
state::Union{VS, Nothing}
taken::Int
end
which would disallow assignment to the itr
field and let the compiler assume that it can never change without forcing the other two fields to be ref cells. Seems like an occasionally useful feature. We could plan on doing that and just leave this as mutable now and document that the itr
field should never be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be so bad about assignment to the itr
field? It seems like assignment to the taken
or state
field would be much more disruptive to the correct execution of the program.
test/iterators.jl
Outdated
for x in a; x == 1 || break; end | ||
@test sum(a) == 7 | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these tests somehow implicitly exercise peek
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but it's also not an exported function.
julia> for x in a; x == 1 || break; end | ||
|
||
julia> Base.peek(a) | ||
3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not overloads first
, like zip
does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point, that would be consistent, though the generic definition of first
iterates, so people might expect it to consume an item (that doesn't really bother me too much though, since take!
is the consume-one-item function). first
is defined to error on an empty collection though, I find returning nothing
a more useful behavior, but I guess peek
on IO doesn't really do that either. Maybe we should have a first
equivalent that returns nothing or another sentinel when the collection is empty? @nalimilan ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point, that would be consistent, though the generic definition of
first
iterates, so people might expect it to consume an item.
I don't think first
will consume item given that it's not first!
.
first
is defined to error on an empty collection though
We already have something breaks that rule.
julia> first(1:0)
1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the state of stateful iterators and doing I/O has never qualified as mutation in the sense that !
at the end of a name indicates. Otherwise all of the I/O functions would end in !
yet they do not. We need to decide if first
qualifies as iteration or not. Having peek
as a non-iterating form of take
seems reasonable to me, but then so would first
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also expect first
to consume elements, so I think I prefer adding peek
. We could generalize it so that it returns nothing
for empty collections as @Keno suggests. Also, it sounds like one function is redundant in the first
/peek
/take!
triad, and I'd tend to drop take!
in favor of first
.
Regarding the empty ranges issue at #25385, I guess we could make first
return nothing
in general for empty collections, but I'm a bit concerned about the performance implications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also arguable that take!
for channels should not have a !
since it is like read
whereas take!(::IOBuffer)
takes ownership of the data, which is why it has a !
, so this is a bit muddled. By that reasoning, take!(::Channel)
should be first(::Channel)
but that's kind of unintuitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use popfirst!
instead of take!
for the mutating operation. I still like peek
rather than first
, because the latter feels like a bit of a pun.
c45b92a
to
7ad9312
Compare
Replaced |
Updated to use |
base/iterators.jl
Outdated
VS = typejoin(VS, Base._return_type(next, Tuple{itrT, | ||
tuple_type_head(tuple_type_tail(VS))})) | ||
VS = typejoin(VS, Base._return_type(next, Tuple{itrT, | ||
tuple_type_head(tuple_type_tail(VS))})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fieldtype(VS, 2) is better defined and optimized
base/iterators.jl
Outdated
VS = typejoin(VS, Base._return_type(next, Tuple{itrT, | ||
tuple_type_head(tuple_type_tail(VS))})) | ||
!(VS <: Tuple{Any, Any}) && return Tuple{Any, Any} | ||
Base._return_type(next, Tuple{itrT, tuple_type_head(tuple_type_tail(VS))}) <: VS ? VS : Tuple{Any, Any} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long line. Would be more readable as a multi line if statement anyways too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, maybe define this recursively, so that you're not having to deal with copying the code and conditional tests (like the several missing is-subtype NTuple{2, Any} tests above). Also, calling this after calling next(start)
also means you're giving it type-unstable input, which I would expect is performing very poorly. It'll be better to call this on the result of start(itr)
.
function approx_iter_type(itrT::Type, state::Type)
nextstate = Base._return_type(next, Tuple{itrT, state})
nextstate <: Tuple{Any, Any} || return Any
nextstate = typejoin(state, fieldtype(nextstate, 2))
return (state == nextstate ? state : approx_iter_type(itrT, nextstate))
end
Also, Tuple{Any, Any}
sounds nice, and is very precise, but can also perform much worse at runtime than just returning Any
(since it'll force us to emit extra code to try to check + convert everything).
I'm not sure how we'll make this work after switching to the new, type-unstable iteration protocol. I think we can get inference to union-split the inference call to fieldtype
, but it may be tricky to typejoin
the result – perhaps we should be only unrolling the loop only exactly once to avoid that call:
approx_iter_type(itrT::Type) = approx_iter_type(itr, Base._return_type(iterate, Tuple{itrT}))
function approx_iter_type(itrT::Type, vstate::Type)
vstate <: Union{Nothing, Tuple{Any, Any}} || return Any
nextvstate = Base._return_type(next, Tuple{itrT, fieldtype(vstate, 2)})
return (vstate <: nextvstate ? nextvstate : (nextvstate <: vstate ? vstate : Any))
end
Since anyways, if we don't pretty much get a concrete type right away, there's likely very little to be gained from continuing to try. (considering also the structure of a for loop, which also only splits off the first element, and the storage, which will only be efficient if it always has a nearly-concrete type)
@@ -765,3 +765,7 @@ Indicate whether `x` is [`missing`](@ref). | |||
""" | |||
ismissing(::Any) = false | |||
ismissing(::Missing) = true | |||
|
|||
function popfirst! end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this generally gets loaded quite early.
There are several different ways to think about this iterator wrapper: 1. It provides a mutable wrapper around an iterator and its iteration state. 2. It turns an iterator-like abstraction into a Channel-like abstraction. 3. It's an iterator that mutates to become its own rest iterator whenever an item is produced. `Stateful` provides the regular iterator interface. Like other mutable iterators (e.g. Channel), if iteration is stopped early (e.g. by a `break` in a for loop), iteration can be resumed from the same spot by continuing to iterate over the same iterator object (in contrast, an immutable iterator would restart from the beginning). The motivation for this iterator came from looking at the string code, which makes frequent use of the underlying iteration protocol rather than higher-level wrappers, because it needs to carry state from one loop to another.
Rebased and updated to use @vtjnash's suggestion for the type determination. |
Ooo. Exciting! |
There are several different ways to think about this iterator wrapper:
1. It provides a mutable wrapper around an iterator and
its iteration state.
2. It turns an iterator-like abstraction into a Channel-like
abstraction.
3. It's an iterator that mutates to become its own rest iterator
whenever an item is produced.
Stateful
provides the regular iterator interface. Like other mutable iterators(e.g. Channel), if iteration is stopped early (e.g. by a
break
in a for loop),iteration can be resumed from the same spot by continuing to iterate over the
same iterator object (in contrast, an immutable iterator would restart from the
beginning).
The motivation for this iterator came from looking at the string
code, which makes frequent use of the underlying iteration protocol
rather than higher-level wrappers, because it needs to carry state
from one loop to another.