Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add Stateful iterator wrapper #25731

Merged
merged 1 commit into from
Jan 27, 2018
Merged

RFC: Add Stateful iterator wrapper #25731

merged 1 commit into from
Jan 27, 2018

Conversation

Keno
Copy link
Member

@Keno Keno commented Jan 24, 2018

There are several different ways to think about this iterator wrapper:
1. It provides a mutable wrapper around an iterator and
its iteration state.
2. It turns an iterator-like abstraction into a Channel-like
abstraction.
3. It's an iterator that mutates to become its own rest iterator
whenever an item is produced.

Stateful provides the regular iterator interface. Like other mutable iterators
(e.g. Channel), if iteration is stopped early (e.g. by a break in a for loop),
iteration can be resumed from the same spot by continuing to iterate over the
same iterator object (in contrast, an immutable iterator would restart from the
beginning).

The motivation for this iterator came from looking at the string
code, which makes frequent use of the underlying iteration protocol
rather than higher-level wrappers, because it needs to carry state
from one loop to another.

Copy link
Member

@StefanKarpinski StefanKarpinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it wouldn't be better to implement this as an immutable struct in which state and elements_produced are RefValue cells so that the itr field can't be changed (and the compiler knows that).

"""
mutable struct Stateful{S,T}
state::Union{Some{S}, Nothing}
itr::T
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems more natural for the iterator to come first – it's common for the inferred fields to follow the explicitly supplied fields.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had it follow the order of type parameters, which was intentionally chosen so you could specify the state type independently of the iterator type (since I default the state type to whatever was first returned from start)

end

function isempty(s::Stateful{S}) where {S}
done(s.itr, coalesce(s.state))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't 1-argument coalesce just the identify function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it unwraps Some, but is the identity otherwise:

julia> coalesce(Some(1))
1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it can be a bit surprising, but that way we don't need a separate unwrap function (formerly get).

isa(IteratorSize(s.itr), SizeUnknown) ? SizeUnknown() : HasLength()
eltype(s::Stateful) = eltype(s.itr)
IteratorEltype(s::Stateful) = IteratorEltype(s.itr)
length(s::Stateful) = length(s.itr) - s.elements_produced
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need the length of these? Is that used in the string code, for example? Without that, we don't need the whole elements_produced field.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we don't really, but it's easy enough to maintain we rely on the compiler inlining this code anyway in performant code, so the field will go away if unused.

mutable struct Stateful{S,T}
state::Union{Some{S}, Nothing}
itr::T
elements_produced::Int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call it taken instead of elements_produced? For brevity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)

julia> collect(Iterators.take(a, 3))
3-element Array{Any,1}:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended not to infer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, will look at it.

@Keno
Copy link
Member Author

Keno commented Jan 24, 2018

I wonder if it wouldn't be better to implement this as an immutable struct in which state and elements_produced are RefValue cells so that the itr field can't be changed (and the compiler knows that).

Yeah, I think so. I don't think it actually matters much from the compiler perspective. This was just easier for the first version. Will change that for the next update.

@StefanKarpinski
Copy link
Member

I don't think it actually matters much from the compiler perspective.

In that case leaving it as-is seems simpler.

@ararslan ararslan added the iteration Involves iteration or the iteration protocol label Jan 24, 2018
@Keno Keno force-pushed the kf/stateful branch 2 times, most recently from 47bf0a1 to 13ec1a4 Compare January 24, 2018 20:43
whenever an item is produced.

`Stateful` provides the regular iterator interface. Like other mutable iterators
(e.g. Channel), if iteration is stopped early (e.g. by a `break` in a for loop),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

`Channel` and `for`

itr::T
# A bit awkward right now, but adapted to the new iteration protocol
nextvalstate::Ref{Union{VS, Nothing}}
taken::Ref{Int}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

taken

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field counts how many "Taken" movies have been released at the time of execution. It is currently set to 3.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't expect that using a ref-cell will necessarily be a performance win. In most cases, we'll probably end up stripping it away, but if we don't, it's an extra memory indirection.

Copy link
Member

@StefanKarpinski StefanKarpinski Jan 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A syntax I had proposed once upon a time was the equivalent of this:

mutable struct Stateful{VS,T}
    const itr::T
    state::Union{VS, Nothing}
    taken::Int
end

which would disallow assignment to the itr field and let the compiler assume that it can never change without forcing the other two fields to be ref cells. Seems like an occasionally useful feature. We could plan on doing that and just leave this as mutable now and document that the itr field should never be changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be so bad about assignment to the itr field? It seems like assignment to the taken or state field would be much more disruptive to the correct execution of the program.

for x in a; x == 1 || break; end
@test sum(a) == 7
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these tests somehow implicitly exercise peek?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but it's also not an exported function.

julia> for x in a; x == 1 || break; end

julia> Base.peek(a)
3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not overloads first, like zip does?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, that would be consistent, though the generic definition of first iterates, so people might expect it to consume an item (that doesn't really bother me too much though, since take! is the consume-one-item function). first is defined to error on an empty collection though, I find returning nothing a more useful behavior, but I guess peek on IO doesn't really do that either. Maybe we should have a first equivalent that returns nothing or another sentinel when the collection is empty? @nalimilan ?

Copy link
Member

@iblislin iblislin Jan 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, that would be consistent, though the generic definition of first iterates, so people might expect it to consume an item.

I don't think first will consume item given that it's not first!.

first is defined to error on an empty collection though

We already have something breaks that rule.

julia> first(1:0)
1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the state of stateful iterators and doing I/O has never qualified as mutation in the sense that ! at the end of a name indicates. Otherwise all of the I/O functions would end in ! yet they do not. We need to decide if first qualifies as iteration or not. Having peek as a non-iterating form of take seems reasonable to me, but then so would first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also expect first to consume elements, so I think I prefer adding peek. We could generalize it so that it returns nothing for empty collections as @Keno suggests. Also, it sounds like one function is redundant in the first/peek/take! triad, and I'd tend to drop take! in favor of first.

Regarding the empty ranges issue at #25385, I guess we could make first return nothing in general for empty collections, but I'm a bit concerned about the performance implications.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also arguable that take! for channels should not have a ! since it is like read whereas take!(::IOBuffer) takes ownership of the data, which is why it has a !, so this is a bit muddled. By that reasoning, take!(::Channel) should be first(::Channel) but that's kind of unintuitive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use popfirst! instead of take! for the mutating operation. I still like peek rather than first, because the latter feels like a bit of a pun.

@Keno Keno force-pushed the kf/stateful branch 2 times, most recently from c45b92a to 7ad9312 Compare January 26, 2018 20:46
@Keno
Copy link
Member Author

Keno commented Jan 26, 2018

Replaced take! by popfirst!. Any more comments? Otherwise I'll merge once green.

@Keno
Copy link
Member Author

Keno commented Jan 26, 2018

Updated to use return_type magic rather than letting the user pick the state of the tuple.

VS = typejoin(VS, Base._return_type(next, Tuple{itrT,
tuple_type_head(tuple_type_tail(VS))}))
VS = typejoin(VS, Base._return_type(next, Tuple{itrT,
tuple_type_head(tuple_type_tail(VS))}))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fieldtype(VS, 2) is better defined and optimized

VS = typejoin(VS, Base._return_type(next, Tuple{itrT,
tuple_type_head(tuple_type_tail(VS))}))
!(VS <: Tuple{Any, Any}) && return Tuple{Any, Any}
Base._return_type(next, Tuple{itrT, tuple_type_head(tuple_type_tail(VS))}) <: VS ? VS : Tuple{Any, Any}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long line. Would be more readable as a multi line if statement anyways too

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, maybe define this recursively, so that you're not having to deal with copying the code and conditional tests (like the several missing is-subtype NTuple{2, Any} tests above). Also, calling this after calling next(start) also means you're giving it type-unstable input, which I would expect is performing very poorly. It'll be better to call this on the result of start(itr).

function approx_iter_type(itrT::Type, state::Type)
    nextstate = Base._return_type(next, Tuple{itrT, state})
    nextstate <: Tuple{Any, Any} || return Any
    nextstate = typejoin(state, fieldtype(nextstate, 2))
    return (state == nextstate ? state : approx_iter_type(itrT, nextstate))
end

Also, Tuple{Any, Any} sounds nice, and is very precise, but can also perform much worse at runtime than just returning Any (since it'll force us to emit extra code to try to check + convert everything).

I'm not sure how we'll make this work after switching to the new, type-unstable iteration protocol. I think we can get inference to union-split the inference call to fieldtype, but it may be tricky to typejoin the result – perhaps we should be only unrolling the loop only exactly once to avoid that call:

approx_iter_type(itrT::Type) = approx_iter_type(itr, Base._return_type(iterate, Tuple{itrT}))
function approx_iter_type(itrT::Type, vstate::Type)
    vstate <: Union{Nothing, Tuple{Any, Any}} || return Any
    nextvstate = Base._return_type(next, Tuple{itrT, fieldtype(vstate, 2)})
    return (vstate <: nextvstate ? nextvstate : (nextvstate <: vstate ? vstate : Any))
end

Since anyways, if we don't pretty much get a concrete type right away, there's likely very little to be gained from continuing to try. (considering also the structure of a for loop, which also only splits off the first element, and the storage, which will only be efficient if it always has a nearly-concrete type)

@@ -765,3 +765,7 @@ Indicate whether `x` is [`missing`](@ref).
"""
ismissing(::Any) = false
ismissing(::Missing) = true

function popfirst! end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this line?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this generally gets loaded quite early.

There are several different ways to think about this iterator wrapper:
    1. It provides a mutable wrapper around an iterator and
       its iteration state.
    2. It turns an iterator-like abstraction into a Channel-like
       abstraction.
    3. It's an iterator that mutates to become its own rest iterator
       whenever an item is produced.

`Stateful` provides the regular iterator interface. Like other mutable iterators
(e.g. Channel), if iteration is stopped early (e.g. by a `break` in a for loop),
iteration can be resumed from the same spot by continuing to iterate over the
same iterator object (in contrast, an immutable iterator would restart from the
beginning).

The motivation for this iterator came from looking at the string
code, which makes frequent use of the underlying iteration protocol
rather than higher-level wrappers, because it needs to carry state
from one loop to another.
@Keno
Copy link
Member Author

Keno commented Jan 27, 2018

Rebased and updated to use @vtjnash's suggestion for the type determination.

@Keno Keno merged commit 42f9131 into master Jan 27, 2018
@StefanKarpinski StefanKarpinski deleted the kf/stateful branch January 27, 2018 23:11
@StefanKarpinski
Copy link
Member

Ooo. Exciting!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
iteration Involves iteration or the iteration protocol
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants