-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unfold()
is an iterable based on a transition function
#44873
base: master
Are you sure you want to change the base?
Conversation
Duplicate of #43203? |
Thanks @Seelengrab , there are indeed some similarities. I hope we can talk about these ideas here and then maybe we can turn that frown upside-down! I suppose what I implemented is indeed a bit similar to some patterns you can do in Python. Also there's the connection to I find iterators in Julia generally hard to write, unless you're writing a generator expression, which are pretty limited in what you can do. If you're doing anything more complex, you'll miss the flexibility of writing a do-block, or a for-loop. And writing a full With a tool like this, it becomes a lot easier to write code in the following fashion: first you write a Then you wrap this up in a function, add something like Regarding I can also imagine having a version where the function is more intended to be pure, and returns a value and "new state", which gets fed to |
This is a version that still lets you create an iterator using do-syntax, but here we have an initial state, and the function takes the state as an input, and returns an ouptut value and the next state. I think it might be nice to have something like this as well, fine by me. My first wish is just being able to write a function straight away with do-syntax, and get an iterator. I still think the version with the closure is important to have, though. It's an implicit impure version that can be very handy, while this version here would be the necessary one for pure functions, which is neat to have if you wanna put on the work. And we can also have something more like a Mealy/Moore machine, that would also take input values,
|
I'm not opposed to making iterators easier to write, I've written a few a bit involved ones myself. I just think that this approach to making it easier has some disadvantages:
All of these are fixable, of course, but they more or less amount to.. reimplementing the iterator protocol with a different interface. If you then decide to e.g. specialize On top of this - there is still #15276, which I don't dare assess the impact of on this.
I'm not sure how you got that impression - I have quite the opposite, there are iterators hiding absolutely everywhere in the ecosystem. There are just fewer functional ones, but I think that has more to the with the people using the language than the protocol or its interface. Most folks using julia aren't functional programmers or call themselves programmers at all.
That's great! Please remember though that julia is not those other (possibly hardcore functional) languages like haskell. I've used a similar approach to turn imperative code into iterator structs - after all, there are only three things you need to know to implement an iterator:
Most of the time, this results in a Personally, I don't think the iterator protocol is too complicated, it's just different from what functional languages do because you have to actively think about what you need & have to carry around. |
I mean.. to me that just already is the current |
Thanks for your insights, @Seelengrab . I'm aware there may be performance penalties. It's just part of the things that programmers must learn and take into account during their work, in my opinion. Just one of many compromises. On the other hand there can be practical matters like I brought up. The main concern here is being able to write code quickly to produce an iterator. Generator expressions are not powerful enough. A It should be made clear that I'm not proposing to replace the iterator protocol, or to use this construct while implementing core libraries, etc. I feel this is a simple thing that can be very handy to many people, especially for prototyping. If it's not available in Julia, I might have it in my |
I do think we should have something like this. I like some variation of The main problem here is, of course, the poor performance of mutable captured variables. I would not want to add a nice-looking feature that encourages people to fall into that performance trap. But, this should probably be added to IterTools.jl if it's not there already. |
Ah shoot, did my reply sound too dismissive? Sorry, that was not my intention! I just think that As Jeff mentions, captured variables have that well-known performance trap that people run into all the time and it's really not transparent when it might occur, especially when you're not deeply involved with all these gotchas. In any case, here's a version that would fix most of the other problems I mentioned above: struct IterableClosure{FuncType, Eltype}
f::FuncType
IterableClosure{Eltype}(f::FuncType) where {Eltype, FuncType} = new{FuncType, Eltype}(f)
end
IterableClosure(f) = IterableClosure{Any}(f)
Base.eltype(::Type{IterableClosure{F, Eltype}}) where {F, Eltype} = Eltype
Base.IteratorEltype(::Type{<:IterableClosure}) = Base.HasEltype()
Base.IteratorSize(::Type{<:IterableClosure}) = Base.SizeUnknown()
function Base.iterate(ic::IterableClosure{F, Eltype}, _=nothing) where {F, Eltype}
nextvalue = ic.f()
if isnothing(nextvalue)
return nothing
else
return nextvalue::Eltype, nothing
end
end The |
Thank you @Seelengrab , this already seems a very productive collaboration so far! You definitely know a lot more about Julia type inference and iterators than I do, I'm surely at least going to learn a lot from this PR. The point of using I would imagine that the version that takes a pure function mapping state to value-state can perform better. Maybe we can try implementing that as well with proper typing and test the performance. I completely understand being aware of the issues that may happen, and giving the user an indication of what should be preferred or not, give it an ugly name inside a module, etc. I think enabling the creation of iterators with do-syntax has a great potential, though, we just need to explore the idea and see where it goes. And I'm glad to see that there's actually hope for #15276. Maybe we should just keep that uncompromising attitude that Julia is known for, and then one day this will be just another barrier that was turned to rubble. |
Adding a few more thoughts because otherwise I keep ruminating it the whole day. First I don't really like the name
I get it that there's a difference because it produces an iterator, you get the intermediary states as an iterator, not just the final "reduction". But it's still kind of a fold in my opinion, That piece of code contains a weird idea, an infinite generator of "unit" values. This basically makes one piece of the function moot. I find it pretty insightful, actually, to consider what happens when you do that. In fact, there seems to be a sort of taxonomy we can come up with when we iterate over a (pure) function
The last two wouldn't make sense with pure functions, but work with argument-less closures modifying some inscrutable state. What looks like to me is that Julia currently has great support for Anyways, I think Scala's version of this A final thought is that looking at |
I'd much prefer having to unwrap at the use site of the I'm curious whether you've seen this blog post about the protocol or this post by @tkf (which mentions that
Yes it can be, but that doesn't mean it has to or should be. Accumulating into a list is already handled by In julia, iterators are lazy by design - they should do the minimal amount of work necessary to get from one state to the next and let the user handle how (or if!) to accumulate. For example, your julia> struct fib end
julia> Base.iterate(::fib, state=(1,1)) = state[1], (state[2], state[1]+state[2])
julia> Base.IteratorSize(::Type{fib}) = Base.IsInfinite()
julia> Base.eltype(::Type{fib}) = Int
julia> foldl(Iterators.take(fib(), 7); init=Int[]) do acc,el
push!(acc, el)
end
7-element Vector{Int64}:
1
1
2
3
5
8
13
# or just
# collect(Iterators.take(fib(), 7)) which keeps the iterator generic & possible for (re-)use in other scenarios as well as keep performance how it should be: fast. Yes I know, explicitly modifying something is not very functional, but then again julia isn't a pure functional language. |
When you say "pure function", what do you mean by that? I'm asking because in the past there has been some confusion about what people mean by that in a julia context.
I'm not saying there's hope for that. I'm saying that until that issue is fixed, you have to be very careful with providing an interface in |
Actually, I disagree :) Your table also pretty much shows the duality of
I feel there's a quite strong benefit in using non-mutating API even though Julia is not a "pure functional language." For example, non-mutating API is much easier and more predictable for execution on GPU. The compiler's optimizations for non-mutating local functions are also much better than variable-mutating closures. These are big upsides in JuilaFolds packages that have been pursuing non-mutating implementations IMHO. So, I think #43203 is a better direction for supporting "ad-hoc iterators." |
Absolutely! This was more related to requiring the |
Here's one example. My favorite function that returns optional values is
I mean the general meaning, not
I'm not sure what you mean, your example, as well as moving away from closures that mutate external values, this all looks to me exactly like sticking to pure functions, and going for a more "functional" way of doing things. The point with this fibo example is just to say that it's not like an iterator produces collections, and a fold produces a value. A fold can produce a collection, the best way to do it is a very minor point. Thanks for alerting me of the allocation, I would hope [A;b] can reuse the same vector, but we could always go with this as well
Anyways, here's what I'm hoping now we could do. Forget about the closure stuff. Basing myself on your previous code:
and basing myself on your example:
Do you see any big problems with that? Isn't it interesting that we can define the same iterator using just a Thanks for the references, I'll try to check it out later. |
Let me try to be clearer, my point is that small changes to how an iterator works transforms it into a fold that produces a collection. On the other hand, imagine you have an iterator that takes S and produces [a,b,c,...,z]. The way I see some people talking, is like you could easily produce a related fold that takes [a,b,c,...,z] and then produces S. This is rather unlikely in my opinion. That would mean a stronger "opposition" between the two, that's what "fold, unfold" suggests to me. I suppose I'm just quirky like that! |
Could you highlight the differences in direction you see right now? I understand talking about closures is probably not a good idea. Forget that. I'm talking about writing a transition function that outputs the same as |
Great discussion. I think the |
I wouldn't necessarily call them "pure", because that may be understood as "there's no I/O" or "there's no allocations" as well. Neither of those examples are relevant to this though - with "pure functional languages" I was only referring to languages like Haskell, where "state" is much less of a thing than in julia.
I disagree! An iterator produces a number of values, while a fold always produces/reduces to a single value. That may be a list itself, but there's still only one list (or one tuple, one aggregate.. you get the idea). The expectation is different, in that you usually get more than one thing from an iterator & its state(s), while you only get one thing from a fold. In an automaton context you could say that an iterator produces all intermediary values, while a fold only produces a single, final value.
This is a version I can get behind and is imo the best version of this so far. It still has the same closure capturing problems, but you can't really get away from them with closures in the first place and it also allows to have a very straightforward "migration path" to an explicit iterator should it become a performance problem:
This requires minimal changes to the function created in the
No, I like it! It's basically taking the minimal possible interface for writing an iterator, except that you don't use the
That sounds like a good idea. The last two caveats from my POV are:
|
I'm a scruffy FP programmer, you're way more rigorous!
I actually think we're talking similar things.
Nice! I wish you had made a patch, hope you don't mind I mostly copy-pasted your code.
You have seen the
Very good point, I think it's definitely good to hear from more people. I went with |
@Seelengrab the original |
Iterator()
is an iterable based on a transition function
Made the |
I think the name will need to be more specific than |
I'm fine with my PR being mentioned in the News file. I think the docs I wrote in my PR are maybe better than in this one? Maybe take a look and see if you agree? In particular, I was careful to include the string unfoldr in the docs so that people would be more likely to find this function with apropos(); and I was quite happy with the explanatory comment explaining how unfold is to No pressure, tho. My PR stalled cos I never got a clear answer on whether Jeff et al wanted unfold(f) as well as unfold(f, init). Your PR provides Unfold(f) and unfold(f, init), so maybe that will satisfy those who wanted both? |
@cmcaine ok thanks for agreeing, and thanks for your work! I think we might take at least your whole |
I'm on holiday now, but I could do the patches later this week when I have access to a computer. Also happy for you to do it. |
I went ahead and copied the extended help, only modified the example code to rely on |
Bumping this, what's the next step? @Seelengrab @tkf |
It looks good from my POV, so 👍 from me - I can't merge things though :) |
I just came across another example where I wanted this function. Can we get this merged by triage if there are no blockers? @nlw0, you said:
about this code: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
function unfold(f, initialstate, eltype::Type{Eltype}) where {Eltype}
I think eltype
should be a keyword argument in case we want to add support for IteratorSize
at some later date. A keyword argument will also lead to code that self-documents better.
The tests should also check this version of the function. At the moment only unfold(f, initialstate)
is tested.
Co-authored-by: Colin Caine <cmcaine@gmail.com>
Co-authored-by: Colin Caine <cmcaine@gmail.com>
Co-authored-by: Colin Caine <cmcaine@gmail.com>
triage added to get eyes on this |
Totally. Also, I find it strange that IteratorEltype and IteratorSize are handled differently - either both should have their argument like |
I have changed How would we ensure this plays well with
|
Here's an implementation of I also wrote a curried version of module Unfolds
using Base:
SizeUnknown, HasLength, HasShape, IsInfinite, EltypeUnknown, HasEltype,
@propagate_inbounds
import Base:
length, size, eltype, IteratorSize, IteratorEltype, iterate, isdone
const SizeTypes = Union{SizeUnknown, IsInfinite, <:Integer, <:NTuple{N, <:Integer} where {N}}
size_type_to_iteratorsize(T::Type{<:Union{SizeUnknown, IsInfinite}}) = T()
size_type_to_iteratorsize(::Type{<:Integer}) = HasLength()
size_type_to_iteratorsize(::Type{<:NTuple{N, <:Integer}}) where {N} = HasShape{N}()
"""
unfold(f, initialstate; [eltype], [size])
Iterable object that generates values from an initial state and a transition
function `f(state)`. The function must follow the same rules as `iterate`.
It returns either `(newvalue, newstate)` or `nothing`, in which case the
sequence ends.
The optional parameters `eltype` and `size` specify the element type and size of the iterator.
If `size` is specified it must be one of:
- an integer, representing the length of the iterator
- a tuple of integers, representing the `size` of the iterator (length will be defined as `prod(size)`)
- `Base.IsInfinite()`, meaning that the iterator is of infinite length
- `Base.SizeUnknown()`, if the iterator has an unknown length (this is the default).
See also: [`iterate`](@ref), [the iteration interface](@ref man-interface-iteration)
!!! compat "Julia 1.10"
This function was added in Julia 1.10.
# Examples
```jldoctest
julia> fib = Iterators.unfold((1,1)) do (a,b)
a, (b, a+b)
end;
julia> reduce(hcat, Iterators.take(fib, 7))
1×7 Matrix{Int64}:
1 1 2 3 5 8 13
julia> frac(c, z=0.0im) = Iterators.unfold((c, z); eltype=ComplexF64) do (c, z)
if real(z * z') < 4
z, (c, z^2 + c)
else
nothing
end
end;
julia> [count(Returns(true), frac(-0.835-0.2321im, (k+j*im)/6)) for j in -4:4, k in -8:8]
9×17 Matrix{Int64}:
2 2 2 3 3 3 5 41 8 4 3 3 2 2 2 2 1
2 3 5 4 5 8 20 11 17 23 4 3 3 3 2 2 2
4 10 17 12 7 56 18 58 33 22 6 5 4 5 4 3 2
26 56 15 13 18 23 13 14 27 46 8 9 16 12 8 4 3
10 7 62 17 16 23 11 12 39 12 11 23 16 17 62 7 10
3 4 8 12 16 9 8 46 27 14 13 23 18 13 15 56 26
2 3 4 5 4 5 6 22 33 58 18 56 7 12 17 10 4
2 2 2 3 3 3 4 23 17 11 20 8 5 4 5 3 2
1 2 2 2 2 3 3 4 8 41 5 3 3 3 2 2 2
```
# Extended help
The interface for `f` is very similar to the interface required by `iterate`, but `unfold` is simpler to use because it does not require you to define a type. You can use this to your advantage when prototyping or writing one-off iterators.
You may want to define an iterator type instead for readability or to dispatch on the type of your iterator.
`unfold` is related to a `while` loop because:
```julia
collect(unfold(f, initialstate))
```
is roughly the same as:
```julia
acc = []
state = initialstate
while true
x = f(state)
isnothing(x) && break
element, state = x
push!(acc, element)
end
```
But the `unfold` version may produce a more strictly typed vector and can be easily modified to return a lazy collection by removing `collect()`.
In Haskell and some other functional programming environments, this function is known as `unfoldr`.
"""
function unfold(f, initialstate; eltype=nothing, size::SizeTypes=SizeUnknown())
rest(Unfold(f, eltype), initialstate; size)
end
"""
unfold(f; [eltype], [size])
Create a function that will return an iterator unfolded by `f` when given an initial state. Equivalent to `initial -> unfold(f, initial; eltype, size)`.
# Example
```jldoctest
julia> const collatz_path = Iterators.unfold() do n
if isnothing(n)
n
elseif isone(n)
(n, nothing)
else
(n, iseven(n) ? n÷2 : 3n+1)
end
end
#1 (generic function with 1 method)
julia> collatz_path(3) |> collect
8-element Vector{Int64}:
3
10
5
16
8
4
2
1
```
"""
function unfold(f; eltype=nothing, size::SizeTypes=SizeUnknown())
initial -> unfold(f, initial; eltype, size)
end
struct Unfold{Eltype, FuncType}
f::FuncType
Unfold{E, F}(f::F) where {E, F} = new{E, F}(f)
Unfold(f::F, eltype) where {F} = new{eltype, F}(f)
end
Unfold(f) = Unfold(f, nothing)
eltype(::Type{<:Unfold{Eltype}}) where {Eltype} = Eltype
eltype(::Type{<:Unfold{nothing}}) = Any
IteratorEltype(::Type{<:Unfold{nothing}}) = EltypeUnknown()
IteratorEltype(::Type{<:Unfold}) = HasEltype()
IteratorSize(::Type{<:Unfold}) = SizeUnknown()
@propagate_inbounds iterate(it::Unfold, state) = it.f(state)
# Iterators.Rest, but it can know how big the iterator will be.
struct Rest{I,S,Z<:SizeTypes}
itr::I
st::S
size::Z
end
"""
rest(iter, state; [size])
An iterator that yields the same elements as `iter`, but starting at the given `state`.
If `size` is specified it must be one of:
- an integer, representing the length of the returned iterator
- a tuple of integers, representing the `size` of the returned iterator (length will be defined as `prod(size)`)
- `Base.IsInfinite()`, meaning that the returned iterator is of infinite length
- `Base.SizeUnknown()`, if the returned iterator has an unknown length
!!! compat "Julia 1.10"
The `size` parameter was added in Julia 1.10.
See also: [`Iterators.drop`](@ref), [`Iterators.peel`](@ref), [`Base.rest`](@ref).
# Examples
```jldoctest
julia> collect(Iterators.rest([1,2,3,4], 2))
3-element Vector{Int64}:
2
3
4
```
"""
rest(itr, state; size=rest_iteratorsize(itr)) = Rest(itr, state, size)
rest(itr::Rest, state; size=rest_iteratorsize(itr)) = Rest(itr.itr, state, size)
rest(itr) = itr
@propagate_inbounds iterate(i::Rest, st=i.st) = iterate(i.itr, st)
isdone(i::Rest, st...) = isdone(i.itr, st...)
eltype(::Type{<:Rest{I}}) where {I} = eltype(I)
IteratorEltype(::Type{<:Rest{I}}) where {I} = IteratorEltype(I)
rest_iteratorsize(a) = SizeUnknown()
rest_iteratorsize(::IsInfinite) = IsInfinite()
IteratorSize(::Type{<:Rest{<:Any, <:Any, Z}}) where {Z} = size_type_to_iteratorsize(Z)
length(u::Rest{<:Any, <:Any, <:Integer}) = u.size
size(u::Rest{<:Any, <:Any, <:NTuple{N, <:Integer}}) where {N} = u.size
length(u::Rest{<:Any, <:Any, <:NTuple{N, <:Integer}}) where {N} = prod(u.size)
end
module UnfoldsTests
using ..Unfolds: Unfolds, unfold
using Test
@testset "unfold" begin
@testset "eltype" begin
@test eltype(unfold(x -> nothing, 1; eltype=String)) == String
function fib_int(x)
Iterators.take(unfold((1, 1); eltype=Int) do (a, b)
a, (b, a+b)
end, x)
end
@test eltype(fib_int(1000)) == Int
@test eltype(collect(fib_int(4))) == Int
@test collect(fib_int(4)) == [1, 1, 2, 3]
end
@testset "size" begin
bad_one_to(n, size) = Unfolds.unfold(x -> x > n ? nothing : (x, x+1), 1; size)
@test Base.IteratorSize(bad_one_to(10, 10)) == Base.HasLength()
@test Base.IteratorSize(bad_one_to(10, (10,))) == Base.HasShape{1}()
@test Base.IteratorSize(bad_one_to(10, Base.SizeUnknown())) == Base.SizeUnknown()
@test collect(bad_one_to(10, 10)) == 1:10
@test collect(bad_one_to(10, (10,))) == 1:10
@test collect(bad_one_to(10, Base.SizeUnknown())) == 1:10
infinite_itr = Unfolds.unfold(x -> (x, x), 1; size=Base.IsInfinite())
@test Base.IteratorSize(infinite_itr) == Base.IsInfinite()
# collect refuses to try and collect iterators of infinite size
@test_throws MethodError collect(infinite_itr)
shaped_itr1 = bad_one_to(9, (3, 3))
@test collect(shaped_itr1) == reshape(1:9, (3, 3))
end
@testset "size and eltype" begin
itr1 = Unfolds.unfold(x -> x > 9 ? nothing : (x, x+1), 1; eltype=Int, size=9)
@test collect(itr1) == 1:9
itr2 = Unfolds.unfold(x -> x > 9 ? nothing : (x, x+1), 1; eltype=Int, size=(3, 3))
@test collect(itr2) == reshape(1:9, (3, 3))
end
end
end Todo: examples of specifying size and eltype; tests for curried unfold; performance review; maybe move the tests of size to be testing |
I'm not sure there's already and alternative for this available in Julia, so I came up with this proposal. Please let me know if there's another way available. I'm especially interested in having a way to use do-syntax to create an iterator.
The idea is that defining iterators using the
Base.iterate
is a little bit inconvenient. Then I noticed that it's pretty convenient to create aChannel
. But there's no equally easy way to create a simpler iterator.Some languages allow you to create an iterator based on a closure that returns an optional type. The proposed class and its corresponding
iterate
method should allow you to do that. Using this, it's easy to write a function that defines a closure and returns and iterator based on it, as the example illustrates.