Support indexing with generators #37648

omus · 2020-09-18T16:25:00Z

Indexing into generators is something I found myself wanting as part of JuliaTime/TimeZones.jl#291. Basically I had function which contained code similar to:

x = [expensive_call(i) for i in range]

In some cases I would need access to all of the elements but in the most performance critical case I only needed access to a single element for which the index was already known. Using an index-able generator was a good solution for this.

I realize that not all generators will be index-able but this allows a generator to use indexing if the underlying iterator used by the generator also supports indexing.

mbauman · 2020-09-18T16:28:14Z

I have a branch that begins work on adding indexing to the ProductIterator — I've been out of the loop for a while now but if that's still not supported I think it should be before doing this.

omus · 2020-09-18T16:30:18Z

have a branch that begins work on adding indexing to the ProductIterator — I've been out of the loop for a while now but if that's still not supported I think it should be before doing this.

Adding indexing to ProductIterator would be great! Although having indexing with ProductIterator would definitely be useful here I don't think this PR requires that functionality as we can always add it later.

rfourquet · 2020-09-18T16:40:15Z

And for reference, I also have an old PR (#22489) which adds indexing/setindexing to a bunch of iterators from Iterators (but not finished for products ;-) ). but which failed to get traction. Maybe one reason was that it was trying to add also traits to classify indexing capabilities (I could easily remove the traits aspect and keep only indexing methods). Mentioning thishere so that it gets also considered when making a decision for this PR...

N5N3 · 2020-09-21T07:10:27Z

Should these be included in LazyArray.jl as new lazy types?

base/generator.jl

test/functional.jl

base/generator.jl

omus · 2020-09-28T15:04:01Z

Some benchmarks with comparing indexing into generators versus having to collect the generator to index into them:

julia> g = (string(x) for x in [1 3 5; 2 4 6])
Base.Generator{Matrix{Int64}, typeof(string)}(string, [1 3 5; 2 4 6])

julia> @btime g[end]
  67.197 ns (2 allocations: 96 bytes)
"6"

julia> @btime collect(g)[end]
  286.318 ns (13 allocations: 704 bytes)
"6"

julia> @btime g[1:6]
  292.162 ns (14 allocations: 832 bytes)
6-element Vector{String}:
 "1"
 "2"
 "3"
 "4"
 "5"
 "6"

julia> @btime collect(g)[1:6]
  311.695 ns (14 allocations: 832 bytes)
6-element Vector{String}:
 "1"
 "2"
 "3"
 "4"
 "5"
 "6"

However, I did notice worse performance when handling colon:

julia> @btime g[:]
  471.658 ns (16 allocations: 864 bytes)
6-element Vector{String}:
 "1"
 "2"
 "3"
 "4"
 "5"
 "6"

julia> @btime collect(g)[:]
  307.751 ns (14 allocations: 832 bytes)
6-element Vector{String}:
 "1"
 "2"
 "3"
 "4"
 "5"
 "6"

tkf · 2020-09-28T19:16:43Z

base/generator.jl

@@ -52,7 +52,16 @@ size(g::Generator) = size(g.iter)
 axes(g::Generator) = axes(g.iter)
 ndims(g::Generator) = ndims(g.iter)

-getindex(g::Generator, I...) = map(g.f, g.iter[I...])
+function getindex(g::Generator, I...)
+    I′ = to_indices(g.iter, I)


to_indices doesn't work on non-array indexables like NamedTuple and Dict. As I commented earlier, I don't think we should support vectorized indexing.

You can use LazyArrays.jl or MappedArrays.jl to get a full array API. If we were to add vectorized indexing for lazy mapping in Base, Broadcasted is probably a better interface to do it since it's already very array-like.

As I commented earlier, I don't think we should support vectorized indexing.

The tricky part of your statement is how would we limit getindex to just use scalar indexing?

Just define getindex(::Generator, ::Integer...) and nothing else. I also think this is the right thing to do.

how would we limit getindex to just use scalar indexing?

Ah, good point. How about introducing

function to_scalar_indices(A, I) J = to_indices(A, I) J isa Tuple{Vararg{Integer}} || error("expected scalar indices. got: ", I) return J end scalar_getindex(A::AbstractArray, I...) = getindex(A, to_scalar_indices(A, I)...) scalar_getindex(A, I...) = getindex(A, I...)

and use it for Generator?

Just define getindex(::Generator, ::Integer...)

Isn't it too restrictive and a half-way solution if it were to restrict the API? For example, it'd mean x[1] for x = Iterators.map(string, (a=1, b=2)) works but x[:a] doesn't.

Also, I guess we need a different implementation for AbstractDict or error out? Applying f after getindex is not correct for AbstractDict.

Just define getindex(::Generator, ::Integer...)

Mostly that works but we get into some strange corner cases with AbstractDict:

julia> function Base.getindex(g::Base.Generator, I::Integer...) g.f(g.iter[I...]) end julia> g = (x * y for (x, y) in Dict(3 => 4)) Base.Generator{Dict{Int64,Int64},var"#1#2"}(var"#1#2"(), Dict(3 => 4)) julia> collect(g)[1] 12 julia> g[1] ERROR: KeyError: key 1 not found Stacktrace: [1] getindex at ./dict.jl:467 [inlined] [2] getindex(::Base.Generator{Dict{Int64,Int64},var"#1#2"}, ::Int64) at ./REPL[1]:2 [3] top-level scope at REPL[3]:1

This was the main reason I ended up restricting the logic to AbstractArray

I guess we need a different implementation for AbstractDict or error out

That was also my conclusion. A first pass at the AbstractDict version looks like:

function Base.getindex(g::Base.Generator{<:AbstractDict}, I...) subset = collect(pairs(g.iter))[I...] if subset isa Pair g.f(subset) else map(g.f, subset) end end

As in my #37648 (comment) to this whole PR, why do you want to support AbstractDict in general?
IMO Generator should behave like a lazy map. And for map we have:

julia> map(x->println(x), Dict(:a=>2, :c=>"h")) ERROR: map is not defined on dictionaries Stacktrace: [1] error(::String) at .\error.jl:33 [2] map(::Function, ::Dict{Symbol,Any}) at .\abstractarray.jl:2190 [3] top-level scope at REPL[10]:1

So I'd definitively error out, optimally with the same error.

omus · 2020-09-28T20:43:27Z

As pointed out supporting getindex when the iterator is not array-like gets difficult. We could define a fallback of getindex(g::Generator, I...) = collect(g)[I...] which is functional but not very performant. Instead I've limited this PR to only support getindex(::Generator{<:AbstractArray}, I...) which should provide the functionality most people care about while ensuring that users don't use getindex on generators where they should be doing collect(g)[I...] themselves.

mcabbott · 2020-09-30T10:31:49Z

Should this also define keys? Today I wanted something like this to work:

 findall(c->c[1]>0, eachcol(randn(2,20)))

Adding Base.keys(g::Base.Generator) = keys(g.iter) would make it so, and might fit well with being able to index.

omus · 2020-10-09T19:53:26Z

Should this also define keys?

See: #37648 (comment)

omus · 2020-10-09T20:43:37Z

I found a reasonable fallback for "handling" index with iterators. As getindex and iteration don't have to return the same values I need index on the iteration results. There's still a good performance here for cases where f is an expensive call:

julia> g = (string(p) for p in Dict(Pair.('a':'z', 1:26)));

julia> @btime g[2:5]
  1.481 μs (64 allocations: 3.03 KiB)
4-element Vector{String}:
 "'f' => 6"
 "'w' => 23"
 "'d' => 4"
 "'e' => 5"

julia> @btime collect(g)[2:5]
  8.895 μs (409 allocations: 15.81 KiB)
4-element Vector{String}:
 "'f' => 6"
 "'w' => 23"
 "'d' => 4"
 "'e' => 5"

julia> g = (p for p in Dict(Pair.('a':'z', 1:26)));

julia> @btime g[2:5]
  187.209 ns (3 allocations: 784 bytes)
4-element Vector{Pair{Char, Int64}}:
 'f' => 6
 'w' => 23
 'd' => 4
 'e' => 5

julia> @btime collect(g)[2:5]
  172.204 ns (2 allocations: 640 bytes)
4-element Vector{Pair{Char, Int64}}:
 'f' => 6
 'w' => 23
 'd' => 4
 'e' => 5

rapus95 · 2020-12-17T14:42:31Z

Suggestion: Change lines 61 and 71 to their lazy equivalents, i.e. Generator(g.f, subset) instead of map(g.f, subset)

Reasoning 1

We shouldn't mix lazy and eager evaluation if the user already decided on lazy evaluation by supplying a generator in the first place.

Reasoning 2

There are two things that Generator(identity, iter)[:] could mean:

a) collect(iter)
b) iter

To me b) seems to be favourable. In that case we must not call map(g.f, iter).

Example

(1:8)[2:3] == 2:3 (not ==[2,3])
If we collect in those cases we would mix up lazy and eager evaluation and probably introduce some unexpected allocations.

Reasoning 3

Given that Generator is the lazy equivalent of map, I would expect to stay in the lazy operation mode as long as possible (i.e. without collecting).
And as such the result of indexing into a generator should also be the lazy, non-collected, equivalent of indexing into an allocated structure.
Ultimately the decision about collection should be made by indexing into the iterator (for which the Generator is just a lazy relay) and thus shouldn't be made by the relay type by explicitly collecting the iterator.

This wouldn't cost us any possibilities because anybody who wants could still apply a manual explicit collect.

Overall question

I know, I'm reasking what already has been asked, but I'm not convinced by the arguments.
Why do we want the lazy map variant to be more capable than the eager variant? I would consider that to be unexpected. Especially I wouldn't want the lazy variant to start collecting just to be more capable than the eager variant (which it wouldn't be anyway since it is itself eager then).

Why not just stick to
getindex(g::Generator, I...) = Generator(g.f, g.iter[I...])
to make it behave exactly like the eager variant and make it an effective pass-through call that respects indexing behaviour of the iterator that is mapped lazily. (i.e. Generator(f, iter)[I...] should be equal to Generator(f, iter[I...]), instead of map(f, iter[I...]))
It'd also align perfectly with #34678
That way repeated sub-indexing into a generator won't necessarily allocate at all and definitively won't apply f to unnecessary elements.

omus added feature Indicates new feature / enhancement requests collections Data structures holding multiple items, e.g. sets labels Sep 18, 2020

tkf reviewed Sep 22, 2020

View reviewed changes

base/generator.jl Outdated Show resolved Hide resolved

nalimilan reviewed Sep 22, 2020

View reviewed changes

test/functional.jl Show resolved Hide resolved

base/generator.jl Show resolved Hide resolved

nalimilan requested a review from mbauman September 28, 2020 16:38

tkf reviewed Sep 28, 2020

View reviewed changes

mcabbott mentioned this pull request Oct 4, 2020

missing == for Reverse #37867

Open

omus added 3 commits October 9, 2020 15:36

Support indexing with generators

0c7856b

Limit indexing to AbstractArray

f40839b

Support all iterators again

bad925e

omus force-pushed the cv/generator-indexing branch from 91f4da0 to bad925e Compare October 9, 2020 20:37

rapus95 mentioned this pull request Nov 10, 2020

keys(::Generator) pass through #34678

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support indexing with generators #37648

Support indexing with generators #37648

omus commented Sep 18, 2020 •

edited

Loading

mbauman commented Sep 18, 2020

omus commented Sep 18, 2020

rfourquet commented Sep 18, 2020

N5N3 commented Sep 21, 2020 •

edited

Loading

omus commented Sep 28, 2020

tkf Sep 28, 2020

omus Sep 28, 2020

mbauman Sep 28, 2020

tkf Sep 29, 2020

omus Sep 29, 2020

omus Sep 29, 2020

rapus95 Dec 17, 2020 •

edited

Loading

omus commented Sep 28, 2020

mcabbott commented Sep 30, 2020

omus commented Oct 9, 2020

omus commented Oct 9, 2020

rapus95 commented Dec 17, 2020 •

edited

Loading

Support indexing with generators #37648

Are you sure you want to change the base?

Support indexing with generators #37648

Conversation

omus commented Sep 18, 2020 • edited Loading

mbauman commented Sep 18, 2020

omus commented Sep 18, 2020

rfourquet commented Sep 18, 2020

N5N3 commented Sep 21, 2020 • edited Loading

omus commented Sep 28, 2020

tkf Sep 28, 2020

Choose a reason for hiding this comment

omus Sep 28, 2020

Choose a reason for hiding this comment

mbauman Sep 28, 2020

Choose a reason for hiding this comment

tkf Sep 29, 2020

Choose a reason for hiding this comment

omus Sep 29, 2020

Choose a reason for hiding this comment

omus Sep 29, 2020

Choose a reason for hiding this comment

rapus95 Dec 17, 2020 • edited Loading

Choose a reason for hiding this comment

omus commented Sep 28, 2020

mcabbott commented Sep 30, 2020

omus commented Oct 9, 2020

omus commented Oct 9, 2020

rapus95 commented Dec 17, 2020 • edited Loading

Reasoning 1

Reasoning 2

Example

Reasoning 3

Overall question

omus commented Sep 18, 2020 •

edited

Loading

N5N3 commented Sep 21, 2020 •

edited

Loading

rapus95 Dec 17, 2020 •

edited

Loading

rapus95 commented Dec 17, 2020 •

edited

Loading