-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support indexing with generators #37648
base: master
Are you sure you want to change the base?
Conversation
I have a branch that begins work on adding indexing to the ProductIterator — I've been out of the loop for a while now but if that's still not supported I think it should be before doing this. |
Adding indexing to |
And for reference, I also have an old PR (#22489) which adds indexing/setindexing to a bunch of iterators from |
Should these be included in |
Some benchmarks with comparing indexing into generators versus having to collect the generator to index into them: julia> g = (string(x) for x in [1 3 5; 2 4 6])
Base.Generator{Matrix{Int64}, typeof(string)}(string, [1 3 5; 2 4 6])
julia> @btime g[end]
67.197 ns (2 allocations: 96 bytes)
"6"
julia> @btime collect(g)[end]
286.318 ns (13 allocations: 704 bytes)
"6"
julia> @btime g[1:6]
292.162 ns (14 allocations: 832 bytes)
6-element Vector{String}:
"1"
"2"
"3"
"4"
"5"
"6"
julia> @btime collect(g)[1:6]
311.695 ns (14 allocations: 832 bytes)
6-element Vector{String}:
"1"
"2"
"3"
"4"
"5"
"6" However, I did notice worse performance when handling colon: julia> @btime g[:]
471.658 ns (16 allocations: 864 bytes)
6-element Vector{String}:
"1"
"2"
"3"
"4"
"5"
"6"
julia> @btime collect(g)[:]
307.751 ns (14 allocations: 832 bytes)
6-element Vector{String}:
"1"
"2"
"3"
"4"
"5"
"6" |
@@ -52,7 +52,16 @@ size(g::Generator) = size(g.iter) | |||
axes(g::Generator) = axes(g.iter) | |||
ndims(g::Generator) = ndims(g.iter) | |||
|
|||
getindex(g::Generator, I...) = map(g.f, g.iter[I...]) | |||
function getindex(g::Generator, I...) | |||
I′ = to_indices(g.iter, I) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to_indices
doesn't work on non-array indexables like NamedTuple
and Dict
. As I commented earlier, I don't think we should support vectorized indexing.
You can use LazyArrays.jl or MappedArrays.jl to get a full array API. If we were to add vectorized indexing for lazy mapping in Base
, Broadcasted
is probably a better interface to do it since it's already very array-like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I commented earlier, I don't think we should support vectorized indexing.
The tricky part of your statement is how would we limit getindex
to just use scalar indexing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just define getindex(::Generator, ::Integer...)
and nothing else. I also think this is the right thing to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would we limit
getindex
to just use scalar indexing?
Ah, good point. How about introducing
function to_scalar_indices(A, I)
J = to_indices(A, I)
J isa Tuple{Vararg{Integer}} || error("expected scalar indices. got: ", I)
return J
end
scalar_getindex(A::AbstractArray, I...) = getindex(A, to_scalar_indices(A, I)...)
scalar_getindex(A, I...) = getindex(A, I...)
and use it for Generator
?
Just define
getindex(::Generator, ::Integer...)
Isn't it too restrictive and a half-way solution if it were to restrict the API? For example, it'd mean x[1]
for x = Iterators.map(string, (a=1, b=2))
works but x[:a]
doesn't.
Also, I guess we need a different implementation for AbstractDict
or error out? Applying f
after getindex
is not correct for AbstractDict
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just define
getindex(::Generator, ::Integer...)
Mostly that works but we get into some strange corner cases with AbstractDict
:
julia> function Base.getindex(g::Base.Generator, I::Integer...)
g.f(g.iter[I...])
end
julia> g = (x * y for (x, y) in Dict(3 => 4))
Base.Generator{Dict{Int64,Int64},var"#1#2"}(var"#1#2"(),
Dict(3 => 4))
julia> collect(g)[1]
12
julia> g[1]
ERROR: KeyError: key 1 not found
Stacktrace:
[1] getindex at ./dict.jl:467 [inlined]
[2] getindex(::Base.Generator{Dict{Int64,Int64},var"#1#2"}, ::Int64) at ./REPL[1]:2
[3] top-level scope at REPL[3]:1
This was the main reason I ended up restricting the logic to AbstractArray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we need a different implementation for
AbstractDict
or error out
That was also my conclusion. A first pass at the AbstractDict
version looks like:
function Base.getindex(g::Base.Generator{<:AbstractDict}, I...)
subset = collect(pairs(g.iter))[I...]
if subset isa Pair
g.f(subset)
else
map(g.f, subset)
end
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in my #37648 (comment) to this whole PR, why do you want to support AbstractDict in general?
IMO Generator should behave like a lazy map. And for map we have:
julia> map(x->println(x), Dict(:a=>2, :c=>"h"))
ERROR: map is not defined on dictionaries
Stacktrace:
[1] error(::String) at .\error.jl:33
[2] map(::Function, ::Dict{Symbol,Any}) at .\abstractarray.jl:2190
[3] top-level scope at REPL[10]:1
So I'd definitively error out, optimally with the same error.
As pointed out supporting |
Should this also define
Adding |
See: #37648 (comment) |
91f4da0
to
bad925e
Compare
I found a reasonable fallback for "handling" index with iterators. As julia> g = (string(p) for p in Dict(Pair.('a':'z', 1:26)));
julia> @btime g[2:5]
1.481 μs (64 allocations: 3.03 KiB)
4-element Vector{String}:
"'f' => 6"
"'w' => 23"
"'d' => 4"
"'e' => 5"
julia> @btime collect(g)[2:5]
8.895 μs (409 allocations: 15.81 KiB)
4-element Vector{String}:
"'f' => 6"
"'w' => 23"
"'d' => 4"
"'e' => 5"
julia> g = (p for p in Dict(Pair.('a':'z', 1:26)));
julia> @btime g[2:5]
187.209 ns (3 allocations: 784 bytes)
4-element Vector{Pair{Char, Int64}}:
'f' => 6
'w' => 23
'd' => 4
'e' => 5
julia> @btime collect(g)[2:5]
172.204 ns (2 allocations: 640 bytes)
4-element Vector{Pair{Char, Int64}}:
'f' => 6
'w' => 23
'd' => 4
'e' => 5 |
Suggestion: Change lines 61 and 71 to their lazy equivalents, i.e. Reasoning 1We shouldn't mix lazy and eager evaluation if the user already decided on lazy evaluation by supplying a generator in the first place. Reasoning 2There are two things that a) To me b) seems to be favourable. In that case we must not call Example
Reasoning 3Given that This wouldn't cost us any possibilities because anybody who wants could still apply a manual explicit Overall questionI know, I'm reasking what already has been asked, but I'm not convinced by the arguments. Why not just stick to |
Indexing into generators is something I found myself wanting as part of JuliaTime/TimeZones.jl#291. Basically I had function which contained code similar to:
In some cases I would need access to all of the elements but in the most performance critical case I only needed access to a single element for which the index was already known. Using an index-able generator was a good solution for this.
I realize that not all generators will be index-able but this allows a generator to use indexing if the underlying iterator used by the generator also supports indexing.