Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Slices array type for eachslice/eachrow/eachcol #32310

Merged
merged 22 commits into from
May 17, 2022
Merged

Conversation

simonbyrne
Copy link
Contributor

@simonbyrne simonbyrne commented Jun 12, 2019

eachslice, eachrow, eachcol (introduced in #29749) now return an SliceArray object (along with Rows/Columns aliases). The main benefit is that it will allow dispatch on the iterator to provide more efficient methods, e.g.

sum(A::Rows) = vec(sum(parent(A), dims=1))

This will encourage the use of eachcol/eachrow to resolve ambiguities in user-facing APIs, in particular, the "observations as rows vs columns" problem in the statistics/ML packages.

This also makes eachslice work over multiple dimensions.

TODOs:

  • Figure out how to handle eltype: I'd appreciate some help on how best to do this.

cc: @nalimilan @mschauer @ararslan @kleinschmidt @mbauman @andyferris @bramtayl @arnavs

@simonbyrne simonbyrne added the arrays [a, r, r, a, y, s] label Jun 12, 2019
@bramtayl
Copy link
Contributor

The way I handle eltype in JuliennedArrays is with typeof(@inbounds view(whole, first_slice_dimensions)...). @inbounds gives the advantage of allowing it to work with empty arrays. If the view function is unstable on whole, it should just error when a slice doesn't match the type of the first slice. I think the long term solution is to allow for eltype-less AbstractArrays by making AbstractArray just a collection of traits (2.0?).

@bramtayl
Copy link
Contributor

Also, instead of sum(A::EachRow) = vec(sum(parent(A), dims=1)), it probably makes more sense to dispatch to mapreducedims for any reducing dims function, so this would be helpful:

struct Reduce{Operator}
    operator::Operator
end

sum = Reduce(+)

@simonbyrne
Copy link
Contributor Author

Also, instead of sum(A::EachRow) = vec(sum(parent(A), dims=1)), it probably makes more sense to dispatch to mapreducedims for any reducing dims function

Yes, this was just meant to be an illustrative example.

base/abstractarraymath.jl Outdated Show resolved Hide resolved
@ararslan ararslan requested a review from mbauman June 13, 2019 00:38
@simonbyrne
Copy link
Contributor Author

Figuring out the appropriate SubArray type looks complicated. I'll fallback on the EltypeUnknown machinery for now (which matches the current Generator behaviour), perhaps figure out something better in future.

@bramtayl
Copy link
Contributor

bramtayl commented Jun 13, 2019

Honestly, though, maybe it makes sense just to copy-paste the code from JuliennedArrays and define

eachcol(A) = Slices(A, True(), False())
eachrow(A) = Slices(A, False(), True())

? I've seen an uptick in attention to it.

Maybe its at long last time to use a generated function to support numbered dims... I'll give it a shot.

@andyferris
Copy link
Member

Cool.

I think there are advantages in preserving the AbstractArray nature of the container - e.g. splitting a matrix into view which is a vector of vectors. (Doing the reverse is also super useful sometimes).

@bramtayl
Copy link
Contributor

bramtayl commented Jun 13, 2019

Well, just added the capability of processing dims::Int... to JuliennedArrays. Should be type stable under constant propagation.

@simonbyrne
Copy link
Contributor Author

simonbyrne commented Jun 13, 2019

Honestly, though, maybe it makes sense just to copy-paste the code from JuliennedArrays and define

eachcol(A) = Slices(A, True(), False())
eachrow(A) = Slices(A, False(), True())

? I've seen an uptick in attention to it.

I did consider it, but there were a few things that made me a bit uneasy: e.g. the use of @inbounds does open up potential problems if the type can't actually be inferred without creating an instance. It was also quite a bit more code for what I wanted to achieve: as I said in the PR, my immediate aim was simply to be able to dispatch on the return type of eachrow/eachcol and be able access the underlying array easily

I think there are advantages in preserving the AbstractArray nature of the container - e.g. splitting a matrix into view which is a vector of vectors. (Doing the reverse is also super useful sometimes).

I agree: the main challenge in doing that is figuring out the SubArray element type. That said, I think that can also be done later (it would technically involve changing the type hierarchy, but not in a terribly breaking way).

@simonbyrne
Copy link
Contributor Author

One thing that is perhaps worth noting about my PR is that the order of dims is important:

julia> M = reshape([(1:8)...], 2, 2, 2)
2×2×2 Array{Int64,3}:
[:, :, 1] =
 1  3
 2  4

[:, :, 2] =
 5  7
 6  8

julia> collect(eachslice(M,dims=(2,3)) )
2×2 Array{Any,2}:
 [1, 2]  [5, 6]
 [3, 4]  [7, 8]

julia> collect(eachslice(M,dims=(3,2)) )
2×2 Array{Any,2}:
 [1, 2]  [3, 4]
 [5, 6]  [7, 8]

happy to hear people's thoughts on this.

@bramtayl
Copy link
Contributor

bramtayl commented Jun 13, 2019

the use of @inbounds does open up potential problems if the type can't actually be inferred without creating an instance

Can you give more detail? I'll need to figure out a way to fix JuliennedArrays if so.

@simonbyrne
Copy link
Contributor Author

Can you give more detail? I'll need to figure out a way to fix JuliennedArrays if so.

Well, you're using @inbounds when the object itself might not be "in bounds", so there is the potential for undefined behaviour. I know you're not actually accessing elements of the view, so it is probably okay, but I'm personally not 100% sure it's safe, so don't want to be the person adding that code to Julia.

@nickrobinson251
Copy link
Contributor

bump? :)

@simonbyrne
Copy link
Contributor Author

@mbauman any thoughts?

@bramtayl
Copy link
Contributor

Ref #32940

@mbauman
Copy link
Member

mbauman commented Sep 30, 2019

This looks great. My only hesitation is that we worked quite a bit on performance in #29749 — this looks like it should be good, but have you done any spot checks?

@@ -449,9 +480,10 @@ See also [`eachrow`](@ref), [`eachcol`](@ref), and [`selectdim`](@ref).
This function requires at least Julia 1.1.
"""
@inline function eachslice(A::AbstractArray; dims)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to have an intermediate step to allow overloading of dispatch on dims? Motivated by dims=:μ type things in NamedDims.

Suggested change
@inline function eachslice(A::AbstractArray; dims)
eachslice(A::AbstractArray; dims) = _eachslice(A::AbstractArray, dims)
@inline function _eachslice(A::AbstractArray, dims)

@MasonProtter
Copy link
Contributor

MasonProtter commented May 7, 2020

Is there any interest in reviving this?

base/abstractarraymath.jl Outdated Show resolved Hide resolved
@Tokazama
Copy link
Contributor

Proposal: I implement the interface in an unregistered SlicedArrayInterface.jl (should have this done in a few days), with a demo use case for StatsBase, Distributions and Bijectors. I think I can make a convincing demo for each of them. I'll invite everyone here and the ArrayInterface maintainers (@ChrisRackauckas, @Tokazama ?) for comments on that sliced array interface. And we'll ask the maintainers of packages like the ones mentioned if ArrayInterfaceCore.jl would be an acceptable dependency.

Once you have the concept in place I would gladly review it. I can see it being beneficial to have this be in a more general interface package, but I can't make any promises as to whether it will be integrated directly into the ArrayInterface repo until everyone has a chance to review the actual code.

@oschulz
Copy link
Contributor

oschulz commented May 20, 2022

Once you have the concept in place I would gladly review it [...] until everyone has a chance to review the actual code.

Thanks - I definitely want to include everyone interested in the design/discussion, so we end up with something with broad support in the Julia community.

@oschulz
Copy link
Contributor

oschulz commented May 20, 2022

@torfjelde and @ToucheSir, I hope such a common sliced array interface can also be very useful regarding https://github.com/TuringLang/Bijectors.jl/discussions/178 .

@RoyiAvital
Copy link

Maybe add to documentation an example to use the new multiple dimensions option which was added?

@oschulz
Copy link
Contributor

oschulz commented May 21, 2022

Once you have the concept in place I would gladly review it.

ArrayInterfaceCore is released, so instead of prototyping the sliced array interface as a standalong temporary package, I'll prototype it as an ArrayInterfaceCore PR - Ok @Tokazama?

@Tokazama
Copy link
Contributor

Tokazama commented May 21, 2022

That seems fine. We still need to set some standards for what goes in core vs other modules because right now there are some things that are left out because of load time issues and not because they don't fit conceptually there.

@oschulz
Copy link
Contributor

oschulz commented May 21, 2022

Thanks @Tokazama I'll give it a go and try to keep it extremely lightweight.

@Tokazama
Copy link
Contributor

Tokazama commented May 21, 2022

If we have to reconsider which part of the repo it goes in or rework it a bit that shouldn't be too big of a deal. There are a lot of moving pieces right now and I would hate for that complexity to discourage meaningful contributions.

@oschulz
Copy link
Contributor

oschulz commented May 31, 2022

@Tokazama I'll try to have a PR draft ready within the next days.

@oschulz
Copy link
Contributor

oschulz commented Jun 6, 2022

The current implementation of Slices doesn't check indices on single-element dimensions generated by drop = false:

julia> A = eachslice(rand(2,3), dims = 2, drop = false); axes(A)
(Base.OneTo(1), Base.OneTo(3))

julia> A[2,3]
2-element view(::Matrix{Float64}, :, 3) with eltype Float64:
 0.5964895828663271
 0.4755380804162148

Should we add @boundscheck checkbounds(...) to getindex(::Slices, ...)?

@mcabbott
Copy link
Contributor

mcabbott commented Jun 6, 2022

Relatedly, should the new axis be unitaxis(::AbstractArray) = Base.OneTo(1), or should it inherit from the parent the way reductions like sum do:

julia> using OffsetArrays

julia> oa = OffsetArray([1 2 3; 4 5 6], 7, 8)
2×3 OffsetArray(::Matrix{Int64}, 8:9, 9:11) with eltype Int64 with indices 8:9×9:11:
 1  2  3
 4  5  6

julia> sum(oa, dims=1)  # trivial axis is 8:8
1×3 OffsetArray(::Matrix{Int64}, 8:8, 9:11) with eltype Int64 with indices 8:8×9:11:
 5  7  9

julia> eachslice(oa, dims=2, drop=false)  # trivial axis is 1:1
1×3 eachslice(OffsetArray(::Matrix{Int64}, 8:9, 9:11), dims = 2, drop = false) of 2-element slices with eltype Int64 with indices Base.OneTo(1)×OffsetArrays.IdOffsetRange(values=9:11, indices=9:11):
 [1, 4]  [2, 5]  [3, 6]

@simonbyrne
Copy link
Contributor Author

Should we add @boundscheck checkbounds(...) to getindex(::Slices, ...)?

Ah, good catch! Can you open an issue (or a PR)?

@simonbyrne
Copy link
Contributor Author

Relatedly, should the new axis be unitaxis(::AbstractArray) = Base.OneTo(1), or should it inherit from the parent the way reductions like sum do:

It should probably inherit from the parent?

@oschulz
Copy link
Contributor

oschulz commented Jun 6, 2022

Relatedly, should the new axis be unitaxis(::AbstractArray) = Base.OneTo(1)
It should probably inherit from the parent?

I was wondering about that too - it would be more natural to inherit indexing from the parent, I think.

@oschulz
Copy link
Contributor

oschulz commented Jun 6, 2022

Another thing, while we're taking fixes and changes: I'm currently building the generic sliced array interface we've been talking about (CC @Tokazama).

The idea is for code to be able to flatten a sliced arrays in either it's "native" or a specific order of dimensions (e.g. inner dims first or outer dims first), so that efficient computations (e.g large linalg ops) can be performed on it, before (possibly) re-slicing the result.

On thing that I'm struggling with is that the current slices implementation is not fully type stable in respect to dimension order: We currently have

typeof(eachslice(A, dims = (1, 2))) != typeof(eachslice(A, dims = (1, 3)))

but

typeof(eachslice(A, dims = (1, 2))) == typeof(eachslice(A, dims = (2, 1)))

So it's impossible to decide whether a permutedims/adjoint is necessary based on the type of A::Slices, which means unslicing operations that need to unslice to a specified order of dimensions can't be type stable (the only workaround I see would be to make a copy afterwards in all cases, which would be wasteful). This is a problem because code that aims to take advantage of sliced arrays will often need to request a specific order of dimensions (e.g. to use linear algebra on the flat array).

While Julia v0.9 isn't branched yet - could we still change Slices and make the slicemap a Val or so? I think that would open up a lot more possibilities. I'd volunteer to do the PR if that's acceptable.

@Tokazama
Copy link
Contributor

Tokazama commented Jun 6, 2022

In the ArrayInterface code you're using you can use StaticInt to get around this sort of issue.

@oschulz
Copy link
Contributor

oschulz commented Jun 6, 2022

In the ArrayInterface code you're using you can use StaticInt to get around this sort of issue.

WIthin ArrayInterface, yes, but it won't be in the Slices object passed by the user, resp. some other code. I don't think a type-stable flatten-as-specified is possible unless the dimension order in Slices can be fully inferred. Dimension order will rarely be dynamically chosen during when Slices are created, so I think there's no reason not to just encode it in it's type.

@simonbyrne , this is your baby though, what do you think?

@oschulz
Copy link
Contributor

oschulz commented Jun 10, 2022

@simonbyrne: Can you open an issue (or a PR)?

Sure! @simonbyrne , would you be fine with me making the slicemap part of the Slices type in that PR as well (see above)?

@oschulz
Copy link
Contributor

oschulz commented Jun 28, 2022

making the slicemap part of the Slices type in that PR as well (see above)

#44538 would present another way to make the slicemap inferable (CC @Tokazama ).

@simonbyrne, do you have any preferences here?

@Tokazama
Copy link
Contributor

I'm assuming it would work to put the slicemap in the type just as well as it works to put the permutation in PermutedDimsArray. I do think it's odd that we keep building systems where constant propagation is the only way to get something like PermutedDimsArray(x, perm) to be type stable, when we know perm is a type parameter (not dynamic). We would think it's weird if the official way to construct an array was Array(T::DataType, sz::Tuple), but we keep making exceptions for indexing into dimensions.

@oschulz
Copy link
Contributor

oschulz commented Jun 29, 2022

we keep building systems where constant propagation is the only way to get something like PermutedDimsArray(x, perm) to be type stable

Indeed, especially since we want to be able to pass such things around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s]
Projects
None yet
Development

Successfully merging this pull request may close these issues.