Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

array reductions (sum, mean, etc.) and dropping dimensions #16606

Open
ak168421 opened this issue May 27, 2016 · 55 comments
Open

array reductions (sum, mean, etc.) and dropping dimensions #16606

ak168421 opened this issue May 27, 2016 · 55 comments
Labels
arrays [a, r, r, a, y, s] breaking This change will break code

Comments

@ak168421
Copy link

ak168421 commented May 27, 2016

Following this discussion on julia-users: https://groups.google.com/forum/#!topic/julia-users/V-meqUZtb7k

I find the fact that sum(A,1) and A[1,:] have different shapes in 0.5 counter intuitive. I am not aware of other languages for which the equivalent expressions have different shapes. It leads to unexpected results when computing e.g. A[1,:] ./ sum(A,1). At least in the case of sum, it seems like sum(A,1) should be equivalent to A[1,:] + A[2,:] + A[3,:] + ...

However, as @timholy pointed out, this behavior can be useful for broadcasting with expressions like A ./ sum(A,1).

For me, having the default slicing and array reduction behavior be consistent seems the natural choice, meaning that sum(A,1) should drop the first dimension. But I understand the attractiveness of not dropping dimensions for broadcasting. What are others' thoughts?

@BobPortmann
Copy link
Contributor

I agree with you. Perhaps the default behavior could be to drop (for consistency with A[1,:]) and have a keyword argument on reduction functions to keep dims.

@StefanKarpinski
Copy link
Member

Having a keyword argument affect the type (i.e. number of dimensions) of the result would make this function type unstable.

@andreasnoack
Copy link
Member

The way to keep the dimensions is to use a range, i.e. A[1:1,:].

@ak168421
Copy link
Author

I am aware that A[1:1,:] does not drop dimensions, but it seems clear that the "default" behavior now is dropping dimensions APL-style with A[1,:], and, intuitively, it seems that the "default" behavior of sum should be compatible.

@andreasnoack
Copy link
Member

My comment was a reply to @BobPortmann's comment, not to the general issue.

@mschauer
Copy link
Contributor

mschauer commented May 27, 2016

I think this was settled in #13612

Edit: "settled" is a bit too strong

@BobPortmann
Copy link
Contributor

I am aware of the way A[1:1,:] vs A[1,:] works and think it is nice. My comment was about the reduction functions, which can be seen as inconsistent to slicing with a scalar. There was much discussion in #3262 but that was before the scalar dimensions dropping occurred. I thought perhaps that sentiment may have changed since then but it appears not.

@timholy
Copy link
Member

timholy commented Aug 29, 2016

#18271 is the strongest argument I've seen yet in favor of dropping dimensions from reductions. It's a pity we don't have an analog of A[1:1, :] to indicate that a dimension should be kept. What if we introduced NoDrop, and used mean(A, NoDrop(1)) and mean(A, NoDrop(1:2))?

Alternatively we could switch the reduction functions so that they required a boolean tuple for dims, i.e., mean(A, (true,false)) and mean(A, (true,true)) rather than mean(A, 1) and mean(A, 1:2). (true= "do reduce over the dimension in this slot".) Then we could also specify that the dimension be retained with mean(A, (true:true,false)) and mean(A, (true:true, true:true)), a direct analogy to the getindex syntax. I'm a little less excited about this variant, though.

@JeffBezanson
Copy link
Member

This seems not-fully-decided, so moving to 1.0.

@tknopp
Copy link
Contributor

tknopp commented Feb 3, 2017

+1 for making reductions and getindex consistent.

@StefanKarpinski
Copy link
Member

@antoine-levitt
Copy link
Contributor

antoine-levitt commented May 20, 2017

FWIW, just got bitten by this: I was expecting something like sum(A.*x',2) to be the same type as A*x, and it wasn't.

@mbauman
Copy link
Member

mbauman commented May 30, 2017

Half-baked idea prompted by #22000 (comment): we could pass reduction functions through the indexing syntax in order to specify which dimensions get reduced. E.g., b[sum, :, sum] == sum(b, (1,3)). This would intuitively drop dimensions.

@mbauman mbauman added the arrays [a, r, r, a, y, s] label May 30, 2017
@nalimilan
Copy link
Member

That's an appealing approach, which reminds me of the X_{i+} notation for denoting column sums.

@StefanKarpinski
Copy link
Member

Huh, that's a pretty cool syntax idea.

@rapus95
Copy link
Contributor

rapus95 commented Jul 9, 2017

i guess it's a dumb question, but why don't we differ the behaviour in the same way as we do for array indexing? sum(A, 3) will drop while sum(A, 3:3) won't drop the dimension

Edit: rethinking my idea showed me why: for indexing colon style is used per dimension while for sum the colon style is used across dimensions

Edit2: #16606 (comment) sounds indeed very clever

@StefanKarpinski
Copy link
Member

I have proposed, elsewhere squeeze(f, A, n) = squeeze(f(A, n), n) so that you can write squeeze(sum, A, 2) instead of squeeze(sum(A, 2), 2).

@StefanKarpinski StefanKarpinski added the needs decision A decision on this change is needed label Aug 24, 2017
mbauman added a commit that referenced this issue Aug 29, 2017
This simple definition makes it easier to write reductions that drops the dimensions over which they reduce. Fixes #16606, addresses part of the root issue in #22000.
@StefanKarpinski
Copy link
Member

Since we're unlikely to change this in 1.0 and adding squeeze is non-breaking, this is no longer release-blocking.

@StefanKarpinski StefanKarpinski removed this from the 1.0 milestone Oct 12, 2017
@StefanKarpinski
Copy link
Member

StefanKarpinski commented Aug 13, 2019

The argument against the keyword form is not that keyword arguments are slow (when that applied, dims also wasn't a keyword, so I don't think that was every the problem), it's that you have to add that keyword to every single reducer everywhere. So we can do this in Base but any custom reducers out there will not have support for the same pattern. It's just weird to have to implement the same completely identical functionality over and over again, even though there's a completely composable solution that requires no code changes at all, just a new function.

Also, no one has addressed the type stability argument: changing return type based on a keyword argument is not generally a good idea.

@nickrobinson251
Copy link
Contributor

Yeah, I think the arguments for dropdims(f, args...; dims, kwargs...) are pretty good. I think @oxinabox was just being greedy (which is good!), and pointing out that we can currently use a do-block to sum

sum(A; dims=1) do x
    x > 0 ? log(x) : x
end 

and in some ways in would be nice to be able to write (something like)

@dropdims sum(A; dims=1) do x
    x > 0 ? log(x) : x
end 

rather than e.g.

dropdims(sum, x -> x > 0 ? log(x) : x, A, dims=1)

or

g(x) = x > 0 ? log(x) : x
dropdims(sum, g, A, dims=1)

But all are an approvement over the current

dropdims(
    sum(X; dims=1) do x
        x > 0 ? log(x) : x
    end,
    dims=1
)

@tkf
Copy link
Member

tkf commented Aug 13, 2019

In #32860 we are discussing possible syntaxes for reduce/foldl/foreach. One idea for supporting multidimensional reduce is to add a syntax (say)

a[., ., :]

which is lowered to

Axed{(1, 2)}(a)

where

struct Axed{dims, T}
    x::T
end

(probably it should be Axed <: AbstractArray)

This way,

dropdims(sum(a[., ., :]))

can implement a type-stable version of

dropdims(sum(a; dims=(1, 2)); dims=(1, 2))

@mcabbott
Copy link
Contributor

The argument that adding drop=true to all methods would be a pain also applies to things like sum(A, squeeze=1) from #16606 (comment) , unfortunately.

Here's a quick attempt at a macro which allows sum(A; dims=2) do ... end, to try it out:

using MacroTools

macro dropdims(ex) # doesn't handle reduce(...; dims=..., init=...) etc.
    MacroTools.postwalk(ex) do x
        if @capture(x, red_(args__, dims=d_)) || @capture(x, red_(args__; dims=d_)) 
            :( dropdims($x; dims=$d) )
        elseif @capture(x, dropdims(red_(args__, dims=d1_); dims=d2_) do z_ body_ end) || 
               @capture(x, dropdims(red_(args__; dims=d1_); dims=d2_) do z_ body_ end)
            :( dropdims($red($z -> $body, $(args...); dims=$d1); dims=$d2) )
        else
            x
        end
    end |> esc
end

@dropdims begin
    a = sum(ones(3,7), dims=2)
    b = sum(10 .* randn(2,10); dims=2) do x                      
        trunc(Int, x)
    end
    (a isa Vector, b isa Vector) # (true, true)
end

@tkf
Copy link
Member

tkf commented Aug 17, 2019

Thinking about "dims syntax" #16606 (comment) a bit more, sum(a[., ., :]) should probably return a lazy object so that y .= sum(a[., ., :]) does not have to allocate anything. Note that copyto!(::Array{N}, ::Axed{dims, <:Array{M}}) where N === M - length(dims) can automatically drop dimensions.

@mcabbott
Copy link
Contributor

If this a[., ., :] object is something like what eachslice makes, then I suspect reductions should always drop dimensions, like prod.(eachcol(x)) does now. Although more efficiently, without breaking y .= prod.(eachcol(x)). Notice that prod(eachcol(x)) is an error.

@tkf
Copy link
Member

tkf commented Aug 17, 2019

If this a[., ., :] object is something like what eachslice makes

I should have clarified that it is not the case. My model of Axed is just "an array paired with dims" and not an iterator-of-iterators. I'd even define simple forwarding methods like:

ndims(a::Axed) = ndims(a.x)
size(a::Axed) = size(a.x)
getindex(a::Axed, I...) = getindex(a.x, I...)

Axed acts as a type-stable way to mark dimensions that can be dropped when there is no ambiguity (i.e., N === M - length(dims)).

@mcabbott
Copy link
Contributor

Ah, I didn't realise you were thinking of Axed as behaving just as label, while I was thinking of it being like Slices (views of the array). Both ways allow something like dropdims(sum(A, dims=(2,4)), dims=(2,4)) to be written more visually. Are you picturing also using it for other things, perhaps like this?

sum(p .* log.(p[:,.] ./ q)

Here log needs to act on the scalars of Axed, which under Slices you'd have to write more like sum((p .* log.(p ./ q))[:,.])

In the other thread I suggested that the Slices view might allow all sorts of generalised mapslices! operations to be done neatly:

@views b[:, .] .= f.(a[:, .])  # foreach((x,y) -> x .= f(y), eachcol(b), eachcol(a))
c[:, .] := f.(a[:, .])         # c = mapslices(f, a, dims=1)

@tkf
Copy link
Member

tkf commented Aug 18, 2019

I implemented a function ReduceDims.along to demonstrate the idea. It's somewhat inspired by bramtayl's JuliennedArrays.jl and @mcabbott's comments. The syntax is something like

sum(along(x, &, &, :))

which is equivalent to sum(x; dims=(1, 2)). [1] Here is a demo

julia> using ReduceDims

julia> x = ones(2, 3, 4);

julia> y = sum(along(x, &, :, :))  # lazy sum(x; dims=1)
mapreduce(identity, add_sum, along(::Array{Float64,3}, (1,)))

julia> copy(y) isa Array{Float64, 3}
true

julia> dropdims(y) isa Array{Float64, 2}  # type stable
true

julia> zeros(1, 3, 4) .= y;  # non-allocating

julia> zeros(3, 4) .= y;  # automatically drop dims

With #31020, you can also fuse it with broadcasting:

julia> using LazyArrays: @~  # need LazyArrays until #19198 is resolved

julia> y = sum(along(@~(x .+ 1), :, &, :))
mapreduce(identity, add_sum, along(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{3},Tuple{Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64}},typeof(+),Tuple{Array{Float64,3},Int64}}, (2,)))

julia> zeros(1, 3, 4) .= y;

[1] There is also + to indicate "repeat previous marker" (like regexp):

along(x::Array{<:Any, 2}, &, +, :) === along(x, &, :)
along(x::Array{<:Any, 3}, &, +, :) === along(x, &, &, :)
along(x::Array{<:Any, 4}, &, +, :) === along(x, &, &, &, :)
along(x::Array{<:Any, 2}, :, +, &) === along(x, :, &)
along(x::Array{<:Any, 3}, :, +, &) === along(x, :, :, &)
along(x::Array{<:Any, 4}, :, +, &) === along(x, :, :, :, &)

@tkf
Copy link
Member

tkf commented Aug 18, 2019

@mcabbott I think something like sum((p .* log.(p ./ q))[:,.]) should work with dims-as-label approach. In fact y .= sum(along(@~(p .* log.(p ./ q)), :, &)) should already work.

The reason behind dims-as-label approach is that this also support non-dropping API as well. IIUC, your nested approach would drop dimensions unless you explicitly mark the non-dropping axes in the destination (as in@views b[:, .] .= f.(a[:, .]))?

But I agree that iterator-of-iterators approach is quite useful especially because it lets you use functions that are not aware of dims. I think it's useful to have both while sharing the "axes specifier" syntax. For example, it could be implemented using syntax like eachslice(x, &, &, :).

nickrobinson251 pushed a commit to invenia/julia that referenced this issue Aug 31, 2019
This simple definition makes it easier to write reductions that drops the dimensions over which they reduce. Fixes JuliaLang#16606, addresses part of the root issue in JuliaLang#22000.
@AzamatB
Copy link
Contributor

AzamatB commented Nov 2, 2020

Can we do something about this? Would be great if we can replace dropdims(sum(A; dims=1); dims=1) that I have all over my codebase with something less ugly.

@StefanKarpinski StefanKarpinski added the triage This should be discussed on a triage call label Nov 30, 2020
@oscardssmith
Copy link
Member

The better approach here is probably to use sum(eachslice(A,...)). In general, dims=... was probably a design mistake, and rather than working on supporting it more, we should work on making more composable methods to get the same functionality.

@dorn-gerhard
Copy link

from a performance side

  1. sum(eachslice(A,...)) seems to be slow compared to
  2. dropdims(sum(A,dims=3),dims=3))

or using

  1. a function (found on [https://stackoverflow.com/questions/53076017/sum-elements-of-an-array-over-a-given-dimension]) like
function f(x::AbstractArray{<:Number, 3})
  nrows, ncols, n = size(x)
  result = zeros(eltype(x), nrows, ncols)
  for k in 1:n
    for j in 1:ncols
      for i in 1:nrows
        result[i,j] += x[i,j,k]
      end
    end
  end
  result
end

For A = rand(3,3,1000) I get using @time:

  1. 0.000104 seconds (1.00 k allocations: 156.219 KiB)
  2. 0.000025 seconds (15 allocations: 496 bytes)
  3. 0.000021 seconds (1 allocation: 160 bytes)

@DilumAluthge DilumAluthge removed this from the 1.x milestone Mar 13, 2022
@brenhinkeller brenhinkeller added breaking This change will break code and removed triage This should be discussed on a triage call labels Nov 16, 2022
@ParadaCarleton
Copy link
Contributor

ParadaCarleton commented Dec 3, 2022

  1. 0.000104 seconds (1.00 k allocations: 156.219 KiB)
  2. 0.000025 seconds (15 allocations: 496 bytes)
  3. 0.000021 seconds (1 allocation: 160 bytes)

eachslice is currently slow because it's type-unstable. However, #32310 fixes this, so sum(eachslice(...)) should be just as fast as the other two methods starting with Julia 1.9.

2.0 should probably remove mapslices and dims keyword arguments entirely, since they can be replaced by the more-consistent map(f, eachslice(...)).

@mcabbott
Copy link
Contributor

mcabbott commented Dec 3, 2022

sum(eachslice(...)) should be just as fast as the other two methods

Timing this:

julia> let A = rand(1000, 1000);
         B = @btime sum($A; dims=2)
         C = @btime sum(eachcol($A))  # adds whole slices, won't work for prod
         D = @btime map(sum, eachslice($A; dims=1, drop=false))
         B  reshape(C,:,1)  D
       end
  min 256.542 μs, mean 264.496 μs (5 allocations, 8.02 KiB)
  min 681.833 μs, mean 13.722 ms (1000 allocations, 7.74 MiB)
  min 1.453 ms, mean 1.464 ms (1 allocation, 7.94 KiB)
true

julia> VERSION
v"1.10.0-DEV.100"

@ParadaCarleton
Copy link
Contributor

sum(eachslice(...)) should be just as fast as the other two methods

Timing this:

julia> let A = rand(1000, 1000);
         B = @btime sum($A; dims=2)
         C = @btime sum(eachcol($A))  # adds whole slices, won't work for prod
         D = @btime map(sum, eachslice($A; dims=1, drop=false))
         B  reshape(C,:,1)  D
       end
  min 256.542 μs, mean 264.496 μs (5 allocations, 8.02 KiB)
  min 681.833 μs, mean 13.722 ms (1000 allocations, 7.74 MiB)
  min 1.453 ms, mean 1.464 ms (1 allocation, 7.94 KiB)
true

julia> VERSION
v"1.10.0-DEV.100"

Hmm, that's very weird. The relative performance seems worse now that the type instability is fixed, despite using way fewer allocations?

@mcabbott
Copy link
Contributor

mcabbott commented Dec 4, 2022

(Why the quote?)

You'll see different timings on different sizes & dimensions. One issue my example highlights is that some ways of slicing are very cache-unfriendly.

You could overload say map(::typeof(sum), ::Slices) to magically use the fast path. One problem is this change of types:

julia> sum(CUDA.rand(10, 10), dims=1)
1×10 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
 5.94849  5.52325  5.60857  5.23081  5.4453  3.76833  6.61417  2.87759  5.43322  4.84982

julia> map(sum, eachslice(rand(Float32, 10, 10), dims=2, drop=false))
1×10 Matrix{Float32}:
 5.65775  5.5501  5.40247  3.77431  4.48232  5.90293  6.42368  5.52671  5.20836  5.30931

Another is how this ought to be written:

julia> sum(ones(2, 7); dims=2, init=0.0+3im)
2×1 Matrix{ComplexF64}:
 7.0 + 3.0im
 7.0 + 3.0im

There is an issue a bit like this with the magic fast path for reduce(vcat, xs), where supplying init breaks the spell.

@ParadaCarleton
Copy link
Contributor

ParadaCarleton commented Dec 4, 2022

We could make sum in particular go fast by overloading, but we want mapping slices to be fast regardless of the function.

Is this a problem with Slices having overhead that causes poor performance? Does map need to be specialized to work more efficiently with Slices? Or is it just that sum does some extra-special magic to go faster, and it wouldn't work for other functions? (Maybe that's just because it's a reducing function, in which case we should probably add something to the performance tips explaining reduce(x; dims=...) is faster than mapping over slices?) I'll have to benchmark more details later today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] breaking This change will break code
Projects
None yet
Development

Successfully merging a pull request may close this issue.