Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliding SubArray construction (getting greedy) #10899

Closed
timholy opened this issue Apr 19, 2015 · 9 comments
Closed

Eliding SubArray construction (getting greedy) #10899

timholy opened this issue Apr 19, 2015 · 9 comments
Labels
performance Must go faster

Comments

@timholy
Copy link
Member

timholy commented Apr 19, 2015

With the tuple overhaul merged, I wonder if we're getting closer to being able to hope that SubArray construction might someday become a no-op in certain circumstances. Here's my "elide-friendly" test function:

function sumcols(A)
    s = 0.0
    for j = 1:size(A,2)
        Aj = slice(A, :, j)
        for i = 1:size(A,1)  # length(Aj)
            @inbounds s += A[i,j]  # Aj[i]
        end
    end
    s
end

Note that this is written in a way that emphasizes that escape-analysis is not needed to make this non-allocating, because Aj is not actually used.

I also took the liberty of locally sprinkling @_inline_meta()s (and equivalent) in all relevant places and rebuilding julia.

For a Matrix{Float64} A, I get this from @time:

julia> @time sumcols(A)
elapsed time: 0.000669812 seconds (859 kB allocated)
24920.401360340726

and this from @code_typed sumcols(A):

1-element Array{Any,1}:
 :($(Expr(:lambda, Any[:A], Any[Any[:s,symbol("#s2"),:j,:Aj,symbol("#s3"),:i,symbol("######f#8031#8033#8035"),symbol("######s#8032#8034#8036")],Any[Any[:A,Array{Float64,2},0],Any[:s,Float64,2],Any[symbol("#s2"),Int64,2],Any[:j,Int64,18],Any[:Aj,SubArray{Float64,1,Array{Float64,2},Tuple{Colon,Int64},2},18],Any[symbol("#s3"),Int64,2],Any[:i,Int64,18],Any[symbol("######f#8031#8033#8035"),Int64,2],Any[symbol("######s#8032#8034#8036"),Int64,2]],Any[],Any[UnitRange{Int64},Tuple{Int64,Int64},UnitRange{Int64},Tuple{Int64,Int64},Int64,Colon,Int64,Int64,Tuple{Int64},Int64,Int64,Int64,Int64,Int64,Int64]], :(begin  # none, line 2:
        s = 0.0 # line 3:
        GenSym(4) = (top(arraysize))(A::Array{Float64,2},2)::Int64
        GenSym(0) = $(Expr(:new, UnitRange{Int64}, 1, :(((top(getfield))(Intrinsics,:select_value))((top(sle_int))(1,GenSym(4))::Bool,GenSym(4),(top(box))(Int64,(top(sub_int))(1,1)))::Int64)))
        #s2 = (top(getfield))(GenSym(0),:start)::Int64
        unless (top(box))(Bool,(top(not_int))(#s2::Int64 === (top(box))(Int64,(top(add_int))((top(getfield))(GenSym(0),:stop)::Int64,1))::Bool)) goto 1
        2: 
        GenSym(11) = #s2::Int64
        GenSym(12) = (top(box))(Int64,(top(add_int))(#s2::Int64,1))
        j = GenSym(11)
        #s2 = GenSym(12) # line 4:
        GenSym(5) = (:)
        (top(arraysize))(A::Array{Float64,2},1)::Int64
        nothing
        checkbounds((top(arraysize))(A::Array{Float64,2},2)::Int64,j::Int64)::Bool
        ######f#8031#8033#8035 = 1
        ######s#8032#8034#8036 = 1
        ######f#8031#8033#8035 = (top(box))(Int64,(top(add_int))(######f#8031#8033#8035::Int64,(top(box))(Int64,(top(mul_int))((top(box))(Int64,(top(sub_int))(1,1)),######s#8032#8034#8036::Int64))))
        GenSym(6) = (top(arraysize))(A::Array{Float64,2},1)::Int64
        ######s#8032#8034#8036 = (top(box))(Int64,(top(mul_int))(######s#8032#8034#8036::Int64,GenSym(6)))
        ######f#8031#8033#8035 = (top(box))(Int64,(top(add_int))(######f#8031#8033#8035::Int64,(top(box))(Int64,(top(mul_int))((top(box))(Int64,(top(sub_int))(j::Int64,1)),######s#8032#8034#8036::Int64))))
        GenSym(7) = (top(arraysize))(A::Array{Float64,2},2)::Int64
        ######s#8032#8034#8036 = (top(box))(Int64,(top(mul_int))(######s#8032#8034#8036::Int64,GenSym(7)))
        GenSym(9) = (top(arraysize))(A::Array{Float64,2},1)::Int64
        Aj = $(Expr(:new, SubArray{Float64,1,Array{Float64,2},Tuple{Colon,Int64},2}, :(A::Array{Float64,2}), :(tuple(GenSym(5),j::Int64)::Tuple{Colon,Int64}), :(tuple(GenSym(9))::Tuple{Int64}), :(######f#8031#8033#8035::Int64), :((top(box))(Int64,(top(mul_int))(1,1))))) # line 5:
        GenSym(10) = (top(arraysize))(A::Array{Float64,2},1)::Int64
        GenSym(2) = $(Expr(:new, UnitRange{Int64}, 1, :(((top(getfield))(Intrinsics,:select_value))((top(sle_int))(1,GenSym(10))::Bool,GenSym(10),(top(box))(Int64,(top(sub_int))(1,1)))::Int64)))
        #s3 = (top(getfield))(GenSym(2),:start)::Int64
        unless (top(box))(Bool,(top(not_int))(#s3::Int64 === (top(box))(Int64,(top(add_int))((top(getfield))(GenSym(2),:stop)::Int64,1))::Bool)) goto 5
        6: 
        GenSym(13) = #s3::Int64
        GenSym(14) = (top(box))(Int64,(top(add_int))(#s3::Int64,1))
        i = GenSym(13)
        #s3 = GenSym(14) # line 6:
        s = (top(box))(Float64,(top(add_float))(s::Float64,(top(arrayref))((top(getfield))(Aj::SubArray{Float64,1,Array{Float64,2},Tuple{Colon,Int64},2},:parent)::Array{Float64,2},(top(box))(Int64,(top(sub_int))((top(box))(Int64,(top(add_int))((top(getfield))(Aj::SubArray{Float64,1,Array{Float64,2},Tuple{Colon,Int64},2},:first_index)::Int64,i::Int64)),1)))::Float64))
        7: 
        unless (top(box))(Bool,(top(not_int))((top(box))(Bool,(top(not_int))(#s3::Int64 === (top(box))(Int64,(top(add_int))((top(getfield))(GenSym(2),:stop)::Int64,1))::Bool)))) goto 6
        5: 
        4: 
        3: 
        unless (top(box))(Bool,(top(not_int))((top(box))(Bool,(top(not_int))(#s2::Int64 === (top(box))(Int64,(top(add_int))((top(getfield))(GenSym(0),:stop)::Int64,1))::Bool)))) goto 2
        1: 
        0:  # line 9:
        return s::Float64
    end::Float64))))

From what I can tell, this demonstrates that everything was indeed inlined (a search for call turns up empty).

The corresponding LLVM IR has several alloca/allocobj calls in it, which I presume is the source of the allocation.

Users-list discussion: https://groups.google.com/d/msg/julia-users/sq5gj-3TdQU/NB0bjmqWjZ0J

@sebastiang
Copy link

This code as written seems more like just a test of dead code elimination, no? Each of the slices you allocate goes unused, but something about the annotations or the optimization passes fail to figure that out.

@carnaval
Copy link
Contributor

If a reference is stored on the gc frame LLVM probably can't help you. If it is not then I think there is some flag we can turn on for the various gc allocation functions which tells llvm they have essentially malloc semantics : you can remove if unused and assume different calls can"t alias. Can't be bad to add this flag anyway.

@timholy
Copy link
Member Author

timholy commented Apr 19, 2015

Yes, it could be handled by dead-code elimination, but I was taking advantage of the fact that it isn't 😄 to prove a point.

@JeffBezanson
Copy link
Member

See also #8134

@sebastiang
Copy link

#8134 would probably have to look pretty deep to help typical SubArray construction, but it would be really nice. A lot of these constructions are going to inline 4 or 5 functions deep and then turn into a one or two instruction operation on the data structure in question. If that structure were on the stack, and it were allocated 'close' enough to its use, it could even be possible that LLVM would then drop the parts of the structure that weren't even used. But I don't see much work recently on #8134, and I guess like so many optimizations, it's probably quite sensitive to where it gets executed, i.e. earlier or later in the pipeline.

@andreasnoack
Copy link
Member

Didn't #15259 fix this? I don't see the allocations anymore on 0.6.

@yuyichao
Copy link
Contributor

Yes, it is fixed, as mentioned in #23240, which is why this was not a dup of #16190.

The issue dup with this is #12415 and that's fixed by #16021 (the last round of #23240).
The "more greedy" version of this mentioned in #23240 could be dup of #16190, depending on how greedy you are.

Currently, on #23240.

julia> function sumcols(A)
           s = 0.0
           for j = 1:size(A,2)
               Aj = @view A[:, j]
               for i = 1:length(Aj)
                   @inbounds s += Aj[i]
               end
           end
           s
       end
sumcols (generic function with 1 method)

julia> @code_warntype sumcols([1.0 2.0; 3.0 4.0])
Variables:
  #self#::#sumcols
  A::Array{Float64,2}
  s::Float64
  #temp#@_4::Int64
  j::Int64
  Aj::Any
  #temp#@_7::Int64
  i::Int64
  #temp#@_9::Any
  J::Tuple{Base.Slice{Base.OneTo{Int64}},Int64}
  stride1::Int64
  #temp#@_12::Base.OneTo{Int64}
  IP::Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}
  Δi@_14::Int64
  Δi@_15::Int64
  #104::Base.##104#105
  r::Float64

Body:
  begin
      s::Float64 = 0.0
      #= line 3 =#
      $(Expr(:inbounds, false))
      # meta: location array.jl size 94
      SSAValue(4) = (Base.arraysize)(A::Array{Float64,2}, 2)::Int64
      # meta: pop location
      $(Expr(:inbounds, :pop))
      $(Expr(:inbounds, false))
      # meta: location range.jl colon 5
      # meta: location range.jl Type 150
      # meta: location range.jl unitrange_last 155
      # meta: location operators.jl >= 270
      # meta: location int.jl <= 376
      SSAValue(8) = (Base.sle_int)(1, SSAValue(4))::Bool
      # meta: pop location
      # meta: pop location
      # meta: location int.jl - 42
      SSAValue(6) = (Base.sub_int)(1, 1)::Int64
      # meta: pop location
      # meta: location operators.jl ifelse 292
      SSAValue(9) = (Base.select_value)(SSAValue(8), SSAValue(4), SSAValue(6))::Int64
      # meta: pop location
      # meta: pop location
      # meta: pop location
      # meta: pop location
      $(Expr(:inbounds, :pop))
      #temp#@_4::Int64 = 1
      28:
      $(Expr(:inbounds, false))
      # meta: location range.jl done 465
      # meta: location int.jl + 43
      SSAValue(15) = (Base.add_int)(SSAValue(9), 1)::Int64
      # meta: pop location
      # meta: location promotion.jl == 351
      SSAValue(16) = (#temp#@_4::Int64 === SSAValue(15))::Bool
      # meta: pop location
      # meta: pop location
      $(Expr(:inbounds, :pop))
      $(Expr(:inbounds, false))
      # meta: location bool.jl ! 38
      SSAValue(18) = (Base.not_int)(SSAValue(16))::Bool
      # meta: pop location
      $(Expr(:inbounds, :pop))
      unless SSAValue(18) goto 336
      $(Expr(:inbounds, false))
      # meta: location range.jl next 464
      # meta: location int.jl + 43
      SSAValue(19) = (Base.add_int)(#temp#@_4::Int64, 1)::Int64
      # meta: pop location
      SSAValue(242) = #temp#@_4::Int64
      # meta: pop location
      $(Expr(:inbounds, :pop))
      #temp#@_4::Int64 = SSAValue(19)
      #= line 4 =#
      $(Expr(:inbounds, false))
      # meta: location subarray.jl view 112
      # meta: location indices.jl to_indices 213
      # meta: location abstractarray.jl indices 69
      # meta: location array.jl size 96
      SSAValue(22) = (Base.arraysize)(A::Array{Float64,2}, 1)::Int64
      SSAValue(23) = (Base.arraysize)(A::Array{Float64,2}, 2)::Int64
      # meta: pop location
      # meta: location tuple.jl map 149
      # meta: location range.jl Type 177
      # meta: location range.jl Type 175
      # meta: location promotion.jl max 362
      # meta: location int.jl < 39
      SSAValue(30) = (Base.slt_int)(SSAValue(22), 0)::Bool
      # meta: pop location
      SSAValue(31) = (Base.select_value)(SSAValue(30), 0, SSAValue(22))::Int64
      # meta: pop location
      # meta: pop location
      # meta: pop location
      # meta: pop location
      # meta: pop location
      # meta: location multidimensional.jl to_indices 489
      # meta: location multidimensional.jl uncolon 499
      # meta: location indices.jl Type 233
      # meta: location indices.jl Type 233
      # meta: location range.jl convert 780
      # meta: location range.jl Type 175
      # meta: location promotion.jl max 362
      # meta: location int.jl < 39
      SSAValue(42) = (Base.slt_int)(SSAValue(31), 0)::Bool
      # meta: pop location
      SSAValue(43) = (Base.select_value)(SSAValue(42), 0, SSAValue(31))::Int64
      # meta: pop location
      SSAValue(45) = $(Expr(:new, Base.OneTo{Int64}, SSAValue(43)))
      # meta: pop location
      # meta: pop location
      SSAValue(48) = $(Expr(:new, Base.Slice{Base.OneTo{Int64}}, SSAValue(45)))
      # meta: pop location
      # meta: pop location
      # meta: pop location
      # meta: pop location
      # meta: pop location
      #= line 113 =#
      # meta: location abstractarray.jl checkbounds 360
      # meta: location abstractarray.jl checkbounds 340
      # meta: location abstractarray.jl indices 69
      # meta: location array.jl size 96
      (Base.arraysize)(A::Array{Float64,2}, 1)::Int64
      SSAValue(54) = (Base.arraysize)(A::Array{Float64,2}, 2)::Int64
      # meta: pop location
      # meta: location tuple.jl map 149
      # meta: location range.jl Type 177
      # meta: location range.jl Type 175
      # meta: location promotion.jl max 362
      # meta: location int.jl < 39
      SSAValue(56) = (Base.slt_int)(SSAValue(54), 0)::Bool
      # meta: pop location
      SSAValue(57) = (Base.select_value)(SSAValue(56), 0, SSAValue(54))::Int64
      # meta: pop location
      # meta: pop location
      # meta: pop location
      # meta: pop location
      # meta: pop location
      # meta: location abstractarray.jl checkbounds_indices 387
      # meta: location abstractarray.jl checkbounds_indices 400
      # meta: location abstractarray.jl checkindex 453
      # meta: location int.jl <= 376
      SSAValue(74) = (Base.sle_int)(1, SSAValue(242))::Bool
      # meta: pop location
      # meta: location int.jl <= 376
      SSAValue(73) = (Base.sle_int)(SSAValue(242), SSAValue(57))::Bool
      # meta: pop location
      # meta: location bool.jl & 42
      SSAValue(75) = (Base.and_int)(SSAValue(74), SSAValue(73))::Bool
      # meta: pop location
      # meta: pop location
      # meta: pop location
      # meta: location bool.jl & 42
      SSAValue(78) = (Base.and_int)(true, SSAValue(75))::Bool
      # meta: pop location
      # meta: pop location
      # meta: pop location
      unless SSAValue(78) goto 139
      goto 142
      139:
      SSAValue(177) = (Core.tuple)(SSAValue(48), SSAValue(242))::Tuple{Base.Slice{Base.OneTo{Int64}},Int64}
      $(Expr(:invoke, MethodInstance for throw_boundserror(::Array{Float64,2}, ::Tuple{Base.Slice{Base.OneTo{Int64}},Int64}), :(Base.throw_boundserror), :(A), SSAValue(177)))::Union{}
      142:
      # meta: pop location
      #= line 114 =#
      # meta: location subarray.jl unsafe_view 119
      # meta: location subarray.jl Type 22
      # meta: location subarray.jl Type 31
      # meta: location subarray.jl compute_stride1 241
      # meta: location abstractarray.jl indices 69
      # meta: location array.jl size 96
      (Base.arraysize)(A::Array{Float64,2}, 1)::Int64
      (Base.arraysize)(A::Array{Float64,2}, 2)::Int64
      # meta: pop location
      # meta: pop location
      # meta: pop location
      #= line 32 =#
      # meta: location subarray.jl compute_offset1 267
      # meta: location subarray.jl compute_offset1 269
      # meta: location subarray.jl compute_linindex 276
      # meta: location abstractarray.jl indices 69
      # meta: location array.jl size 96
      SSAValue(138) = (Base.arraysize)(A::Array{Float64,2}, 1)::Int64
      (Base.arraysize)(A::Array{Float64,2}, 2)::Int64
      # meta: pop location
      # meta: location tuple.jl map 149
      # meta: location range.jl Type 177
      # meta: location range.jl Type 175
      # meta: location promotion.jl max 362
      # meta: location int.jl < 39
      SSAValue(146) = (Base.slt_int)(SSAValue(138), 0)::Bool
      # meta: pop location
      SSAValue(147) = (Base.select_value)(SSAValue(146), 0, SSAValue(138))::Int64
      # meta: pop location
      # meta: pop location
      # meta: pop location
      # meta: pop location
      # meta: pop location
      #= line 277 =#
      # meta: location subarray.jl compute_linindex 286
      # meta: location int.jl - 42
      SSAValue(153) = (Base.sub_int)(1, 1)::Int64
      # meta: pop location
      #= line 287 =#
      # meta: location int.jl * 44
      SSAValue(161) = (Base.mul_int)(SSAValue(153), 1)::Int64
      # meta: pop location
      # meta: location int.jl + 43
      SSAValue(162) = (Base.add_int)(1, SSAValue(161))::Int64
      # meta: pop location
      # meta: location int.jl * 44
      SSAValue(160) = (Base.mul_int)(1, SSAValue(147))::Int64
      # meta: pop location
      # meta: location subarray.jl compute_linindex 281
      # meta: location int.jl - 42
      SSAValue(165) = (Base.sub_int)(SSAValue(242), 1)::Int64
      # meta: pop location
      #= line 282 =#
      # meta: location int.jl * 44
      SSAValue(169) = (Base.mul_int)(SSAValue(165), SSAValue(160))::Int64
      # meta: pop location
      # meta: location int.jl + 43
      SSAValue(170) = (Base.add_int)(SSAValue(162), SSAValue(169))::Int64
      # meta: pop location
      # meta: pop location
      # meta: pop location
      # meta: pop location
      # meta: location abstractarray.jl indices 51
      # meta: location int.jl <= 376
      SSAValue(112) = (Base.sle_int)(1, 2)::Bool
      # meta: pop location
      unless SSAValue(112) goto 219
      # meta: location abstractarray.jl indices 69
      # meta: location array.jl size 96
      (Base.arraysize)(A::Array{Float64,2}, 1)::Int64
      (Base.arraysize)(A::Array{Float64,2}, 2)::Int64
      # meta: pop location
      # meta: pop location
      goto 220
      219:
      220:
      # meta: pop location
      # meta: location int.jl * 44
      SSAValue(132) = (Base.mul_int)(1, 1)::Int64
      # meta: pop location
      # meta: location int.jl - 42
      SSAValue(171) = (Base.sub_int)(SSAValue(170), SSAValue(132))::Int64
      # meta: pop location
      # meta: pop location
      # meta: pop location
      # meta: pop location
      # meta: pop location
      # meta: pop location
      # meta: pop location
      $(Expr(:inbounds, :pop))
      goto 236
      236:
      #= line 5 =#
      $(Expr(:inbounds, false))
      # meta: location abstractarray.jl length 126
      # meta: location subarray.jl size 56
      # meta: location subarray.jl indices 319
      # meta: location subarray.jl _indices_sub 324
      # meta: location indices.jl unsafe_indices 236
      SSAValue(189) = (Core.getfield)(SSAValue(48), :indices)::Base.OneTo{Int64}
      # meta: pop location
      # meta: pop location
      # meta: pop location
      # meta: location tuple.jl map 148
      # meta: location subarray.jl #104 56
      # meta: location range.jl unsafe_length 384
      SSAValue(195) = (Core.getfield)(SSAValue(189), :stop)::Int64
      # meta: pop location
      # meta: pop location
      # meta: pop location
      # meta: pop location
      # meta: pop location
      $(Expr(:inbounds, :pop))
      $(Expr(:inbounds, false))
      # meta: location range.jl colon 5
      # meta: location range.jl Type 150
      # meta: location range.jl unitrange_last 155
      # meta: location operators.jl >= 270
      # meta: location int.jl <= 376
      SSAValue(201) = (Base.sle_int)(1, SSAValue(195))::Bool
      # meta: pop location
      # meta: pop location
      # meta: location int.jl - 42
      SSAValue(199) = (Base.sub_int)(1, 1)::Int64
      # meta: pop location
      # meta: location operators.jl ifelse 292
      SSAValue(202) = (Base.select_value)(SSAValue(201), SSAValue(195), SSAValue(199))::Int64
      # meta: pop location
      # meta: pop location
      # meta: pop location
      # meta: pop location
      $(Expr(:inbounds, :pop))
      #temp#@_7::Int64 = 1
      278:
      $(Expr(:inbounds, false))
      # meta: location range.jl done 465
      # meta: location int.jl + 43
      SSAValue(208) = (Base.add_int)(SSAValue(202), 1)::Int64
      # meta: pop location
      # meta: location promotion.jl == 351
      SSAValue(209) = (#temp#@_7::Int64 === SSAValue(208))::Bool
      # meta: pop location
      # meta: pop location
      $(Expr(:inbounds, :pop))
      $(Expr(:inbounds, false))
      # meta: location bool.jl ! 38
      SSAValue(211) = (Base.not_int)(SSAValue(209))::Bool
      # meta: pop location
      $(Expr(:inbounds, :pop))
      unless SSAValue(211) goto 334
      $(Expr(:inbounds, false))
      # meta: location range.jl next 464
      # meta: location int.jl + 43
      SSAValue(212) = (Base.add_int)(#temp#@_7::Int64, 1)::Int64
      # meta: pop location
      SSAValue(241) = #temp#@_7::Int64
      # meta: pop location
      $(Expr(:inbounds, :pop))
      #temp#@_7::Int64 = SSAValue(212)
      #= line 6 =#
      $(Expr(:inbounds, true))
      $(Expr(:inbounds, false))
      # meta: location subarray.jl getindex 201
      308:
      309:
      #= line 202 =#
      $(Expr(:inbounds, true))
      $(Expr(:inbounds, false))
      # meta: location int.jl + 43
      SSAValue(228) = (Base.add_int)(SSAValue(171), SSAValue(241))::Int64
      # meta: pop location
      $(Expr(:inbounds, :pop))
      $(Expr(:inbounds, false))
      # meta: location array.jl getindex 586
      SSAValue(229) = (Base.arrayref)(A::Array{Float64,2}, SSAValue(228))::Float64
      # meta: pop location
      $(Expr(:inbounds, :pop))
      $(Expr(:inbounds, :pop))
      # meta: pop location
      $(Expr(:inbounds, :pop))
      $(Expr(:inbounds, false))
      # meta: location float.jl + 376
      SSAValue(237) = (Base.add_float)(s::Float64, SSAValue(229))::Float64
      # meta: pop location
      $(Expr(:inbounds, :pop))
      s::Float64 = SSAValue(237)
      $(Expr(:inbounds, :pop))
      332:
      goto 278
      334:
      goto 28
      336:
      #= line 9 =#
      return s::Float64
  end::Float64

The AST is much longer due to the linearized IR so it is left as an exercise for the reader to make sure that there isn't any allocation of SubArrays ;-p

@andreasnoack
Copy link
Member

But I guess this issue should then be closed

@yuyichao
Copy link
Contributor

Sure... I just wonder if @timholy is now more greedy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

7 participants