Skip to content
This repository has been archived by the owner on May 4, 2019. It is now read-only.

sum() slower than custom summing function #120

Open
nalimilan opened this issue Jun 18, 2016 · 2 comments
Open

sum() slower than custom summing function #120

nalimilan opened this issue Jun 18, 2016 · 2 comments

Comments

@nalimilan
Copy link
Member

Despite being specialized for NullableArray, sum is slower than a loop written by hand, both on current master and on my branch improving operations on Nullable. This is most likely due to branching and to some unexpected allocations. We should check whether the compiler is able to generate optimal code when calling the operators naively, or whether specializing reduce for NullableArray can still be useful.

On a recent Julia git master, with my operations branch:

julia> using NullableArrays

julia> using BenchmarkTools

julia> x = NullableArray(rand(10000000), rand(Bool, 10000000));

julia> @benchmark sum(x, skipnull=true)
BenchmarkTools.Trial: 
  samples:          48
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  3.00 mb
  allocs estimate:  163823
  minimum time:     102.94 ms (0.00% GC)
  median time:      103.75 ms (0.00% GC)
  mean time:        104.18 ms (0.38% GC)
  maximum time:     108.24 ms (3.42% GC)

julia> function f{T}(x::NullableArray{T})
           r = Nullable(zero(T))
           @inbounds for el in x
               !isnull(el) && (r += el)
           end
           r
       end
f (generic function with 1 method)

julia> @benchmark f(x)
BenchmarkTools.Trial: 
  samples:          62
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  32.00 bytes
  allocs estimate:  1
  minimum time:     80.41 ms (0.00% GC)
  median time:      80.96 ms (0.00% GC)
  mean time:        80.98 ms (0.00% GC)
  maximum time:     81.57 ms (0.00% GC)
@aaowens
Copy link

aaowens commented Mar 27, 2017

Either this is pretty much fixed in the latest master and 0.6 or it's hardware specific. When I run your example, I get

julia> @benchmark sum(x, skipnull=true)
BenchmarkTools.Trial: 
  memory estimate:  3.00 MiB
  allocs estimate:  163822
  --------------
  minimum time:     73.413 ms (0.00% GC)
  median time:      74.191 ms (0.00% GC)
  mean time:        74.510 ms (0.28% GC)
  maximum time:     77.393 ms (0.00% GC)
  --------------
  samples:          68
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

julia> @benchmark f(x)
BenchmarkTools.Trial: 
  memory estimate:  32 bytes
  allocs estimate:  1
  --------------
  minimum time:     71.422 ms (0.00% GC)
  median time:      71.747 ms (0.00% GC)
  mean time:        71.871 ms (0.00% GC)
  maximum time:     75.335 ms (0.00% GC)
  --------------
  samples:          70
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

However, it is still as you said if the array is mostly not missing values.

julia> bob = NullableArray(rand(10000000), [zeros(Bool, 10000000 - 1 ); true])
10000000-element NullableArrays.NullableArray{Float64,1}:
 0.907998 
 0.0410982
 0.869887 
 0.689121 
 0.347948 
 0.369682 
 0.428783 
 0.613532 
 0.490274 
 0.356395 
 ⋮        
 0.846593 
 0.506439 
 0.645877 
 0.0634062
 0.245713 
 0.644798 
 0.607075 
 0.108837 
 #NULL    

julia> @benchmark sum(bob; skipnull = true)

BenchmarkTools.Trial: 
  memory estimate:  3.00 MiB
  allocs estimate:  163822
  --------------
  minimum time:     34.577 ms (0.00% GC)
  median time:      35.300 ms (0.00% GC)
  mean time:        35.707 ms (0.69% GC)
  maximum time:     42.227 ms (5.81% GC)
  --------------
  samples:          140
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

julia> @benchmark f(bob)
BenchmarkTools.Trial: 
  memory estimate:  32 bytes
  allocs estimate:  1
  --------------
  minimum time:     26.046 ms (0.00% GC)
  median time:      26.156 ms (0.00% GC)
  mean time:        26.830 ms (0.00% GC)
  maximum time:     33.703 ms (0.00% GC)
  --------------
  samples:          186
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

I also tried it where bob had no missing values, and then sum is much faster than the loop. It must be calling a specialized method.

@nalimilan
Copy link
Member Author

Looks like the timings improved in 0.6. I still see a relatively small difference on 0.5.

As you not, the strategy used by _mapreduce_skipnull is still generally less efficient than the manual loop when there are few missing values. It requires going over the input once to check whether there are any nulls (missingdata): this is slow if there are no nulls among the first values. If there are nulls, it counts them, which is quite slow: actually, we don't need the exact count, we just need to know whether there are at last one or two non-null values. Finally, it goes over the input to compute the result in the general case. We should be able to replace the multiple null-checking operations with a single loop.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants