Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: use pairwise summation for sum, cumsum, and cumprod #4039

Merged
merged 3 commits into from
Aug 13, 2013

Conversation

stevengj
Copy link
Member

This patch modifies the sum and cumsum functions (and cumprod) to use pairwise summation for summing AbstractArrays, in order to achieve much better accuracy at a negligible computational cost.

Pairwise summation recursively divides the array in half, sums the halves recursively, and then adds the two sums. As long as the base case is large enough (here, n=128 seemed to suffice), the overhead of the recursion is negligible compared to naive summation (a simple loop). The advantage of this algorithm is that it achieves O(sqrt(log n)) mean error growth, versus O(sqrt(n)) for naive summation, which is almost indistinguishable from the O(1) error growth of Kahan compensated summation.

For example:

A = rand(10^8)
es = sum_kbn(A)
(oldsum(A) - es)/es, (newsum(A) - es)/es

where oldsum and newsum are the old and new implementations, respectively, gives (-1.2233732622777777e-13,0.0) on my machine in a typical run: the old sum loses three significant digits, whereas the new sum loses none. On the other hand, their execution time is nearly identical:

@time oldsum(A);
@time newsum(A);

gives

elapsed time: 0.116988841 seconds (105816 bytes allocated)
elapsed time: 0.110502884 seconds (64 bytes allocated)

(The difference is within the noise.)

@JeffBezanson and @StefanKarpinski, I think I mentioned this possibility to you at Berkeley.

@stevengj
Copy link
Member Author

(We could use the old definitions for AbstractArray{T<:Integer}, as pairwise summation has no accuracy advantage there, although it doesn't really hurt either.)

@GunnarFarneback
Copy link
Contributor

If you do image processing with single precision, this is an enormously big deal. E.g.

julia> mean(fill(1.5f0, 3000, 4000))
1.766983f0

(Example chosen for its striking effect. Unfortunately the mean implementation is disjoint from the sum implementation that this patch modifies.)

@stevengj
Copy link
Member Author

@GunnarFarneback, yes, it is a pain to modify all of the different sum variants, although it would be possible in principle. But it would help with the case you mentioned:

julia> newsum(fill!(Array(Float32, 3000*4000), 1.5f0)) / (3000*4000) 
1.5f0

@stevengj
Copy link
Member Author

Of course, in single precision, the right thing is to just do summation and other accumulation in double precision, rounding back to single precision at the end.

@GunnarFarneback
Copy link
Contributor

To clarify my example was meant show the insufficiency of the current state of summing. That pairwise summation is an effective solution I considered as obvious, so I'm all in favor of this patch. That mean and some other functions fail to take advantage of it is a separate issue.

@GunnarFarneback
Copy link
Contributor

The right thing with respect to precision, not necessarily with respect to speed.

@StefanKarpinski
Copy link
Member

I'm good with this as a first step and think we should, as @GunnarFarneback points out, integrate even further so that mean and other stats functions also use pairwise summation.

@stevengj
Copy link
Member Author

Updated to use pairwise summation for mean, var, and varm as well. At least when looking at the whole array; for sums etc. along a subset of the dimensions of a larger array, we need a pairwise variant of reducedim (for associative operations).

The variance computation is also more efficient now because (at least when operating on the whole array) it no longer constructs a temporary array of the same size. (Would be even better if abs2 were inlined, but that will be easy once something like #3796 lands.)

Also, I noticed that var was buggy for complex matrices, since it used x.*x instead of abs2(x). Our varm function used dot, which did the complex conjugation, but it returned a complex number with zero imaginary part instead of a real number. Now these both should work and return a real value; I added a test case.

@stevengj
Copy link
Member Author

Also added an associative variant of mapreduce which uses pairwise reduction.

_Question:_ Although reduce and mapreduce currently do left-associative reduction ("fold left"), this isn't documented in the manual. Is this intentional, i.e. are these operations supposed unconstrained in their association ordering? (The docs should really be explicit on this point.) If so, then we don't need a separate mapreduce_associative function, right?

@StefanKarpinski
Copy link
Member

Another approach would be to have a Associativity type that can be passed into reduce and mapreduce as a keyword (defaulting to pairwise?) and then we can specialize the internals of the reductions on that type, which will minimize the overhead. I.e. similar to the sorting infrastructure.

@pao
Copy link
Member

pao commented Aug 13, 2013

Although reduce and mapreduce currently do left-associative reduction ("fold left"), this isn't documented in the manual. Is this intentional, i.e. are these operations supposed unconstrained in their association ordering? (The docs should really be explicit on this point.)

That would be a good idea--for instance, https://github.com/pao/Monads.jl/blob/master/src/Monads.jl#L54 relies on the fold direction.

@stevengj
Copy link
Member Author

The only sensible associativities to have in Base are left, right, and unspecified. A Pairwise associativity makes no sense because we don't even want to implement a fully pairwise case (because, for performance, the base case should be a simple left-associative loop) and we want to be free to change the details as needed. Note also that for parallel operations you really want to have unspecified associativity so that more efficient tree algorithms can be used.

My suggestion would be that mapreduce and reduce should be explicitly documented as having implementation-dependent associativity, and then to have separate reduce_left and mapreduce_left functions which enforce left-associativity. (And maybe right-associative variants? But those can't easily be implemented for iterators, and I'm not sure how useful they are; I would just punt on those until there is demand.) Since there are only three useful cases, defining a type for this doesn't make sense to me. But probably this should be a separate issue.

@stevengj
Copy link
Member Author

@StefanKarpinski, should I go ahead and merge this?

@ViralBShah
Copy link
Member

This is great, and I would love to see this merged. Also, we should probably not export sum_kbn anymore, given this patch.

@stevengj
Copy link
Member Author

sum_kbn is still more accurate in some cases. e.g. sum([1 1e-100 -1]) still gives 0.0, but sum_kbn gives 1.0e-100. But I agree that the need for sum_kbn is greatly reduced.

@StefanKarpinski
Copy link
Member

I'm good with merging this. @JeffBezanson?

JeffBezanson added a commit that referenced this pull request Aug 13, 2013
RFC: use pairwise summation for sum, cumsum, and cumprod
@JeffBezanson JeffBezanson merged commit c8f89d1 into JuliaLang:master Aug 13, 2013
stevengj added a commit that referenced this pull request Aug 13, 2013
@timholy
Copy link
Member

timholy commented Aug 14, 2013

This is great. I had played with my own variant of this idea, breaking it up into blocks of size sqrt(N), but for small arrays the sqrt was a performance-killer. This is much better, thanks for contributing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants