faster circshift! for SparseMatrixCSC #30317

abraunst · 2018-12-08T12:46:46Z

So this is another (much faster) version of the circshift! implementation for SparseMatrixCSC. Unfortunately, the code is much less legible than the two-liner of #30300, but

avoids allocations ~~almost~~ completely
has exactly two mod calls

In its current state, for SparseMatrixCSC matrices, it seems comparable to the dense version for small dense matrices, faster for larger ones and much much faster for both sparse and larger ones.

Question1: is it possible to replace the O .= similar(X) in the first line and just check/assert that allocated memory is compatible (i.e. colptr, rowval and nzval are of the same length and n,m are ==)? I suppose no, because it would be a breaking change. This would avoid a double allocation in the call from circshift. Otherwise, I suppose I could move everything to a helper function and implement circshift instead of relying on the generic one. If this is the way to go, I would replace O .= similar(X) by just allocating the memory (no need to copy colptr and rowval)

EDIT: I realized that this question is kind of stupid... I just resize! colptr, rowval and nzval -- this does not involve allocation if they are already of the right size. It also makes things work when the output has different type than the input -- which didn't work before. I added some tests that would have catched it.

~~Question2: ~~even without O .= similar(X),~~ there seems to be some very small allocation reported by @benchmark... I don't know where it comes from (but I may be missinterpreting).~~

I believe that this is a good improvement wrt to the current situation. The implementation is reasonably straightforward. Thoughts?

Updated benchmarks:

Summary of mean times, raw data here

x	MASTER	BRANCH
sprand(10,10,1.0)	1.211 μs	1.218 μs
sprand(10,10,0.1)	789.932 ns	765.545 ns
sprand(1000,1000,1.0)	71.192 ms	3.904 ms
sprand(1000,1000,0.1)	37.105 ms	196.435 μs
sprand(1000,1000,0.01)	11.689 ms	26.761 μs

Updated benchmarks with some @inbounds, raw data here

x	MASTER	BRANCH
sprand(10,10,1.0)	1.211 μs	1.172 μs
sprand(10,10,0.1)	789.932 ns	725.866 ns
sprand(1000,1000,1.0)	71.192 ms	3.656 ms
sprand(1000,1000,0.1)	37.105 ms	167.782 μs
sprand(1000,1000,0.01)	11.689 ms	20.948 μs

EDIT3: Found the solution to Question2 above: I think it the splat in the call to (dense) circshift! (in multidimensional.jl):

julia> x=rand(10); y=similar(x); @btime circshift!($y,$x,1);
  307.100 ns (3 allocations: 144 bytes)

julia> x=rand(10); y=similar(x); @btime circshift!($y,$x,(1,));
  35.272 ns (0 allocations: 0 bytes)

julia> @btime (1...,);
  263.556 ns (3 allocations: 144 bytes)

I thought that the temporary tuple would have been optimized out... can this be solved there?

For the moment, I will just solve it localy. This gives a huge improvement for small matrices!

Updated benchmarks, raw data here

x	MASTER	BRANCH
sprand(10,10,1.0)	1.211 μs	463.476 ns
sprand(10,10,0.1)	789.932 ns	110.585 ns
sprand(1000,1000,1.0)	71.192 ms	3.669 ms
sprand(1000,1000,0.1)	37.105 ms	171.386 μs
sprand(1000,1000,0.01)	11.689 ms	20.705 μs

mauro3 · 2018-12-08T15:56:55Z

(Always quote code, in particular macros. Above you just pinged the github user "benchmark"!)

abraunst · 2018-12-08T16:05:47Z

(Always quote code, in particular macros. Above you just pinged the github user "benchmark"!)

dang, I'm sorry about that. Just corrected it.

stevengj · 2018-12-17T18:33:29Z

stdlib/SparseArrays/src/sparsematrix.jl

+    r = mod(r, X.m)
+    @inbounds for i=1:O.n
+        subvector_shifter!(O.rowval, O.nzval, O.colptr[i], O.colptr[i+1]-1, O.m, r)
+    end


Skip this loop if iszero(r). Similarly the code above can be replace with a copy if iszero(c).

Thank you @stevengj. I implemented the suggestions, let me know if I interpreted correctly. I also moved subvector_shifter! to sparsevector.jl, it seemed more appropriate (should I prepend the name with _ given that it's a helper, or better not, as it is used also by sparsematrix.jl?).

stevengj · 2018-12-17T18:36:25Z

stdlib/SparseArrays/test/sparse.jl

+    for i=1:20
+        m,n = 17,15
+        A = sprand(m, n, rand())
+        shifts = rand(-m:m), rand(-n:n)


I would make this deterministic to make sure we always exercise the corner cases.

for rshift in (-1, 0, 1, 10), cshift in (-1, 0, 1, 10) shifts = (rshift, cshift)

and have a separate loop for the sparse vector case.

Thanks @stevengj. I've done it. I moved the sparse vector tests to test/sparsevector.jl (which already existed).

stevengj · 2018-12-21T16:34:12Z

stdlib/SparseArrays/src/sparsevector.jl

@@ -2008,7 +2008,7 @@ end


 function circshift!(O::SparseVector, X::SparseVector, (r,)::Base.DimsInteger{1})
-    copy!(O, X)
+    O .= X


Is there a bug in copy! for this case?

Yes, I opened a separate issue #30443

ViralBShah · 2018-12-22T21:55:21Z

@stevengj Can you merge it when you think it is ready?

stevengj · 2018-12-24T05:55:16Z

Are the 32-bit appveyor failures unrelated?

abraunst · 2018-12-24T09:42:05Z

Are the 32-bit appveyor failures unrelated?

Seem unrelated to me. OTOH, I have no idea of what they are :-)

stevengj · 2018-12-24T13:40:41Z

Maybe rebase and force-push to see if the appveyor problem is something that was recently fixed.

…parseVector

…better tests

…atrices

…ng helpers to sparsevector.jl

…nto sparsevector.jl

abraunst · 2018-12-24T15:09:40Z

Maybe rebase and force-push to see if the appveyor problem is something that was recently fixed.

Sure. Should I squash everything into a single commit while I am it?

stevengj · 2018-12-25T13:07:02Z

No need to squash — github allows us to squash when merging. Thanks!

@inbounds

* implement circshift! for SparseMatrixCSC * factor helper function shifter!, implement efficient circshift! for SparseVector * add some @inbounds for improved performance * remove allocations completely, giving a large improvement for small matrices * some renaming to avoid polluting the module namespace * remove useless reallocation and fix bug with different in/out types, better tests * avoid action if iszero(r) and/or iszero(c), move sparse vector shifting helpers to sparsevector.jl * Make shift amounts deterministic in tests, move sparse vector tests into sparsevector.jl * comment fix * for some reason, copy!(a::SparseVector, b::SparseVector) does not work

@inbounds

* implement circshift! for SparseMatrixCSC * factor helper function shifter!, implement efficient circshift! for SparseVector * add some @inbounds for improved performance * remove allocations completely, giving a large improvement for small matrices * some renaming to avoid polluting the module namespace * remove useless reallocation and fix bug with different in/out types, better tests * avoid action if iszero(r) and/or iszero(c), move sparse vector shifting helpers to sparsevector.jl * Make shift amounts deterministic in tests, move sparse vector tests into sparsevector.jl * comment fix * for some reason, copy!(a::SparseVector, b::SparseVector) does not work

@inbounds

* implement circshift! for SparseMatrixCSC * factor helper function shifter!, implement efficient circshift! for SparseVector * add some @inbounds for improved performance * remove allocations completely, giving a large improvement for small matrices * some renaming to avoid polluting the module namespace * remove useless reallocation and fix bug with different in/out types, better tests * avoid action if iszero(r) and/or iszero(c), move sparse vector shifting helpers to sparsevector.jl * Make shift amounts deterministic in tests, move sparse vector tests into sparsevector.jl * comment fix * for some reason, copy!(a::SparseVector, b::SparseVector) does not work (cherry picked from commit 94993e9)

abraunst mentioned this pull request Dec 8, 2018

implement circshift and circshift! for SparseMatrixCSC #30300

Closed

abraunst changed the title ~~implement circshift! for SparseMatrixCSC~~ [WIP] implement circshift! for SparseMatrixCSC Dec 9, 2018

abraunst changed the title ~~[WIP] implement circshift! for SparseMatrixCSC~~ Implement circshift! for SparseMatrixCSC Dec 9, 2018

abraunst mentioned this pull request Dec 10, 2018

slow unidimensional circshift on dense (small) vectors #30336

Closed

abraunst force-pushed the circshift2 branch 2 times, most recently from 30cbcab to 0fa85f1 Compare December 12, 2018 17:08

kshyatt added the sparse Sparse arrays label Dec 12, 2018

abraunst changed the title ~~Implement circshift! for SparseMatrixCSC~~ faster circshift! for SparseMatrixCSC Dec 14, 2018

stevengj reviewed Dec 17, 2018

View reviewed changes

stevengj reviewed Dec 21, 2018

View reviewed changes

stevengj approved these changes Dec 21, 2018

View reviewed changes

abraunst added 10 commits December 24, 2018 16:07

implement circshift! for SparseMatrixCSC

1815a13

factor helper function shifter!, implement efficient circshift! for S…

cf8cd90

…parseVector

remove useless reallocation and fix bug with different in/out types, …

9cc9583

…better tests

add some @inbounds for improved performance

31d54c2

remove allocations completely, giving a large improvement for small m…

04485c5

…atrices

some renaming to avoid polluting the module namespace

17cd9f9

avoid action if iszero(r) and/or iszero(c), move sparse vector shifti…

18968ce

…ng helpers to sparsevector.jl

Make shift amounts deterministic in tests, move sparse vector tests i…

0ae61fc

…nto sparsevector.jl

comment fix

288d2ef

for some reason, copy!(a::SparseVector, b::SparseVector) does not work

0c8d70e

abraunst force-pushed the circshift2 branch from 228f055 to 0c8d70e Compare December 24, 2018 15:43

stevengj merged commit 94993e9 into JuliaLang:master Dec 25, 2018

abraunst deleted the circshift2 branch December 25, 2018 15:43

ViralBShah added backport pending 1.0 performance Must go faster labels Dec 30, 2018

KristofferC mentioned this pull request Jan 11, 2019

Backports for 1.0.4 #30536

Closed

53 tasks

StefanKarpinski added triage This should be discussed on a triage call backport 1.0 and removed triage This should be discussed on a triage call labels Jan 31, 2019

JeffBezanson removed backport 1.0 triage This should be discussed on a triage call labels Jan 31, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

faster circshift! for SparseMatrixCSC #30317

faster circshift! for SparseMatrixCSC #30317

abraunst commented Dec 8, 2018 •

edited

Loading

mauro3 commented Dec 8, 2018

abraunst commented Dec 8, 2018

stevengj Dec 17, 2018

abraunst Dec 18, 2018

stevengj Dec 17, 2018 •

edited

Loading

abraunst Dec 18, 2018

stevengj Dec 21, 2018 •

edited

Loading

abraunst Dec 21, 2018

ViralBShah commented Dec 22, 2018

stevengj commented Dec 24, 2018

abraunst commented Dec 24, 2018

stevengj commented Dec 24, 2018

abraunst commented Dec 24, 2018 •

edited

Loading

stevengj commented Dec 25, 2018

faster circshift! for SparseMatrixCSC #30317

faster circshift! for SparseMatrixCSC #30317

Conversation

abraunst commented Dec 8, 2018 • edited Loading

mauro3 commented Dec 8, 2018

abraunst commented Dec 8, 2018

stevengj Dec 17, 2018

Choose a reason for hiding this comment

abraunst Dec 18, 2018

Choose a reason for hiding this comment

stevengj Dec 17, 2018 • edited Loading

Choose a reason for hiding this comment

abraunst Dec 18, 2018

Choose a reason for hiding this comment

stevengj Dec 21, 2018 • edited Loading

Choose a reason for hiding this comment

abraunst Dec 21, 2018

Choose a reason for hiding this comment

ViralBShah commented Dec 22, 2018

stevengj commented Dec 24, 2018

abraunst commented Dec 24, 2018

stevengj commented Dec 24, 2018

abraunst commented Dec 24, 2018 • edited Loading

stevengj commented Dec 25, 2018

abraunst commented Dec 8, 2018 •

edited

Loading

stevengj Dec 17, 2018 •

edited

Loading

stevengj Dec 21, 2018 •

edited

Loading

abraunst commented Dec 24, 2018 •

edited

Loading