implement circshift and circshift! for SparseMatrixCSC #30300

abraunst · 2018-12-07T16:39:24Z

This implements circshift and circshift! for the SparseMatrixCSC type (otherwise it falls back to the generic, suboptimal method). I added a couple of tests.

https://discourse.julialang.org/t/circshift-of-a-sparsematrix/18420

fredrikekre · 2018-12-07T17:33:47Z

Do you have any benchmarks to share?

mbauman · 2018-12-07T17:49:09Z

stdlib/SparseArrays/src/sparsematrix.jl

+function circshift!(O::SparseMatrixCSC, X::SparseMatrixCSC, (r,c)::Base.DimsInteger{2})
+    I, J, V = findnz(X)
+    O .= sparse(mod1.(I .+ r, X.n), mod1.(J .+ c, X.m), V, X.n, X.m)
+end


Very minor style thing: I'd like to see return O here.

mbauman · 2018-12-07T17:49:45Z

This is great, thank you! Would you be willing to also implement circshift for SparseVector? It should be similarly straightforward.

I'm sure benchmarks are impressive.

abraunst · 2018-12-07T18:01:43Z

This is great, thank you! Would you be willing to also implement circshift for SparseVector? It should be similarly straightforward.

Sure!

stevengj · 2018-12-07T18:34:13Z

stdlib/SparseArrays/src/sparsematrix.jl

+
+function circshift!(O::SparseMatrixCSC, X::SparseMatrixCSC, (r,c)::Base.DimsInteger{2})
+    I, J, V = findnz(X)
+    O .= sparse(mod1.(I .+ r, X.n), mod1.(J .+ c, X.m), V, X.n, X.m)


Why isn't this X.m, X.n?

Sorry, I've been sloppy and didn't notice because I used square matrices. I modified the code and tests to catch this.

(Make sure the non-square case has test coverage.)

stevengj · 2018-12-07T18:40:44Z

The approach in this PR allocates 9 temporary arrays of length O(nnz), so it is wasting a factor of of 4–5 in memory. (3 for the findnz output, 2 more for the mod1 outputs, 2 more temporary arrays inside sparse, and 2 for the output of sparse.)

It would be much more efficient to operate in-place directly on the CSC output, so that no temporary O(nnz) arrays are required. Even if X === O (is that allowed?), you should be able to do that relatively simply in two passes: one to re-order the rows, and one to re-order the columns.

It also kind of defeats the purpose of circshift! if it is equivalent to O .= circshift(X, (r,c)).

abraunst · 2018-12-07T18:41:33Z

Do you have any benchmarks to share?

Here are some quick benchmarks.

I'm using

julia> function mycircshift!(O, X::SparseMatrixCSC, (r,c)::Base.DimsInteger{2})
           I, J, V = findnz(X)
           O .= sparse(mod1.(I .+ r, X.n), mod1.(J .+ c, X.m), V, X.n, X.m)
           return O
       end
mycircshift! (generic function with 1 method)

First an NxN matrix with N=10:



julia> A=sprand(10,10,0.1); O1=similar(A); O2=similar(A);

julia> @benchmark circshift!(O1,A,(7,5))
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     2.046 μs (0.00% GC)
  median time:      2.078 μs (0.00% GC)
  mean time:        2.130 μs (0.00% GC)
  maximum time:     6.731 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     9

julia> @benchmark mycircshift!(O1,A,(7,5))
BenchmarkTools.Trial: 
  memory estimate:  1.67 KiB
  allocs estimate:  15
  --------------
  minimum time:     1.204 μs (0.00% GC)
  median time:      1.271 μs (0.00% GC)
  mean time:        2.407 μs (35.97% GC)
  maximum time:     6.339 ms (99.95% GC)
  --------------
  samples:          10000
  evals/sample:     10

For larger size:


A=sprand(1000,1000,0.1); O1=similar(A); O2=similar(A);

ulia> @benchmark circshift!(O1,A,(7,5))
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     88.042 ms (0.00% GC)
  median time:      88.259 ms (0.00% GC)
  mean time:        89.128 ms (0.00% GC)
  maximum time:     94.913 ms (0.00% GC)
  --------------
  samples:          57
  evals/sample:     1

julia> O2=similar(A);

julia> @benchmark mycircshift!(O2,A,(7,5))
BenchmarkTools.Trial: 
  memory estimate:  6.90 MiB
  allocs estimate:  26
  --------------
  minimum time:     6.434 ms (0.00% GC)
  median time:      6.820 ms (0.00% GC)
  mean time:        7.263 ms (3.55% GC)
  maximum time:     70.468 ms (88.76% GC)
  --------------
  samples:          687
  evals/sample:     1

For a large, but sparser matrix:

julia> A=sprand(1000,1000,0.01); O1=similar(A); O2=similar(A);

julia> @benchmark circshift!(O1,A,(7,5))
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     33.831 ms (0.00% GC)
  median time:      33.910 ms (0.00% GC)
  mean time:        34.206 ms (0.00% GC)
  maximum time:     46.266 ms (0.00% GC)
  --------------
  samples:          147
  evals/sample:     1

julia> @benchmark mycircshift!(O1,A,(7,5))
BenchmarkTools.Trial: 
  memory estimate:  742.06 KiB
  allocs estimate:  26
  --------------
  minimum time:     630.000 μs (0.00% GC)
  median time:      667.545 μs (0.00% GC)
  mean time:        702.915 μs (3.42% GC)
  maximum time:     66.595 ms (98.96% GC)
  --------------
  samples:          7069
  evals/sample:     1

So the dense version does not allocate, but still the sparse version seems more convenient for large matrices and gets better with sparser matrices.

I think that one can get rid of the allocations but the code would not be so simple. I can try that if you think that its important.

mbauman · 2018-12-07T18:51:55Z

Let's not let perfect be the enemy of the good here. Sure, this could be improved further, but:

It improves the return type
It's O(nnz) instead of (m*n).

Sure, the multiplier on the O(nnz) is a little high, and this might actually be a slight perf regression for completely full SparseMatrixCSCs, but it's a huge improvement for the most common use-cases for sparse matrices.

abraunst · 2018-12-07T18:52:26Z

The approach in this PR allocates 9 arrays of length O(nnz), rather than the 2 that are required for the output, so it is wasting a factor of more than 4 in memory.

It would be much more efficient to operate directly on the CSC input and directly write the CSC output, so that no temporary O(nnz) arrays are required.

You're right... I will see how it comes and eventually make another PR.

abraunst · 2018-12-08T13:50:36Z

The approach in this PR allocates 9 arrays of length O(nnz), rather than the 2 that are required for the output, so it is wasting a factor of more than 4 in memory.
It would be much more efficient to operate directly on the CSC input and directly write the CSC output, so that no temporary O(nnz) arrays are required.

You're right... I will see how it comes and eventually make another PR.

done in #30317

ViralBShah · 2018-12-17T01:45:23Z

Bump. Is this good to merge?

ViralBShah · 2018-12-17T01:46:15Z

Should this be closed in favour of #30317?

abraunst · 2018-12-18T07:08:32Z

Should this be closed in favour of #30317?

I suppose that this question is not directed to me. In my opinion #30317 is uniformly better than master performance-wise, but of course, it's harder to review. This PR is a bit slower than master for small dense matrices. If there is something I can do to help the review process of #30317, please let me know!

fredrikekre added performance Must go faster sparse Sparse arrays labels Dec 7, 2018

mbauman reviewed Dec 7, 2018

View reviewed changes

stevengj reviewed Dec 7, 2018

View reviewed changes

abraunst mentioned this pull request Dec 8, 2018

faster circshift! for SparseMatrixCSC #30317

Merged

abraunst force-pushed the circshift branch 2 times, most recently from 6e22a8b to 6f890ac Compare December 11, 2018 21:05

implement circshift! for SparseMatrixCS

b48e876

abraunst force-pushed the circshift branch from 6f890ac to b48e876 Compare December 13, 2018 07:56

stevengj closed this Dec 21, 2018

abraunst deleted the circshift branch December 23, 2018 09:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement circshift and circshift! for SparseMatrixCSC #30300

implement circshift and circshift! for SparseMatrixCSC #30300

abraunst commented Dec 7, 2018

fredrikekre commented Dec 7, 2018

mbauman Dec 7, 2018

mbauman commented Dec 7, 2018 •

edited

Loading

abraunst commented Dec 7, 2018

stevengj Dec 7, 2018

abraunst Dec 7, 2018

stevengj Dec 7, 2018

stevengj commented Dec 7, 2018 •

edited

Loading

abraunst commented Dec 7, 2018

mbauman commented Dec 7, 2018

abraunst commented Dec 7, 2018

abraunst commented Dec 8, 2018

ViralBShah commented Dec 17, 2018

ViralBShah commented Dec 17, 2018

abraunst commented Dec 18, 2018

implement circshift and circshift! for SparseMatrixCSC #30300

implement circshift and circshift! for SparseMatrixCSC #30300

Conversation

abraunst commented Dec 7, 2018

fredrikekre commented Dec 7, 2018

mbauman Dec 7, 2018

Choose a reason for hiding this comment

mbauman commented Dec 7, 2018 • edited Loading

abraunst commented Dec 7, 2018

stevengj Dec 7, 2018

Choose a reason for hiding this comment

abraunst Dec 7, 2018

Choose a reason for hiding this comment

stevengj Dec 7, 2018

Choose a reason for hiding this comment

stevengj commented Dec 7, 2018 • edited Loading

abraunst commented Dec 7, 2018

mbauman commented Dec 7, 2018

abraunst commented Dec 7, 2018

abraunst commented Dec 8, 2018

ViralBShah commented Dec 17, 2018

ViralBShah commented Dec 17, 2018

abraunst commented Dec 18, 2018

mbauman commented Dec 7, 2018 •

edited

Loading

stevengj commented Dec 7, 2018 •

edited

Loading