Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement circshift and circshift! for SparseMatrixCSC #30300

Closed
wants to merge 1 commit into from

Conversation

abraunst
Copy link
Contributor

@abraunst abraunst commented Dec 7, 2018

This implements circshift and circshift! for the SparseMatrixCSC type (otherwise it falls back to the generic, suboptimal method). I added a couple of tests.

https://discourse.julialang.org/t/circshift-of-a-sparsematrix/18420

@fredrikekre
Copy link
Member

Do you have any benchmarks to share?

@fredrikekre fredrikekre added performance Must go faster sparse Sparse arrays labels Dec 7, 2018
function circshift!(O::SparseMatrixCSC, X::SparseMatrixCSC, (r,c)::Base.DimsInteger{2})
I, J, V = findnz(X)
O .= sparse(mod1.(I .+ r, X.n), mod1.(J .+ c, X.m), V, X.n, X.m)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor style thing: I'd like to see return O here.

@mbauman
Copy link
Member

mbauman commented Dec 7, 2018

This is great, thank you! Would you be willing to also implement circshift for SparseVector? It should be similarly straightforward.

I'm sure benchmarks are impressive.

@abraunst
Copy link
Contributor Author

abraunst commented Dec 7, 2018

This is great, thank you! Would you be willing to also implement circshift for SparseVector? It should be similarly straightforward.

Sure!


function circshift!(O::SparseMatrixCSC, X::SparseMatrixCSC, (r,c)::Base.DimsInteger{2})
I, J, V = findnz(X)
O .= sparse(mod1.(I .+ r, X.n), mod1.(J .+ c, X.m), V, X.n, X.m)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this X.m, X.n?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I've been sloppy and didn't notice because I used square matrices. I modified the code and tests to catch this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Make sure the non-square case has test coverage.)

@stevengj
Copy link
Member

stevengj commented Dec 7, 2018

The approach in this PR allocates 9 temporary arrays of length O(nnz), so it is wasting a factor of of 4–5 in memory. (3 for the findnz output, 2 more for the mod1 outputs, 2 more temporary arrays inside sparse, and 2 for the output of sparse.)

It would be much more efficient to operate in-place directly on the CSC output, so that no temporary O(nnz) arrays are required. Even if X === O (is that allowed?), you should be able to do that relatively simply in two passes: one to re-order the rows, and one to re-order the columns.

It also kind of defeats the purpose of circshift! if it is equivalent to O .= circshift(X, (r,c)).

@abraunst
Copy link
Contributor Author

abraunst commented Dec 7, 2018

Do you have any benchmarks to share?

Here are some quick benchmarks.

I'm using

julia> function mycircshift!(O, X::SparseMatrixCSC, (r,c)::Base.DimsInteger{2})
           I, J, V = findnz(X)
           O .= sparse(mod1.(I .+ r, X.n), mod1.(J .+ c, X.m), V, X.n, X.m)
           return O
       end
mycircshift! (generic function with 1 method)

First an NxN matrix with N=10:



julia> A=sprand(10,10,0.1); O1=similar(A); O2=similar(A);

julia> @benchmark circshift!(O1,A,(7,5))
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     2.046 μs (0.00% GC)
  median time:      2.078 μs (0.00% GC)
  mean time:        2.130 μs (0.00% GC)
  maximum time:     6.731 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     9

julia> @benchmark mycircshift!(O1,A,(7,5))
BenchmarkTools.Trial: 
  memory estimate:  1.67 KiB
  allocs estimate:  15
  --------------
  minimum time:     1.204 μs (0.00% GC)
  median time:      1.271 μs (0.00% GC)
  mean time:        2.407 μs (35.97% GC)
  maximum time:     6.339 ms (99.95% GC)
  --------------
  samples:          10000
  evals/sample:     10

For larger size:


A=sprand(1000,1000,0.1); O1=similar(A); O2=similar(A);

ulia> @benchmark circshift!(O1,A,(7,5))
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     88.042 ms (0.00% GC)
  median time:      88.259 ms (0.00% GC)
  mean time:        89.128 ms (0.00% GC)
  maximum time:     94.913 ms (0.00% GC)
  --------------
  samples:          57
  evals/sample:     1

julia> O2=similar(A);

julia> @benchmark mycircshift!(O2,A,(7,5))
BenchmarkTools.Trial: 
  memory estimate:  6.90 MiB
  allocs estimate:  26
  --------------
  minimum time:     6.434 ms (0.00% GC)
  median time:      6.820 ms (0.00% GC)
  mean time:        7.263 ms (3.55% GC)
  maximum time:     70.468 ms (88.76% GC)
  --------------
  samples:          687
  evals/sample:     1

For a large, but sparser matrix:

julia> A=sprand(1000,1000,0.01); O1=similar(A); O2=similar(A);

julia> @benchmark circshift!(O1,A,(7,5))
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     33.831 ms (0.00% GC)
  median time:      33.910 ms (0.00% GC)
  mean time:        34.206 ms (0.00% GC)
  maximum time:     46.266 ms (0.00% GC)
  --------------
  samples:          147
  evals/sample:     1

julia> @benchmark mycircshift!(O1,A,(7,5))
BenchmarkTools.Trial: 
  memory estimate:  742.06 KiB
  allocs estimate:  26
  --------------
  minimum time:     630.000 μs (0.00% GC)
  median time:      667.545 μs (0.00% GC)
  mean time:        702.915 μs (3.42% GC)
  maximum time:     66.595 ms (98.96% GC)
  --------------
  samples:          7069
  evals/sample:     1

So the dense version does not allocate, but still the sparse version seems more convenient for large matrices and gets better with sparser matrices.

I think that one can get rid of the allocations but the code would not be so simple. I can try that if you think that its important.

@mbauman
Copy link
Member

mbauman commented Dec 7, 2018

Let's not let perfect be the enemy of the good here. Sure, this could be improved further, but:

  • It improves the return type
  • It's O(nnz) instead of (m*n).

Sure, the multiplier on the O(nnz) is a little high, and this might actually be a slight perf regression for completely full SparseMatrixCSCs, but it's a huge improvement for the most common use-cases for sparse matrices.

@abraunst
Copy link
Contributor Author

abraunst commented Dec 7, 2018

The approach in this PR allocates 9 arrays of length O(nnz), rather than the 2 that are required for the output, so it is wasting a factor of more than 4 in memory.

It would be much more efficient to operate directly on the CSC input and directly write the CSC output, so that no temporary O(nnz) arrays are required.

You're right... I will see how it comes and eventually make another PR.

@abraunst
Copy link
Contributor Author

abraunst commented Dec 8, 2018

The approach in this PR allocates 9 arrays of length O(nnz), rather than the 2 that are required for the output, so it is wasting a factor of more than 4 in memory.
It would be much more efficient to operate directly on the CSC input and directly write the CSC output, so that no temporary O(nnz) arrays are required.

You're right... I will see how it comes and eventually make another PR.

done in #30317

@abraunst abraunst force-pushed the circshift branch 2 times, most recently from 6e22a8b to 6f890ac Compare December 11, 2018 21:05
@ViralBShah
Copy link
Member

Bump. Is this good to merge?

@ViralBShah
Copy link
Member

Should this be closed in favour of #30317?

@abraunst
Copy link
Contributor Author

Should this be closed in favour of #30317?

I suppose that this question is not directed to me. In my opinion #30317 is uniformly better than master performance-wise, but of course, it's harder to review. This PR is a bit slower than master for small dense matrices. If there is something I can do to help the review process of #30317, please let me know!

@stevengj stevengj closed this Dec 21, 2018
@abraunst abraunst deleted the circshift branch December 23, 2018 09:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster sparse Sparse arrays
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants