-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
faster circshift! for SparseMatrixCSC #30317
Conversation
(Always quote code, in particular macros. Above you just pinged the github user "benchmark"!) |
dang, I'm sorry about that. Just corrected it. |
30cbcab
to
0fa85f1
Compare
r = mod(r, X.m) | ||
@inbounds for i=1:O.n | ||
subvector_shifter!(O.rowval, O.nzval, O.colptr[i], O.colptr[i+1]-1, O.m, r) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skip this loop if iszero(r)
. Similarly the code above can be replace with a copy if iszero(c)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @stevengj. I implemented the suggestions, let me know if I interpreted correctly. I also moved subvector_shifter! to sparsevector.jl, it seemed more appropriate (should I prepend the name with _ given that it's a helper, or better not, as it is used also by sparsematrix.jl?).
stdlib/SparseArrays/test/sparse.jl
Outdated
for i=1:20 | ||
m,n = 17,15 | ||
A = sprand(m, n, rand()) | ||
shifts = rand(-m:m), rand(-n:n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make this deterministic to make sure we always exercise the corner cases.
for rshift in (-1, 0, 1, 10), cshift in (-1, 0, 1, 10)
shifts = (rshift, cshift)
and have a separate loop for the sparse vector case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @stevengj. I've done it. I moved the sparse vector tests to test/sparsevector.jl (which already existed).
@@ -2008,7 +2008,7 @@ end | |||
|
|||
|
|||
function circshift!(O::SparseVector, X::SparseVector, (r,)::Base.DimsInteger{1}) | |||
copy!(O, X) | |||
O .= X |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a bug in copy!
for this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I opened a separate issue #30443
@stevengj Can you merge it when you think it is ready? |
Are the 32-bit appveyor failures unrelated? |
Seem unrelated to me. OTOH, I have no idea of what they are :-) |
Maybe rebase and force-push to see if the appveyor problem is something that was recently fixed. |
…ng helpers to sparsevector.jl
…nto sparsevector.jl
Sure. Should I squash everything into a single commit while I am it? |
No need to squash — github allows us to squash when merging. Thanks! |
* implement circshift! for SparseMatrixCSC * factor helper function shifter!, implement efficient circshift! for SparseVector * add some @inbounds for improved performance * remove allocations completely, giving a large improvement for small matrices * some renaming to avoid polluting the module namespace * remove useless reallocation and fix bug with different in/out types, better tests * avoid action if iszero(r) and/or iszero(c), move sparse vector shifting helpers to sparsevector.jl * Make shift amounts deterministic in tests, move sparse vector tests into sparsevector.jl * comment fix * for some reason, copy!(a::SparseVector, b::SparseVector) does not work
* implement circshift! for SparseMatrixCSC * factor helper function shifter!, implement efficient circshift! for SparseVector * add some @inbounds for improved performance * remove allocations completely, giving a large improvement for small matrices * some renaming to avoid polluting the module namespace * remove useless reallocation and fix bug with different in/out types, better tests * avoid action if iszero(r) and/or iszero(c), move sparse vector shifting helpers to sparsevector.jl * Make shift amounts deterministic in tests, move sparse vector tests into sparsevector.jl * comment fix * for some reason, copy!(a::SparseVector, b::SparseVector) does not work
* implement circshift! for SparseMatrixCSC * factor helper function shifter!, implement efficient circshift! for SparseVector * add some @inbounds for improved performance * remove allocations completely, giving a large improvement for small matrices * some renaming to avoid polluting the module namespace * remove useless reallocation and fix bug with different in/out types, better tests * avoid action if iszero(r) and/or iszero(c), move sparse vector shifting helpers to sparsevector.jl * Make shift amounts deterministic in tests, move sparse vector tests into sparsevector.jl * comment fix * for some reason, copy!(a::SparseVector, b::SparseVector) does not work (cherry picked from commit 94993e9)
So this is another (much faster) version of the circshift! implementation for SparseMatrixCSC. Unfortunately, the code is much less legible than the two-liner of #30300, but
almostcompletelymod
callsIn its current state, for SparseMatrixCSC matrices, it seems comparable to the dense version for small dense matrices, faster for larger ones and much much faster for both sparse and larger ones.
Question1: is it possible to replace theO .= similar(X)
in the first line and just check/assert that allocated memory is compatible (i.e. colptr, rowval and nzval are of the same length and n,m are ==)? I suppose no, because it would be a breaking change. This would avoid a double allocation in the call from circshift. Otherwise, I suppose I could move everything to a helper function and implement circshift instead of relying on the generic one. If this is the way to go, I would replaceO .= similar(X)
by just allocating the memory (no need to copy colptr and rowval)EDIT: I realized that this question is kind of stupid... I just
resize!
colptr, rowval and nzval -- this does not involve allocation if they are already of the right size. It also makes things work when the output has different type than the input -- which didn't work before. I added some tests that would have catched it.Question2:even withoutthere seems to be some very small allocation reported byO .= similar(X)
,@benchmark
... I don't know where it comes from (but I may be missinterpreting).I believe that this is a good improvement wrt to the current situation. The implementation is reasonably straightforward. Thoughts?
Updated benchmarks:
Summary of mean times, raw data here
Updated benchmarks with some
@inbounds
, raw data hereEDIT3: Found the solution to Question2 above: I think it the splat in the call to (dense)
circshift!
(in multidimensional.jl):I thought that the temporary tuple would have been optimized out... can this be solved there?
For the moment, I will just solve it localy. This gives a huge improvement for small matrices!
Updated benchmarks, raw data here