Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added 3- and 5- argument LinearAlgebra.mul! and copy functions #37

Merged
merged 10 commits into from
Mar 7, 2020
Merged

Added 3- and 5- argument LinearAlgebra.mul! and copy functions #37

merged 10 commits into from
Mar 7, 2020

Conversation

matteoacrossi
Copy link
Contributor

No description provided.

@codecov
Copy link

codecov bot commented Mar 5, 2020

Codecov Report

Merging #37 into master will increase coverage by 0.67%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #37      +/-   ##
==========================================
+ Coverage    94.9%   95.58%   +0.67%     
==========================================
  Files           4        4              
  Lines         157      181      +24     
==========================================
+ Hits          149      173      +24     
  Misses          8        8
Impacted Files Coverage Δ
src/blockdiagonal.jl 83.72% <100%> (+2.63%) ⬆️
src/linalg.jl 98.03% <100%> (+0.6%) ⬆️
src/base_maths.jl 100% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 234a99c...ab7f8fe. Read the comment docs.

Copy link
Contributor

@nickrobinson251 nickrobinson251 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this!

Please can you post what the current behaviour of mul! with 3 BlockDiagonals is? Does it error, get the wrong results, or is it just not as performant as possible?
If the last one, please can you post some benchmarks indicative of the speed up from the change?

src/blockdiagonal.jl Outdated Show resolved Hide resolved
src/blockdiagonal.jl Outdated Show resolved Hide resolved
src/linalg.jl Outdated Show resolved Hide resolved
@matteoacrossi
Copy link
Contributor Author

Currently mul! falls back to the standard LinearAlgebra one and I think it converts the BlockDiagonal matrices to regular Matrix. If the matrices are both BlockDiagonal and have the same block structure, it is much faster to perform a block-by-block multiplication:

using BlockDiagonals
using LinearAlgebra
using BenchmarkTools

# 3-Argument mul!
mul2!(C::BlockDiagonal, A::BlockDiagonal, B::BlockDiagonal) = mul2!(C, A, B, true, false)

# 5-Argument mul!
function mul2!(C::BlockDiagonal, A::BlockDiagonal, B::BlockDiagonal, α::Number, β::Number)
    BlockDiagonals.isequal_blocksizes(A, B) || throw(DimensionMismatch("A and B have different block sizes"))
    BlockDiagonals.isequal_blocksizes(C, A) || throw(DimensionMismatch("C has incompatible block sizes"))
    for i in 1:length(blocks(C))
        LinearAlgebra.mul!(C.blocks[i], A.blocks[i], B.blocks[i], α, β)
    end
    return C
end

N1, N2, N3 = 30, 40, 50
b1 = BlockDiagonal([rand(N1, N1), rand(N2, N2), rand(N3, N3)])
b2 = BlockDiagonal([rand(N1, N1), rand(N2, N2), rand(N3, N3)])
b3 = similar(b1);

This is with the current master of BlockDiagonals

@benchmark mul!(b3, b1, b2)

BenchmarkTools.Trial: 
  memory estimate:  39.06 MiB
  allocs estimate:  388823
  --------------
  minimum time:     19.184 ms (0.00% GC)
  median time:      28.789 ms (11.61% GC)
  mean time:        29.916 ms (10.99% GC)
  maximum time:     52.644 ms (11.49% GC)
  --------------
  samples:          168
  evals/sample:     1

This is with the specialized function

@benchmark mul2!(b3, b1, b2)

BenchmarkTools.Trial: 
  memory estimate:  768 bytes
  allocs estimate:  17
  --------------
  minimum time:     20.210 μs (0.00% GC)
  median time:      21.957 μs (0.00% GC)
  mean time:        27.255 μs (0.00% GC)
  maximum time:     834.732 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

@matteoacrossi
Copy link
Contributor Author

But I do realize now that maybe the specialized code should fall back to the standard matrix multiplication if the block sizes don't match, but the overall matrix sizes do, instead of throwing an error.

@nickrobinson251 nickrobinson251 mentioned this pull request Mar 5, 2020
@matteoacrossi
Copy link
Contributor Author

I implemented the proposed changes. After all, the mul!(C::BlockDiagonal, A::BlockDiagonal, B::BlockDiagonal) should error if the block structures do not match. If C is not BlockDiagonal, then the method automatically falls back to regular matrix multiplication (although there well be a lot of allocations).

Copy link
Contributor

@nickrobinson251 nickrobinson251 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thanks a lot.

I bunch of very minor style suggestions.

Please can you bump the version in the Project.toml? Then I will release the new version as soon as this is merged :) Since this is just a performance improvement, and not breaking, we can just bump the last version number. Thanks again!

src/linalg.jl Outdated Show resolved Hide resolved
src/linalg.jl Outdated Show resolved Hide resolved
src/linalg.jl Outdated Show resolved Hide resolved
src/linalg.jl Outdated Show resolved Hide resolved
test/linalg.jl Outdated Show resolved Hide resolved
test/linalg.jl Outdated Show resolved Hide resolved
src/linalg.jl Outdated Show resolved Hide resolved
src/blockdiagonal.jl Outdated Show resolved Hide resolved
src/blockdiagonal.jl Outdated Show resolved Hide resolved
matteoacrossi and others added 2 commits March 7, 2020 19:25
Co-Authored-By: Nick Robinson <npr251@gmail.com>
@nickrobinson251 nickrobinson251 merged commit 9fdcb28 into JuliaArrays:master Mar 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants