Implement multiply-add interface in LinearAlgebra #29634

tkf · 2018-10-13T23:46:24Z

I started working on "multiply-add" interface for computing C = αAB + βC for LinearAlgebra. The detail of the API is currently discussed in JuliaLang/LinearAlgebra.jl#473. I go ahead and implement the feature using the name addmul!. I use this name only because it's easy to switch to other alternatives by simple replace. Let's do (and finish 😄) the naming discussion in JuliaLang/LinearAlgebra.jl#473.

Remaining TODOs:

decide and implement NaN behavior

EDIT: I updated the summary "inplace" since I noticed there were some now irrelevant comments. (You can click edited ▼ above to see the old version.)

jebej · 2018-10-15T13:46:22Z

Wouldn't the two functions defined do exactly the same thing for the default arguments of alpha and beta (which should be true and false) since gemm_wrapper! already has those defaults? If so, what would be the point, except for those two functions to have a different name?

tkf · 2018-10-15T19:12:22Z

The idea is that ternary mul! and muladd! do C = A * B and C += A * B, respectively, to make muladd! similar to scalar muladd. Of course, the exact signature may be different from the one I write above. Maybe only 5-ary muladd! will be defined. Let's keep the API discussion in JuliaLang/LinearAlgebra.jl#473.

It may not be defined for Matrix{BigFloat}.

and addmul!(C, X, s::Number, alpha, beta)

andreasnoack · 2019-08-14T21:31:51Z

I'd forgotten that. It's quite rare that there are no changes detected at all.

Then let's get thing merged such that we can finally start using the feature. It's been in demand for quite some time.

JeffBezanson · 2019-08-16T11:46:56Z

The tests for this are failing sometimes.

fredrikekre · 2019-08-16T12:22:00Z

See JuliaLang/LinearAlgebra.jl#655

Jutho · 2019-10-30T14:09:56Z

I hope that by mentioning the following issue/inconsistency here, the relevant people are directly informed:

The function

mul!(C::StridedMatrix{T}, A::StridedVecOrMat{T}, B::StridedVecOrMat{T}, α::Number, β::Number)

tries to call BLAS/gemm whenever the scalars can be converted to type T.

However, for e.g. adjoint A and complex eltype, there is

mul!(C::StridedMatrix{T}, adjA::Adjoint{<:Any,<:StridedVecOrMat{T}}, B::StridedVecOrMat{T}, alpha::Union{T, Bool}, beta::Union{T, Bool}) where {T<:BlasComplex}

which requires the scalars to be of type T or of type Bool, while other values of the scalars (e.g. I wanted to use alpha=-1) fall back to

mul!(C::AbstractMatrix, adjA::Adjoint{<:Any,<:AbstractVecOrMat}, B::AbstractVecOrMat, alpha::Number, beta::Number)

which directly calls

generic_matmatmul!

instead of BLAS/gemm. The same seems to be true for other combinations of adjoint and transpose.

Jutho · 2019-10-30T14:27:15Z

Also, I notice that with a code which uses many small matrix multiplications, the construction of the MulAddMul object takes a significant fraction of the time. I first assumed this was an issue with profiling, but replacing mul! calls with corresponding BLAS.gemm! calls confirms that these profiles are telling the truth.

Here an example profile:

tkf · 2019-10-30T21:15:31Z

As for Adjoint/Transpose, I think we just need something like #33229 (as you've already discovered).

MulAddMul construction takes time when alpha and beta are not constant. A MWE is

N = 10
A = randn(N, N)
B = randn(N, N)
C = similar(A)

function demo(C, A, B, alpha, beta, n)
    for _ in 1:n
        mul!(C, A, B, alpha, beta)
    end
end

demo(C, A, B, 1, 0, 1)
@time @profile demo(C, A, B, 1, 0, 10^5)

while MulAddMul vanishes from the profile if I do

function demo(C, A, B, n)
    for _ in 1:n
        mul!(C, A, B, 1, 0)
    end
end

Maybe something like #29634 (comment) would be the solution? Maybe we can also add an early branch in mul! to directly call BLAS.gemm! without constructing MulAddMul?

Jutho · 2019-10-30T21:25:05Z

Thanks for the quick response and the pointers @tkf .

For the MulAddMul issue, I had constant values in the code, but maybe some higher levels up, so I don't know if they were propagated correctly. It also seemed to be worse for the adjoint(A)*B case than for the A*B case, which was maybe related to the first issue. I can profile further if needed, but have currently worked around these by directly calling gemm!, and it seems you are already aware of these issues. That's also why I didn't open new issues right away.

tkf · 2019-10-31T05:20:39Z

Yeah, propagating constants for long call chains is not super robust... FYI, one option may be to use StaticNumbers.jl. This calls gemm! and compiles away MulAddMul construction:

using StaticNumbers
demo(C, A, B, static(1), static(0), 1)
@time @profile demo(C, A, B, static(1), static(0), 10^5)

I also tried #29634 (comment) and it works (i.e., MulAddMul vanishes from the profile). But, IIUC, it works by duplicating function body. Not sure if we want the code bloat, especially the ones implemented in pure Julia.

This is suggested by chethega in: JuliaLang#29634 (comment)

* Construct MulAddMul at gemm_wrapper! call sites * Add branches manually in MulAddMul constructor This is suggested by chethega in: #29634 (comment) * Update stdlib/LinearAlgebra/src/generic.jl Co-Authored-By: Kristoffer Carlsson <kristoffer.carlsson@chalmers.se> Co-authored-by: Kristoffer Carlsson <kristoffer.carlsson@chalmers.se>

* Construct MulAddMul at gemm_wrapper! call sites * Add branches manually in MulAddMul constructor This is suggested by chethega in: #29634 (comment) * Update stdlib/LinearAlgebra/src/generic.jl Co-Authored-By: Kristoffer Carlsson <kristoffer.carlsson@chalmers.se> Co-authored-by: Kristoffer Carlsson <kristoffer.carlsson@chalmers.se> (cherry picked from commit 2da42e0)

* Construct MulAddMul at gemm_wrapper! call sites * Add branches manually in MulAddMul constructor This is suggested by chethega in: #29634 (comment) * Update stdlib/LinearAlgebra/src/generic.jl Co-Authored-By: Kristoffer Carlsson <kristoffer.carlsson@chalmers.se> Co-authored-by: Kristoffer Carlsson <kristoffer.carlsson@chalmers.se>

tkf mentioned this pull request Oct 13, 2018

Matrix Multiplication API JuliaLang/LinearAlgebra.jl#473

Closed

chriscoey mentioned this pull request Oct 14, 2018

use in-place mul+add (GEMM style) when Julia #29634 is merged jump-dev/Hypatia.jl#87

Closed

dkarrasch mentioned this pull request Oct 29, 2018

avoid allocation for length-1 linear combinations JuliaLinearAlgebra/LinearMaps.jl#34

Merged

tkf added 6 commits November 18, 2018 14:53

Multiply-add interface for BLAS.gemm!

9cef41a

Multiply-add interface for BLAS.syrk!

4013d8d

Multiply-add interface for BLAS.herk!

93aa9b6

Multiply-add interface for gemv!

95deaaf

Fix UndefRefError from C[i,j]

302396b

It may not be defined for Matrix{BigFloat}.

Do not assume *(::Bool, ::eltype(C)) exists

3b72e3b

tkf force-pushed the matmuladd branch from dd9bb8e to cad8082 Compare November 19, 2018 00:27

tkf added 12 commits November 18, 2018 21:28

Implement mul! in terms of addmul!

a27f5f5

Test multiply-add interface

04333aa

Document addmul!

ae38931

Use lmul! for beta * C; eltype may not be commutative

87b41b8

Add _lmul_or_fill!

8f41412

Add multiply-add interface for symmetric matrices

b97af31

Add multiply-add interface for Number and UniformScaling

cd169b3

Add multiply-add interface for diagonal matrices

a655ff8

Add multiply-add interface for bi- and tri-diagonal matrices

84f009b

Add multiply-add interface for triangular matrices

b0ab7b2

Test multiply-add interface in test/generic.jl

92a7e86

Fix addmul!(C, s::Number, X, alpha, beta)

00cfc91

and addmul!(C, X, s::Number, alpha, beta)

tkf force-pushed the matmuladd branch from cad8082 to 00cfc91 Compare November 19, 2018 09:35

tkf added 4 commits November 19, 2018 18:32

Special-case alpha=1 beta=0 using type parameter

ed6821f

Test multiply-add interface in test/uniformscaling.jl

602fb7b

Test multiply-add interface in test/diagonal.jl

c33767b

Use addmul! in SparseArrays

1309cc7

tkf force-pushed the matmuladd branch from 39b9a39 to 1309cc7 Compare November 20, 2018 05:44

andreasnoack merged commit 7615d4c into JuliaLang:master Aug 14, 2019

This was referenced Aug 14, 2019

Add news and compat for mul!(C, A, B, α, β) #32900

Merged

5-arg mul! bug fixes #32901

Merged

KristofferC added needs news A NEWS entry is required for this change and removed needs news A NEWS entry is required for this change labels Aug 15, 2019

tkf mentioned this pull request Aug 16, 2019

Refactoring: add at-sc macro to improve readability of mul! code #32922

Closed

dkarrasch mentioned this pull request Nov 1, 2019

Dispatch even more to BLAS #33743

Merged

This was referenced Dec 4, 2019

mul! performance regression on master JuliaLang/LinearAlgebra.jl#684

Closed

Backports release 1.3.1 #33979

Merged

daviehh mentioned this pull request Jan 15, 2020

Fix mul! performance regression for 2x2 and 3x3 matrices #34384

Closed

tkf mentioned this pull request Jan 16, 2020

alternative fix for mul! #34394

Closed

tkf added a commit to tkf/julia that referenced this pull request Jan 29, 2020

Add branches manually in MulAddMul constructor

ec824ad

This is suggested by chethega in: JuliaLang#29634 (comment)

tkf added a commit to tkf/julia that referenced this pull request Jan 31, 2020

Add branches manually in MulAddMul constructor

5d8f7af

This is suggested by chethega in: JuliaLang#29634 (comment)

blegat mentioned this pull request Dec 1, 2021

Fix ambiguity with new mul! methods in Julia v1.7 jump-dev/MutableArithmetics.jl#127

Merged

amilsted mentioned this pull request Oct 4, 2022

dense-matrix mul!(C, A, B, alpha, beta) allocates #46865

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement multiply-add interface in LinearAlgebra #29634

Implement multiply-add interface in LinearAlgebra #29634

tkf commented Oct 13, 2018 •

edited

Loading

jebej commented Oct 15, 2018

tkf commented Oct 15, 2018

andreasnoack commented Aug 14, 2019

JeffBezanson commented Aug 16, 2019

fredrikekre commented Aug 16, 2019

Jutho commented Oct 30, 2019 •

edited

Loading

Jutho commented Oct 30, 2019

tkf commented Oct 30, 2019

Jutho commented Oct 30, 2019

tkf commented Oct 31, 2019

Implement multiply-add interface in LinearAlgebra #29634

Implement multiply-add interface in LinearAlgebra #29634

Conversation

tkf commented Oct 13, 2018 • edited Loading

jebej commented Oct 15, 2018

tkf commented Oct 15, 2018

andreasnoack commented Aug 14, 2019

JeffBezanson commented Aug 16, 2019

fredrikekre commented Aug 16, 2019

Jutho commented Oct 30, 2019 • edited Loading

Jutho commented Oct 30, 2019

tkf commented Oct 30, 2019

Jutho commented Oct 30, 2019

tkf commented Oct 31, 2019

tkf commented Oct 13, 2018 •

edited

Loading

Jutho commented Oct 30, 2019 •

edited

Loading