-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix mul! performance regression for 2x2 and 3x3 matrices #34384
Conversation
I can confirm these benchmarks: using LinearAlgebra
using BenchmarkTools
for ndim = 2:3
m1 = rand(ComplexF64, ndim, ndim)
m2 = rand(ComplexF64, ndim, ndim)
ou = similar(m1)
@btime mul!($ou, $m1, $m2)
end on master:
on PR branch:
Version info:
|
One additional thing: it appears while this
a bit better if one also incoperates #29634 (comment)
|
How about just copy pasting two versions of 2x2 and 3x3 mul? One for 5 arg and one for 3 arg? Relying on constant propagation seems brittle. |
There are many places where the 2x2 and 3x3 mul is called inside the various wrappers, so copy-pasting to all may introduce bloat/make it less maintainable for future changes... one can do the short-circuit |
I think the branching trick #29634 (comment) is an easier solution. |
This is suggested by chethega in: JuliaLang#29634 (comment)
Inlining all of if mA == 2 && nA == 2 && nB == 2
return matmul2x2!(C, tA, tB, A, B, MulAddMul(α, β))
end
if mA == 3 && nA == 3 && nB == 3
return matmul3x3!(C, tA, tB, A, B, MulAddMul(α, β))
end It should be further streamlined to something like this: if 2 == size(A, 1) && 2 == size(A, 2) && 2 == size(B, 1) && 2 == size(B, 2)
return matmul2x2!(C, tA, tB, A, B, MulAddMul(α, β))
elseif 3 == size(A, 1) && 3 == size(A, 2) && 3 == size(B, 1) && 3 == size(B, 2)
return matmul3x3!(C, tA, tB, A, B, MulAddMul(α, β))
end After that there can be a non-inlined call to a function that checks dimension compatibility, looks at the stride arguments, etc. and does the relatively slow, expensive GEMM call. |
Great! Does this version still fix the performance issue? |
Seems to have some CI problems:
|
This is a quick patch to fix JuliaLang/LinearAlgebra.jl#684.
On my laptop, the result of the benchmark
@btime mul!($ou, $m1, $m2)
from JuliaLang/LinearAlgebra.jl#684 is: