Fix mul! performance regression for 2x2 and 3x3 matrices #34384

tkf · 2020-01-15T08:38:45Z

This is a quick patch to fix JuliaLang/LinearAlgebra.jl#684.

On my laptop, the result of the benchmark @btime mul!($ou, $m1, $m2) from JuliaLang/LinearAlgebra.jl#684 is:

Before (07a16d6): 335.059 ns (1 allocation: 16 bytes)
After (9274757): 37.268 ns (0 allocations: 0 bytes)

goggle · 2020-01-15T15:09:35Z

I can confirm these benchmarks:

using LinearAlgebra
using BenchmarkTools
for ndim = 2:3
    m1 = rand(ComplexF64, ndim, ndim)
    m2 = rand(ComplexF64, ndim, ndim)
    ou = similar(m1)
    @btime mul!($ou, $m1, $m2)
end

on master:

364.115 ns (1 allocation: 16 bytes)
387.926 ns (1 allocation: 16 bytes)

on PR branch:

18.957 ns (0 allocations: 0 bytes)
35.137 ns (0 allocations: 0 bytes)

Version info:

Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, ivybridge)

daviehh · 2020-01-15T15:52:42Z

One additional thing: it appears while this @inline fixed the 3-argument mul!, the issue with constant propagation of 5-argument mul! may still remain:

using LinearAlgebra
using BenchmarkTools


ndim = 3

al = .5;
bt = .7im;

m1 = rand(ComplexF64,ndim,ndim);
m2 = rand(ComplexF64,ndim,ndim);
ou = rand(ComplexF64,ndim,ndim);

@btime mul!($ou, $m1, $m2);

@btime mul!($ou, $m1, $m2, $al, $bt);

@btime mul!($ou, $m1, $m2, .5, .7im);

41.007 ns (0 allocations: 0 bytes)
334.309 ns (3 allocations: 80 bytes)
54.090 ns (0 allocations: 0 bytes)

a bit better if one also incoperates #29634 (comment)

40.915 ns (0 allocations: 0 bytes)
77.679 ns (1 allocation: 32 bytes)
54.872 ns (0 allocations: 0 bytes)

KristofferC · 2020-01-15T16:29:31Z

How about just copy pasting two versions of 2x2 and 3x3 mul? One for 5 arg and one for 3 arg? Relying on constant propagation seems brittle.

daviehh · 2020-01-15T20:47:16Z

There are many places where the 2x2 and 3x3 mul is called inside the various wrappers, so copy-pasting to all may introduce bloat/make it less maintainable for future changes... one can do the short-circuit MulAddMul alternatively #34394, is it easier to improve upon that?

tkf · 2020-01-15T21:33:15Z

MulAddMul thing was useful to improve performance for structured matrices; e.g., #29634 (comment)

I think the branching trick #29634 (comment) is an easier solution.

This is suggested by chethega in: JuliaLang#29634 (comment)

StefanKarpinski · 2020-01-30T20:12:35Z

Inlining all of gemm_wrapper! seems very excessive: all we really want to inline is the part that checks if the 2x2 and 3x3 cases should be called. If you want to inline this, it should be refactored so that the only part that's actually inlined is this:

if mA == 2 && nA == 2 && nB == 2
    return matmul2x2!(C, tA, tB, A, B, MulAddMul(α, β))
end
if mA == 3 && nA == 3 && nB == 3
    return matmul3x3!(C, tA, tB, A, B, MulAddMul(α, β))
end

It should be further streamlined to something like this:

if 2 == size(A, 1) && 2 == size(A, 2) && 2 == size(B, 1) && 2 == size(B, 2)
    return matmul2x2!(C, tA, tB, A, B, MulAddMul(α, β))
elseif 3 == size(A, 1) && 3 == size(A, 2) && 3 == size(B, 1) && 3 == size(B, 2)
    return matmul3x3!(C, tA, tB, A, B, MulAddMul(α, β))
end

After that there can be a non-inlined call to a function that checks dimension compatibility, looks at the stride arguments, etc. and does the relatively slow, expensive GEMM call.

tkf · 2020-01-31T08:07:04Z

I think 1c614e3 implements what you suggested.

I also created another PR #34601 that does not add new @inline.

StefanKarpinski · 2020-01-31T13:30:15Z

Great! Does this version still fix the performance issue?

KristofferC · 2020-01-31T13:43:50Z

Seems to have some CI problems:

Test Failed at /buildworker/worker/tester_linux64/build/share/julia/stdlib/v1.5/LinearAlgebra/test/matmul.jl:586
  Expression: mul!(copy(C), A, B, ��, 1.0) == C
   Evaluated: [NaN NaN NaN; NaN NaN NaN; NaN NaN NaN] == [1.0 1.0 1.0; 1.0 1.0 1.0; 1.0 1.0 1.0]

StefanKarpinski · 2020-01-31T13:58:50Z

Inline gemm_wrapper!

9274757

tkf mentioned this pull request Jan 15, 2020

mul! performance regression on master JuliaLang/LinearAlgebra.jl#684

Closed

ViralBShah added the linear algebra Linear algebra label Jan 15, 2020

tkf mentioned this pull request Jan 16, 2020

alternative fix for mul! #34394

Closed

tkf added 2 commits January 28, 2020 23:47

Add branches manually in MulAddMul constructor

ec824ad

This is suggested by chethega in: JuliaLang#29634 (comment)

Merge branch 'master' into issue-34013

dd9d18f

tkf added 2 commits January 30, 2020 23:24

Merge branch 'master' into issue-34013

74a1dd6

Do not inline whole gemm_wrapper!

1c614e3

tkf mentioned this pull request Jan 31, 2020

Construct MulAddMul at gemm_wrapper! call sites #34601

Merged

tkf added 2 commits January 31, 2020 21:33

Check iszero(α) before matmul2x2! and matmul3x3!

a381bc9

Check aliasing before matmul2x2! and matmul3x3!

8e3cd3f

tkf closed this Feb 2, 2020

N5N3 mentioned this pull request Oct 12, 2022

Avoid type instability when constructing MulAddMul #47088

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix mul! performance regression for 2x2 and 3x3 matrices #34384

Fix mul! performance regression for 2x2 and 3x3 matrices #34384

tkf commented Jan 15, 2020 •

edited

Loading

goggle commented Jan 15, 2020

daviehh commented Jan 15, 2020 •

edited

Loading

KristofferC commented Jan 15, 2020 •

edited

Loading

daviehh commented Jan 15, 2020 •

edited

Loading

tkf commented Jan 15, 2020

StefanKarpinski commented Jan 30, 2020

tkf commented Jan 31, 2020

StefanKarpinski commented Jan 31, 2020

KristofferC commented Jan 31, 2020

StefanKarpinski commented Jan 31, 2020

Fix mul! performance regression for 2x2 and 3x3 matrices #34384

Fix mul! performance regression for 2x2 and 3x3 matrices #34384

Conversation

tkf commented Jan 15, 2020 • edited Loading

goggle commented Jan 15, 2020

daviehh commented Jan 15, 2020 • edited Loading

KristofferC commented Jan 15, 2020 • edited Loading

daviehh commented Jan 15, 2020 • edited Loading

tkf commented Jan 15, 2020

StefanKarpinski commented Jan 30, 2020

tkf commented Jan 31, 2020

StefanKarpinski commented Jan 31, 2020

KristofferC commented Jan 31, 2020

StefanKarpinski commented Jan 31, 2020

tkf commented Jan 15, 2020 •

edited

Loading

daviehh commented Jan 15, 2020 •

edited

Loading

KristofferC commented Jan 15, 2020 •

edited

Loading

daviehh commented Jan 15, 2020 •

edited

Loading