Add templated implementations of BCSR matrix operations #293

A-CGray · 2024-02-13T18:56:53Z

I was looking into implementing transpose multiplication for BCSR matrices when I discovered that the BCSR operations have separate implementations for each possible block size. I think we could get the same performance for most of the defined operations by templating on the block size, which would drastically reduce the amount of duplicated code.

So far I've added templated implementations of the 2 basic MatVec operations. I'm opening this PR to get @gjkennedy 's opinion on this before I continue converting more functions.

A-CGray · 2024-02-23T22:41:21Z

Timing suggests that the templated implementation of the MatVec product is as fast as the handwritten version, and both are faster than the existing generic implementation that uses BLAS.

The timings below are for MatVec products computed using the stiffness matrix from one of my wingbox cases with ~420 kDOF on a single core (I'm assuming parallelisation won't make any difference to the results since the code being changed only affects the local block operations).

All compiled with EXTRA_CC_FLAGS = -fPIC -O3 -march=core-avx2 -mtune=core-avx2 -Wall

Using generic implementation (BCSRMatVecMult):

Timed for: 278 loops, best of 5
    time per loop: best=17.039 ms, mean=17.388 ± 0.3 ms

Using blocksize=6 specific implementation (BCSRMatVecMult6):

Timed for: 516 loops, best of 5
    time per loop: best=8.406 ms, mean=9.354 ± 1.0 ms

Using templated function (BCSRBlockMatVecMult<6>):

Timed for: 522 loops, best of 5
    time per loop: best=8.784 ms, mean=9.342 ± 0.2 ms

Add templated implementations of bmult bmultadd

8e1c4f5

A-CGray requested a review from gjkennedy February 13, 2024 18:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add templated implementations of BCSR matrix operations #293

Add templated implementations of BCSR matrix operations #293

A-CGray commented Feb 13, 2024

A-CGray commented Feb 23, 2024 •

edited

Loading

Add templated implementations of BCSR matrix operations #293

Are you sure you want to change the base?

Add templated implementations of BCSR matrix operations #293

Conversation

A-CGray commented Feb 13, 2024

A-CGray commented Feb 23, 2024 • edited Loading

A-CGray commented Feb 23, 2024 •

edited

Loading