CSysMatrix cleanup and performance improvements #700

pcarruscag · 2019-06-05T15:08:58Z

Proposed Changes

I made this in micro steps and the commit messages are fairly detailed, so here I will give only the highlights. Every change is mathematically equivalent.
I claim performance improvements based on the type of cases I run (steady RANS implicit, elasticity), results may vary, so please give this a try.

Cleanup

Iterative smoothers (Jacobi, ILU, etc.) are now implemented in CSysSolve as one generic method with the same interface as the Krylov solvers.
Dead code was removed from CSysMatrix (as requested in Increased performance of the discrete adjoint solver by using Templates for Linear Solvers #653).
Duplicated code, mostly dealing with block-block/vector multiplication, was merged.
Row/column elimination tasks implemented in CSysMatrix (as requested in Fix given displacement BC's of FEA solver and CElasticityMovement #658).
Linelet uses fewer working variables.
Make a lot of methods private (the diff of matrix_structure.hpp is not going to be pretty...).

Performance

Inlined small methods.
No calls to "Get/SetBlock" when the block location is already known (helps ILU and LU_SGS).
Block inversion done in one go instead of multiple Gaussian eliminations (which used to require repeated upper matrix transformations) (helps Jacobi, Linelet, and ILU).
Inverted ILU diagonal blocks are stored during "build" for use in "compute".

Bugs

Linelet preconditioner was not working for multiple wall markers.

ToDo

I will try to make the MKL optimizations work for the discrete adjoint (I say try because it may require too much creativity with the templates...).
Get some benchmarks (I don't want you to just take my word for it).

Related Work

#650, #653, #658

PR Checklist

I am submitting my contribution to the develop branch.
My contribution generates no new compiler warnings (try with the '-Wall -Wextra -Wno-unused-parameter -Wno-empty-body' compiler flags).
My contribution is commented and consistent with SU2 style.
I have added a test case that demonstrates my contribution, if necessary.

…n CSysSolve the type of preconditioner defines the type of smoother

- not in use anywhere - exponential complexity, and so slower for most common nVar range - using an incompatible matrix format (** vs. *)

…rgument order of block methods

…rflow (after having removed an seemingly redundant exit condition)

- work on dest. vector instead of temporaries - using the optimized (mkl) block routines - Gaussian elimination instead of product with inverse

- avoid use of GetBlock in XXXProduct routines as that requires a search - work directly on ouput vector to avoid copy operations

- fewer working structures in CSysMatrix - stl containers and contiguous storage

… "Compute" method

…nPointDomain

…plied to CElasticityMovement and CFEASolver

economon

First comments in the review.. very promising so far!

SU2_CFD/src/solver_direct_elasticity.cpp

Common/src/matrix_structure.cpp

Common/src/linear_solvers_structure.cpp

Common/include/matrix_structure.hpp

Common/src/matrix_structure.cpp

…or_lin_solvers

pcarruscag

Ok MKL works with the discrete adjoint now, it did get a bit creative...
@economon I added more comments.

Now to find out why the residuals changed for unsteady regressions.

Should I move some files to folders? Or move inlines to the hpp?

Common/src/linear_solvers_structure.cpp

pcarruscag · 2019-06-06T19:28:45Z

Common/include/matrix_structure.inl

+}
+
+#define __MATVECPROD_SIGNATURE__(TYPE,NAME) \
+inline void CSysMatrix<TYPE>::NAME(const TYPE *matrix, const TYPE *vector, TYPE *product)


I hope this does not look too bad, but this MKL stuff does work very well for the discrete adjoint (maybe 10% faster than native code).

talbring · 2019-06-11T12:11:39Z

Thanks @pcarruscag! This is a nice cleanup! 👍

So while you are already at it, you can already separate it in folders/files and remove the *inl (if all agree with these changes). Maybe adding a linear algebra subfolder makes sense.

pcarruscag · 2019-06-19T15:42:15Z

Thanks @talbring,
I moved the preconditioner and matrix-vector product wrapper classes to separate files, these are so light weight that I was thinking of leaving them in one file.
I also moved some inlines to the hpp but I kept the private inlines in the inl file, these are only needed in the cpp and so by including the inl from the cpp (instead of bottom of hpp) we might avoid triggering recompilation of more units when working on implementation details of CSysMatrix.
Finally I would like to move/rename the larger files on a separate PR, that way it will be easier to track changes.
What do you think?

…t was diverging...)

pcarruscag · 2019-06-19T16:55:47Z

The case with large change of residuals (3 orders) is the rotating cylinders case (sliding interface).
With this PR I get the following flow field at the last iteration:

With develop I get this one at time iter 2:

And I think it is fair to say the case was actually blowing up before:

If I change the linear preconditioner to the ILU the develop results are basically the same as with this PR. Why does the previous LU_SGS diverge and the current doesn't? No idea, maybe for a different limit condition it would go the other way...

talbring · 2019-06-19T17:45:39Z

@pcarruscag Good that you mention it. I saw exactly the same behavior while working on #715. If I disable grid movement for the first zone (where grid velocity is 0 anyway because there is no rotation) it works. In my opinion having grid movement disabled or enabled and specifying zero movement should result in exactly the same convergence behavior ...

…onditioner option

pcarruscag · 2019-06-24T13:49:37Z

I tested the MKL integration with the discrete adjoint, and everything looks ok, turns out to be only about 5% faster on a per iteration basis (i.e. excluding recording times). I had to grab some work from my other ongoing PR's to run a case with reasonable CFL settings so I am not going to upload files for this test.

@economon , @talbring , if you do not mind me moving the files with significant changes on a separate PR, I think this is ready to go.

economon · 2019-06-24T21:28:48Z

@talbring : as you know, the difference between disabled and active grid movement with 0 velocity is that the former case does not even allocate the memory for the grid velocity at each node, and many conditionals checking for grid movement throughout the solver (fluxes, BCs) are avoided. This was to save memory and effort when grid motion is not needed, however, maybe we need to now change the strategy for multizone problems which may have both fixed and moving zones (perhaps always active with 0 as default).

I am a little surprised they are not the same as well, but somewhere in the code path there must be an issue with this.. my guess is something related to BC_Fluid_Interface() or the transfer structure when grid movement is active on both sides but has a value of 0 on one of the interfaces.

talbring · 2019-06-25T14:44:20Z

@pcarruscag For me its fine if you do it in a separate PR.

@economon I also assume that is has something to do with how the interfaces are handled. We definitely have to check it. For now in PR #715 I enabled grid movement (in the config) also in fixed zones.

pcarruscag · 2019-06-25T15:17:35Z

If that case is so sensitive to be affected by the changes in this PR (it started converging without any change of options), it does not surprise me that other sources of numerical noise can have a similar influence.

economon · 2019-06-27T21:25:31Z

Ok for me to wait for another PR to break the other files.

Would be good to get some additional reviews on this one from the community if folks have some time.

pcarruscag · 2019-07-02T13:35:05Z

Common/src/matrix_structure.cpp


-          if ((kPoint >= jPoint) && (jPoint < (long)nPointDomain)) {
+          if (kPoint > jPoint) {



This was not a bug, but we overwrite the ij block completely so it is not necessary to update it here.

pcarruscag · 2019-07-12T13:58:31Z

@economon no one seems to have time, let's get going please.

WallyMaier

The changes look good to me.

EduardoMolina

Thanks for this clean PR! I appreciate that you also update the config files in the TestCases folder.
Please revert the travis.yml before merge.
LGTM!

Eduardo

EduardoMolina · 2019-07-13T20:29:01Z

.travis.yml

 before_script:
    # Get the test cases
-    - git clone -b develop https://github.com/su2code/TestCases.git ./TestData
+    - git clone -b feature_refactor_lin_solvers https://github.com/su2code/TestCases.git ./TestData


Please revert before we merge.

EduardoMolina · 2019-07-13T20:29:58Z

Common/include/linear_solvers_structure.hpp

+  unsigned long Smoother_LinSolver(const VectorType & b, VectorType & x, ProductType & mat_vec,
+                                   PrecondType & precond, ScalarType tol, unsigned long m,
+                                   ScalarType *residual, bool monitoring, CConfig *config);
+


Thanks for the comments.

pcarruscag · 2019-07-13T20:51:16Z

Thank you @WallyMaier and @EduardoMolina for the reviews.

pcarruscag added 17 commits June 1, 2019 14:56

replace iterative smoothers in CSysMatrix by generic implementation i…

a5ec710

…n CSysSolve the type of preconditioner defines the type of smoother

remove some dead or duplicated code from CSysMatrix

ebb485b

inline and re-use Gaussian elimination

e48af55

remove determinant-based routines because:

6d0ec4a

- not in use anywhere - exponential complexity, and so slower for most common nVar range - using an incompatible matrix format (** vs. *)

make private what does not need to be public, inline small methods

a310517

more efficient block inversion method, clearer names and consistent a…

8669928

…rgument order of block methods

const correct block methods, signed type in Gauss elim. to avoid unde…

734dbde

…rflow (after having removed an seemingly redundant exit condition)

performance tweaks, by when possible:

4953689

- work on dest. vector instead of temporaries - using the optimized (mkl) block routines - Gaussian elimination instead of product with inverse

LU SGS optimizations:

ac71229

- avoid use of GetBlock in XXXProduct routines as that requires a search - work directly on ouput vector to avoid copy operations

make more methods private as they do not output anything

53ca46f

fix linelet bug, multiple wall markers would cause segfault

54d492b

cleanup Linelet and let it run in parallel

2e72f47

- fewer working structures in CSysMatrix - stl containers and contiguous storage

avoid Get/SetBlock_ILU when the index to the data is already known

886e987

stored inverse ILU diagonal entries for re-use in calls to associated…

3f0bb4f

… "Compute" method

minor cleanup of redundant casts and invM only needs to be sized for …

868fe38

…nPointDomain

row/column elimination method in CSysMatrix (Dirichlet boundaries) ap…

47db3e3

…plied to CElasticityMovement and CFEASolver

Merge branch 'develop' into feature_refactor_lin_solvers

93448ed

economon reviewed Jun 5, 2019

View reviewed changes

pcarruscag and others added 8 commits June 6, 2019 00:53

fix EnforceSolutionAtNode (wrong order of operations)

35ceff4

MKL optimizations working with discrete adjoint

2c63bcb

implement gemv and gemm in a template instead of using macros

3c2c256

cleanup

43e3559

Merge branch 'feature_refactor_lin_solvers_local' into feature_refact…

c00e4ec

…or_lin_solvers

Merge branch 'develop' into feature_refactor_lin_solvers

8fea873

Merge branch 'develop' into feature_refactor_lin_solvers

e7698a3

more comments, improve template config, smoother relaxation from config

eb718c7

pcarruscag commented Jun 6, 2019

View reviewed changes

Merge branch 'develop' into feature_refactor_lin_solvers

981d8bb

Merge branch 'develop' into feature_refactor_lin_solvers

0d07c57

pcarruscag added 3 commits June 19, 2019 12:31

move CMatrixVectorProduct and CPreconditioner to separate files

3876646

move public inline methods to hpp

5087c79

update file diff testcases

6c36734

update residuals for rotating cylinder case (large change as before i…

818e143

…t was diverging...)

pcarruscag added 5 commits June 20, 2019 16:07

move inlines of CSysVector to .hpp, delete .inl

3b55b29

condense SMOOTHER_XXX into a single option, getting XXX from the prec…

c476ab2

…onditioner option

no need for vector of boolean when computing ILU

9429ae3

move CSysSolve inlines to hpp

56818c5

remove include of .inl file

0362485

pcarruscag mentioned this pull request Jun 27, 2019

Refactoring of convective numerics classes #691

Merged

4 tasks

talbring and others added 2 commits June 28, 2019 11:05

Merge branch 'develop' into feature_refactor_lin_solvers

41e67fa

minor tweak to BuildILU

4a757f8

pcarruscag commented Jul 2, 2019

View reviewed changes

Merge branch 'develop' into feature_refactor_lin_solvers

d03cd97

WallyMaier reviewed Jul 12, 2019

View reviewed changes

EduardoMolina approved these changes Jul 13, 2019

View reviewed changes

revert travis

f273bcf

pcarruscag merged commit 27882a4 into su2code:develop Jul 13, 2019

pcarruscag mentioned this pull request Jul 15, 2019

Move/rename linear algebra files #729

Merged

4 tasks

talbring added the changelog:chore label Nov 7, 2019


		if ((kPoint >= jPoint) && (jPoint < (long)nPointDomain)) {
		if (kPoint > jPoint) {

CSysMatrix cleanup and performance improvements #700

CSysMatrix cleanup and performance improvements #700

Uh oh!

Conversation

pcarruscag commented Jun 5, 2019

Proposed Changes

Related Work

PR Checklist

Uh oh!

economon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pcarruscag left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pcarruscag Jun 6, 2019

Choose a reason for hiding this comment

Uh oh!

talbring commented Jun 11, 2019

Uh oh!

pcarruscag commented Jun 19, 2019

Uh oh!

pcarruscag commented Jun 19, 2019

Uh oh!

talbring commented Jun 19, 2019

Uh oh!

pcarruscag commented Jun 24, 2019

Uh oh!

economon commented Jun 24, 2019

Uh oh!

talbring commented Jun 25, 2019

Uh oh!

pcarruscag commented Jun 25, 2019

Uh oh!

economon commented Jun 27, 2019

Uh oh!

pcarruscag Jul 2, 2019

Choose a reason for hiding this comment

Uh oh!

pcarruscag commented Jul 12, 2019

Uh oh!

WallyMaier left a comment

Choose a reason for hiding this comment

Uh oh!

EduardoMolina left a comment

Choose a reason for hiding this comment

Uh oh!

EduardoMolina Jul 13, 2019

Choose a reason for hiding this comment

Uh oh!

EduardoMolina Jul 13, 2019

Choose a reason for hiding this comment

Uh oh!

pcarruscag commented Jul 13, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

pcarruscag left a comment •

edited

Loading