Skip to content

Conversation

@pcarruscag
Copy link
Member

Proposed Changes

I made this in micro steps and the commit messages are fairly detailed, so here I will give only the highlights. Every change is mathematically equivalent.
I claim performance improvements based on the type of cases I run (steady RANS implicit, elasticity), results may vary, so please give this a try.

Cleanup

Performance

  • Inlined small methods.
  • No calls to "Get/SetBlock" when the block location is already known (helps ILU and LU_SGS).
  • Block inversion done in one go instead of multiple Gaussian eliminations (which used to require repeated upper matrix transformations) (helps Jacobi, Linelet, and ILU).
  • Inverted ILU diagonal blocks are stored during "build" for use in "compute".

Bugs

  • Linelet preconditioner was not working for multiple wall markers.

ToDo

  • I will try to make the MKL optimizations work for the discrete adjoint (I say try because it may require too much creativity with the templates...).
  • Get some benchmarks (I don't want you to just take my word for it).

Related Work

#650, #653, #658

PR Checklist

  • I am submitting my contribution to the develop branch.
  • My contribution generates no new compiler warnings (try with the '-Wall -Wextra -Wno-unused-parameter -Wno-empty-body' compiler flags).
  • My contribution is commented and consistent with SU2 style.
  • I have added a test case that demonstrates my contribution, if necessary.

pcarruscag added 17 commits June 1, 2019 14:56
…n CSysSolve

the type of preconditioner defines the type of smoother
- not in use anywhere
- exponential complexity, and so slower for most common nVar range
- using an incompatible matrix format (** vs. *)
…rflow

(after having removed an seemingly redundant exit condition)
- work on dest. vector instead of temporaries
- using the optimized (mkl) block routines
- Gaussian elimination instead of product with inverse
- avoid use of GetBlock in XXXProduct routines as that requires a search
- work directly on ouput vector to avoid copy operations
- fewer working structures in CSysMatrix
- stl containers and contiguous storage
Copy link
Member

@economon economon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First comments in the review.. very promising so far!

Copy link
Member Author

@pcarruscag pcarruscag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok MKL works with the discrete adjoint now, it did get a bit creative...
@economon I added more comments.

Now to find out why the residuals changed for unsteady regressions.

Should I move some files to folders? Or move inlines to the hpp?

}

#define __MATVECPROD_SIGNATURE__(TYPE,NAME) \
inline void CSysMatrix<TYPE>::NAME(const TYPE *matrix, const TYPE *vector, TYPE *product)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope this does not look too bad, but this MKL stuff does work very well for the discrete adjoint (maybe 10% faster than native code).

@talbring
Copy link
Member

Thanks @pcarruscag! This is a nice cleanup! 👍

So while you are already at it, you can already separate it in folders/files and remove the *inl (if all agree with these changes). Maybe adding a linear algebra subfolder makes sense.

@pcarruscag
Copy link
Member Author

Thanks @talbring,
I moved the preconditioner and matrix-vector product wrapper classes to separate files, these are so light weight that I was thinking of leaving them in one file.
I also moved some inlines to the hpp but I kept the private inlines in the inl file, these are only needed in the cpp and so by including the inl from the cpp (instead of bottom of hpp) we might avoid triggering recompilation of more units when working on implementation details of CSysMatrix.
Finally I would like to move/rename the larger files on a separate PR, that way it will be easier to track changes.
What do you think?

@pcarruscag
Copy link
Member Author

The case with large change of residuals (3 orders) is the rotating cylinders case (sliding interface).
With this PR I get the following flow field at the last iteration:
image
With develop I get this one at time iter 2:
image
And I think it is fair to say the case was actually blowing up before:
image
If I change the linear preconditioner to the ILU the develop results are basically the same as with this PR. Why does the previous LU_SGS diverge and the current doesn't? No idea, maybe for a different limit condition it would go the other way...

@talbring
Copy link
Member

@pcarruscag Good that you mention it. I saw exactly the same behavior while working on #715. If I disable grid movement for the first zone (where grid velocity is 0 anyway because there is no rotation) it works. In my opinion having grid movement disabled or enabled and specifying zero movement should result in exactly the same convergence behavior ...

@pcarruscag
Copy link
Member Author

I tested the MKL integration with the discrete adjoint, and everything looks ok, turns out to be only about 5% faster on a per iteration basis (i.e. excluding recording times). I had to grab some work from my other ongoing PR's to run a case with reasonable CFL settings so I am not going to upload files for this test.

@economon , @talbring , if you do not mind me moving the files with significant changes on a separate PR, I think this is ready to go.

@economon
Copy link
Member

@talbring : as you know, the difference between disabled and active grid movement with 0 velocity is that the former case does not even allocate the memory for the grid velocity at each node, and many conditionals checking for grid movement throughout the solver (fluxes, BCs) are avoided. This was to save memory and effort when grid motion is not needed, however, maybe we need to now change the strategy for multizone problems which may have both fixed and moving zones (perhaps always active with 0 as default).

I am a little surprised they are not the same as well, but somewhere in the code path there must be an issue with this.. my guess is something related to BC_Fluid_Interface() or the transfer structure when grid movement is active on both sides but has a value of 0 on one of the interfaces.

@talbring
Copy link
Member

@pcarruscag For me its fine if you do it in a separate PR.

@economon I also assume that is has something to do with how the interfaces are handled. We definitely have to check it. For now in PR #715 I enabled grid movement (in the config) also in fixed zones.

@pcarruscag
Copy link
Member Author

If that case is so sensitive to be affected by the changes in this PR (it started converging without any change of options), it does not surprise me that other sources of numerical noise can have a similar influence.

@economon
Copy link
Member

Ok for me to wait for another PR to break the other files.

Would be good to get some additional reviews on this one from the community if folks have some time.


if ((kPoint >= jPoint) && (jPoint < (long)nPointDomain)) {
if (kPoint > jPoint) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was not a bug, but we overwrite the ij block completely so it is not necessary to update it here.

@pcarruscag
Copy link
Member Author

@economon no one seems to have time, let's get going please.

Copy link
Contributor

@WallyMaier WallyMaier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me.

Copy link
Contributor

@EduardoMolina EduardoMolina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this clean PR! I appreciate that you also update the config files in the TestCases folder.
Please revert the travis.yml before merge.
LGTM!

Eduardo

.travis.yml Outdated
before_script:
# Get the test cases
- git clone -b develop https://github.com/su2code/TestCases.git ./TestData
- git clone -b feature_refactor_lin_solvers https://github.com/su2code/TestCases.git ./TestData
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert before we merge.

unsigned long Smoother_LinSolver(const VectorType & b, VectorType & x, ProductType & mat_vec,
PrecondType & precond, ScalarType tol, unsigned long m,
ScalarType *residual, bool monitoring, CConfig *config);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments.

@pcarruscag
Copy link
Member Author

Thank you @WallyMaier and @EduardoMolina for the reviews.

@pcarruscag pcarruscag merged commit 27882a4 into su2code:develop Jul 13, 2019
@pcarruscag pcarruscag mentioned this pull request Jul 15, 2019
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants