Quasi-Newton convergence acceleration/stabilization of discrete adjoints #1020

pcarruscag · 2020-06-08T16:12:12Z

Proposed Changes

So I took a common method that folks use to converge multi physics problems expressed as fixed point iterations (the IQN-ILS) and applied it to the inner iterations of the discrete adjoint drivers (single and multizone), it seems to work well... I was having some issues when the primal solver does not converge so well for optimization edge cases, and this keeps the adjoint from diverging.
I'll post some results at some point.

Other than storing a number of solution snapshots (20 seems like a good number) the overhead is minimal (and also provided you have Lapack/MKL, compile for OpenMP, with fast-math, and AVX support).

Related Work

Already has #1015
Resolves #1021
Resolves #1025
Resolves #1029

PR Checklist

I am submitting my contribution to the develop branch.
My contribution generates no new compiler warnings (try with the '-Wall -Wextra -Wno-unused-parameter -Wno-empty-body' compiler flags).
My contribution is commented and consistent with SU2 style.
I have added a test case that demonstrates my contribution, if necessary.
I have updated appropriate documentation (Tutorials, Docs Page, config_template.cpp) , if necessary.

pcarruscag · 2020-06-08T16:14:13Z

Common/include/toolboxes/CQuasiNewtonDriver.hpp

+ * \file CQuasiNewtonDriver.hpp
+ * \brief Implements a method to accelerate and stabilize the convergence
+ * of fixed point iterations, the history of past iterates is used to compute
+ * a least squares approximation of the inverse of the Jacobian, which is then
+ * used to correct the natural solution of the fixed point iteration.
+ * \note Based on the IQN-ILS method, see DOI 10.1007/s11831-013-9085-5 and
+ * references therein.


All the implementation, explanation and references are in this file.

pcarruscag · 2020-06-08T16:14:58Z

Common/src/CConfig.cpp

+  /* DESCRIPTION: Number of samples for quasi-Newton methods. */
+  addUnsignedShortOption("QUASI_NEWTON_NUM_SAMPLES", nQuasiNewtonSamples, 0);


The feature is activated by setting a number of samples greater than 1.

pcarruscag · 2020-06-08T16:16:38Z

SU2_CFD/src/drivers/CDiscAdjSinglezoneDriver.cpp

+    if (QNDriver.size()) {
+      GetAllSolutions(ZONE_0, true, QNDriver.FPresult());
+      SetAllSolutions(ZONE_0, true, QNDriver.compute());
+    }


And this is basically how it is used after running one discrete adjoint iteration.

pcarruscag · 2020-06-17T11:16:15Z

Common/include/toolboxes/CQuasiNewtonDriver.hpp

+  void shiftHistoryLeft() {
+    for (Index i = 1; i < X.size(); ++i) {
+      /*--- Swap instead of moving to re-use the memory of the first sample.
+       * This is why X and R are not stored as contiguous blocks of mem. ---*/
+      std::swap(X[i-1], X[i]);
+      std::swap(R[i-1], R[i]);
+    }
+  }


For the benefit of those less familiar with c++11, there are these things called move operations, this operation of shifting matrices to the left is nearly 0 cost because swapping two su2matrices only swaps the pointers and sizes and does not copy the actual data.
That is why in this case X and R are not stored as a single chunk of memory.

Alright, swapping them X.size() times is in fact the obvious replacement for std::deque... :)

pcarruscag · 2020-06-17T11:27:40Z

Common/include/toolboxes/CQuasiNewtonDriver.hpp

+    /*--- Tiled part of the loop. ---*/
+    Index begin = 0;
+    while (end-begin >= BLOCK_SIZE) {
+      computeNormalEquations<BLOCK_SIZE>(mat, rhs, begin);
+      begin += BLOCK_SIZE;
+    }


For those interested in learning about performance techniques, I use something here called loop tiling, or loop blocking, or strip mining.
We need to compute A^T A where A is narrow, doing the natural "row dot column" algorithm is very inefficient because it makes poor use of the CPU cache, i.e. after doing the dot product of column 0 with itself, we go back to the beginning of column 0 to dot it with column 1, but by now the beginning of column 0 was evicted from cache (getting data from main memory is orders of magnitude slower than getting it from cache).
By tiling the loop we go over all the different combinations of columns and rows without evicting anything from the cache (since we work on much smaller sizes).
For general matrix multiplication we would need to tile both rows and columns.

pcarruscag · 2020-06-17T11:39:13Z

Common/include/toolboxes/CQuasiNewtonDriver.hpp

+  template<Index StaticSize>
+  void computeNormalEquations(su2vector<Scalar>& mat,
+                              su2vector<Scalar>& vec,
+                              Index start,
+                              Index dynSize = 0) const {
+    /*--- Select either the static or dynamic size, optimizes inner loop. ---*/
+    const auto blkSize = StaticSize? StaticSize : dynSize;


Because we tiled the loop, we have many work chunks with a static size, and one remainder chunk with dynamic size (not known during compilation).
This method is templated on the static size, so that we can "generate" two versions, for the version where the compiler knows the size of the loop it can generate much more efficient code because:
The size is a multiple of the SIMD length, and likely also of the loop unrolling count, which means only the necessary instructions are generated (better use of instruction cache), and size checks are not needed (less branching).

…wton_adjoint

oleburghardt · 2020-06-23T12:27:20Z

Great stuff, thanks for taking the first step to integrate quasi-Newton techniques to the adjoint fixed point iterations. I'm reviewing this the next couple of days.

Just one question right away so that I'm on the right track. It seems that this implementation is based on equation 130 from Degroote's paper. Right? Though you're not using any QR decomposition for solving it? What's the approach then (assumingly in computeNormalEquations)?

And you're not using std::deque like I did for storing a window of some past residuals (i.e. V^k), looking forward to see what kind of solution you came up with.

pcarruscag · 2020-06-23T13:30:02Z

Yes equation 130, the approach to solve equation 130, is to solve equation 130 (which are the normal equations mentioned in computeNormalEquations).
What I used is better than deque, as explained in one of my comments it allows shifting the vectors at zero cost.

oleburghardt

Hi Pedro, just some small comments below. In gerenal, I think it's good to be directly merged into the code. I like that the functionalities are well split so that one can use this as a basis for other quasi-Newton stabilizations (that require keeping track of the solution and residual history).

oleburghardt · 2020-06-25T12:43:08Z

Common/include/toolboxes/CQuasiNewtonDriver.hpp

+  void shiftHistoryLeft() {
+    for (Index i = 1; i < X.size(); ++i) {
+      /*--- Swap instead of moving to re-use the memory of the first sample.
+       * This is why X and R are not stored as contiguous blocks of mem. ---*/
+      std::swap(X[i-1], X[i]);
+      std::swap(R[i-1], R[i]);
+    }
+  }


Alright, swapping them X.size() times is in fact the obvious replacement for std::deque... :)

Common/include/toolboxes/CQuasiNewtonDriver.hpp

SU2_CFD/src/drivers/CDiscAdjMultizoneDriver.cpp

Common/include/toolboxes/CQuasiNewtonDriver.hpp

pcarruscag · 2020-06-26T11:33:39Z

Thanks for the review @oleburghardt , corrections coming up.

pcarruscag · 2020-06-26T13:49:46Z

UnitTests/Common/toolboxes/CQuasiNewtonInvLeastSquares_tests.cpp

+  q.compute();
+}
+
+TEST_CASE("QN-ILS", "[Toolboxes]") {


@clarkpede @talbring , I've joined the unit test bandwagon, let me known if the naming and so on is appropriate (I borrowed inspiration from the other tests so I guess it's alright).

pcarruscag added 10 commits June 4, 2020 22:01

serial hacked implementation using Eigen

2b55895

parallel version via normal equations

b899d85

Merge branch 'hybrid_parallel_mpi' into feature_quasi_newton_adjoint

b1315e1

Merge branch 'iteration_class' into feature_quasi_newton_adjoint

2c60d29

native implementation, without Eigen

4f95689

a little cleanup and allow omp SIMD in AD builds

a489b1c

Merge branch 'iteration_class' into feature_quasi_newton_adjoint

b2396a0

add config option, generic get/set solution methods in CDriver

61a0ca0

support for multizone, does not work as well...

07b3c51

optimization and reset before multizone inner iters (works better)

360b390

pr-triage bot added the PR: unreviewed label Jun 8, 2020

pcarruscag added the changelog:feature label Jun 8, 2020

pcarruscag commented Jun 8, 2020

View reviewed changes

pcarruscag added 3 commits June 8, 2020 17:23

fix build

d65c487

fix nompi build

d7933df

fix compilation error on gcc 5.4, remove obsolete option from testcases

1f4232a

pcarruscag mentioned this pull request Jun 9, 2020

Develop does not compile with GCC 5.4 #1021

Closed

fix #1025

eabe878

pcarruscag commented Jun 17, 2020

View reviewed changes

pcarruscag added 3 commits June 18, 2020 15:56

fix bug introduced in #1009

f417862

Merge remote-tracking branch 'upstream/develop' into feature_quasi_ne…

9e60d30

…wton_adjoint

little cleanup and add new option to config template

2c762d1

pcarruscag mentioned this pull request Jun 18, 2020

Wrong communication of gradients for periodic cases #1029

Closed

oleburghardt requested changes Jun 25, 2020

View reviewed changes

pr-triage bot added PR: reviewed-changes-requested and removed PR: unreviewed labels Jun 25, 2020

address review comments

88fc0f7

add unit test

f71148b

pr-triage bot added PR: unreviewed and removed PR: reviewed-changes-requested labels Jun 26, 2020

pcarruscag commented Jun 26, 2020

View reviewed changes

pcarruscag added 2 commits June 26, 2020 15:36

fix compilation error with constexpr arrays

74c0d91

fix it again...

f3e6172

oleburghardt approved these changes Jun 29, 2020

View reviewed changes

pr-triage bot added PR: reviewed-approved and removed PR: unreviewed labels Jun 29, 2020

pcarruscag merged commit 4acbdf8 into develop Jun 29, 2020

pcarruscag deleted the feature_quasi_newton_adjoint branch June 29, 2020 15:22

pr-triage bot added PR: merged and removed PR: reviewed-approved labels Jun 29, 2020

snow54 mentioned this pull request Jul 19, 2021

Fix equivalent area calculation #1329

Merged

5 tasks

		/* DESCRIPTION: Number of samples for quasi-Newton methods. */
		addUnsignedShortOption("QUASI_NEWTON_NUM_SAMPLES", nQuasiNewtonSamples, 0);

Quasi-Newton convergence acceleration/stabilization of discrete adjoints #1020

Quasi-Newton convergence acceleration/stabilization of discrete adjoints #1020

Uh oh!

Conversation

pcarruscag commented Jun 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed Changes

Related Work

PR Checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pcarruscag Jun 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

oleburghardt commented Jun 23, 2020

Uh oh!

pcarruscag commented Jun 23, 2020

Uh oh!

oleburghardt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pcarruscag commented Jun 26, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pcarruscag commented Jun 8, 2020 •

edited

Loading

pcarruscag Jun 8, 2020 •

edited

Loading