Skip to content

Conversation

@pcarruscag
Copy link
Member

@pcarruscag pcarruscag commented Jun 8, 2020

Proposed Changes

So I took a common method that folks use to converge multi physics problems expressed as fixed point iterations (the IQN-ILS) and applied it to the inner iterations of the discrete adjoint drivers (single and multizone), it seems to work well... I was having some issues when the primal solver does not converge so well for optimization edge cases, and this keeps the adjoint from diverging.
I'll post some results at some point.

Other than storing a number of solution snapshots (20 seems like a good number) the overhead is minimal (and also provided you have Lapack/MKL, compile for OpenMP, with fast-math, and AVX support).

Related Work

Already has #1015
Resolves #1021
Resolves #1025
Resolves #1029

PR Checklist

  • I am submitting my contribution to the develop branch.
  • My contribution generates no new compiler warnings (try with the '-Wall -Wextra -Wno-unused-parameter -Wno-empty-body' compiler flags).
  • My contribution is commented and consistent with SU2 style.
  • I have added a test case that demonstrates my contribution, if necessary.
  • I have updated appropriate documentation (Tutorials, Docs Page, config_template.cpp) , if necessary.

Comment on lines 2 to 8
* \file CQuasiNewtonDriver.hpp
* \brief Implements a method to accelerate and stabilize the convergence
* of fixed point iterations, the history of past iterates is used to compute
* a least squares approximation of the inverse of the Jacobian, which is then
* used to correct the natural solution of the fixed point iteration.
* \note Based on the IQN-ILS method, see DOI 10.1007/s11831-013-9085-5 and
* references therein.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the implementation, explanation and references are in this file.

Comment on lines +1574 to +1575
/* DESCRIPTION: Number of samples for quasi-Newton methods. */
addUnsignedShortOption("QUASI_NEWTON_NUM_SAMPLES", nQuasiNewtonSamples, 0);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feature is activated by setting a number of samples greater than 1.

Comment on lines 210 to 213
if (QNDriver.size()) {
GetAllSolutions(ZONE_0, true, QNDriver.FPresult());
SetAllSolutions(ZONE_0, true, QNDriver.compute());
}
Copy link
Member Author

@pcarruscag pcarruscag Jun 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this is basically how it is used after running one discrete adjoint iteration.

Comment on lines 65 to 72
void shiftHistoryLeft() {
for (Index i = 1; i < X.size(); ++i) {
/*--- Swap instead of moving to re-use the memory of the first sample.
* This is why X and R are not stored as contiguous blocks of mem. ---*/
std::swap(X[i-1], X[i]);
std::swap(R[i-1], R[i]);
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the benefit of those less familiar with c++11, there are these things called move operations, this operation of shifting matrices to the left is nearly 0 cost because swapping two su2matrices only swaps the pointers and sizes and does not copy the actual data.
That is why in this case X and R are not stored as a single chunk of memory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, swapping them X.size() times is in fact the obvious replacement for std::deque... :)

Comment on lines 80 to 85
/*--- Tiled part of the loop. ---*/
Index begin = 0;
while (end-begin >= BLOCK_SIZE) {
computeNormalEquations<BLOCK_SIZE>(mat, rhs, begin);
begin += BLOCK_SIZE;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For those interested in learning about performance techniques, I use something here called loop tiling, or loop blocking, or strip mining.
We need to compute A^T A where A is narrow, doing the natural "row dot column" algorithm is very inefficient because it makes poor use of the CPU cache, i.e. after doing the dot product of column 0 with itself, we go back to the beginning of column 0 to dot it with column 1, but by now the beginning of column 0 was evicted from cache (getting data from main memory is orders of magnitude slower than getting it from cache).
By tiling the loop we go over all the different combinations of columns and rows without evicting anything from the cache (since we work on much smaller sizes).
For general matrix multiplication we would need to tile both rows and columns.

Comment on lines 107 to 113
template<Index StaticSize>
void computeNormalEquations(su2vector<Scalar>& mat,
su2vector<Scalar>& vec,
Index start,
Index dynSize = 0) const {
/*--- Select either the static or dynamic size, optimizes inner loop. ---*/
const auto blkSize = StaticSize? StaticSize : dynSize;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we tiled the loop, we have many work chunks with a static size, and one remainder chunk with dynamic size (not known during compilation).
This method is templated on the static size, so that we can "generate" two versions, for the version where the compiler knows the size of the loop it can generate much more efficient code because:
The size is a multiple of the SIMD length, and likely also of the loop unrolling count, which means only the necessary instructions are generated (better use of instruction cache), and size checks are not needed (less branching).

@oleburghardt
Copy link
Contributor

Great stuff, thanks for taking the first step to integrate quasi-Newton techniques to the adjoint fixed point iterations. I'm reviewing this the next couple of days.

Just one question right away so that I'm on the right track. It seems that this implementation is based on equation 130 from Degroote's paper. Right? Though you're not using any QR decomposition for solving it? What's the approach then (assumingly in computeNormalEquations)?

And you're not using std::deque like I did for storing a window of some past residuals (i.e. V^k), looking forward to see what kind of solution you came up with.

@pcarruscag
Copy link
Member Author

Yes equation 130, the approach to solve equation 130, is to solve equation 130 (which are the normal equations mentioned in computeNormalEquations).
What I used is better than deque, as explained in one of my comments it allows shifting the vectors at zero cost.

Copy link
Contributor

@oleburghardt oleburghardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Pedro, just some small comments below. In gerenal, I think it's good to be directly merged into the code. I like that the functionalities are well split so that one can use this as a basis for other quasi-Newton stabilizations (that require keeping track of the solution and residual history).

Comment on lines 65 to 72
void shiftHistoryLeft() {
for (Index i = 1; i < X.size(); ++i) {
/*--- Swap instead of moving to re-use the memory of the first sample.
* This is why X and R are not stored as contiguous blocks of mem. ---*/
std::swap(X[i-1], X[i]);
std::swap(R[i-1], R[i]);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, swapping them X.size() times is in fact the obvious replacement for std::deque... :)

@pcarruscag
Copy link
Member Author

Thanks for the review @oleburghardt , corrections coming up.

q.compute();
}

TEST_CASE("QN-ILS", "[Toolboxes]") {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clarkpede @talbring , I've joined the unit test bandwagon, let me known if the naming and so on is appropriate (I borrowed inspiration from the other tests so I guess it's alright).

@pcarruscag pcarruscag merged commit 4acbdf8 into develop Jun 29, 2020
@pcarruscag pcarruscag deleted the feature_quasi_newton_adjoint branch June 29, 2020 15:22
@snow54 snow54 mentioned this pull request Jul 19, 2021
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants