Hybrid parallel MPI #1009

pcarruscag · 2020-05-30T13:58:38Z

Proposed Changes

This makes the routines InitiateComms and CompleteComms (and periodic counterparts) safe to call in parallel, until now they had to be guarded inside SU2_OMP_MASTER sections.
The calls to MPI are still only made by the master thread (funneled communication) but the buffers are packed and unpacked by all threads.
I also made a slight change which seems to make communications more efficient, we were always communicating the entire buffer, which is sized for the maximum number of variables per point, because the data was packed like this:
o o o o _ _ _ _ o o o o _ ... (count = 4, maxCount = 8);
I changed it to:
o o o o o o o o ... _ _ _ _ ...
Which allows only part of the buffer to be communicated.
(The maximum size nPrimVarGrad*nDim*2 is actually quite large compared to the median nVar)

In the process I also had to make a few more CGeometry routines thread safe.

Related Work

#789
Resolves #1011

PR Checklist

I am submitting my contribution to the develop branch.
My contribution generates no new compiler warnings (try with the '-Wall -Wextra -Wno-unused-parameter -Wno-empty-body' compiler flags).
My contribution is commented and consistent with SU2 style.

pcarruscag · 2020-05-30T14:01:43Z

Common/include/geometry/CGeometry.hpp

+  void PostP2PRecvs(CGeometry *geometry, CConfig *config, unsigned short commType,
+                    unsigned short countPerPoint, bool val_reverse) const;


A consequence of this is that the current count-per-point needs to be explicitly passed to a few routines, whereas before we could always rely on the maximum value.

pcarruscag · 2020-05-31T14:23:54Z

This is the sort of scalability (in terms of time to solution, not per iteration) we get now:

Which you can directly compare with #861

Edit: The results at 192c are actually better, it depends on the position of the cluster nodes in the network, the update is apples to apples.

talbring

Thanks @pcarruscag! Just two small comments below.

SU2_CFD/src/solvers/CMeshSolver.cpp

SU2_CFD/src/solvers/CSolver.cpp

Common/src/linear_algebra/CSysSolve_b.cpp

Common/include/CConfig.hpp

Common/include/linear_algebra/CSysSolve.hpp

Common/src/geometry/CGeometry.cpp

Common/src/linear_algebra/CSysMatrix.cpp

Common/src/linear_algebra/CSysSolve_b.cpp

SU2_CFD/src/solvers/CSolver.cpp

pcarruscag · 2020-06-04T14:20:25Z

Common/src/geometry/CGeometry.cpp

+CGeometry::CGeometry(void) :
+  size(SU2_MPI::GetSize()),
+  rank(SU2_MPI::GetRank()) {

 }


The requested spring cleaning is done, I think I'll stop here.

economon

LGTM

pcarruscag added 3 commits May 29, 2020 23:58

store data contiguously in MPI buffers

53e7184

fix comms for transposed products

9f7c8b3

make Init and CompleteComms thread safe (still funneled comms)

8813dc2

pr-triage bot added the PR: unreviewed label May 30, 2020

pcarruscag commented May 30, 2020

View reviewed changes

pcarruscag added the changelog:none label May 30, 2020

pcarruscag added 6 commits May 30, 2020 20:15

bug fix, mpi status needs to be global so all threads see it

7705fa7

cleanup, const CConfig for comms and in all linear algebra

dea94d4

cleanup periodic comms

e59bba5

packing optimization for periodic comms

337d264

thread safe periodic comms

23f3ca5

more const CConfig in gradient and limiter routines

ea9b1d3

pcarruscag added the enhancement label May 31, 2020

pcarruscag added 3 commits May 31, 2020 22:51

need more conservative barriers for mpich+old gcc

1f86ab4

fix walltime output nompi + omp, some simd optimization

875b954

fix #1011

0deb047

talbring reviewed Jun 2, 2020

View reviewed changes

SU2_CFD/src/solvers/CMeshSolver.cpp Outdated Show resolved Hide resolved

SU2_CFD/src/solvers/CSolver.cpp Show resolved Hide resolved

Common/src/linear_algebra/CSysSolve_b.cpp Outdated Show resolved Hide resolved

pcarruscag added 2 commits June 2, 2020 14:50

replace AD_BEGIN_PASSIVE macro by a function

f774899

fix bugs avoid global variable

623152f

economon reviewed Jun 3, 2020

View reviewed changes

pcarruscag added 2 commits June 4, 2020 15:12

move inits of CGeometry and co. to declaration

d6be66a

fix lambda

c78761e

pcarruscag commented Jun 4, 2020

View reviewed changes

economon approved these changes Jun 4, 2020

View reviewed changes

pr-triage bot added PR: reviewed-approved and removed PR: unreviewed labels Jun 4, 2020

Merge branch 'develop' into hybrid_parallel_mpi

73ee8b2

pr-triage bot added PR: unreviewed and removed PR: reviewed-approved labels Jun 5, 2020

detect simd support based on standard version instead of compiler

0bb4b62

pcarruscag merged commit 48ee558 into develop Jun 5, 2020

pcarruscag deleted the hybrid_parallel_mpi branch June 5, 2020 14:41

pr-triage bot added PR: merged and removed PR: unreviewed labels Jun 5, 2020

pr-triage bot added PR: unreviewed and removed PR: merged labels Jun 18, 2020

pcarruscag added a commit that referenced this pull request Jun 18, 2020

fix bug introduced in #1009

f417862

pcarruscag mentioned this pull request Jun 18, 2020

Wrong communication of gradients for periodic cases #1029

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hybrid parallel MPI #1009

Hybrid parallel MPI #1009

Uh oh!

pcarruscag commented May 30, 2020 •

edited

Loading

Uh oh!

pcarruscag May 30, 2020

Uh oh!

pcarruscag commented May 31, 2020 •

edited

Loading

Uh oh!

talbring left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pcarruscag Jun 4, 2020

Uh oh!

economon left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		void PostP2PRecvs(CGeometry geometry, CConfig config, unsigned short commType,
		unsigned short countPerPoint, bool val_reverse) const;

Hybrid parallel MPI #1009

Hybrid parallel MPI #1009

Uh oh!

Conversation

pcarruscag commented May 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed Changes

Related Work

PR Checklist

Uh oh!

pcarruscag May 30, 2020

Choose a reason for hiding this comment

Uh oh!

pcarruscag commented May 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

talbring left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pcarruscag Jun 4, 2020

Choose a reason for hiding this comment

Uh oh!

economon left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pcarruscag commented May 30, 2020 •

edited

Loading

pcarruscag commented May 31, 2020 •

edited

Loading