-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving "not cyclic" solver #2314
base: next
Are you sure you want to change the base?
Conversation
By default all processors are used, resulting in an all-to-all communication pattern. An optional third argument to the constructor now specifies how many processors should be used. This reduces the number of messages, at the cost of increasing load imbalance. The processors used are evenly distributed, in the hope that this will minimise network traffic on any given node.
Input option to control the number of processors (in X) to gather systems onto. This can be used to tune performance: Smaller ngather leads to fewer messages, but more load imbalance. Modified test, to cover more variations of nxpe and ngather. That test found some bugs, which are fixed here.
…nto next-cyclic-comms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
include/cyclic_reduction.hxx
Outdated
|
||
#include "output.hxx" | ||
|
||
#include "bout/openmpwrap.hxx" | ||
|
||
template <class T> class CyclicReduce { | ||
template <class T> | ||
class CyclicReduce { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: constructor does not initialize these fields: comm, myns, sys0 [cppcoreguidelines-pro-type-member-init]
class CyclicReduce {
^
include/cyclic_reduction.hxx
Outdated
|
||
#include "output.hxx" | ||
|
||
#include "bout/openmpwrap.hxx" | ||
|
||
template <class T> class CyclicReduce { | ||
template <class T> | ||
class CyclicReduce { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: class CyclicReduce
defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator [cppcoreguidelines-special-member-functions]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
Clang-tidy and @ZedThree suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
include/cyclic_reduction.hxx
Outdated
|
||
#include "output.hxx" | ||
|
||
#include "bout/openmpwrap.hxx" | ||
|
||
template <class T> class CyclicReduce { | ||
template <class T> | ||
class CyclicReduce { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: class 'CyclicReduce' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator [cppcoreguidelines-special-member-functions]
class CyclicReduce {
^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
`if` applied to entire string rather than just last part
Failing some tests for what looks like an unrelated change (bad merge?). Doesn't need to go into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
CyclicReduce(MPI_Comm c, int size) : comm(c), N(size) { | ||
MPI_Comm_size(c, &nprocs); | ||
MPI_Comm_rank(c, &myproc); | ||
CyclicReduce(MPI_Comm c, int size, int ngather = 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: parameter name 'c' is too short, expected at least 3 characters [readability-identifier-length]
CyclicReduce(MPI_Comm c, int size, int ngather = 0)
^
/// @param[in] c The communicator of all processors involved in the solve | ||
/// @param[in] size The number of rows on this processor | ||
void setup(MPI_Comm c, int size) { | ||
/// @param[in] gather The number of processors to gather onto. If 0, use all processors | ||
void setup(MPI_Comm c, int size, int ngather = 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: parameter name 'c' is too short, expected at least 3 characters [readability-identifier-length]
void setup(MPI_Comm c, int size, int ngather = 0) {
^
BoutReal pinterval = static_cast<BoutReal>(nprocs) / ngatherprocs; | ||
|
||
{ | ||
int ns = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: variable name 'ns' is too short, expected at least 3 characters [readability-identifier-length]
int ns =
^
int ns = | ||
Nsys / ngatherprocs; // Number of systems to assign to all gathering processors | ||
int nsextra = Nsys % ngatherprocs; // Number of processors with 1 extra | ||
int s0 = 0; // Starting system number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: variable name 's0' is too short, expected at least 3 characters [readability-identifier-length]
int s0 = 0; // Starting system number
^
// Loop over gathering processors | ||
for (int i = 0; i < ngatherprocs; i++) { | ||
|
||
int p = i; // Gathering onto all processors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: variable name 'p' is too short, expected at least 3 characters [readability-identifier-length]
int p = i; // Gathering onto all processors
^
if (myproc < nsextra) { | ||
myns++; | ||
sys0 += myproc; | ||
int ns = nsys / ngatherprocs; // Number of systems to assign to all processors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: variable name 'ns' is too short, expected at least 3 characters [readability-identifier-length]
int ns = nsys / ngatherprocs; // Number of systems to assign to all processors
^
|
||
// Calculate which processors these are | ||
for (int i = 0; i < ngatherprocs; i++) { | ||
int proc = static_cast<int>(pinterval * i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: variable 'proc' of type 'int' can be declared 'const' [misc-const-correctness]
int proc = static_cast<int>(pinterval * i); | |
int const proc = static_cast<int>(pinterval * i); |
@@ -96,16 +96,16 @@ LaplaceCyclic::LaplaceCyclic(Options* opt, const CELL_LOC loc, Mesh* mesh_in, | |||
xcmplx.reallocate(nmode, n); | |||
bcmplx.reallocate(nmode, n); | |||
|
|||
int ngather = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: variable 'ngather' of type 'int' can be declared 'const' [misc-const-correctness]
int ngather = | |
int const ngather = |
int mype, npe; | ||
MPI_Comm_rank(BoutComm::get(), &mype); | ||
MPI_Comm_size(BoutComm::get(), &npe); | ||
|
||
int ngather = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: variable 'ngather' of type 'int' can be declared 'const' [misc-const-correctness]
int ngather = | |
int const ngather = |
options["ngather"].doc("The number of processors to gather onto").withDefault(npe); | ||
|
||
// Create a cyclic reduction object, operating on Ts | ||
auto cr = std::make_unique<CyclicReduce<T>>(BoutComm::get(), n, ngather); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: variable name 'cr' is too short, expected at least 3 characters [readability-identifier-length]
auto cr = std::make_unique<CyclicReduce<T>>(BoutComm::get(), n, ngather);
^
By default all processors are used, resulting in an all-to-all communication pattern.
An optional third argument to the constructor now specifies how many processors should be used. This reduces the number of messages, at the cost of increasing load imbalance. The processors used are evenly
distributed, in the hope that this will minimise network traffic on any given node.