Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable MPI_RSEND by default #1199

Merged
merged 1 commit into from
Jan 19, 2017

Conversation

worleyph
Copy link
Contributor

Recent versions of the Cray MPI library are causing MPI communication
algorithms in PIO that use MPI_RSEND to hang. This is not the first time
that MPI_RSEND (and MPI_IRSEND) support has been broken in vendor MPI
libraries. Currently, CPP defines in CAM, MCT, and PIO will redefine
MPI_RSEND as MPI_SEND and MPI_IRSEND as MPI_ISEND when the CPP token
_NO_MPI_RSEND is defined during the build process. For improved
reliability, this modification makes this the default, always redefining
MPI_RSEND as MPI_SEND and MPI_IRSEND as MPI_ISEND, unless the CPP
token _USE_MPI_RSEND is defined during the build. Separate PRs have been
submitted for MCT and PIO. This one includes the changes for CAM.

BFB
Addresses Issue #1157

Recent versions of the Cray MPI library are causing MPI communication
algorithms in PIO that use MPI_RSEND to hang. This is not the first time
that MPI_RSEND (and MPI_IRSEND) support has been broken in vendor MPI
libraries. Currently, CPP defines in CAM, MCT, and PIO will redefine
MPI_RSEND as MPI_SEND and MPI_IRSEND as MPI_ISEND when the CPP token
_NO_MPI_RSEND is defined during the build process. For improved
reliability, this modification makes this the default, always redefining
MPI_RSEND as MPI_SEND and MPI_IRSEND as MPI_ISEND, unless the CPP
token _USE_MPI_RSEND is defined during the build. Separate PRs have been
submitted for MCT and PIO. This one includes the changes for CAM.
@worleyph worleyph added Atmosphere BFB PR leaves answers BFB labels Dec 27, 2016
@worleyph
Copy link
Contributor Author

I have tested this in master using A_WCYCL2000 ne30_oEC on Titan with PGI. For CAM, this really only comes into play when using physics load balancing, so I enabled this and tested. It is BFB. I tested both building with _USE_MPI_RSEND in the Macros file and in the Depends.titan.pgi file. Both methods worked (to enable MPI_RSEND in CAM). There also seems to be some performance advantage to enabling MPI_RSEND, and we may want to add this to the Depends.titan.XXX for the two relevant CAM files sometime in the future, after evaluating again. I wish I knew why ocean history writes (when using MPI_RSEND) and ONLY ocean history writes, exhibit the hang with the latest Cray MPI libraries.

@rljacob rljacob added this to the v1.0beta1 milestone Jan 10, 2017
@singhbalwinder
Copy link
Contributor

@rljacob : Is it okay if I merge this to next today? This is a BFB PR and the other PR which went to next today (PR #1219) is also BFB.

@worleyph : Should I change "Addresses Issue" to "Fixes" in your PR message, or Github doesn't care about it?

@worleyph
Copy link
Contributor Author

Since CAM did not indicate having any problems with this, it doesn't really fix anything - just a precaution, allowing Depends to turn it back on selectively. Almost second guessing myself now ... but go-ahead. Leave as Addresses, since Issue won't be resolved until PIO has the same logic added (and it matters there).

@rljacob
Copy link
Member

rljacob commented Jan 18, 2017

go ahead

singhbalwinder added a commit that referenced this pull request Jan 18, 2017
…#1199)

Disable MPI_RSEND by default

Recent versions of the Cray MPI library are causing MPI communication
algorithms in PIO that use MPI_RSEND to hang. This is not the first time
that MPI_RSEND (and MPI_IRSEND) support has been broken in vendor MPI
libraries. Currently, CPP defines in CAM, MCT, and PIO will redefine
MPI_RSEND as MPI_SEND and MPI_IRSEND as MPI_ISEND when the CPP token
_NO_MPI_RSEND is defined during the build process. For improved
reliability, this modification makes this the default, always redefining
MPI_RSEND as MPI_SEND and MPI_IRSEND as MPI_ISEND, unless the CPP
token _USE_MPI_RSEND is defined during the build. Separate PRs have been
submitted for MCT and PIO. This one includes the changes for CAM.

[BFB]
Addresses Issue #1157

* worleyph/atm/mpi_rsend_disable-by-default:
  Disable MPI_RSEND by default
@singhbalwinder
Copy link
Contributor

Merged to next.

@singhbalwinder singhbalwinder merged commit e5054e8 into master Jan 19, 2017
singhbalwinder added a commit that referenced this pull request Jan 19, 2017
Disable MPI_RSEND by default

Recent versions of the Cray MPI library are causing MPI communication
algorithms in PIO that use MPI_RSEND to hang. This is not the first time
that MPI_RSEND (and MPI_IRSEND) support has been broken in vendor MPI
libraries. Currently, CPP defines in CAM, MCT, and PIO will redefine
MPI_RSEND as MPI_SEND and MPI_IRSEND as MPI_ISEND when the CPP token
_NO_MPI_RSEND is defined during the build process. For improved
reliability, this modification makes this the default, always redefining
MPI_RSEND as MPI_SEND and MPI_IRSEND as MPI_ISEND, unless the CPP
token _USE_MPI_RSEND is defined during the build. Separate PRs have been
submitted for MCT and PIO. This one includes the changes for CAM.

BFB
Addresses Issue #1157

* worleyph/atm/mpi_rsend_disable-by-default:
  Disable MPI_RSEND by default
jayeshkrishna added a commit that referenced this pull request Mar 27, 2017
At least on titan MPI_[I]Rsends cause hang during I/O due to
bugs in the MPI library. This commit changes the default from
*rsends to *sends.

Also see related PR #1199

Fixes #1157
agsalin pushed a commit that referenced this pull request Apr 13, 2017
Big refactor of CaseStatus management

Write a special handler in utils.py in order to be able to
easily encapsulate the concept that a function needs to log
its status to CaseStatus and be robust in the face of exceptions.

Rearrange and rename some functions in build.py in order to make
it more clear what is the public API and what are internal functions.

Test suite: scripts_regression_tests
Test baseline:
Test namelist changes:
Test status: bit for bit

Fixes #1038

User interface changes?: CaseStatus will look slightly different

Code review: @jedwards4b @bertinia
@worleyph worleyph deleted the worleyph/atm/mpi_rsend_disable-by-default branch July 26, 2017 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Atmosphere BFB PR leaves answers BFB
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants