-
Notifications
You must be signed in to change notification settings - Fork 380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable MPI_RSEND by default #1199
Disable MPI_RSEND by default #1199
Conversation
Recent versions of the Cray MPI library are causing MPI communication algorithms in PIO that use MPI_RSEND to hang. This is not the first time that MPI_RSEND (and MPI_IRSEND) support has been broken in vendor MPI libraries. Currently, CPP defines in CAM, MCT, and PIO will redefine MPI_RSEND as MPI_SEND and MPI_IRSEND as MPI_ISEND when the CPP token _NO_MPI_RSEND is defined during the build process. For improved reliability, this modification makes this the default, always redefining MPI_RSEND as MPI_SEND and MPI_IRSEND as MPI_ISEND, unless the CPP token _USE_MPI_RSEND is defined during the build. Separate PRs have been submitted for MCT and PIO. This one includes the changes for CAM.
I have tested this in master using A_WCYCL2000 ne30_oEC on Titan with PGI. For CAM, this really only comes into play when using physics load balancing, so I enabled this and tested. It is BFB. I tested both building with _USE_MPI_RSEND in the Macros file and in the Depends.titan.pgi file. Both methods worked (to enable MPI_RSEND in CAM). There also seems to be some performance advantage to enabling MPI_RSEND, and we may want to add this to the Depends.titan.XXX for the two relevant CAM files sometime in the future, after evaluating again. I wish I knew why ocean history writes (when using MPI_RSEND) and ONLY ocean history writes, exhibit the hang with the latest Cray MPI libraries. |
Since CAM did not indicate having any problems with this, it doesn't really fix anything - just a precaution, allowing Depends to turn it back on selectively. Almost second guessing myself now ... but go-ahead. Leave as Addresses, since Issue won't be resolved until PIO has the same logic added (and it matters there). |
go ahead |
…#1199) Disable MPI_RSEND by default Recent versions of the Cray MPI library are causing MPI communication algorithms in PIO that use MPI_RSEND to hang. This is not the first time that MPI_RSEND (and MPI_IRSEND) support has been broken in vendor MPI libraries. Currently, CPP defines in CAM, MCT, and PIO will redefine MPI_RSEND as MPI_SEND and MPI_IRSEND as MPI_ISEND when the CPP token _NO_MPI_RSEND is defined during the build process. For improved reliability, this modification makes this the default, always redefining MPI_RSEND as MPI_SEND and MPI_IRSEND as MPI_ISEND, unless the CPP token _USE_MPI_RSEND is defined during the build. Separate PRs have been submitted for MCT and PIO. This one includes the changes for CAM. [BFB] Addresses Issue #1157 * worleyph/atm/mpi_rsend_disable-by-default: Disable MPI_RSEND by default
Merged to next. |
Disable MPI_RSEND by default Recent versions of the Cray MPI library are causing MPI communication algorithms in PIO that use MPI_RSEND to hang. This is not the first time that MPI_RSEND (and MPI_IRSEND) support has been broken in vendor MPI libraries. Currently, CPP defines in CAM, MCT, and PIO will redefine MPI_RSEND as MPI_SEND and MPI_IRSEND as MPI_ISEND when the CPP token _NO_MPI_RSEND is defined during the build process. For improved reliability, this modification makes this the default, always redefining MPI_RSEND as MPI_SEND and MPI_IRSEND as MPI_ISEND, unless the CPP token _USE_MPI_RSEND is defined during the build. Separate PRs have been submitted for MCT and PIO. This one includes the changes for CAM. BFB Addresses Issue #1157 * worleyph/atm/mpi_rsend_disable-by-default: Disable MPI_RSEND by default
Big refactor of CaseStatus management Write a special handler in utils.py in order to be able to easily encapsulate the concept that a function needs to log its status to CaseStatus and be robust in the face of exceptions. Rearrange and rename some functions in build.py in order to make it more clear what is the public API and what are internal functions. Test suite: scripts_regression_tests Test baseline: Test namelist changes: Test status: bit for bit Fixes #1038 User interface changes?: CaseStatus will look slightly different Code review: @jedwards4b @bertinia
Recent versions of the Cray MPI library are causing MPI communication
algorithms in PIO that use MPI_RSEND to hang. This is not the first time
that MPI_RSEND (and MPI_IRSEND) support has been broken in vendor MPI
libraries. Currently, CPP defines in CAM, MCT, and PIO will redefine
MPI_RSEND as MPI_SEND and MPI_IRSEND as MPI_ISEND when the CPP token
_NO_MPI_RSEND is defined during the build process. For improved
reliability, this modification makes this the default, always redefining
MPI_RSEND as MPI_SEND and MPI_IRSEND as MPI_ISEND, unless the CPP
token _USE_MPI_RSEND is defined during the build. Separate PRs have been
submitted for MCT and PIO. This one includes the changes for CAM.
BFB
Addresses Issue #1157