Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

E3SM hangs in MPI finalize on Summit #2847

Closed
jayeshkrishna opened this issue Apr 10, 2019 · 3 comments · Fixed by #2856
Closed

E3SM hangs in MPI finalize on Summit #2847

jayeshkrishna opened this issue Apr 10, 2019 · 3 comments · Fixed by #2856
Assignees
Labels
BFB PR leaves answers BFB Machine Files Summit

Comments

@jayeshkrishna
Copy link
Contributor

E3SM hangs on Summit when running F case with ne4_ne4 resolution. This problem was reported by Tahsin Kurc (@tkurc) while testing E3SM+PIO2+ADIOS on Summit.

I observed the same issue with PIO2 tests on Summit.

This issue can be recreated using a simple MPI hello world program on Summit.

@jayeshkrishna jayeshkrishna self-assigned this Apr 10, 2019
@jayeshkrishna
Copy link
Contributor Author

Tahsin Kurc also noted that updating the IBM MPI library version resolved the issue (Hang in MPI finalize is a known issue - https://www.olcf.ornl.gov/for-users/system-user-guides/summit/summit-user-guide/#known-issues)

I verified that the MPI hello world program no longer hangs after updating the MPI library version locally (spectrum-mpi/10.2.0.10-20181214 to spectrum-mpi/10.2.0.11-20190201)

@jayeshkrishna
Copy link
Contributor Author

I will be creating a PR soon to fix this issue

@jayeshkrishna
Copy link
Contributor Author

Also, the upcoming PR will also update the essl module (essl/6.1.0-20180406 is not available, we need to use essl/6.1.0-2 instead)

jayeshkrishna added a commit that referenced this issue Apr 15, 2019
Upgrading the summit essl and MPI modules.

With the older version of MPI modules, MPI_Finalize call hangs.
The older version of essl module is no longer available.

Fixes #2847
jayeshkrishna added a commit that referenced this issue Apr 15, 2019
Upgrading the summit essl and MPI modules.

With the older version of MPI modules, MPI_Finalize call hangs.
The older version of essl module is no longer available.

Fixes #2847
jayeshkrishna added a commit that referenced this issue Apr 15, 2019
Upgrading the summit cmake, essl and MPI modules.

With the older version of MPI modules, MPI_Finalize call hangs.
The older version of essl module is no longer available.

Fixes #2847
jayeshkrishna added a commit that referenced this issue Apr 22, 2019
Upgrading the summit cmake, essl and MPI modules.

Also updating the ROMIO version to prevent OOM errors.

Fixes #2847

[BFB]
jayeshkrishna added a commit that referenced this issue Apr 22, 2019
…e_fixes

Upgrading the summit cmake, essl and MPI modules.

Also updating the ROMIO version to prevent OOM errors.

Fixes #2847

[BFB]
jgfouca pushed a commit that referenced this issue Jun 25, 2019
Upgrading the summit cmake, essl and MPI modules.

With the older version of MPI modules, MPI_Finalize call hangs.
The older version of essl module is no longer available.

Fixes #2847
jgfouca pushed a commit that referenced this issue Jun 25, 2019
…e_fixes

Upgrading the summit cmake, essl and MPI modules.

Also updating the ROMIO version to prevent OOM errors.

Fixes #2847

[BFB]
rljacob pushed a commit that referenced this issue Apr 21, 2021
Maint 5.6 merge
Merge maint-5.6 into master, conflicts are resolved on this branch.
Clean up xml
Test suite: scripts_regression_tests.py
Test baseline:
Test namelist changes:
Test status: bit for bit,
Fixes

User interface changes?:

Update gh-pages html (Y/N)?:

Code review: jgfouca
rljacob pushed a commit that referenced this issue May 6, 2021
Maint 5.6 merge
Merge maint-5.6 into master, conflicts are resolved on this branch.
Clean up xml
Test suite: scripts_regression_tests.py
Test baseline:
Test namelist changes:
Test status: bit for bit,
Fixes

User interface changes?:

Update gh-pages html (Y/N)?:

Code review: jgfouca
jgfouca pushed a commit that referenced this issue Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BFB PR leaves answers BFB Machine Files Summit
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant