-
Notifications
You must be signed in to change notification settings - Fork 885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPI process hangs when using MPIR in OpenMPI v3.0.x #5349
Comments
We have seen a similar problem when running the reproducer on SLES12 and Ubuntu 16.04 systems. In those cases, both MPI process appears to be stuck calling
|
Please see #5321 - might be the same problem. |
#5321 fix works. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Background information
What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)
v3.0.0 and v3.0.2
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
from source tarball with GCC 7.1
Please describe the system on which you are running
Details of the problem
An MPI process appears to be stuck recursively calling
OPAL_MCA_PMIX2X_PMIx_Init ()
when attaching to it in GDB after having run toMPIR_Breakpoint
inmpirun
. It's only the last MPI process for which this occurs. Other processes are stopped atPMPI_Init ()
.A simple reproducer is available here: gdb-only.zip
Here's how the reproducer can be used
(the backtrace has been limited to 1000 lines)
The reproducer starts
mpirun
undergdb
, setsMPIR_being_debugged=1
and runs toMPIR_Breakpoint
. It then attaches to the second (last)hello_c
process and prints the bactrace tologfile
, which can then be inspected.The text was updated successfully, but these errors were encountered: