You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An PRTE daemon has unexpectedly failed after launch and before
communicating back to mpirun. This could be caused by a number
of factors, including an inability to create a connection back
to mpirun due to a lack of common network interfaces and/or no
route found between them. Please check network connectivity
(including firewalls and network routing requirements).
@ggouaillardet I could use your help here, if you have a little time. I honestly am having no luck digging into why this launch is failing. In my case, the srun cmd to launch the prteds just hangs. Yet I can copy/paste that same cmd string to launch any other app without problem. Any assistance debugging the problem would be much appreciated.
Background information
What version of the PMIx Reference Server are you using? (e.g., v1.0, v2.1, git master @ hash, etc.)
Open MPI 5.0.x nightly snapshot openmpi-v5.0.x-202203030340-563c565.tar.gz
What version of PMIx are you using? (e.g., v1.2.5, v2.0.3, v2.1.0, git branch name and hash, etc.)
Open MPI 5.0.x nightly snapshot openmpi-v5.0.x-202203030340-563c565.tar.gz
Please describe the system on which you are running
Details of the problem
The context of the issue is indirect launch of a job under control of a debugger.
Broadly following the indirect.c example,
gives output
Changing the command to use
allows the job to complete - this issue appears to be specific to slurm integration.
The text was updated successfully, but these errors were encountered: