Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI Continues to Run After Timeseries should terminate #1

Open
jnickla1 opened this issue Nov 1, 2023 · 0 comments
Open

MPI Continues to Run After Timeseries should terminate #1

jnickla1 opened this issue Nov 1, 2023 · 0 comments

Comments

@jnickla1
Copy link
Owner

jnickla1 commented Nov 1, 2023

All 36 of the mpi subprocesses were still running hours after the ./timeseries script successfully terminated. Log file:

************************************************************
Successfully completed generating variable time-series files
Total Time: 18680.984241962433 seconds
************************************************************
--------------------------------------------------------------------------
A system call failed during shared memory initialization that should
not have.  It is likely that your MPI job will now either abort or
experience performance degradation.

  Local host:  utley.local
  System call: unlink(2) /var/folders/rb/z2h_lx454f73ttnwhkkjdk8h0000h3/T//ompi.utley.515/pid.37919/1/vader_segment.utley.515.8ee40001.26
  Error:       No such file or directory (errno 2)
--------------------------------------------------------------------------
[utley.local:37919] 1 more process has sent help message help-opal-shmem-mmap.txt / sys call fail
[utley.local:37919] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

Fix may be export TMPDIR=/tmp.

This only happened on one machine that I installed this code onto, the other terminated successfully and relinquished the cpu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant