You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After failure of mpirun jobs there is often orphaned orted processes left behind that consume 100% cpu. The version I am using is 4.0.1 and looks like this is the first version to show this kind of issue.
The pstack of the hanging orted process as follows:
Thread 2 (Thread 0x148556eb2700 (LWP 12288)):
#0 0x000014855739467d in poll () at ../sysdeps/unix/syscall-template.S:84
#1 0x0000148558465d26 in poll_dispatch ()
#2 0x000014855845bf5d in opal_libevent2022_event_base_loop ()
#3 0x000014855840566e in progress_engine ()
#4 0x000014855765b494 in start_thread (arg=0x148556eb2700) at pthread_create.c:333
#5 0x000014855739dacf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
Thread 1 (Thread 0x148558d6f040 (LWP 12285)):
#0 __GI___pthread_mutex_lock (mutex=0x23b0020) at ../nptl/pthread_mutex_lock.c:75
#1 0x0000148558459350 in opal_libevent2022_event_del ()
#2 0x000014855845ea45 in opal_libevent2022_event_base_free ()
#3 0x000014855854173e in tracker_destructor ()
#4 0x0000148558541c81 in pmix_progress_thread_stop ()
#5 0x0000148558555791 in OPAL_MCA_PMIX3X_PMIx_server_finalize ()
#6 0x00001485584b97a8 in pmix3x_server_finalize ()
#7 0x0000148558887494 in pmix_server_finalize ()
#8 0x00001485588adf17 in orte_ess_base_orted_finalize ()
#9 0x00001485588b56c9 in rte_finalize ()
#10 0x000014855885e2a0 in orte_finalize ()
#11 0x000014855887dbd5 in orte_daemon ()
#12 0x00000000004007b9 in main ()
The text was updated successfully, but these errors were encountered:
No, it is not related to a known issue. I'm just asking to update to the latest on the release series to see if it has been fixed as a matter of course. Sorry. ☹️
After failure of mpirun jobs there is often orphaned orted processes left behind that consume 100% cpu. The version I am using is 4.0.1 and looks like this is the first version to show this kind of issue.
The pstack of the hanging orted process as follows:
The text was updated successfully, but these errors were encountered: