Skip to content

Commit

Permalink
various fixes for singleton support
Browse files Browse the repository at this point in the history
some recent changes broke singleton support - twice in a month.

First, remove problematic PMIX_RELEASE of jdata when its not ready to be removed.

For some reason this showed up in singleton mode with debug enabled.
Various asserts would fail when this PMIX_RELEASE was invoked.
This was due to the fact that the jdata had been put on a list of jdata's
so the opal_list destructor was having a fit trying to release a jdata
which was still in a list.

It turns out this jdata is being released in the code starting at
line 95 of prte_finalize.c.   I assume with debug not enabled that the
jdata is released twice, rather than failing in the assert in prted_comm.c

Some work to add in session id's for tracking allocations also broke
singleton support.

This patch restores the singletone functionality.

Related to issue open-mpi/ompi#12307

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
  • Loading branch information
hppritcha authored and rhc54 committed Feb 23, 2024
1 parent cb87ece commit cfe71a7
Show file tree
Hide file tree
Showing 3 changed files with 5 additions and 2 deletions.
5 changes: 4 additions & 1 deletion src/mca/plm/base/plm_base_receive.c
Original file line number Diff line number Diff line change
Expand Up @@ -271,7 +271,10 @@ void prte_plm_base_recv(int status, pmix_proc_t *sender,
/* try defaulting to parent session */
if (NULL != (parent = prte_get_job_data_object(nptr->nspace))) {
session = parent->session;

if (NULL == session) {
rc = PRTE_ERR_NOT_FOUND;
goto ANSWER_LAUNCH;
}
// (RHC) This next clause merits some thought - not sure I fully
// understand the conditionals
} else if (!prte_pmix_server_globals.scheduler_connected ||
Expand Down
1 change: 0 additions & 1 deletion src/prted/prted_comm.c
Original file line number Diff line number Diff line change
Expand Up @@ -502,7 +502,6 @@ void prte_daemon_recv(int status, pmix_proc_t *sender,
PMIX_LOAD_PROCID(&pname, job, PMIX_RANK_WILDCARD);
prte_pmix_server_clear(&pname);

PMIX_RELEASE(jdata);
break;

/**** REPORT TOPOLOGY COMMAND ****/
Expand Down
1 change: 1 addition & 0 deletions src/tools/prte/prte.c
Original file line number Diff line number Diff line change
Expand Up @@ -1401,6 +1401,7 @@ static int prep_singleton(const char *name)
jdata = PMIX_NEW(prte_job_t);
PMIX_LOAD_NSPACE(jdata->nspace, ptr);
free(ptr);
jdata->session = prte_default_session;
rc = prte_set_job_data_object(jdata);
if (PRTE_SUCCESS != rc) {
PRTE_UPDATE_EXIT_STATUS(PRTE_ERR_FATAL);
Expand Down

0 comments on commit cfe71a7

Please sign in to comment.