Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Singleton MPI initialization and spawn #10590

Closed
dalcinl opened this issue Jul 20, 2022 · 18 comments · Fixed by #10688
Closed

Singleton MPI initialization and spawn #10590

dalcinl opened this issue Jul 20, 2022 · 18 comments · Fixed by #10688
Assignees
Milestone

Comments

@dalcinl
Copy link
Contributor

dalcinl commented Jul 20, 2022

Looks like singleton MPI init and spawn is broken in the main branch.

Look at this reproducer:

#include <mpi.h>
#include <stdio.h>

int main(int argc, char *argv[])
{
  MPI_Comm parent, intercomm;

  MPI_Init(NULL, NULL);

  MPI_Comm_get_parent(&parent);
  if (MPI_COMM_NULL != parent)
    MPI_Comm_disconnect(&parent);
  
  if (argc > 1) {
    printf("Spawning '%s' ... ", argv[1]);
    MPI_Comm_spawn(argv[1], MPI_ARGV_NULL,
                   1, MPI_INFO_NULL, 0, MPI_COMM_SELF,
                   &intercomm, MPI_ERRCODES_IGNORE);
    MPI_Comm_disconnect(&intercomm);
    printf("OK\n");
  }

  MPI_Finalize();
}

Now I run that code using Open MPI v4.1.2 (system package from Fedora 36) the following two ways:

$ mpiexec -n 1 ./a.out ./a.out 
Spawning './a.out' ... OK

$ ./a.out ./a.out 
Spawning './a.out' ... OK

Note that the second way does not use mpiexec (that is, what the MPI standard calls singleton MPI initialization).

Next I run the code with ompi/main. I've configured with:

./configure \
    --without-ofi \
    --without-ucx \
    --with-pmix=internal \
    --with-prrte=internal \
    --with-libevent=internal \
    --with-hwloc=internal \
    --enable-debug \
    --enable-mem-debug \
    --disable-man-pages \
    --disable-sphinx

The first way (using mpiexec) seems to works just fine. The second way (singleton MPI init) fails:

$ mpiexec -n 1 ./a.out ./a.out 
Spawning './a.out' ... OK

$ ./a.out ./a.out 
[kw61149:1105609] OPAL ERROR: Error in file ../../ompi/dpm/dpm.c at line 2122
[kw61149:00000] *** An error occurred in MPI_Comm_spawn
[kw61149:00000] *** reported by process [440139776,0]
[kw61149:00000] *** on communicator MPI_COMM_SELF
[kw61149:00000] *** MPI_ERR_UNKNOWN: unknown error
[kw61149:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[kw61149:00000] ***    and MPI will try to terminate your MPI job as well)

PS: Lack of singleton MPI initialization complicate some Python users wanting to dynamically spawn MPI processes as needed via mpi4py without requiring the parent process to be launched through mpiexec.

@jsquyres jsquyres added this to the v5.0.0 milestone Jul 21, 2022
@dalcinl
Copy link
Contributor Author

dalcinl commented Aug 17, 2022

@awlauria After your recent update of submodule pointers, things got even worse. Now I cannot even call MPI_Init_thread() if mpiexec is not used:

$ mpiexec -n 1 python -c "from mpi4py import MPI"

$ python -c "from mpi4py import MPI"
--------------------------------------------------------------------------
It looks like MPI runtime init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during RTE init; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  local peers
  --> Returned "Not found" (-13) instead of "Success" (0)
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_rte_init failed
  --> Returned "Not found" (-13) instead of "Success" (0)
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_mpi_instance_init failed
  --> Returned "Not found" (-13) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and MPI will try to terminate your MPI job as well)
[kw61149:1130097] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!

@awlauria
Copy link
Contributor

@dalcinl sigh

I didn't expect it to solve this issue, but good to know. Thanks

@rhc54
Copy link
Contributor

rhc54 commented Aug 18, 2022

FWIW: the error appears to be in the MPI layer, and not in PMIx or PRRTE. A quick look identifies the following culprit code (taken from main branch):

    val = NULL;
    OPAL_MODEX_RECV_VALUE(rc, PMIX_LOCAL_PEERS,
                          &pname, &val, PMIX_STRING);
    if (PMIX_SUCCESS == rc && NULL != val) {
        peers = opal_argv_split(val, ',');
        free(val);
    } else {
        ret = opal_pmix_convert_status(rc);
        error = "local peers";
        goto error;
    }
    /* if we were unable to retrieve the #local peers, set it here */
    if (0 == opal_process_info.num_local_peers) {
        opal_process_info.num_local_peers = opal_argv_count(peers) - 1;
    }

You can see that not finding "local_peers", which you won't find in the case of a singleton, incorrectly results in return of an error, even though the following code correctly knows how to deal with that situation.

@jjhursey jjhursey self-assigned this Aug 18, 2022
@jjhursey
Copy link
Member

I took a crack at this and got singleton without spawn working, but with spawn I'm getting:

[jjhursey@f5n17 mpi] ./hello_c 
  0/  1) [f5n17] 2998639 Hello, world!
[jjhursey@f5n17 mpi]  ./simple_spawn ./simple_spawn
[f5n17:2999204] OPAL ERROR: Error in file dpm/dpm.c at line 2122
[f5n17:00000] *** An error occurred in MPI_Comm_spawn
[f5n17:00000] *** reported by process [1890451456,0]
[f5n17:00000] *** on communicator MPI_COMM_SELF
[f5n17:00000] *** MPI_ERR_UNKNOWN: unknown error
[f5n17:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[f5n17:00000] ***    and MPI will try to terminate your MPI job as well)

I have a lead on what needs fixing (something with starting the prte environment). I'm looking into it.

jjhursey added a commit to jjhursey/ompi that referenced this issue Aug 18, 2022
 * Fixes open-mpi#10590
 * Singletons will not have a PMIx value for `PMIX_LOCAL_PEERS`
   so make that optional instead of required.
 * `&` is being confused as an application argument in `prte`
   instead of the background character
   * Replace with `--daemonize` which is probably better anyway

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
@jjhursey
Copy link
Member

jjhursey commented Aug 18, 2022

Two changes required to fix this issue:

With those two fixes I was able to run a singleton without MPI_Comm_spawn and a singleton with MPI_Comm_spawn (same program as was provided in this issue)

[jjhursey@f5n17 mpi] ./hello_c 
  0/  1) [f5n17] 3347380 Hello, world!
[jjhursey@f5n17 mpi] ./simple_spawn ./simple_spawn
Spawning './simple_spawn' ... OK

@rhc54
Copy link
Contributor

rhc54 commented Aug 18, 2022

PRRTE change has been committed - thanks Josh! Will now port it over to PRRTE v3.0.

Note: the fix was done in the schizo/prrte component and therefore only applies to prterun with the PRRTE personality. You might want to check to see if it also needs to go into the OMPI personality.

@jjhursey
Copy link
Member

Open MPI master now seems to have a fully working singleton spawn after the merging of the following PRs:

shell$ date
Thu Aug 25 09:55:37 EDT 2022
shell$  git rev-parse HEAD
96fadd9d6860bd1dc89f15e88a472f310ff13c89
shell$ git submodule status
 ac7abc6e432cd3fe2d5a72809a987f180133d668 3rd-party/openpmix (v1.1.3-3604-gac7abc6e)
 0a7547330050854f8164c4805fafd8e32a2786cd 3rd-party/prrte (psrvr-v2.0.0rc1-4420-g0a754733)
shell$ git branch
* main
shell$ cd /tmp/ompi-tests-public/singleton
shell$ make clean all
rm -f hello_c simple_spawn simple_spawn_multiple *.o
mpicc hello_c.c -Wall -g -O0 -o hello_c
mpicc simple_spawn.c -Wall -g -O0 -o simple_spawn
mpicc simple_spawn_multiple.c -Wall -g -O0 -o simple_spawn_multiple
shell$ ./run.sh 
=====================
Testing: Hello with mpirun
=====================
  0/  1) [f5n18] 367194 Hello, world!
=====================
Testing: Hello as a singleton
=====================
  0/  1) [f5n18] 367300 Hello, world!
=====================
Testing: MPI_Comm_spawn with mpirun
=====================
Spawning './simple_spawn' ... OK
=====================
Testing: MPI_Comm_spawn as a singleton
=====================
Spawning './simple_spawn' ... OK
=====================
Testing: MPI_Comm_spawn_multiple with mpirun
=====================
Hello from a Child (B)
Hello from a Child (A)
Hello from a Child (B)
Spawning Multiple './simple_spawn_multiple' ... OK
=====================
Testing: MPI_Comm_spawn_multiple as a singleton
=====================
Hello from a Child (B)
Hello from a Child (A)
Hello from a Child (B)
Spawning Multiple './simple_spawn_multiple' ... OK
=====================
Success
=====================

jjhursey added a commit to jjhursey/ompi that referenced this issue Aug 25, 2022
 * Fixes open-mpi#10590
 * Singletons will not have a PMIx value for `PMIX_LOCAL_PEERS`
   so make that optional instead of required.
 * `&` is being confused as an application argument in `prte`
   instead of the background character
   * Replace with `--daemonize` which is probably better anyway

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
(cherry picked from commit 16a1fa6)
@jjhursey
Copy link
Member

Using the singleton test suite

Version Singleton Singleton MPI_Comm_spawn Singleton MPI_Comm_spawn_multiple
main PASS PASS PASS
v5.0.x PASS FAILURE FAILURE
v5.0.x (patched) PASS PASS PASS (FAILURE with UCX in both Singleton and mpirun)
4.1.4 PASS PASS PASS
4.0.7 PASS PASS PASS
3.1.6 PASS FAILURE (Singleton and mpirun) FAILURE (Singleton and mpirun)

v5.0.x (patched) includes - see PR #10716 which includes the first three

The MPI_Comm_spawn_multiple test failed with UCX (passes with ob1) just on the v5.0.x branch (works on main).

mpirun --np 1 ./simple_spawn_multiple ./simple_spawn_multiple

@open-mpi/ucx Can you all check to see if we are missing a commit from main to v5.0.x for ucx?

@janjust
Copy link
Contributor

janjust commented Aug 25, 2022

@hoopoepg Hey Sergey, can we see about why singleton Comm_spawn_multiple() runs fail with UCX? Are we missing something that's currently in main, but not in v5.0?

@hoopoepg
Copy link
Contributor

it is really strange - we didn't anything specific for spawn functionality.
will look

@brminich
Copy link
Member

@karasevb, can you please take a look?

@karasevb
Copy link
Member

...

mpirun --np 1 ./simple_spawn_multiple ./simple_spawn_multiple

@open-mpi/ucx Can you all check to see if we are missing a commit from main to v5.0.x for ucx?

I cannot reproduce the Singleton MPI_Comm_spawn_multiple failure with UCX. I've built v5.0.x (647d793) + patches that @jjhursey mentioned and it works well with ob1/ucx pmls.
Perhaps this requires a special OMPI config line?

@jjhursey
Copy link
Member

jjhursey commented Aug 30, 2022

@karasevb I created a fresh build of Open MPI v5.0.x with UCX 1.13.0 and it is working fine now. So I'm not sure what happened in that test. If I see it again I'll try to track it down a file a bug.

Below is the current state of Singletons on the various branches/releases (using ob1 and ucx - tested separately):

Version Singleton Singleton MPI_Comm_spawn Singleton MPI_Comm_spawn_multiple
main branch PASS PASS PASS
v5.0.x branch PASS PASS PASS
v4.1.x branch PASS PASS PASS
4.1.4 release PASS PASS PASS
v4.0.x branch PASS PASS PASS
4.0.7 release PASS PASS PASS
3.1.6 release PASS FAILURE (Singleton and mpirun) FAILURE (Singleton and mpirun)

Odd data point. With the v4.1.x branch I had to set OMPI_MCA_osc=^ucx otherwise it would segv in MPI_Finalize:

(gdb) bt
#0  0x00007fd4d26f0acf in uw_frame_state_for () from /lib64/libgcc_s.so.1
#1  0x00007fd4d26f2758 in _Unwind_Backtrace () from /lib64/libgcc_s.so.1
#2  0x00007fd4dee04c56 in backtrace () from /lib64/libc.so.6
#3  0x00007fd4d1d16a9e in ucs_debug_backtrace_create (strip=2, bckt=0x7fd4df72b190) at debug/debug.c:596
#4  ucs_debug_backtrace_create (bckt=0x7fd4df72b190, strip=2) at debug/debug.c:585
#5  0x00007fd4d1d16db3 in ucs_debug_print_backtrace (stream=0x7fd4df089600 <_IO_2_1_stderr_>, strip=strip@entry=2) at debug/debug.c:654
#6  0x00007fd4d1d19414 in ucs_handle_error (message=0x7fd4d1d3908b "address not mapped to object") at debug/debug.c:1081
#7  0x00007fd4d1d195cc in ucs_debug_handle_error_signal (signo=signo@entry=11, cause=0x7fd4d1d3908b "address not mapped to object", 
    fmt=fmt@entry=0x7fd4d1d3912d " at address %p") at debug/debug.c:1033
#8  0x00007fd4d1d19878 in ucs_error_signal_handler (signo=11, info=0x7fd4df72b5f0, context=<optimized out>) at debug/debug.c:984
#9  <signal handler called>
#10 0x00007fd4d24e301f in ?? ()
#11 0x00007fd4de68e6b5 in opal_mem_hooks_release_hook (buf=0x7fd4df6e7000, length=139264, from_alloc=true) at memoryhooks/memory.c:129
#12 0x00007fd4de74fb21 in _intercept_munmap (start=0x7fd4df6e7000, length=139264) at memory_patcher_component.c:184
#13 0x00007fd4de74fb8e in intercept_munmap (start=0x7fd4df6e7000, length=139264) at memory_patcher_component.c:198
#14 0x00007fd4de751b88 in mca_mpool_default_free (mpool=0x7fd4de9aaae0 <mca_mpool_malloc_module>, addr=0x7fd4df6e8000)
    at base/mpool_base_default.c:70
#15 0x00007fd4de67fd5c in opal_free_list_allocation_release (fl=0x7fd4cb1414f0 <mca_osc_rdma_component+496>, fl_mem=0x249dba0)
    at class/opal_free_list.c:70
#16 0x00007fd4de67fe9c in opal_free_list_destruct (fl=0x7fd4cb1414f0 <mca_osc_rdma_component+496>) at class/opal_free_list.c:103
#17 0x00007fd4caf1f960 in opal_obj_run_destructors (object=0x7fd4cb1414f0 <mca_osc_rdma_component+496>)
    at ../../../../opal/class/opal_object.h:483
#18 0x00007fd4caf20e01 in ompi_osc_rdma_component_finalize () at osc_rdma_component.c:359
#19 0x00007fd4df3e1788 in ompi_osc_base_finalize () at base/osc_base_frame.c:72
#20 0x00007fd4df3114e5 in ompi_mpi_finalize () at runtime/ompi_mpi_finalize.c:323
#21 0x00007fd4df34fce2 in PMPI_Finalize () at pfinalize.c:54
#22 0x00000000004009a0 in main (argc=1, argv=0x7ffe1a15b598) at hello_c.c:24
(gdb) up

I'll see if I can track that down.

@dalcinl
Copy link
Contributor Author

dalcinl commented Aug 31, 2022

@jjhursey I'm not sure you got my request for advice for a related issue in mpi4py/mpi4py#247. I'm just asking for your comment about whether this is a know issue that can be somehow worked around via MCA params, or I should just disable tests as known failure.

@karasevb
Copy link
Member

Odd data point. With the v4.1.x branch I had to set OMPI_MCA_osc=^ucx otherwise it would segv in MPI_Finalize:

@janjust do you have any idea how to fix this?

@jjhursey
Copy link
Member

@jjhursey I'm not sure you got my request for advice for a related issue in mpi4py/mpi4py#247. I'm just asking for your comment about whether this is a know issue that can be somehow worked around via MCA params, or I should just disable tests as known failure.

Sorry I haven't gotten to it just yet. I'm planning on looking into it today.

@janjust
Copy link
Contributor

janjust commented Aug 31, 2022

Odd data point. With the v4.1.x branch I had to set OMPI_MCA_osc=^ucx otherwise it would segv in MPI_Finalize:

@janjust do you have any idea how to fix this?

hm, weird, I'm not really sure why it's segving in Finalize, but can take a look, could be a double free

@jjhursey
Copy link
Member

jjhursey commented Sep 2, 2022

@janjust I spent some time today tracking this down. I suspected that it was due to the environment I was running in for testing (isolated Docker container).

I found the fix and posted a PR: #10758

I posted a summary of my investigation to the PR. Locally, I applied a similar fix to v4.1.x (minor code movement changes due to the drift between branches), and it resolved the issue.

MamziB pushed a commit to MamziB/ompi that referenced this issue Oct 26, 2022
 * Fixes open-mpi#10590
 * Singletons will not have a PMIx value for `PMIX_LOCAL_PEERS`
   so make that optional instead of required.
 * `&` is being confused as an application argument in `prte`
   instead of the background character
   * Replace with `--daemonize` which is probably better anyway

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
MamziB pushed a commit to MamziB/ompi that referenced this issue Oct 26, 2022
 * Fixes open-mpi#10590
 * Singletons will not have a PMIx value for `PMIX_LOCAL_PEERS`
   so make that optional instead of required.
 * `&` is being confused as an application argument in `prte`
   instead of the background character
   * Replace with `--daemonize` which is probably better anyway

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
yli137 pushed a commit to yli137/ompi that referenced this issue Jan 10, 2024
 * Fixes open-mpi#10590
 * Singletons will not have a PMIx value for `PMIX_LOCAL_PEERS`
   so make that optional instead of required.
 * `&` is being confused as an application argument in `prte`
   instead of the background character
   * Replace with `--daemonize` which is probably better anyway

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants