Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signal: Segmentation fault (11) #11867

Closed
YarShev opened this issue Aug 21, 2023 · 4 comments
Closed

Signal: Segmentation fault (11) #11867

YarShev opened this issue Aug 21, 2023 · 4 comments

Comments

@YarShev
Copy link

YarShev commented Aug 21, 2023

Thank you for taking the time to submit an issue!

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

$ mpiexec --version
mpiexec (OpenRTE) 4.1.5

Report bugs to http://www.open-mpi.org/community/help/

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

$ conda install -c conda-forge mpi4py openmpi

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

Please describe the system on which you are running

  • Operating system/version: Ubuntu 20.04.4 LTS
  • Computer hardware: Intel(R) Xeon(R) Platinum 8276L CPU @ 2.20GHz
  • Network type: Ethernet

Details of the problem

mpiexec -n 2 python foo.py
# foo.py
from mpi4py import MPI
import numpy

comm = MPI.COMM_WORLD
rank = comm.Get_rank()

if rank == 0:
    data = numpy.arange(1000000, dtype='i')
    r = comm.Isend([data, MPI.INT], dest=1, tag=77)
    r.Wait()
elif rank == 1:
    data = numpy.empty(1000000, dtype='i')
    r = comm.Irecv([data, MPI.INT], source=0, tag=77)
    r.Free()
    print(data)
mpiexec noticed that process rank 1 with PID 0 on node FOO exited on signal 11 (Segmentation fault)

The issue is probably similar to pmodels/mpich#6584, which mpich has fixed recently.

@devreal
Copy link
Contributor

devreal commented Aug 21, 2023

The code will not produce what you expect:

r = comm.Irecv([data, MPI.INT], source=0, tag=77)
r.Free()
print(data)

Assuming that r.Free() simply calls MPI_Request_free it does not wait for the operation to complete. The print may or may not print the data you expect, depending on whether the receive has completed already. Not sure that this is the cause for the Segfault but can you try to wait for the request first?

@YarShev
Copy link
Author

YarShev commented Aug 22, 2023

If I wait for the request, everything works well. However, let's assume that I don't need data on the receiver side and just want to free a request and issue another irecv, for instance, or even just quit the program. Even removing print(data) in the original example above raises the error.

@devreal
Copy link
Contributor

devreal commented Aug 22, 2023

The code you posted ends without either calling MPI_Finalize or waiting for the communication to complete, i.e., the communication is dangling until the implicit finalization occurs. I can only speculate here but it could well be that the numpy array gets destroyed before all data is written to it. You have no control over the order in which the finalization/destruction happens. What happens if you add a call to MPI.Finalize at the end? If that doesn't help, a stack trace would be helpful to learn where things go wrong.

@YarShev
Copy link
Author

YarShev commented Aug 23, 2023

If I add MPI.Finalize, there is no any error. When printing data I get the output data = [0 0 0 ... 0 0 0] and this is probably the expected behavior so the issue can be closed. Thanks.

@YarShev YarShev closed this as completed Aug 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants