Skip to content

MPI_Waitany does not appear to be thread-safe #6004

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gonnetp opened this issue Oct 31, 2018 · 2 comments
Closed

MPI_Waitany does not appear to be thread-safe #6004

gonnetp opened this issue Oct 31, 2018 · 2 comments

Comments

@gonnetp
Copy link

gonnetp commented Oct 31, 2018

Background information

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

v3.1.3

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Downloaded, compiled, and installed from the source tarball.

Please describe the system on which you are running


Details of the problem

Calling MPI_Waitany from multiple threads on the same array of MPI_Request objects segfaults reliably.

I have attached a reproducible example, you can compile and run it with

mpicc testParallelMpiWaitany.c -g -Wall
mpirun --oversubscribe -n 100 ./a.out

This usually fails with the following output:

[pedro@laika tests]$ mpirun --oversubscribe -np 100 ./a.out 
Rank 01 is ready.
Rank 06 is ready.
Rank 07 is ready.
Rank 09 is ready.
Rank 11 is ready.
Rank 42 is ready.
Rank 55 is ready.
Rank 64 is ready.
Rank 08 is ready.
Rank 27 is ready.
Rank 13 is ready.
Rank 62 is ready.
Rank 87 is ready.
Rank 91 is ready.
Rank 40 is ready.
Rank 65 is ready.
Rank 33 is ready.
Rank 00 is ready.
Rank 15 is ready.
Rank 16 is ready.
Rank 17 is ready.
Rank 20 is ready.
Rank 24 is ready.
Rank 26 is ready.
Rank 29 is ready.
Rank 31 is ready.
Rank 35 is ready.
Rank 36 is ready.
Rank 37 is ready.
Rank 41 is ready.
Rank 47 is ready.
Rank 51 is ready.
Rank 53 is ready.
Rank 56 is ready.
Rank 58 is ready.
Rank 61 is ready.
Rank 63 is ready.
Rank 70 is ready.
Rank 80 is ready.
Rank 88 is ready.
Rank 90 is ready.
Rank 94 is ready.
Rank 22 is ready.
Rank 12 is ready.
Rank 10 is ready.
Rank 66 is ready.
Rank 03 is ready.
Rank 04 is ready.
Rank 14 is ready.
Rank 18 is ready.
Rank 21 is ready.
Rank 23 is ready.
Rank 25 is ready.
Rank 28 is ready.
Rank 30 is ready.
Rank 34 is ready.
Rank 38 is ready.
Rank 39 is ready.
Rank 43 is ready.
Rank 44 is ready.
Rank 45 is ready.
Rank 50 is ready.
Rank 52 is ready.
Rank 54 is ready.
Rank 57 is ready.
Rank 59 is ready.
Rank 60 is ready.
Rank 67 is ready.
Rank 68 is ready.
Rank 69 is ready.
Rank 71 is ready.
Rank 72 is ready.
Rank 75 is ready.
Rank 77 is ready.
Rank 79 is ready.
Rank 82 is ready.
Rank 84 is ready.
Rank 85 is ready.
Rank 86 is ready.
Rank 92 is ready.
Rank 95 is ready.
Rank 97 is ready.
Rank 99 is ready.
Rank 05 is ready.
Rank 19 is ready.
Rank 32 is ready.
Rank 46 is ready.
Rank 48 is ready.
Rank 49 is ready.
Rank 81 is ready.
Rank 73 is ready.
Rank 02 is ready.
Rank 83 is ready.
Rank 96 is ready.
Rank 76 is ready.
Rank 78 is ready.
Rank 74 is ready.
Rank 93 is ready.
Rank 98 is ready.
Rank 89 is ready.
[laika:21276] *** Process received signal ***
[laika:21276] Signal: Segmentation fault (11)
[laika:21276] Signal code: Address not mapped (1)
[laika:21276] Failing at address: 0x38
[laika:21276] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f3282a8d890]
[laika:21276] [ 1] /usr/lib/libmpi.so.40(ompi_request_default_wait_any+0x14f)[0x7f3282ce332f]
[laika:21276] [ 2] /usr/lib/libmpi.so.40(MPI_Waitany+0xab)[0x7f3282d28a1b]
[laika:21276] [ 3] ./a.out(+0xe30)[0x55af6da20e30]
[laika:21276] [ 4] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db)[0x7f3282a826db]
[laika:21276] [ 5] /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f32827ab88f]
[laika:21276] *** End of error message ***
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 94 with PID 0 on node laika exited on signal 11 (Segmentation fault).
--------------------------------------
------------------------------------

testParallelMpiWaitany.c.gz

@gonnetp gonnetp changed the title MPI_Waitany is not thread-safe MPI_Waitany does not appear to be thread-safe Oct 31, 2018
@bosilca
Copy link
Member

bosilca commented Oct 31, 2018

According to the MPI standard it is illegal to wait on the same requests in multiple threads concurrently.

@bosilca bosilca closed this as completed Oct 31, 2018
@gonnetp
Copy link
Author

gonnetp commented Nov 1, 2018

OK, thanks for clarifying!

It would probably be a good idea to mention this in the MPI_Wail* man-pages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants