-
Notifications
You must be signed in to change notification settings - Fork 70
ompi/request: change semantics of ompi request callbacks #1325
Conversation
This commit changes the sematics of ompi request callbacks. If a request's callback has freed or re-posted (using start) a request the callback must return 1 instead of OMPI_SUCCESS. This indicates to ompi_request_complete that the request should not be modified further. This fixes a race condition in osc/pt2pt that could lead to the req_state being inconsistent if a request is freed between the callback and setting the request as complete. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov> (cherry picked from commit open-mpi/ompi@6aa658a) Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
@hppritcha I marked this for 2.0.2 because it is not 100% necessary for 2.0.1 but it is a nice to have. The problem leads to a hang or a crash with lots of threads and osc/pt2pt. |
👍 |
@hppritcha We should consider moving this back to v2.0.1. It's a real threaded race condition. |
Test PASSed. |
@jsquyres I have one more commit for this. If possible please consider for 2.0.1. I am testing the last fix now. So far I have not been able to get it to hang. |
@hjelmn I think we missed the boat on this one for v2.0.1, especially since there's another commit coming. Specifically: there's always going to be one more commit from a random developer -- we have to close the door and actually do a release at some point. Sorry. |
Yeah, not unexpected. I think osc/pt2pt is clean now. Spent most of yesterday tracking down a hang on Cray systems. Looks like that one is a btl/ugni bug. Will have that fixed for 2.0.2. |
@hppritcha and I talked -- approved. |
This commit changes the sematics of ompi request callbacks. If a
request's callback has freed or re-posted (using start) a request
the callback must return 1 instead of OMPI_SUCCESS. This indicates
to ompi_request_complete that the request should not be modified
further. This fixes a race condition in osc/pt2pt that could lead
to the req_state being inconsistent if a request is freed between
the callback and setting the request as complete.
Signed-off-by: Nathan Hjelm hjelmn@lanl.gov
(cherry picked from commit open-mpi/ompi@6aa658a)
Signed-off-by: Nathan Hjelm hjelmn@lanl.gov