-
Notifications
You must be signed in to change notification settings - Fork 871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PML/UCX: improved error processing in MPI_Recv #8140
PML/UCX: improved error processing in MPI_Recv #8140
Conversation
ompi/mca/pml/ucx/pml_ucx.c
Outdated
@@ -627,15 +630,15 @@ int mca_pml_ucx_recv(void *buf, size_t count, ompi_datatype_t *datatype, int src | |||
MCA_COMMON_UCX_PROGRESS_LOOP(ompi_pml_ucx.ucp_worker) { | |||
status = ucp_request_test(req, &info); | |||
if (status != UCS_INPROGRESS) { | |||
mca_pml_ucx_set_recv_status_safe(mpi_status, status, &info); | |||
mca_pml_ucx_set_recv_status(mpi_status, status, &info); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add return value for mca_pml_ucx_set_recv_status(), instead of extra branch in lines 615-616?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
branch on 615-616 needed to call mca_pml_ucx_set_recv_status. I can move it in to mca_pml_ucx_set_recv_status_safe
@yosefe ok to squash? |
ompi/mca/pml/ucx/pml_ucx_request.h
Outdated
ompi_status_public_t _local_status; | ||
ompi_status_public_t *mpi_status = (_mpi_status != MPI_STATUS_IGNORE) ? | ||
_mpi_status : &_local_status; | ||
|
||
return mca_pml_ucx_set_recv_status(mpi_status, ucp_status, info); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe avoid setting temp mpi_status field if not needed
if (mpi_status != MPI_STATUS_IGNORE) {
return mca_pml_ucx_set_recv_status(mpi_status, ucp_status, info)
} else if (status == UCS_OK || status == CANCELED) {
return UCS_OK;
} else if (status == TRUNCATED)
reutrn ERR_TRUNCATE
} else {
return ERR_INTERN
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
@@ -627,15 +628,15 @@ int mca_pml_ucx_recv(void *buf, size_t count, ompi_datatype_t *datatype, int src | |||
MCA_COMMON_UCX_PROGRESS_LOOP(ompi_pml_ucx.ucp_worker) { | |||
status = ucp_request_test(req, &info); | |||
if (status != UCS_INPROGRESS) { | |||
mca_pml_ucx_set_recv_status_safe(mpi_status, status, &info); | |||
result = mca_pml_ucx_set_recv_status_safe(mpi_status, status, &info); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this also in other places mca_pml_ucx_set_recv_status_safe is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed for MPI_Mrecv call, other locations are non-blocking MPI calls - not relevant
72cde39
to
4ddfab2
Compare
@yosefe ok to squash? |
ompi/mca/pml/ucx/pml_ucx_request.h
Outdated
{ | ||
if (mpi_status != MPI_STATUS_IGNORE) { | ||
mca_pml_ucx_set_recv_status(mpi_status, ucp_status, info); | ||
return mca_pml_ucx_set_recv_status(mpi_status, ucp_status, info); | ||
} else if ((ucp_status == UCS_OK) || (ucp_status == UCS_ERR_CANCELED)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add opal_likely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, added
2946abd
to
4ef5fc2
Compare
@yosefe ok to squash? |
- improved error processing in MPI_Recv implementation of pml UCX - added error handling for pml_ucx_mrecv call Signed-off-by: Sergey Oblomov <sergeyo@nvidia.com>
4ef5fc2
to
eb9405d
Compare
@yosefe ok to merge? |
of pml UCX
Signed-off-by: Sergey Oblomov sergeyo@mellanox.com