Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PML/UCX: improved error processing in MPI_Recv #8140

Merged

Conversation

hoopoepg
Copy link
Contributor

  • improved error processing in MPI_Recv implementation
    of pml UCX

Signed-off-by: Sergey Oblomov sergeyo@mellanox.com

@@ -627,15 +630,15 @@ int mca_pml_ucx_recv(void *buf, size_t count, ompi_datatype_t *datatype, int src
MCA_COMMON_UCX_PROGRESS_LOOP(ompi_pml_ucx.ucp_worker) {
status = ucp_request_test(req, &info);
if (status != UCS_INPROGRESS) {
mca_pml_ucx_set_recv_status_safe(mpi_status, status, &info);
mca_pml_ucx_set_recv_status(mpi_status, status, &info);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add return value for mca_pml_ucx_set_recv_status(), instead of extra branch in lines 615-616?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

branch on 615-616 needed to call mca_pml_ucx_set_recv_status. I can move it in to mca_pml_ucx_set_recv_status_safe

@hoopoepg
Copy link
Contributor Author

@yosefe ok to squash?

Comment on lines 197 to 201
ompi_status_public_t _local_status;
ompi_status_public_t *mpi_status = (_mpi_status != MPI_STATUS_IGNORE) ?
_mpi_status : &_local_status;

return mca_pml_ucx_set_recv_status(mpi_status, ucp_status, info);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe avoid setting temp mpi_status field if not needed

if (mpi_status != MPI_STATUS_IGNORE) {
    return mca_pml_ucx_set_recv_status(mpi_status, ucp_status, info)
} else if (status == UCS_OK || status == CANCELED) {
   return UCS_OK;
} else if (status == TRUNCATED)
  reutrn ERR_TRUNCATE
} else {
 return ERR_INTERN
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@@ -627,15 +628,15 @@ int mca_pml_ucx_recv(void *buf, size_t count, ompi_datatype_t *datatype, int src
MCA_COMMON_UCX_PROGRESS_LOOP(ompi_pml_ucx.ucp_worker) {
status = ucp_request_test(req, &info);
if (status != UCS_INPROGRESS) {
mca_pml_ucx_set_recv_status_safe(mpi_status, status, &info);
result = mca_pml_ucx_set_recv_status_safe(mpi_status, status, &info);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this also in other places mca_pml_ucx_set_recv_status_safe is used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed for MPI_Mrecv call, other locations are non-blocking MPI calls - not relevant

@hoopoepg hoopoepg force-pushed the topic/pml-ucx-recv-improved-errhandling branch from 72cde39 to 4ddfab2 Compare October 29, 2020 11:30
@hoopoepg
Copy link
Contributor Author

@yosefe ok to squash?

{
if (mpi_status != MPI_STATUS_IGNORE) {
mca_pml_ucx_set_recv_status(mpi_status, ucp_status, info);
return mca_pml_ucx_set_recv_status(mpi_status, ucp_status, info);
} else if ((ucp_status == UCS_OK) || (ucp_status == UCS_ERR_CANCELED)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add opal_likely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, added

@hoopoepg hoopoepg force-pushed the topic/pml-ucx-recv-improved-errhandling branch from 2946abd to 4ef5fc2 Compare November 2, 2020 07:38
@hoopoepg
Copy link
Contributor Author

hoopoepg commented Nov 2, 2020

@yosefe ok to squash?

- improved error processing in MPI_Recv implementation
  of pml UCX
- added error handling for pml_ucx_mrecv call

Signed-off-by: Sergey Oblomov <sergeyo@nvidia.com>
@hoopoepg hoopoepg force-pushed the topic/pml-ucx-recv-improved-errhandling branch from 4ef5fc2 to eb9405d Compare November 2, 2020 09:26
@hoopoepg
Copy link
Contributor Author

hoopoepg commented Nov 2, 2020

@yosefe ok to merge?

@yosefe yosefe merged commit 1f3e334 into open-mpi:master Nov 3, 2020
@yosefe yosefe deleted the topic/pml-ucx-recv-improved-errhandling branch November 3, 2020 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants