Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prov/efa: Fix the ibv cq error handling. #9652

Merged
merged 1 commit into from
Dec 19, 2023
Merged

Conversation

shijin-aws
Copy link
Contributor

@shijin-aws shijin-aws commented Dec 18, 2023

Currently, efa_rdm_ep_poll_ibv_cq couldn't
handle error for IBV_WC_RECV_RDMA_WITH_IMM
and IBV_WC_RDMA_READ. This patch fixes it.

It also removed the failed_send/write_comps
in the debug build, because these symbols
are never used.

@shijin-aws shijin-aws requested a review from a team December 18, 2023 21:05
Comment on lines 473 to 477
case IBV_WC_SEND:
#if ENABLE_DEBUG
if (opcode == IBV_WC_SEND)
ep->failed_send_comps++;
else
#endif
efa_rdm_pke_handle_tx_error(pkt_entry, FI_EIO, prov_errno);
break;
case IBV_WC_RDMA_WRITE:
#if ENABLE_DEBUG
ep->failed_write_comps++;
#endif
efa_rdm_pke_handle_tx_error(pkt_entry, FI_EIO, prov_errno);
} else {
assert(opcode == IBV_WC_RECV);
break;
case IBV_WC_RDMA_READ:
#if ENABLE_DEBUG
ep->failed_read_comps++;
#endif
efa_rdm_pke_handle_tx_error(pkt_entry, FI_EIO, prov_errno);
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit we can also do fallthrough e.g.

case IBV_WC_SEND:
#if ENABLE_DEBUG
    ep->failed_send_comps++;
#endif
case IBV_WC_RDMA_WRITE:
#if ENABLE_DEBUG
    ep->failed_write_comps++;
#endif
case IBV_WC_RDMA_READ:
#if ENABLE_DEBUG
    ep->failed_read_comps++;
#endif
    efa_rdm_pke_handle_tx_error(pkt_entry, FI_EIO, prov_errno);
    break;

Copy link
Contributor Author

@shijin-aws shijin-aws Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't that make failed_write_comps incremented for send as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. The counter indeed introduces extra coding. We can find a more concise approach but I don't have a strong preference.

Copy link
Contributor Author

@shijin-aws shijin-aws Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, ideally we should declare failed_comps as an array, and have a map between the ibv wc code to the ofi op code, so we can increment the counter with the right index

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

offline talked with @wenduwan, new revision removed the unused failed_send/write_comps to reduce the code duplication

Currently, efa_rdm_ep_poll_ibv_cq couldn't
handle error for IBV_WC_RECV_RDMA_WITH_IMM
and IBV_WC_RDMA_READ. This patch fixes it.

It also removed the failed_send/write/read_comps
in the debug build, because these symbols
are never used.

Signed-off-by: Shi Jin <sjina@amazon.com>
@shijin-aws shijin-aws merged commit 52b24eb into ofiwg:main Dec 19, 2023
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants