Skip to content

Commit

Permalink
prov/efa: Make fi_cancel return EOPNOTSUPP for zero copy receive mode.
Browse files Browse the repository at this point in the history
A receive cannot be safely cancelled in zero copy receive mode
as we cannot cancel the recv in the HW level. Make fi_cancel return
EOPNOTSUPP instead of making hacks that don't really fake the behaviors.

Signed-off-by: Shi Jin <sjina@amazon.com>
  • Loading branch information
shijin-aws committed Jul 31, 2024
1 parent 8e21c90 commit 01c0f6c
Show file tree
Hide file tree
Showing 6 changed files with 45 additions and 1 deletion.
2 changes: 2 additions & 0 deletions man/fi_efa.7.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,8 @@ No support for counters for the DGRAM endpoint.

No support for inject.

No support for `fi_cancel()` for the [zero-copy receive mode](https://github.com/ofiwg/libfabric/blob/main/prov/efa/docs/efa_rdm_protocol_v4.md#48-user-receive-qp-feature--request-and-zero-copy-receive).

When using FI_HMEM for AWS Neuron or Habana SynapseAI buffers, the provider
requires peer to peer transaction support between the EFA and the FI_HMEM
device. Therefore, the FI_HMEM_P2P_DISABLED option is not supported by the EFA
Expand Down
4 changes: 4 additions & 0 deletions prov/efa/docs/efa_rdm_protocol_v4.md
Original file line number Diff line number Diff line change
Expand Up @@ -1599,6 +1599,10 @@ If the receiver supports it, sender will then send packets with user data to the
there is no ordering or tagging requirement, and the receiver already knows the sender, sender can
send packets without any headers in the payload. If the receiver doesn't support this extra feature,
the sender will continue send packets with headers to the receiver's default QP.

On the receiver side, it will post the user recv buffer to the user recv QP directly when the user
calls fi_recv(). Currently such receive cannot be cancelled and fi_cancel() is not supported in
zero-copy receive mode.
If a receiver gets RTM packets delivered to its default QP, it raises an error
because it requests all RTM packets must be delivered to its user recv QP.

Expand Down
5 changes: 5 additions & 0 deletions prov/efa/src/rdm/efa_rdm_ep_fiops.c
Original file line number Diff line number Diff line change
Expand Up @@ -1313,6 +1313,11 @@ ssize_t efa_rdm_ep_cancel(fid_t fid_ep, void *context)
struct efa_rdm_ep *ep;

ep = container_of(fid_ep, struct efa_rdm_ep, base_ep.util_ep.ep_fid.fid);
if (ep->use_zcpy_rx) {
EFA_WARN(FI_LOG_EP_CTRL, "fi_cancel is not supported in zero-copy receive mode.\n");
return -FI_EOPNOTSUPP;
}

return ep->peer_srx_ep->ops->cancel(&ep->peer_srx_ep->fid, context);
}

Expand Down
33 changes: 32 additions & 1 deletion prov/efa/test/efa_unit_test_ep.c
Original file line number Diff line number Diff line change
Expand Up @@ -1017,4 +1017,35 @@ void test_efa_rdm_ep_close_discard_posted_recv(struct efa_resource **state)

/* Reset to NULL to avoid test reaper closing again */
resource->ep = NULL;
}
}

void test_efa_rdm_ep_zcpy_recv_cancel(struct efa_resource **state)
{
struct efa_resource *resource = *state;
struct fi_context cancel_context = {0};
struct efa_unit_test_buff recv_buff;

resource->hints = efa_unit_test_alloc_hints(FI_EP_RDM);
assert_non_null(resource->hints);

resource->hints->tx_attr->msg_order = FI_ORDER_NONE;
resource->hints->rx_attr->msg_order = FI_ORDER_NONE;
resource->hints->caps = FI_MSG;

/* enable zero-copy recv mode in ep */
test_efa_rdm_ep_use_zcpy_rx_impl(resource, true);

/* Construct a recv buffer with mr */
efa_unit_test_buff_construct(&recv_buff, resource, 16);

assert_int_equal(fi_recv(resource->ep, recv_buff.buff, recv_buff.size, fi_mr_desc(recv_buff.mr), FI_ADDR_UNSPEC, &cancel_context), 0);

assert_int_equal(fi_cancel((struct fid *)resource->ep, &cancel_context), -FI_EOPNOTSUPP);

/**
* the buf is still posted to rdma-core, so unregistering mr can
* return non-zero. Currently ignore this failure.
*/
(void) fi_close(&recv_buff.mr->fid);
free(recv_buff.buff);
}
1 change: 1 addition & 0 deletions prov/efa/test/efa_unit_tests.c
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,7 @@ int main(void)
cmocka_unit_test_setup_teardown(test_efa_rdm_ep_user_zcpy_rx_happy, efa_unit_test_mocks_setup, efa_unit_test_mocks_teardown),
cmocka_unit_test_setup_teardown(test_efa_rdm_ep_user_zcpy_rx_unhappy_due_to_sas, efa_unit_test_mocks_setup, efa_unit_test_mocks_teardown),
cmocka_unit_test_setup_teardown(test_efa_rdm_ep_close_discard_posted_recv, efa_unit_test_mocks_setup, efa_unit_test_mocks_teardown),
cmocka_unit_test_setup_teardown(test_efa_rdm_ep_zcpy_recv_cancel, efa_unit_test_mocks_setup, efa_unit_test_mocks_teardown),
cmocka_unit_test_setup_teardown(test_dgram_cq_read_empty_cq, efa_unit_test_mocks_setup, efa_unit_test_mocks_teardown),
cmocka_unit_test_setup_teardown(test_ibv_cq_ex_read_empty_cq, efa_unit_test_mocks_setup, efa_unit_test_mocks_teardown),
cmocka_unit_test_setup_teardown(test_ibv_cq_ex_read_failed_poll, efa_unit_test_mocks_setup, efa_unit_test_mocks_teardown),
Expand Down
1 change: 1 addition & 0 deletions prov/efa/test/efa_unit_tests.h
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,7 @@ void test_efa_rdm_ep_enable_qp_in_order_aligned_128_bytes_bad();
void test_efa_rdm_ep_user_zcpy_rx_happy();
void test_efa_rdm_ep_user_zcpy_rx_unhappy_due_to_sas();
void test_efa_rdm_ep_close_discard_posted_recv();
void test_efa_rdm_ep_zcpy_recv_cancel();
void test_dgram_cq_read_empty_cq();
void test_ibv_cq_ex_read_empty_cq();
void test_ibv_cq_ex_read_failed_poll();
Expand Down

0 comments on commit 01c0f6c

Please sign in to comment.