Skip to content

Commit

Permalink
osc/rdma: use btl/self for self communication as last resort
Browse files Browse the repository at this point in the history
ompi_osc_rdma_peer_btl_endpoint() is used to select btl and endpoint
to communicate with a peer.

This patch added a change to ompi_osc_rdma_peer_btl_endpoint() that:
if no btl/endpoint has been selected for self communication,
and if bml has the btl/self, then use btl/self for self communication.

It also made a change to ompi_osc_rdma_new_peer():

Currently if no btl can be found and the peer is self, the function
still continues. This patch made the function to fail in this case,
because the ability to do self communication is essential for osc/rdma.

Signed-off-by: Wei Zhang <wzam@amazon.com>
  • Loading branch information
wzamazon committed Sep 27, 2021
1 parent 6c4cb75 commit b0197d9
Showing 1 changed file with 14 additions and 4 deletions.
18 changes: 14 additions & 4 deletions ompi/mca/osc/rdma/osc_rdma_peer.c
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,19 @@ static int ompi_osc_rdma_peer_btl_endpoint (struct ompi_osc_rdma_module_t *modul
}
}

/* unlikely but can happen when creating a peer for self */
if (peer_id == ompi_comm_rank (module->comm)) {
for (int btl_index = 0 ; btl_index < num_btls ; ++btl_index) {
struct mca_btl_base_module_t *btl;

btl = bml_endpoint->btl_rdma.bml_btls[btl_index].btl;
if (strcmp(btl->btl_component->btl_version.mca_component_name, "self")==0) {
*btl_out = btl;
*endpoint = bml_endpoint->btl_eager.bml_btls[btl_index].btl_endpoint;
return OMPI_SUCCESS;
}
}
}

return OMPI_ERR_UNREACH;
}

Expand All @@ -86,9 +98,7 @@ int ompi_osc_rdma_new_peer (struct ompi_osc_rdma_module_t *module, int peer_id,

/* find a btl/endpoint to use for this peer */
int ret = ompi_osc_rdma_peer_btl_endpoint (module, peer_id, &btl, &endpoint);
if (OPAL_UNLIKELY(OMPI_SUCCESS != ret &&
!(module->selected_btls[0]->btl_atomic_flags & MCA_BTL_ATOMIC_SUPPORTS_GLOB) &&
(peer_id != ompi_comm_rank (module->comm)))) {
if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) {
return ret;
}

Expand Down

0 comments on commit b0197d9

Please sign in to comment.