Skip to content

Commit

Permalink
btl/ofi: Disable EFA provider in versions earlier than libfabric 1.12.0
Browse files Browse the repository at this point in the history
EFA incorrectly implements FI_DELIVERY_COMPLETE in earlier libfabric
versions. While FI_DELIVERY_COMPLETE would be advertised by the
provider, completions would return too early by not accounting for
bounce buffers on the receive side. This would cause the BTL
to receive early completions that lead to correctness issues.

This is not an issue in the mtl/ofi as it does not require
FI_DELIVERY_COMPLETE.

Signed-off-by: William Zhang <wilzhang@amazon.com>
  • Loading branch information
wckzhang committed Jul 30, 2020
1 parent 41df122 commit a7dcfd9
Showing 1 changed file with 11 additions and 0 deletions.
11 changes: 11 additions & 0 deletions opal/mca/btl/ofi/btl_ofi_component.c
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,17 @@ static int validate_info(struct fi_info *info, uint64_t required_caps)

BTL_VERBOSE(("validating device: %s", info->domain_attr->name));

/* EFA does not fulfill FI_DELIVERY_COMPLETE requirements in prior libfabric
* versions. The prov version is set as:
* FI_VERSION(FI_MAJOR_VERSION * 100 + FI_MINOR_VERSION, FI_REVISION_VERSION * 10)
* Thus, FI_VERSION(112,0) corresponds to libfabric 1.12.0
*/
if (!strncasecmp(info->fabric_attr->prov_name, "efa", 3)
&& FI_VERSION_LT(info->fabric_attr->prov_version, FI_VERSION(112,0))) {
BTL_VERBOSE(("unsupported libfabric efa version"));
return OPAL_ERROR;
}

/* we need exactly all the required bits */
if ((info->caps & required_caps) != required_caps) {
BTL_VERBOSE(("unsupported caps"));
Expand Down

0 comments on commit a7dcfd9

Please sign in to comment.