Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prov/verbs;ofi_rxm iov_limit has a hard limit of 4 #10641

Open
mlefebvre1 opened this issue Dec 17, 2024 · 2 comments
Open

prov/verbs;ofi_rxm iov_limit has a hard limit of 4 #10641

mlefebvre1 opened this issue Dec 17, 2024 · 2 comments

Comments

@mlefebvre1
Copy link

mlefebvre1 commented Dec 17, 2024

Describe the bug
Hi, I want to use the provider verbs;ofi_rxm and have iov_limit set to 16 for both rx and tx. Unfortunately, when I set the hints to 16, no providers are returned. It seems the rxm provider has a hard limit of 4, but If I query the RDMA device using libibverbs I see that it supports up to 30 scatter-gather entries.

To Reproduce

#include <stdio.h>

#include <rdma/fabric.h>
#include <rdma/fi_errno.h>

int main() {
  struct fi_info *hints = NULL;
  struct fi_info *info;

  hints = fi_allocinfo();
  hints->ep_attr->type = FI_EP_RDM;
  hints->fabric_attr->prov_name = (char *)"verbs;ofi_rxm";
  hints->tx_attr->iov_limit = 16;
  hints->rx_attr->iov_limit = 16;

  if (hints == NULL) {
    printf("failed to allocate memory for hints\n");
    return 1;
  }

  int ret = fi_getinfo(FI_VERSION(2, 0), "0.0.0.0", "8080", FI_SOURCE,
                       hints, &info);
  if (ret != FI_SUCCESS) {
    printf("failed to get any provider reason=%s\n", fi_strerror(ret));
    return 1;
  }

  printf("%s\n", fi_tostr(info, FI_TYPE_INFO));
}

Output
If I run with FI_LOG_LEVEL=info I get the following errors:

libfabric:1180674:1734402138::ofi_rxm:core:ofi_check_rx_attr():914<info> iov_limit too large
libfabric:1180674:1734402138::ofi_rxm:core:ofi_check_rx_attr():915<info> Supported: 4
libfabric:1180674:1734402138::ofi_rxm:core:ofi_check_rx_attr():915<info> Requested: 16

Expected behavior
I would expect that a user can set the hints for the iov limits up to what the RDMA device can support (in my case, at least up to 30).

Environment:
Linux ubuntu 20.04
libfabric version 2.0.0

Thank you.

@mlefebvre1 mlefebvre1 added the bug label Dec 17, 2024
@aingerson
Copy link
Contributor

@mlefebvre1 Hi there and welcome to the libfabric community!
Many providers hard code their limits. 4 is a reasonable limit. Since rxm is layered on top of verbs and often has to copy over the iovs from the user, it is difficult to make this dynamic and has the potential to waste a lot of space. So this isn't a bug, but rather an optimization. We can look into the possibility of adding support for more depending on the device limit, but it's probably not going to be high on the list if I'm being honest.
I would recommend trying to get your application to need fewer iovs (ie sending 4 at a time) if possible. Alternatively, you could try increasing the hardcoded limit to 16 but obviously you would need your own libfabric build so it wouldn't be a long term solution.

@aingerson aingerson added enhancement and removed bug labels Dec 17, 2024
@mlefebvre1
Copy link
Author

Gotcha, thanks for the quick reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants