Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: handle dmabuf for ofi_mr functions #9499

Merged
merged 3 commits into from
Nov 6, 2023

Conversation

shijin-aws
Copy link
Contributor

@shijin-aws shijin-aws commented Oct 31, 2023

When mr is registered with FI_MR_DMABUF bit in the flag,
ofi_mr_attr_update should pass the fi_mr_dmabuf object
from the attr user passed.

When inserting the mr to mr map, libfabric
needs to translate the dmabuf object to mr_iov.

When constructing ofi_mr_info from mr_attr for ofi_mr_cache_search.
Convert dmabuf to iov if the flags has FI_MR_DMABUF.

@j-xiong
Copy link
Contributor

j-xiong commented Oct 31, 2023

Intel CI error is probably transient, will retry.

12:08:33  Unexpected output in rqfreeb: APPLICATION TIMED OUT, TIMEOUT = 180s
12:08:33  Program rqfreeb exited without No Errors

@sunkuamzn
Copy link
Contributor

What about ofi_mr_cache* functions?

@shijin-aws
Copy link
Contributor Author

What about ofi_mr_cache* functions?

mr cache is currently skipped for dmabuf reg

@shijin-aws
Copy link
Contributor Author

@j-xiong do you think we should a similar convert from dmabuf to iov for ofi_mr_cache path now?

@j-xiong
Copy link
Contributor

j-xiong commented Nov 1, 2023

@shijin-aws Yes, I think it makes sense to support this in mr cache for the case that virtual address mapping is available (i.e. base_addr != 0).

@shijin-aws
Copy link
Contributor Author

@shijin-aws Yes, I think it makes sense to support this in mr cache for the case that virtual address mapping is available (i.e. base_addr != 0).

Will update, thanks

@shijin-aws
Copy link
Contributor Author

shijin-aws commented Nov 3, 2023

@j-xiong please see the updated commits. I didn't want to check whether base_addr != 0 inside the mr_cache functions but think it should be provider's responsibility to do that before using mr cache for dmabuf reg.

Copy link
Contributor

@j-xiong j-xiong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly fine. Some nitpick about the line length.

include/ofi_mr.h Outdated
@@ -427,9 +438,9 @@ int ofi_mr_cache_search(struct ofi_mr_cache *cache,
* with the cache.
*/
struct ofi_mr_entry *ofi_mr_cache_find(struct ofi_mr_cache *cache,
const struct fi_mr_attr *attr);
const struct fi_mr_attr *attr, uint64_t flags);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line is too long. Break after the comma.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

{
cur_abi_attr->mr_iov = (struct iovec *) user_attr->mr_iov;
if (FI_VERSION_GE(user_version, FI_VERSION(1, 20)) && (flags & FI_MR_DMABUF))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line too long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

include/ofi_mr.h Outdated
int i;

for (i = 0; i < count; i++) {
iov[i].iov_base = (void *) ((uintptr_t) dmabuf[i].base_addr + dmabuf[i].offset);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line too long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -634,7 +634,7 @@ int efa_mr_is_cuda_memory_freed(struct efa_mr *efa_mr, bool *freed)
* negative libfabric error code on failure.
*/
static
int efa_mr_update_domain_mr_map(struct efa_mr *efa_mr, struct fi_mr_attr *mr_attr)
int efa_mr_update_domain_mr_map(struct efa_mr *efa_mr, struct fi_mr_attr *mr_attr, uint64_t flags)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line too long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -750,14 +750,14 @@ int efa_mr_update_domain_mr_map(struct efa_mr *efa_mr, struct fi_mr_attr *mr_att
}
#else /* HAVE_CUDA */
static
int efa_mr_update_domain_mr_map(struct efa_mr *efa_mr, struct fi_mr_attr *mr_attr)
int efa_mr_update_domain_mr_map(struct efa_mr *efa_mr, struct fi_mr_attr *mr_attr, uint64_t flags)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line too long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

When a mr is registered with FI_MR_DMABUF, mr_iov
is replaced by the fi_mr_dmabuf object in the
mr_attr. When inserting the mr to mr map, libfabric
needs to translate the dmabuf object to mr_iov.

Signed-off-by: Shi Jin <sjina@amazon.com>
When mr is registered with FI_MR_DMABUF bit in the flag,
ofi_mr_attr_update should pass the fi_mr_dmabuf object
from the attr user passed

Signed-off-by: Shi Jin <sjina@amazon.com>
Convert dmabuf to iov for ofi_mr_info in ofi_mr_cache_find
and ofi_mr_cache_reg.

When constructing ofi_mr_info from mr_attr for ofi_mr_cache_search.
Convert dmabuf to iov if the flags has FI_MR_DMABUF.

Signed-off-by: Shi Jin <sjina@amazon.com>
@j-xiong
Copy link
Contributor

j-xiong commented Nov 6, 2023

bot:aws:retest

@shijin-aws
Copy link
Contributor Author

@j-xiong ok to merge?

@j-xiong j-xiong merged commit 07b396f into ofiwg:main Nov 6, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants