Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

btl/vader: when using single-copy emulation fragment large rdma #6961

Merged
merged 1 commit into from
Sep 7, 2019

Conversation

hjelmn
Copy link
Member

@hjelmn hjelmn commented Sep 6, 2019

This commit changes how the single-copy emulation in the vader btl
operates. Before this change the BTL set its put and get limits
based on the max send size. After this change the limits are unset
and the put or get operation is fragmented internally.

References #6568

Signed-off-by: Nathan Hjelm hjelmn@google.com

This commit changes how the single-copy emulation in the vader btl
operates. Before this change the BTL set its put and get limits
based on the max send size. After this change the limits are unset
and the put or get operation is fragmented internally.

References open-mpi#6568

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
@hjelmn
Copy link
Member Author

hjelmn commented Sep 6, 2019

@jsquyres Seems to be working. Ran with local ompi-tests ibm test suite and it is passing. Also passing the reproducer from #6568.

@jsquyres
Copy link
Member

jsquyres commented Sep 6, 2019

@hjelmn is this an alternate workaround to the OB1 PUT issue?

@hjelmn
Copy link
Member Author

hjelmn commented Sep 6, 2019

This is a work around to issues with fragmented gets in ob1.

@hjelmn
Copy link
Member Author

hjelmn commented Sep 6, 2019

Probably helps with puts too.

@bosilca bosilca merged commit d7f6dd0 into open-mpi:master Sep 7, 2019
@bosilca
Copy link
Member

bosilca commented Sep 7, 2019

For the sake of completeness, this provides a nice partial solution to the PUT issue, but ensuring that the vader BTL accepts to handle very large fragments, removing the PML need to fragment messages. However, it does not fix the UT protocol, which remains broken for all multi-fragments messages.

@al-rigazzi
Copy link

@bosilca can you be more specific? What protocols and components are affected?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants