-
Notifications
You must be signed in to change notification settings - Fork 871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osc/rdma: modify attach to check for region overlap #7387
osc/rdma: modify attach to check for region overlap #7387
Conversation
@DmitryLyakh should fix the issue. Keep in mind to run the app you will have to increase the maximum number of attachments. The default was 32 (now 64) but in order to run the app from #7384 you need to bump that to 128 ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the dynamic attachment doc again, I agree that MPI-3.1 appears to allow (2). I guess that makes sense, but also makes my head hurt.
@@ -218,7 +219,7 @@ static int ompi_osc_rdma_component_register (void) | |||
MCA_BASE_VAR_SCOPE_LOCAL, &mca_osc_rdma_component.buffer_size); | |||
free(description_str); | |||
|
|||
mca_osc_rdma_component.max_attach = 32; | |||
mca_osc_rdma_component.max_attach = 64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call out this change in the commit log? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should probably break it out into a different commit. I realized that 32, while safer for Cray systems, is probably too low for most use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
ec331c7
to
96ed630
Compare
This commit addresses two issues in osc/rdma: 1) It is erroneous to attach regions that overlap. This was being allowed but the standard does not allow overlapping attachments. 2) Overlapping registration regions (4k alignment of attachments) appear to be allowed. Add attachment bases to the bookeeping structure so we can keep better track of what can be detached. It is possible that the standard did not intend to allow #2. If that is the case then #2 should fail in the same way as #1. There should be no technical reason to disallow #2 at this time. References open-mpi#7384 Signed-off-by: Nathan Hjelm <hjelmn@google.com>
This commit increaes the osc_rdma_max_attach variable from 32 to 64. The new default is kept low due to the small number of registration resources on some systems (Cray Aries). A larger max attachement value can be set by the user on other systems. Signed-off-by: Nathan Hjelm <hjelmn@google.com>
96ed630
to
54c8233
Compare
This commit addresses two issues in osc/rdma:
It is erroneous to attach regions that overlap. This was being
allowed but the standard does not allow overlapping attachments.
Overlapping registration regions (4k alignment of attachments)
appear to be allowed. Add attachment bases to the bookeeping
structure so we can keep better track of what can be detached.
It is possible that the standard did not intend to allow #2. If that
is the case then #2 should fail in the same way as #1. There should
be no technical reason to disallow #2 at this time.
References #7384
Signed-off-by: Nathan Hjelm hjelmn@google.com