-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove all non-trivial view handling in Distributor #1103
Conversation
Testing failure in |
@masterleinad Could I ask you to review this PR? You are probably the person most familiar with the code, and may spot the stupid stuff I missed. |
8db61db
to
29ada0e
Compare
29ada0e
to
35bbe1d
Compare
auto imports_comm = Kokkos::create_mirror_view(Kokkos::WithoutInitializing, | ||
MirrorSpace{}, imports); | ||
#else | ||
auto exports_comm = permuted_exports; | ||
auto imports_comm = imports; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the exports_comm
size differs based on ARBORX_ENABLE_GPU_AWARE_MPI
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it doesn't. It's always the size of permuted_exports
, which is the size of exports
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. We just omit copying based on that option.
requests.emplace_back(); | ||
MPI_Irecv(receive_buffer_ptr, message_size, MPI_BYTE, _sources[i], 123, | ||
_comm, &requests.back()); | ||
} | ||
} | ||
|
||
// make sure the data in dest_buffer has been copied before sending it. | ||
if (permutation_necessary) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need an extra fence now if no permutation is necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it is for both GPU-aware and non-GPU aware paths. For non-GPU aware, we need to make sure the (parts of) export data are finished copying to the host. I think the right check for this would be "(if gpu-aware and permutation is necessary) or (not gpu-aware)".
Edit: additionally, it may be even different worse than that. If we launched a kernel using space
that fills the exports
prior to calling doPostsAndWaits
, and we are in the GPU-aware mode, we can't guarantee that kernel completed. So, we have to call fence even if gpu-aware and permutation is unnecessary.
@masterleinad Thank you for the review! |
The rationale: we only need to deal with simple 1D views internally. All the complexity was stemming from trying to handle multiple dimensions. It really is unnecessary.
This patch does the following:
sendAcrossNetwork
No longer necessary as it does the same thing as
doPostsAndWaits
doPostsAndWaits
Straightforward handling of views
doPostsAndWaits
Only parts that need to be sent are copied to the host (when running in non-GPU aware MPI)
doPostsAndWaits
fromtstDetailsDistributedTreeUtils.cpp
into a separatetstDetailsDistributor.cpp