Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove all non-trivial view handling in Distributor #1103

Merged
merged 2 commits into from
Jun 11, 2024

Conversation

aprokop
Copy link
Contributor

@aprokop aprokop commented Jun 6, 2024

The rationale: we only need to deal with simple 1D views internally. All the complexity was stemming from trying to handle multiple dimensions. It really is unnecessary.

This patch does the following:

  • Gets rid of sendAcrossNetwork
    No longer necessary as it does the same thing as doPostsAndWaits
  • Simplifies doPostsAndWaits
    Straightforward handling of views
  • Optimizes doPostsAndWaits
    Only parts that need to be sent are copied to the host (when running in non-GPU aware MPI)
  • Moves testing doPostsAndWaits from tstDetailsDistributedTreeUtils.cpp into a separate tstDetailsDistributor.cpp

@aprokop aprokop added performance Something is slower than it should be refactoring Code reorganization labels Jun 6, 2024
@aprokop
Copy link
Contributor Author

aprokop commented Jun 7, 2024

Testing failure in CUDA-11.1.1-NVCC is irrelevant.

@aprokop
Copy link
Contributor Author

aprokop commented Jun 7, 2024

@masterleinad Could I ask you to review this PR? You are probably the person most familiar with the code, and may spot the stupid stuff I missed.

@aprokop aprokop force-pushed the remove_all_nd_distributed branch 2 times, most recently from 8db61db to 29ada0e Compare June 7, 2024 20:58
@aprokop aprokop force-pushed the remove_all_nd_distributed branch from 29ada0e to 35bbe1d Compare June 8, 2024 14:43
Comment on lines +354 to +358
auto imports_comm = Kokkos::create_mirror_view(Kokkos::WithoutInitializing,
MirrorSpace{}, imports);
#else
auto exports_comm = permuted_exports;
auto imports_comm = imports;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the exports_comm size differs based on ARBORX_ENABLE_GPU_AWARE_MPI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it doesn't. It's always the size of permuted_exports, which is the size of exports.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. We just omit copying based on that option.

requests.emplace_back();
MPI_Irecv(receive_buffer_ptr, message_size, MPI_BYTE, _sources[i], 123,
_comm, &requests.back());
}
}

// make sure the data in dest_buffer has been copied before sending it.
if (permutation_necessary)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need an extra fence now if no permutation is necessary?

Copy link
Contributor Author

@aprokop aprokop Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it is for both GPU-aware and non-GPU aware paths. For non-GPU aware, we need to make sure the (parts of) export data are finished copying to the host. I think the right check for this would be "(if gpu-aware and permutation is necessary) or (not gpu-aware)".

Edit: additionally, it may be even different worse than that. If we launched a kernel using space that fills the exports prior to calling doPostsAndWaits, and we are in the GPU-aware mode, we can't guarantee that kernel completed. So, we have to call fence even if gpu-aware and permutation is unnecessary.

src/details/ArborX_DetailsDistributor.hpp Show resolved Hide resolved
@aprokop aprokop merged commit dda1230 into arborx:master Jun 11, 2024
1 of 2 checks passed
@aprokop aprokop deleted the remove_all_nd_distributed branch June 11, 2024 17:27
@aprokop
Copy link
Contributor Author

aprokop commented Jun 11, 2024

@masterleinad Thank you for the review!

@aprokop aprokop mentioned this pull request Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Something is slower than it should be refactoring Code reorganization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants