We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
We are always using asynchronous thrust launch on a cuda stream, which involves extra cudaStreamSync within thrust calls, e.g.,
cudaStreamSync
wholegraph/cpp/src/wholememory_ops/functions/exchange_ids_nccl_func.cu
Line 63 in 9f290c4
wholegraph/cpp/src/wholegraph_ops/unweighted_sample_without_replacement_func.cuh
Line 340 in 9f290c4
It would be better to change to thrust::cuda::par_nosync, to make it easier to overlap with other operations.
thrust::cuda::par_nosync
The text was updated successfully, but these errors were encountered:
Sorry for the late reply. wg 24.04 is closing, is it ok if we fix this in 24.06?
Sorry, something went wrong.
remove unnecessary sync between thrust ops and host threads (#160)
f8cadcf
fix to issue 148[https://github.com/rapidsai/wholegraph/issues/148](url), remove unnecessary sync between thrust ops and host cpu threads Authors: - https://github.com/linhu-nv Approvers: - Chuang Zhu (https://github.com/chuangz0) URL: #160
No branches or pull requests
We are always using asynchronous thrust launch on a cuda stream, which involves extra
cudaStreamSync
within thrust calls, e.g.,wholegraph/cpp/src/wholememory_ops/functions/exchange_ids_nccl_func.cu
Line 63 in 9f290c4
wholegraph/cpp/src/wholegraph_ops/unweighted_sample_without_replacement_func.cuh
Line 340 in 9f290c4
It would be better to change to
thrust::cuda::par_nosync
, to make it easier to overlap with other operations.The text was updated successfully, but these errors were encountered: