-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use same GPU stream for all kernels #296
Conversation
One thing I didn't think too deeply about while making this PR is what to do with calls to cudaMemcpy (non-async). I believe those API calls run on stream 0 which isn't a correctness problem as the RAJA default stream is synchronous with stream 0, but it isn't really in the spirit of the change in this PR either. I'm thinking about adding a resource argument to some of our memory helper functions to work around this. |
@MrBurmark the mem copy calls are outside of the kernel timing regions, so does it matter? |
By default the default resource from camp is used. This uses a different stream by default.
4f84161
to
120c5f2
Compare
Most of them don't matter, but there are some that are used in the timed loop in reduction kernels. I've been trying it out and it looks like there is a performance penalty for using implicitly synchronized streams compared to a single stream. I'm going to rewrite those memory calls to explicitly call cuda|hipMemcpyAsync and streamSynchronize. |
I made the change for the kernels using memcpy and synchronize calls in the timed portion. They now use memcpyAsync+streamSynchronize or streamSynchronize to use their stream. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of changes for something we should have thought about earlier, huh? 😄 Thank you for working through this.
…f into feature/burmark1/gpu_stream
46fd529
to
5517fc0
Compare
89fb9f5
to
39b748f
Compare
fc2ce08
to
962bac4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonderful!
@MrBurmark let's merge this one next when it gets through tioga CI. Also, we need to make sure that all new kernels are following this pattern. |
Use the same GPU stream for all kernels
Use a specific GPU stream for all Cuda/Hip kernels. This is done by using the same resource for all kernels. By default this is the RAJA default stream, but can be changed to stream 0 with the
--gpu_stream_0
command line argument.