-
Notifications
You must be signed in to change notification settings - Fork 750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][CUDA][HIP][PI] Fix barrier #6490
Conversation
I think none of these failures are actually related to the changes in this PR. |
@@ -381,8 +381,25 @@ pi_result cuda_piEventRetain(pi_event event); | |||
|
|||
/// \endcond | |||
|
|||
void _pi_queue::compute_stream_wait_for_barrier_if_needed(CUstream stream, | |||
pi_uint32 stream_i) { | |||
if (barrier_event_ && !compute_applied_barrier_[stream_i]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to check if the barrier event has finished prior to this so we can potentially clear it and skip the cuStreamWaitEvent call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are proposing an additional CUDA API call for potentially every operation that might let us eliminate additional calls in the future. Whether this makes sense to do or not depends on whether we expect most streams will have work enqueued to them before or after all the work before the barrier is finished. If the answer is before, the current code will be more performant. If the answer is after, we want to do what you suggest.
My gut feeling says current implementation will be better for most use cases. However the reasoning from the previous paragraph assumes that cuStreamWaitEvent
and cuEventQuery
take roughly the same time. Looking into that could give us more information that could inform this decision. However, for now I would leave this as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this is potentially more work up-front, though I would expect cuEventQuery
to be somewhat lightweight, but doing some benchmarking for it may make sense before a final decision is made on this. I am okay to keep as-is. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@steffenlarsen @smaslov-intel is this okay to merge now or are there further reviews we should request? I also see there is a failure for the ESIMD job, I assume this is unrelated. |
ESIMD failures seem to happen on other PRs as well. Lets merge this. |
Improves the test for barrier to make it actually fail if the barrier implementation does not work. Tests intel/llvm#6490
…m-test-suite#1122) Improves the test for barrier to make it actually fail if the barrier implementation does not work. Tests intel#6490
Fixes a bug in barrier implementation in CUDA and HIP plugins that often caused barrier not to work. The new implementation is also faster.
Tests in: intel/llvm-test-suite#1122