-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OpenCL] Fix OpenCL get_valid_counts errors due to intrinsic atomic_add #5857
Conversation
93c2d33
to
dc6b48e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any unit test?
We would need to add Currently the CI doesn't test anything for opencl which is why we don't find out about these errors until much later. Do we know why we don't test opencl? |
9a19371
to
8f657a0
Compare
@wpan11nv Any more comments? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@kazum can you take a look and manage the PR? Thanks. |
# get_valid_count for cuda doesn't do data rearrangement | ||
if target == 'cuda': | ||
# get_valid_count for cuda, opencl doesn't do data rearrangement | ||
if target in ['cuda', 'opencl']: | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning here looks wrong to me. The test in the below link doesn't work for OpenCL too because we don't do data rearrangement for GPU nms implementation.
https://discuss.tvm.ai/t/nms-compile-fails-for-cuda-target-but-works-fine-for-llvm-target/7045/2
Probably, we should fix non_max_suppression for GPU first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenCL uses the same implementation as CUDA. The CUDA implementation of get_valid_counts
was changed to no longer rearrange the output of get_valid_counts
because it will be rearranged by NMS later anyway. This gives the correct output for NMS. See #5339
That issue with NMS looks to be a separate issue where the CUDA implementation wasn't fully updated to match changes to CPU implementation by #4312
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your explanation. Actually, I've successfully build NMS if I revert the change in #4312.
8f657a0
to
fa183ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I'll merge this after CI is passed.
Thanks! |
@trevor-m please rebase against the master |
fa183ce
to
1db9f1a
Compare
…dd (apache#5857) * [OpenCL] Fix atomic add used by get_valid_counts * Rename l -> load, add flag to enable atomics * Opencl doesn't do data rearrangement
…dd (apache#5857) * [OpenCL] Fix atomic add used by get_valid_counts * Rename l -> load, add flag to enable atomics * Opencl doesn't do data rearrangement
Some fixes a few months ago to the
get_valid_counts
CUDA implementation broke OpenCL because of the atomic add intrinsic which was added.This PR fixes
get_valid_counts
for OpenCL with the following changes:intrinsic::tvm_address_of
to include storage scope (e.g.__global
).cl_khr_global_int32_base_atomics
. This isn't required for OpenCL 1.1+ because atomic_add became a core feature. I'm happy to remove this if we don't care about OpenCL 1.0. Alternatively we can overrideop->call_type == CallNode::PureExtern
and set a flag to enable this only whenatomic_add
is actually used.Original error messages before this fix: