Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenCL] Fix OpenCL get_valid_counts errors due to intrinsic atomic_add #5857

Merged
merged 3 commits into from
Jun 30, 2020

Conversation

trevor-m
Copy link
Contributor

@trevor-m trevor-m commented Jun 19, 2020

Some fixes a few months ago to the get_valid_counts CUDA implementation broke OpenCL because of the atomic add intrinsic which was added.

This PR fixes get_valid_counts for OpenCL with the following changes:

  1. Register intrinsic atomic add for OpenCL.
  2. Override intrinsic::tvm_address_of to include storage scope (e.g. __global).
  3. Enable cl_khr_global_int32_base_atomics. This isn't required for OpenCL 1.1+ because atomic_add became a core feature. I'm happy to remove this if we don't care about OpenCL 1.0. Alternatively we can override op->call_type == CallNode::PureExtern and set a flag to enable this only when atomic_add is actually used.

Original error messages before this fix:

  1. During compilation:
Unresolved intrinsic atomic_add with return type int32
  1. During runtime:
<source>:6922:43: error: casting '__global void *' to type 'int *' changes address space of pointer
      atomic_add_return[(0)] = atomic_add(((int *)get_valid_counts_v0 + 0), 1);

@trevor-m
Copy link
Contributor Author

@Laurawly @kazum @wpan11nv Could you please review? Thanks!

Copy link
Contributor

@wpan11nv wpan11nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any unit test?

@trevor-m
Copy link
Contributor Author

Any unit test?

RELAY_TEST_TARGETS=opencl python3 tests/python/relay/test_op_level5.py will test this.

We would need to add opencl to ctx_list to have this run by default https://github.com/apache/incubator-tvm/blob/master/python/tvm/relay/testing/config.py#L28

Currently the CI doesn't test anything for opencl which is why we don't find out about these errors until much later. Do we know why we don't test opencl?

@trevor-m trevor-m force-pushed the fix-getvalidcounts-opencl branch 2 times, most recently from 9a19371 to 8f657a0 Compare June 23, 2020 16:02
@trevor-m
Copy link
Contributor Author

trevor-m commented Jun 23, 2020

@wpan11nv Any more comments?

Copy link
Contributor

@wpan11nv wpan11nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@zhiics
Copy link
Member

zhiics commented Jun 24, 2020

@kazum can you take a look and manage the PR? Thanks.

@kazum kazum self-assigned this Jun 25, 2020
# get_valid_count for cuda doesn't do data rearrangement
if target == 'cuda':
# get_valid_count for cuda, opencl doesn't do data rearrangement
if target in ['cuda', 'opencl']:
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning here looks wrong to me. The test in the below link doesn't work for OpenCL too because we don't do data rearrangement for GPU nms implementation.
https://discuss.tvm.ai/t/nms-compile-fails-for-cuda-target-but-works-fine-for-llvm-target/7045/2

Probably, we should fix non_max_suppression for GPU first?

Copy link
Contributor Author

@trevor-m trevor-m Jun 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenCL uses the same implementation as CUDA. The CUDA implementation of get_valid_counts was changed to no longer rearrange the output of get_valid_counts because it will be rearranged by NMS later anyway. This gives the correct output for NMS. See #5339

That issue with NMS looks to be a separate issue where the CUDA implementation wasn't fully updated to match changes to CPU implementation by #4312

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your explanation. Actually, I've successfully build NMS if I revert the change in #4312.

Copy link
Contributor

@kazum kazum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I'll merge this after CI is passed.

@trevor-m
Copy link
Contributor Author

Looks good to me. I'll merge this after CI is passed.

Thanks!

@tqchen
Copy link
Member

tqchen commented Jun 28, 2020

@trevor-m please rebase against the master

@tqchen tqchen added the status: need update need update based on feedbacks label Jun 29, 2020
@trevor-m
Copy link
Contributor Author

@kazum @tqchen Rebased and CI passed. Thanks!

@kazum kazum merged commit b3d3ff2 into apache:master Jun 30, 2020
@kazum
Copy link
Contributor

kazum commented Jun 30, 2020

Thanks @trevor-m @wpan11nv !

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Jun 30, 2020
…dd (apache#5857)

* [OpenCL] Fix atomic add used by get_valid_counts

* Rename l -> load, add flag to enable atomics

* Opencl doesn't do data rearrangement
zhiics pushed a commit to neo-ai/tvm that referenced this pull request Jul 2, 2020
…dd (apache#5857)

* [OpenCL] Fix atomic add used by get_valid_counts

* Rename l -> load, add flag to enable atomics

* Opencl doesn't do data rearrangement
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: need review status: need update need update based on feedbacks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants