Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NMS with CUDA only #1824

Merged
merged 5 commits into from
Apr 15, 2022
Merged

NMS with CUDA only #1824

merged 5 commits into from
Apr 15, 2022

Conversation

grimoire
Copy link
Member

This PR add a cuda kernel for nms to avoid computation on cpu.

I am not sure if I should call this an "optimization" cause on small input data, cpu performance might even better than gpu. And the data distribution will also affect the performance.

envs:

  • Device: RTX 2070super
  • CUDA: 11.3
  • nvidia driver: 465.19.01
  • CPU: Intel(R) Core(TM) i7-9700KF CPU @ 3.60GHz

Test data comes from faster rcnn and demo image. Both nms in rpn and nms in bbox head are tested.

- old new
rpn data(4741 boxes) 0.98435ms 0.88798ms
rcnn data(583 boxes) 0.23121ms 0.22827ms
random data(500 boxes) 0.23056ms 0.29762ms
random data(1000 boxes) 0.26279ms 0.41685ms
random data(5000 boxes) 1.27405ms 1.34129ms
random data(10000 boxes) 3.62377ms 2.74194ms
random data(20000 boxes) 21.07619ms 4.80819ms

Real data might have clustered bboxes, which reduce gpu computations. I guess that is why real data performance is better than random data.

@zhouzaida zhouzaida assigned ZwwWayne and teamwong111 and unassigned ZwwWayne Apr 13, 2022
@ZwwWayne ZwwWayne merged commit 74031cc into open-mmlab:master Apr 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants