-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TOPI] Improve memory layout inside GPU NMS kernel #7257
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I'm glad this speeds up NMS, that confirms some of our suspicions from [Torch] Restore class-aware NMS for detection models by graph rewrite #7154
- I don't love that after this PR, ever framework but MXNET will have an op of the form: concat->split->nms->concat->split. Can we talk about moving the split and concat steps out of this op and reworking the frameworks for a new API?
Even after this, I think we will still need a loop over classes for ONNX and TF, since ONNX explicitly and TF implicitly need max_output_boxes_per_class, while this op even with class id will return max_output_boxes for all classes. |
I don't quite follow, maybe you are missing something? First, this PR doesn't change our NMS API, it only changes the buffer layouts used internally. Second, the final concat is only required for MXNet, which uses (The valid entries are supposed to move to the top, if tvm/python/tvm/topi/cuda/nms.py Lines 762 to 763 in 4e8cc4f
If |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, yep.
We don't seem to have a test for invalid_to_bottom, it's False by default:
tvm/python/tvm/relay/op/vision/nms.py
Line 70 in 4e8cc4f
invalid_to_bottom=False, |
tvm/python/tvm/relay/frontend/mxnet.py
Line 1050 in 4e8cc4f
invalid_to_bottom=True, |
It might be a bug because it never runs outside of the gluon ssd tutorial, which doesn't check box order:
tvm/tutorials/frontend/deploy_ssd_gluoncv.py
Lines 111 to 127 in 926a315
for target in ["llvm", "cuda"]: | |
ctx = tvm.context(target, 0) | |
if ctx.exist: | |
lib = build(target) | |
class_IDs, scores, bounding_boxs = run(lib, ctx) | |
###################################################################### | |
# Display result | |
ax = utils.viz.plot_bbox( | |
img, | |
bounding_boxs.asnumpy()[0], | |
scores.asnumpy()[0], | |
class_IDs.asnumpy()[0], | |
class_names=block.classes, | |
) | |
plt.show() |
We should probably open an issue for it, or add a unit test and fix the kernel.
I'd still like to see us change the NMS API to better reflect what's going on internally (i.e., remove the concat/split for TF/ONNX/Pytorch), but that doesn't need to be this PR.
Yes, ideally I want to update our NMS to be closer to TF/ONNX/PyTorch, and let MXNet frontend handle split and concat, rather than the other way around (what we have now). Current API is over complicated due to the need to support both styles. If we can assume that Supporting |
commit fe8fda81774c2e1a4d434179f62e3a299e084cb7 Author: Masahiro Masuda <masahi129@gmail.com> Date: Wed Dec 30 20:31:29 2020 +0900 fix write by a single thread commit 0c21e36 Author: Masahiro Masuda <masahi129@gmail.com> Date: Tue Dec 29 04:32:18 2020 +0900 minor improvement when topk is available commit 68c6866 Author: Masahiro Masuda <masahi129@gmail.com> Date: Tue Dec 29 04:10:24 2020 +0900 finish concat output commit 37d7a19 Author: Masahiro Masuda <masahi129@gmail.com> Date: Tue Dec 29 03:59:28 2020 +0900 fixed topk handling commit 1913f97 Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 21:34:24 2020 +0900 more refactoring commit 70c65f0 Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 21:27:15 2020 +0900 unpack input data commit 3a27397 Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 21:22:16 2020 +0900 slight change to initialization commit 9b42008 Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 19:50:36 2020 +0900 add some comments, remove check the check on negative class id commit 0aa375d Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 19:39:49 2020 +0900 leave a TODO on write by only one thread commit d75ee0a Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 19:13:04 2020 +0900 temp disable write by only thread 0 commit 20b5630 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sat Dec 26 10:06:43 2020 +0900 use one block two avoid global sync issue commit dd1e230 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sat Dec 26 07:59:19 2020 +0900 make NMS inner loop parallel fix write by a single thread
48beb10
to
03fd7ba
Compare
Thanks @mbrookhart |
commit fe8fda81774c2e1a4d434179f62e3a299e084cb7 Author: Masahiro Masuda <masahi129@gmail.com> Date: Wed Dec 30 20:31:29 2020 +0900 fix write by a single thread commit 0c21e36 Author: Masahiro Masuda <masahi129@gmail.com> Date: Tue Dec 29 04:32:18 2020 +0900 minor improvement when topk is available commit 68c6866 Author: Masahiro Masuda <masahi129@gmail.com> Date: Tue Dec 29 04:10:24 2020 +0900 finish concat output commit 37d7a19 Author: Masahiro Masuda <masahi129@gmail.com> Date: Tue Dec 29 03:59:28 2020 +0900 fixed topk handling commit 1913f97 Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 21:34:24 2020 +0900 more refactoring commit 70c65f0 Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 21:27:15 2020 +0900 unpack input data commit 3a27397 Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 21:22:16 2020 +0900 slight change to initialization commit 9b42008 Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 19:50:36 2020 +0900 add some comments, remove check the check on negative class id commit 0aa375d Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 19:39:49 2020 +0900 leave a TODO on write by only one thread commit d75ee0a Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 19:13:04 2020 +0900 temp disable write by only thread 0 commit 20b5630 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sat Dec 26 10:06:43 2020 +0900 use one block two avoid global sync issue commit dd1e230 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sat Dec 26 07:59:19 2020 +0900 make NMS inner loop parallel fix write by a single thread
commit fe8fda81774c2e1a4d434179f62e3a299e084cb7 Author: Masahiro Masuda <masahi129@gmail.com> Date: Wed Dec 30 20:31:29 2020 +0900 fix write by a single thread commit 0c21e36 Author: Masahiro Masuda <masahi129@gmail.com> Date: Tue Dec 29 04:32:18 2020 +0900 minor improvement when topk is available commit 68c6866 Author: Masahiro Masuda <masahi129@gmail.com> Date: Tue Dec 29 04:10:24 2020 +0900 finish concat output commit 37d7a19 Author: Masahiro Masuda <masahi129@gmail.com> Date: Tue Dec 29 03:59:28 2020 +0900 fixed topk handling commit 1913f97 Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 21:34:24 2020 +0900 more refactoring commit 70c65f0 Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 21:27:15 2020 +0900 unpack input data commit 3a27397 Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 21:22:16 2020 +0900 slight change to initialization commit 9b42008 Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 19:50:36 2020 +0900 add some comments, remove check the check on negative class id commit 0aa375d Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 19:39:49 2020 +0900 leave a TODO on write by only one thread commit d75ee0a Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 19:13:04 2020 +0900 temp disable write by only thread 0 commit 20b5630 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sat Dec 26 10:06:43 2020 +0900 use one block two avoid global sync issue commit dd1e230 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sat Dec 26 07:59:19 2020 +0900 make NMS inner loop parallel fix write by a single thread
commit fe8fda81774c2e1a4d434179f62e3a299e084cb7 Author: Masahiro Masuda <masahi129@gmail.com> Date: Wed Dec 30 20:31:29 2020 +0900 fix write by a single thread commit 0c21e36 Author: Masahiro Masuda <masahi129@gmail.com> Date: Tue Dec 29 04:32:18 2020 +0900 minor improvement when topk is available commit 68c6866 Author: Masahiro Masuda <masahi129@gmail.com> Date: Tue Dec 29 04:10:24 2020 +0900 finish concat output commit 37d7a19 Author: Masahiro Masuda <masahi129@gmail.com> Date: Tue Dec 29 03:59:28 2020 +0900 fixed topk handling commit 1913f97 Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 21:34:24 2020 +0900 more refactoring commit 70c65f0 Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 21:27:15 2020 +0900 unpack input data commit 3a27397 Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 21:22:16 2020 +0900 slight change to initialization commit 9b42008 Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 19:50:36 2020 +0900 add some comments, remove check the check on negative class id commit 0aa375d Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 19:39:49 2020 +0900 leave a TODO on write by only one thread commit d75ee0a Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 19:13:04 2020 +0900 temp disable write by only thread 0 commit 20b5630 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sat Dec 26 10:06:43 2020 +0900 use one block two avoid global sync issue commit dd1e230 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sat Dec 26 07:59:19 2020 +0900 make NMS inner loop parallel fix write by a single thread
commit fe8fda81774c2e1a4d434179f62e3a299e084cb7 Author: Masahiro Masuda <masahi129@gmail.com> Date: Wed Dec 30 20:31:29 2020 +0900 fix write by a single thread commit 0c21e36 Author: Masahiro Masuda <masahi129@gmail.com> Date: Tue Dec 29 04:32:18 2020 +0900 minor improvement when topk is available commit 68c6866 Author: Masahiro Masuda <masahi129@gmail.com> Date: Tue Dec 29 04:10:24 2020 +0900 finish concat output commit 37d7a19 Author: Masahiro Masuda <masahi129@gmail.com> Date: Tue Dec 29 03:59:28 2020 +0900 fixed topk handling commit 1913f97 Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 21:34:24 2020 +0900 more refactoring commit 70c65f0 Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 21:27:15 2020 +0900 unpack input data commit 3a27397 Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 21:22:16 2020 +0900 slight change to initialization commit 9b42008 Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 19:50:36 2020 +0900 add some comments, remove check the check on negative class id commit 0aa375d Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 19:39:49 2020 +0900 leave a TODO on write by only one thread commit d75ee0a Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 19:13:04 2020 +0900 temp disable write by only thread 0 commit 20b5630 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sat Dec 26 10:06:43 2020 +0900 use one block two avoid global sync issue commit dd1e230 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sat Dec 26 07:59:19 2020 +0900 make NMS inner loop parallel fix write by a single thread
commit fe8fda81774c2e1a4d434179f62e3a299e084cb7 Author: Masahiro Masuda <masahi129@gmail.com> Date: Wed Dec 30 20:31:29 2020 +0900 fix write by a single thread commit 0c21e36 Author: Masahiro Masuda <masahi129@gmail.com> Date: Tue Dec 29 04:32:18 2020 +0900 minor improvement when topk is available commit 68c6866 Author: Masahiro Masuda <masahi129@gmail.com> Date: Tue Dec 29 04:10:24 2020 +0900 finish concat output commit 37d7a19 Author: Masahiro Masuda <masahi129@gmail.com> Date: Tue Dec 29 03:59:28 2020 +0900 fixed topk handling commit 1913f97 Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 21:34:24 2020 +0900 more refactoring commit 70c65f0 Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 21:27:15 2020 +0900 unpack input data commit 3a27397 Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 21:22:16 2020 +0900 slight change to initialization commit 9b42008 Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 19:50:36 2020 +0900 add some comments, remove check the check on negative class id commit 0aa375d Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 19:39:49 2020 +0900 leave a TODO on write by only one thread commit d75ee0a Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Dec 28 19:13:04 2020 +0900 temp disable write by only thread 0 commit 20b5630 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sat Dec 26 10:06:43 2020 +0900 use one block two avoid global sync issue commit dd1e230 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sat Dec 26 07:59:19 2020 +0900 make NMS inner loop parallel fix write by a single thread
Motivation
Currently, our NMS API expects input and output data to be in a format
[batch, num_anchors, 6]
, where 6 is for 4 bbox coordinates, score and class id. This format is highly inefficient for class-aware NMS, because to check if two bboxes are the same class, we need to load other 5 values. More discussion in #7154 (comment) and other comments in that PR.This PR improved the memory layout of buffers used in the NMS kernel. Specifically, it unpacks
[batch, num_anchors, 6]
input into one[batch, num_anchors, 4]
buffer for bboxes, and two[batch, num_anchors]
buffers for scores and class ids.Speed up
This change is expected to give a good speed up for the cases when the number valid boxes is large and there are class ids involved. For example, although the change in #7154 brings only modest speed up, combined with this PR the speed up becomes good, as shown below:
Other consideration
For other cases, it is likely that there would be no speed up, but it should be no worse. For example, it doesn't improve NMS workload from Gluon SSD. When
return_indices=False
, which I think only applies to MXNet, there is additional concatenation of three buffers at the end. For Gluon SSD, this concat takes 40 micro sec on GTX 1070 ti, while the main NMS kernel takes 200 micro sec. If people think this additional concat is expensive, we really need to rethink our NMS API: The same concat is already happenning in all other frontends, because all frameworks other than MXNet gives NMS inputs in an unpacked form and concat is necessary to satisfy our NMS API.please review @mbrookhart @Laurawly @vinx13 @trevor-m