[Feature] enable exporting to onnx for PointRend #4977

DmitriySidnev · 2021-04-15T17:04:03Z

This PR with PR#953 in mmcv enables exporting to ONNX for PointRend based models. Affected models:

CLAassistant · 2021-04-15T17:04:10Z

All committers have signed the CLA.

codecov · 2021-04-15T17:58:33Z

Codecov Report

Merging #4977 (602b06c) into master (6ef9605) will decrease coverage by 0.65%.
The diff coverage is 1.16%.

❗ Current head 602b06c differs from pull request most recent head 0bd53cb. Consider uploading reports for the commit 0bd53cb to get more accurate results

@@            Coverage Diff             @@
##           master    #4977      +/-   ##
==========================================
- Coverage   65.19%   64.54%   -0.66%     
==========================================
  Files         276      267       -9     
  Lines       21265    20618     -647     
  Branches     3534     3484      -50     
==========================================
- Hits        13864    13308     -556     
+ Misses       6647     6546     -101     
- Partials      754      764      +10

Flag	Coverage Δ
unittests	`64.54% <1.16%> (-0.62%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...det/models/roi_heads/mask_heads/mask_point_head.py	`31.53% <0.00%> (-1.18%)`	⬇️
mmdet/models/roi_heads/point_rend_roi_head.py	`13.66% <1.33%> (-5.97%)`	⬇️
mmdet/core/evaluation/eval_hooks.py	`51.70% <0.00%> (-20.53%)`	⬇️
mmdet/models/roi_heads/test_mixins.py	`50.58% <0.00%> (-9.81%)`	⬇️
mmdet/models/dense_heads/rpn_test_mixin.py	`77.41% <0.00%> (-6.46%)`	⬇️
mmdet/models/detectors/cornernet.py	`94.87% <0.00%> (-5.13%)`	⬇️
mmdet/models/roi_heads/mask_heads/fcn_mask_head.py	`65.88% <0.00%> (-3.89%)`	⬇️
mmdet/models/roi_heads/base_roi_head.py	`85.29% <0.00%> (-2.21%)`	⬇️
mmdet/core/bbox/coder/yolo_bbox_coder.py	`58.97% <0.00%> (-2.01%)`	⬇️
mmdet/core/export/onnx_helper.py	`30.76% <0.00%> (-1.49%)`	⬇️
... and 71 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6ef9605...0bd53cb. Read the comment docs.

RunningLeon · 2021-04-16T09:34:19Z

@DmitriySidnev Thanks for the contribution. Please list the models you have verified in the PR description.

QingChuanWS · 2021-04-19T07:33:32Z

Thank you for the contribution, but I have some problems when running your PR, could you provide me with the version of pytorch you used to test these models?

DmitriySidnev · 2021-04-19T09:11:39Z

@QingChuanWS, hi! I use pytorch 1.8.1.

QingChuanWS · 2021-04-26T10:55:12Z

Hi, @DmitriySidnev. Have you verified whether the visualized result of the modified point_rend under pytorch is correct?

mmdet/models/roi_heads/mask_heads/mask_point_head.py

DmitriySidnev · 2021-04-26T14:39:24Z

@QingChuanWS, I have verified both metrics and visualized results.

RunningLeon · 2021-04-29T11:55:46Z

@DmitriySidnev Could you merge with master and test if it is OK for Point Rend model exporting to ONNX with batch and dynamic shape if possible?

DmitriySidnev · 2021-04-29T22:00:50Z

@RunningLeon, I have checked exporting to ONNX with dynamic shape and batch. After fixing a bug in mmcv/ops/point_sample/bilinear_grid_sample function it works. But there is the line if torch.onnx.is_in_onnx_export() and num_imgs == 1: in my code that brakes logic with batch during exporting. It can be simply fixed via exporting with input tensor with first dimension (batch_size) = 2. The result onnx graph is valid.

RunningLeon · 2021-05-08T11:56:22Z

@DmitriySidnev Hi, please allow me to push for this PR.

ERROR: Permission to DmitriySidnev/mmdetection.git denied to RunningLeon.
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

DmitriySidnev · 2021-05-08T15:32:04Z

@RunningLeon, hi! Access permissions granted.

RangiLyu · 2021-05-13T12:11:58Z

Hi~
I tried to test point_rend_r50_caffe_fpn_mstrain_1x_coco onnx model with onnxruntime, but run out of memory.

error message:


Non-zero status code returned while running ScatterElements node. Name:'ScatterElements_2702' Status Message: /onnxruntime_src/onnxruntime/core/framework/bfc_arena.cc:305 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool) Failed to allocate memory for requested buffer of size 1605632000

Traceback (most recent call last):
  File "tools/deployment/test.py", line 132, in <module>
    main()
  File "tools/deployment/test.py", line 111, in main
    args.show_score_thr)
  File "/home/rangilyu/Projects/mmdetection/mmdet/apis/test.py", line 27, in single_gpu_test
    result = model(return_loss=False, rescale=True, **data)
  File "/home/rangilyu/anaconda3/envs/mmonnx/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/rangilyu/Projects/mmcv/mmcv/parallel/data_parallel.py", line 42, in forward
    return super().forward(*inputs, **kwargs)
  File "/home/rangilyu/anaconda3/envs/mmonnx/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 165, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/rangilyu/anaconda3/envs/mmonnx/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/rangilyu/Projects/mmcv/mmcv/runner/fp16_utils.py", line 95, in new_func
    return old_func(*args, **kwargs)
  File "/home/rangilyu/Projects/mmdetection/mmdet/models/detectors/base.py", line 169, in forward
    return self.forward_test(img, img_metas, **kwargs)
  File "/home/rangilyu/Projects/mmdetection/mmdet/core/export/model_wrappers.py", line 74, in forward_test
    self.sess.run_with_iobinding(self.io_binding)
  File "/home/rangilyu/anaconda3/envs/mmonnx/lib/python3.7/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 229, in run_with_iobinding
    self._sess.run_with_iobinding(iobinding._iobinding, run_options)
RuntimeError: Error in execution: Non-zero status code returned while running ScatterElements node. Name:'ScatterElements_2702' Status Message: /onnxruntime_src/onnxruntime/core/framework/bfc_arena.cc:305 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool) Failed to allocate memory for requested buffer of size 1605632000

'ScatterElements_2702' failed to allocate memory for requested buffer of size 1605632000.

Why does this node ScatterElements_2702 need so much memory?

RangiLyu · 2021-05-14T02:11:04Z

Hi~
I tried to test point_rend_r50_caffe_fpn_mstrain_1x_coco onnx model with onnxruntime, but run out of memory.

error message:


Non-zero status code returned while running ScatterElements node. Name:'ScatterElements_2702' Status Message: /onnxruntime_src/onnxruntime/core/framework/bfc_arena.cc:305 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool) Failed to allocate memory for requested buffer of size 1605632000

Traceback (most recent call last):
  File "tools/deployment/test.py", line 132, in <module>
    main()
  File "tools/deployment/test.py", line 111, in main
    args.show_score_thr)
  File "/home/rangilyu/Projects/mmdetection/mmdet/apis/test.py", line 27, in single_gpu_test
    result = model(return_loss=False, rescale=True, **data)
  File "/home/rangilyu/anaconda3/envs/mmonnx/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/rangilyu/Projects/mmcv/mmcv/parallel/data_parallel.py", line 42, in forward
    return super().forward(*inputs, **kwargs)
  File "/home/rangilyu/anaconda3/envs/mmonnx/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 165, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/rangilyu/anaconda3/envs/mmonnx/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/rangilyu/Projects/mmcv/mmcv/runner/fp16_utils.py", line 95, in new_func
    return old_func(*args, **kwargs)
  File "/home/rangilyu/Projects/mmdetection/mmdet/models/detectors/base.py", line 169, in forward
    return self.forward_test(img, img_metas, **kwargs)
  File "/home/rangilyu/Projects/mmdetection/mmdet/core/export/model_wrappers.py", line 74, in forward_test
    self.sess.run_with_iobinding(self.io_binding)
  File "/home/rangilyu/anaconda3/envs/mmonnx/lib/python3.7/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 229, in run_with_iobinding
    self._sess.run_with_iobinding(iobinding._iobinding, run_options)
RuntimeError: Error in execution: Non-zero status code returned while running ScatterElements node. Name:'ScatterElements_2702' Status Message: /onnxruntime_src/onnxruntime/core/framework/bfc_arena.cc:305 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool) Failed to allocate memory for requested buffer of size 1605632000

'ScatterElements_2702' failed to allocate memory for requested buffer of size 1605632000.

Why does this node ScatterElements_2702 need so much memory?

Find a similar issue in onnxruntime(microsoft/onnxruntime#7612). Switched to a lower version of onnxruntime and solved this problem.

RangiLyu · 2021-05-17T02:01:33Z

I have tested the performance of the ONNX model with onnxruntime. Here is the result:

Model	Type	box mAP	box AP50	box AP75	mask mAP	mask AP50	mask AP75
point_rend_r50_caffe_fpn_mstrain_1x_coco	PyTorch	38.4	59.0	41.8	36.3	56.9	38.7
point_rend_r50_caffe_fpn_mstrain_1x_coco	ONNX	37.9	58.4	41.5	34.9	55.9	36.8

The performance degradation of mask is large.

DmitriySidnev · 2021-05-17T10:19:07Z

@RangiLyu, could you please share scripts and commands which you use for export and test the model?

RangiLyu · 2021-05-17T10:50:52Z

@RangiLyu, could you please share scripts and commands which you use for export and test the model?

python tools/deployment/pytorch2onnx.py configs/point_rend/point_rend_r50_caffe_fpn_mstrain_1x_coco.py checkpoints/point_rend_r50_caffe_fpn_mstrain_1x_coco-1bcb5fb4.pth --output-file  checkpoints/point_rend_r50_caffe_fpn_mstrain_1x_coco-1bcb5fb4.onnx --dynamic-export

and

python tools/deployment/test.py configs/point_rend/point_rend_r50_caffe_fpn_mstrain_1x_coco.py checkpoints/point_rend_r50_caffe_fpn_mstrain_1x_coco-1bcb5fb4.onnx --out work_dirs/point_rend_onnx/result.pkl --eval bbox segm

DmitriySidnev · 2021-05-20T12:35:55Z

@RangiLyu, I could not find the reason of metrics degradation. Only one thing that in my view may be related to the problem is a constant spatial resolution of output masks (by default it is 800x1216) in onnx graph. But I am not sure about it. On the other hand significant drop of segmentation metrics probably to be due to lower detection quality.

RunningLeon · 2021-05-21T02:51:52Z

@DmitriySidnev @RangiLyu Could refer to here for possible reasons.

Mask AP of Mask R-CNN drops by 1% for ONNXRuntime. The main reason is that the predicted masks are directly interpolated to original image in PyTorch, while they are at first interpolated to the preprocessed input image of the model and then to original image in ONNXRuntime.

RunningLeon · 2021-05-24T02:20:05Z

@DmitriySidnev , @RangiLyu #5197 may be a possible reason.

AronLin · 2021-05-25T07:08:23Z

I merged this PR with the latest master branch, and tested the performance of the ONNX model with onnxruntime.

Here is the result:

Model	Type	box mAP	box AP50	box AP75	mask mAP	mask AP50	mask AP75
point_rend_r50_caffe_fpn_mstrain_1x_coco	PyTorch	38.4	59.0	41.8	36.3	56.9	38.7
point_rend_r50_caffe_fpn_mstrain_1x_coco	ONNX	38.4	59.0	41.8	35.2	56.5	37.1

It seems all right.

RunningLeon · 2021-05-25T07:22:17Z

Kindly ping@ZwwWayne

RunningLeon · 2021-06-01T08:19:14Z

@DmitriySidnev Hi, could you fix conflicts and refactor the onnx export according to #5205? Thanks a lot.

…ointrend_to_onnx

RangiLyu

LGTM

ZwwWayne · 2021-06-16T13:44:02Z

I think the design can be better. Can we put the complete ONNX export logic to an onnx_export function and separate them from the original inference code?
See example here.

DmitriySidnev · 2021-06-18T09:25:11Z

@ZwwWayne, I was worried about the functionality and now I really do not have enough free time to refactor the code. Can anyone else do this?

ZwwWayne · 2021-06-22T09:28:46Z

@ZwwWayne, I was worried about the functionality and now I really do not have enough free time to refactor the code. Can anyone else do this?

OK, we will put effort on that. Thanks for the efforts.

ZwwWayne · 2021-06-29T05:24:56Z

Merged in #5440

Fix export to onnx for PointRend

96edd27

Dmitry Sidnev added 2 commits April 15, 2021 20:59

Fix codestyle

f71a808

Fix codestyle

adec8fc

ZwwWayne requested a review from RunningLeon April 16, 2021 04:57

RunningLeon added the ONNX label Apr 16, 2021

ZwwWayne assigned RunningLeon Apr 19, 2021

RunningLeon assigned QingChuanWS Apr 20, 2021

ZwwWayne changed the base branch from master to onnx April 21, 2021 03:09

RunningLeon mentioned this pull request Apr 26, 2021

Iteration Plan of v2.13.0 - May 2021 #5064

Closed

13 tasks

RunningLeon reviewed Apr 26, 2021

View reviewed changes

mmdet/models/roi_heads/mask_heads/mask_point_head.py Outdated Show resolved Hide resolved

ZwwWayne changed the base branch from onnx to master April 26, 2021 12:12

Rename variables

caa71a0

Merge branch 'master' into pointrend_onnx

bfca80d

Dmitry Sidnev added 6 commits April 30, 2021 13:49

Fix export to onnx for PointRend

5d46b42

Fix codestyle

b2fcef7

Fix codestyle

113f333

Rename variables

9b03e64

Fix exporting with batch

3f7632b

Fix getting batch size

b0d5be6

DmitriySidnev force-pushed the fix/pointrend_to_onnx branch from 21d41c6 to b0d5be6 Compare April 30, 2021 10:50

support point rend exporting to ONNX with batch

7f4c3db

update doc

c93314f

RunningLeon requested a review from hhaAndroid May 8, 2021 11:57

update doc

28d68d8

RunningLeon unassigned QingChuanWS May 11, 2021

RunningLeon requested a review from ZwwWayne May 25, 2021 07:21

Dmitry Sidnev added 2 commits June 2, 2021 11:51

Merge branch 'master' of github.com:open-mmlab/mmdetection into fix/p…

4d88a31

…ointrend_to_onnx

Fix doc

0bd53cb

ZwwWayne mentioned this pull request Jun 3, 2021

Iteration Plan of v2.14.0 - June 2021 #5279

Closed

11 tasks

RangiLyu self-requested a review June 9, 2021 05:32

RangiLyu approved these changes Jun 16, 2021

View reviewed changes

AronLin added a commit to AronLin/mmdetection that referenced this pull request Jun 23, 2021

Merge open-mmlab#4977 and master

f4f9b56

AronLin mentioned this pull request Jun 24, 2021

[Feature] follow-up work to enable exporting to onnx for PointRend (#4977) #5440

Merged

ZwwWayne closed this Jun 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] enable exporting to onnx for PointRend #4977

[Feature] enable exporting to onnx for PointRend #4977

DmitriySidnev commented Apr 15, 2021 •

edited

Loading

CLAassistant commented Apr 15, 2021 •

edited

Loading

codecov bot commented Apr 15, 2021 •

edited

Loading

RunningLeon commented Apr 16, 2021

QingChuanWS commented Apr 19, 2021 •

edited

Loading

DmitriySidnev commented Apr 19, 2021

QingChuanWS commented Apr 26, 2021

DmitriySidnev commented Apr 26, 2021

RunningLeon commented Apr 29, 2021

DmitriySidnev commented Apr 29, 2021

RunningLeon commented May 8, 2021

DmitriySidnev commented May 8, 2021

RangiLyu commented May 13, 2021

RangiLyu commented May 14, 2021

RangiLyu commented May 17, 2021

DmitriySidnev commented May 17, 2021

RangiLyu commented May 17, 2021

DmitriySidnev commented May 20, 2021

RunningLeon commented May 21, 2021

RunningLeon commented May 24, 2021

AronLin commented May 25, 2021

RunningLeon commented May 25, 2021

RunningLeon commented Jun 1, 2021

RangiLyu left a comment

ZwwWayne commented Jun 16, 2021 •

edited

Loading

DmitriySidnev commented Jun 18, 2021

ZwwWayne commented Jun 22, 2021

ZwwWayne commented Jun 29, 2021

[Feature] enable exporting to onnx for PointRend #4977

[Feature] enable exporting to onnx for PointRend #4977

Conversation

DmitriySidnev commented Apr 15, 2021 • edited Loading

CLAassistant commented Apr 15, 2021 • edited Loading

codecov bot commented Apr 15, 2021 • edited Loading

Codecov Report

RunningLeon commented Apr 16, 2021

QingChuanWS commented Apr 19, 2021 • edited Loading

DmitriySidnev commented Apr 19, 2021

QingChuanWS commented Apr 26, 2021

DmitriySidnev commented Apr 26, 2021

RunningLeon commented Apr 29, 2021

DmitriySidnev commented Apr 29, 2021

RunningLeon commented May 8, 2021

DmitriySidnev commented May 8, 2021

RangiLyu commented May 13, 2021

RangiLyu commented May 14, 2021

RangiLyu commented May 17, 2021

DmitriySidnev commented May 17, 2021

RangiLyu commented May 17, 2021

DmitriySidnev commented May 20, 2021

RunningLeon commented May 21, 2021

RunningLeon commented May 24, 2021

AronLin commented May 25, 2021

RunningLeon commented May 25, 2021

RunningLeon commented Jun 1, 2021

RangiLyu left a comment

Choose a reason for hiding this comment

ZwwWayne commented Jun 16, 2021 • edited Loading

DmitriySidnev commented Jun 18, 2021

ZwwWayne commented Jun 22, 2021

ZwwWayne commented Jun 29, 2021

DmitriySidnev commented Apr 15, 2021 •

edited

Loading

CLAassistant commented Apr 15, 2021 •

edited

Loading

codecov bot commented Apr 15, 2021 •

edited

Loading

QingChuanWS commented Apr 19, 2021 •

edited

Loading

ZwwWayne commented Jun 16, 2021 •

edited

Loading