[ONNX] Exporting the SSD model to ONNX lowers the quality metric. #5576

SemyonBevzuk · 2021-07-09T12:50:12Z

Describe the bug
The documentation (link) states that Box AP for SSD model when using ONNX Runtime should be 25.6.
But if you export the model to ONNX with the last checkpoint and collect the metric using the tools/deployment/test.py script, then it will be less than expected: 23.3.

If you use the previous checkpoint (which was before #5291), then after updating it using tools/model_converters/upgrade_ssd_version.py, the metrics correspond to the data from the table.

Table with Box AP metrics for ONNX Runtime:

Before #5291	Current	With upgrade old checkpoint
25.6	23.3	25.6

It looks strange and I have not found a solution to this problem yet.

Reproduction

Export SSD model with the current checkpoint:

python tools/deployment/pytorch2onnx.py configs/ssd/ssd300_coco.py /tmp/openmmlab/snapshots/ssd300_coco_20210604_193052-b61137df.pth --output-file /tmp/openmmlab/ssd/ssd300_coco/config_current.onnx --dynamic-export

Testing the model config_current.onnx:

python tools/deployment/test.py configs/ssd/ssd300_coco.py /tmp/openmmlab/ssd/ssd300_coco/config_current.onnx --backend onnxruntime --eval bbox

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.233
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.398
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.239
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.063
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.253
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.389
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.345
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.346
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.346
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.116
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.387
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.538

Update the old checkpoint and export the model:

python tools/model_converters/upgrade_ssd_version.py /tmp/openmmlab/snapshots/ssd300_coco_20200307-a92d2092.pth /tmp/openmmlab/snapshots/ssd300_coco_20200307-a92d2092_upgrade.pth

python tools/deployment/pytorch2onnx.py configs/ssd/ssd300_coco.py /tmp/openmmlab/snapshots/ssd300_coco_20200307-a92d2092_upgrade.pth --output-file /tmp/openmmlab/ssd/ssd300_coco/config_upgrade.onnx --dynamic-export

python tools/deployment/test.py configs/ssd/ssd300_coco.py /tmp/openmmlab/ssd/ssd300_coco/config_upgrade.onnx --backend onnxruntime --eval bbox

Testing the model config_upgrade.onnx:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.256
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.438
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.263
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.071
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.278
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.422
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.375
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.376
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.376
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.125
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.417
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.586

Environment

Environment

sys.platform: linux
Python: 3.7.9 (default, Aug 31 2020, 12:42:55) [GCC 7.3.0]
CUDA available: True
GPU 0,1,2: GeForce RTX 2080 Ti
CUDA_HOME: /usr/local/cuda-10.2
NVCC: Cuda compilation tools, release 10.2, V10.2.89
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.8.1
PyTorch compiling details: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 10.2
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
- CuDNN 7.6.5
- Magma 2.5.2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.9.1
OpenCV: 4.5.3-openvino
MMCV: 1.3.8
MMCV Compiler: GCC 7.5
MMCV CUDA Compiler: 10.2
MMDetection: 2.14.0+76a9b44

The text was updated successfully, but these errors were encountered:

RunningLeon · 2021-07-16T09:39:36Z

@SemyonBevzuk Thanks for the info. @jshilong Could you take a look at this issue?

jshilong · 2021-08-01T12:26:54Z

Sorry for the late response
@RangiLyu, Would you mind helping check if #5291 causing the issue？

RangiLyu · 2021-08-02T03:17:37Z

Hi, we find out that the VGG SSD models were trained with a buggy version of mmdet which can cause label disordering. That bug was fixed after #5243 while the models were trained before that. And the val mAP of the pth model on the correct label order dataset is 0.233. So, there is nothing wrong with the onnxruntime.
We will retrain the model as soon as possible.

openmmlab-bot assigned jshilong Jul 9, 2021

RunningLeon added the ONNX label Jul 16, 2021

RangiLyu self-assigned this Aug 2, 2021

RangiLyu linked a pull request Aug 4, 2021 that will close this issue

[Fix]: Update correct SSD models. #5789

Merged

ZwwWayne closed this as completed in #5789 Aug 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ONNX] Exporting the SSD model to ONNX lowers the quality metric. #5576

[ONNX] Exporting the SSD model to ONNX lowers the quality metric. #5576

SemyonBevzuk commented Jul 9, 2021

RunningLeon commented Jul 16, 2021

jshilong commented Aug 1, 2021

RangiLyu commented Aug 2, 2021

[ONNX] Exporting the SSD model to ONNX lowers the quality metric. #5576

[ONNX] Exporting the SSD model to ONNX lowers the quality metric. #5576

Comments

SemyonBevzuk commented Jul 9, 2021

RunningLeon commented Jul 16, 2021

jshilong commented Aug 1, 2021

RangiLyu commented Aug 2, 2021