Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ONNX] Exporting the SSD model to ONNX lowers the quality metric. #5576

Closed
SemyonBevzuk opened this issue Jul 9, 2021 · 3 comments · Fixed by #5789
Closed

[ONNX] Exporting the SSD model to ONNX lowers the quality metric. #5576

SemyonBevzuk opened this issue Jul 9, 2021 · 3 comments · Fixed by #5789
Assignees
Labels

Comments

@SemyonBevzuk
Copy link
Contributor

Describe the bug
The documentation (link) states that Box AP for SSD model when using ONNX Runtime should be 25.6.
But if you export the model to ONNX with the last checkpoint and collect the metric using the tools/deployment/test.py script, then it will be less than expected: 23.3.

If you use the previous checkpoint (which was before #5291), then after updating it using tools/model_converters/upgrade_ssd_version.py, the metrics correspond to the data from the table.

Table with Box AP metrics for ONNX Runtime:

Before #5291 Current With upgrade old checkpoint
25.6 23.3 25.6

It looks strange and I have not found a solution to this problem yet.

Reproduction

  • Export SSD model with the current checkpoint:
python tools/deployment/pytorch2onnx.py configs/ssd/ssd300_coco.py /tmp/openmmlab/snapshots/ssd300_coco_20210604_193052-b61137df.pth --output-file /tmp/openmmlab/ssd/ssd300_coco/config_current.onnx --dynamic-export
  • Testing the model config_current.onnx:
python tools/deployment/test.py configs/ssd/ssd300_coco.py /tmp/openmmlab/ssd/ssd300_coco/config_current.onnx --backend onnxruntime --eval bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.233
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.398
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.239
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.063
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.253
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.389
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.345
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.346
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.346
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.116
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.387
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.538
  • Update the old checkpoint and export the model:
python tools/model_converters/upgrade_ssd_version.py /tmp/openmmlab/snapshots/ssd300_coco_20200307-a92d2092.pth /tmp/openmmlab/snapshots/ssd300_coco_20200307-a92d2092_upgrade.pth
python tools/deployment/pytorch2onnx.py configs/ssd/ssd300_coco.py /tmp/openmmlab/snapshots/ssd300_coco_20200307-a92d2092_upgrade.pth --output-file /tmp/openmmlab/ssd/ssd300_coco/config_upgrade.onnx --dynamic-export
python tools/deployment/test.py configs/ssd/ssd300_coco.py /tmp/openmmlab/ssd/ssd300_coco/config_upgrade.onnx --backend onnxruntime --eval bbox
  • Testing the model config_upgrade.onnx:
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.256
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.438
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.263
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.071
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.278
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.422
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.375
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.376
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.376
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.125
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.417
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.586

Environment

Environment

sys.platform: linux
Python: 3.7.9 (default, Aug 31 2020, 12:42:55) [GCC 7.3.0]
CUDA available: True
GPU 0,1,2: GeForce RTX 2080 Ti
CUDA_HOME: /usr/local/cuda-10.2
NVCC: Cuda compilation tools, release 10.2, V10.2.89
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.8.1
PyTorch compiling details: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 10.2
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
- CuDNN 7.6.5
- Magma 2.5.2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.9.1
OpenCV: 4.5.3-openvino
MMCV: 1.3.8
MMCV Compiler: GCC 7.5
MMCV CUDA Compiler: 10.2
MMDetection: 2.14.0+76a9b44

@RunningLeon
Copy link
Collaborator

@SemyonBevzuk Thanks for the info. @jshilong Could you take a look at this issue?

@jshilong
Copy link
Collaborator

jshilong commented Aug 1, 2021

Sorry for the late response
@RangiLyu, Would you mind helping check if #5291 causing the issue?

@RangiLyu
Copy link
Member

RangiLyu commented Aug 2, 2021

Hi, we find out that the VGG SSD models were trained with a buggy version of mmdet which can cause label disordering. That bug was fixed after #5243 while the models were trained before that. And the val mAP of the pth model on the correct label order dataset is 0.233. So, there is nothing wrong with the onnxruntime.
We will retrain the model as soon as possible.

@RangiLyu RangiLyu self-assigned this Aug 2, 2021
@RangiLyu RangiLyu linked a pull request Aug 4, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants