-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[ONNX export] Fixing spatial export for batchnorm #17711
[ONNX export] Fixing spatial export for batchnorm #17711
Conversation
LGTM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM too
@szha, can you take a look? |
As far as I understand, What do you think? Edit: The comment says that the default for ONNX is to compute |
…to fix-batchnorm-onnx
…ial=1 is conveyed
This makes sense! I've deprecated the attribute in my recent PR update, and provided more clarity in the comments. |
…to fix-batchnorm-onnx
…to fix-batchnorm-onnx
…to fix-batchnorm-onnx
…to fix-batchnorm-onnx
…to fix-batchnorm-onnx
…ator-mxnet into fix-batchnorm-onnx
@vandanavk - Can you please help review this changes? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
- Waiting for Vandana's review.
- Restarted flaky windows build.
@vinitra - Thanks for your contributions. can you please sync to master? |
…to fix-batchnorm-onnx
@ChaiBapchya - windows-gpu build is failing consistently here. Any suggestions please? Thanks. |
I don't know why it just started failing, but I know how to fix it. The root cause is: you're using 32 bits compiler for 64 bits target. A 32 bits program can only have 2GB address space, so you may run out it quickly.
Please add "-A x64 -T host=x64" to your cmake command line args. |
…to fix-batchnorm-onnx
@mxnet-bot run ci [unix-gpu, windows-gpu] |
Jenkins CI successfully triggered : [unix-gpu, windows-gpu] |
…to fix-batchnorm-onnx
@mxnet-bot run ci [all] |
Jenkins CI successfully triggered : [unix-gpu, centos-cpu, windows-cpu, sanity, centos-gpu, edge, website, unix-cpu, clang, windows-gpu, miscellaneous] |
Waiting on #17962 for CI fixes. |
@vinitra - Sorry for the inconvenience caused. A fix for windows-gpu is now submitted. |
…to fix-batchnorm-onnx
No worries! Thanks for the help. |
@mxnet-bot run ci [unix-gpu] |
Jenkins CI successfully triggered : [unix-gpu] |
…to fix-batchnorm-onnx
None of the jobs entered are supported. |
@vinitra haven't configured mxnet-bot to trigger the macosx yet. [It's not mandatory for merge either]. But if needed we can increase the scope to incorporate that trigger. |
@mxnet-bot run ci [unix-cpu] |
Jenkins CI successfully triggered : [unix-cpu] |
* fixing spatial export for batchnorm * retrigger CI * fixing broken pylint * retrigger build * deprecating spatial attribute in exporter so default behavior of spatial=1 is conveyed
@sandeep-krishnamurthy This fix got merged in master. I checked it's neither in v1.7.x nor in previous releases [e.g. 1.6.0] |
* fixing spatial export for batchnorm * retrigger CI * fixing broken pylint * retrigger build * deprecating spatial attribute in exporter so default behavior of spatial=1 is conveyed
* fixing spatial export for batchnorm * retrigger CI * fixing broken pylint * retrigger build * deprecating spatial attribute in exporter so default behavior of spatial=1 is conveyed
* fixing spatial export for batchnorm * retrigger CI * fixing broken pylint * retrigger build * deprecating spatial attribute in exporter so default behavior of spatial=1 is conveyed Co-authored-by: Vinitra Swamy <vinitras@gmail.com>
Description
In the ONNX model zoo, we noticed that models like ArcFace and DUC that have been exported from mxnet with batchnorm operators are not treating spatial mode correctly. Without this PR, this model export pipeline is broken.
onnx/models#156
onnx/models#91 (comment)
cc: @ankkhedia @hariharans29 @snnn
Quoting from MxNet BatchNorm documentation: "One of the most popular normalization techniques is Batch Normalization, usually called BatchNorm for short. We normalize the activations across all samples in a batch for each of the channels independently."
The comment in the exporter refers to mean and variance per feature, instead of per channel. Fixing this will mean that spatial mode should be 1, instead of 0 in the ONNX export.
Changing spatial to 1 fixed these models, in accordance with the issue referenced above.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes