Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

fix block.export #17970

Merged
merged 9 commits into from
Sep 1, 2020
Merged

fix block.export #17970

merged 9 commits into from
Sep 1, 2020

Conversation

chinakook
Copy link
Contributor

net.hybridize may optimize out some ops. These ops are alive in nn.Block(also nn.HybridBlock), but its names are not contained in symbol's arg_names list. So ignore these ops except that their name are end with 'running_mean' or 'running_var'.

Description

(Brief description on what this PR is about)

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

```net.hybridize``` may optimize out some ops. These ops are alive in nn.Block(also nn.HybridBlock), but its names are not contained in symbol's ```arg_names``` list. So ignore these ops except that their name are end with 'running_mean' or 'running_var'.
@chinakook chinakook requested a review from szha as a code owner April 4, 2020 06:06
@mxnet-bot
Copy link

Hey @chinakook , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [website, centos-gpu, miscellaneous, unix-gpu, centos-cpu, windows-cpu, sanity, unix-cpu, edge, clang, windows-gpu]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@wkcn
Copy link
Member

wkcn commented Apr 6, 2020

Thanks for your contribution!
Some user-defined operator may use other names except of running_mean and running_var to store the augmented variables, which does not need any gradient, such as the memory feature in the memory network.

Is there any weakness to export all augumented variables?

@chinakook
Copy link
Contributor Author

It's complicated for that rarely case, I think we can add a standard suffix or let user to register their suffix for user for this case.

@chinakook
Copy link
Contributor Author

chinakook commented Apr 6, 2020

I have this problem just when I export a inference version of ICNet from gluoncv. Its results have four feature maps, but we only use the first one, so other ops and weighs related to the other three feature maps are optimized out by the symbol engine.

let user can save their extra param.
add allow_extra to let user decide whether to save extra parameters or not.
@wkcn
Copy link
Member

wkcn commented Apr 6, 2020

I think it is related to sym.list_auxiliary_states rather than block.export.

block.export should store all auxiliary states.

If some feature maps are optimized out, their auxiliary states should be removed in sym.list_auxiliary_states.

It seems that the extra auxiliary states are not removed when some ops are removed in the current implementation.

@chinakook
Copy link
Contributor Author

It's not aux states that are optimized out. It's some normal convolution ops and weights unused by the graph. They are in the gluon's parameter dict but their name are not in the symbol's sym.list_arguments. The routine cannot find the name in the arg_list and then find it in the aux_list where the parameter can never be. So the assertion failed.

@chinakook chinakook mentioned this pull request Apr 6, 2020
add moving_mean and moving_var when export model with SymbolBlock
python/mxnet/gluon/block.py Outdated Show resolved Hide resolved
typo

Co-authored-by: Sheng Zha <szha@users.noreply.github.com>
@szha
Copy link
Member

szha commented Jun 9, 2020

cc @leezu

@szha szha requested a review from leezu August 14, 2020 05:32
@leezu
Copy link
Contributor

leezu commented Aug 14, 2020

@chinakook will you update the PR to fix the root cause of the issue? See the example by @sxjscience in #17981

@chinakook
Copy link
Contributor Author

@chinakook will you update the PR to fix the root cause of the issue? See the example by @sxjscience in #17981

I've tested. It can be export successfully after this fix.

Whether to remove the amp_cast and amp_multicast operators, before exporting the model.
allow_extra : bool, optional
Whether to save extra parameters whose names are not in the result symbol.
User can set allow_extra to True to load these parameters with old mxnet.mod.Module.set_params API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mxnet.mod.Module.set_params does not exist anymore.


remove_amp_cast : bool, optional
Whether to remove the amp_cast and amp_multicast operators, before exporting the model.
allow_extra : bool, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use-case to save parameters that are not used by the symbol? Do we need this option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not necessary.

Comment on lines 1278 to 1281
if name.endswith('running_mean') or name.endswith('running_var') \
or name.endswith('moving_mean') or name.endswith('moving_var'):
assert name in aux_names
arg_dict['aux:%s'%name] = param._reduce()
Copy link
Contributor

@leezu leezu Aug 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we ensure that no aux parameter with a name different to running_mean, running_var, etc is missing from the resulting params file if allow_extra is False?

@leezu
Copy link
Contributor

leezu commented Aug 31, 2020

@chinakook can you help resolve the conflicts?

python/mxnet/gluon/block.py Outdated Show resolved Hide resolved
python/mxnet/gluon/block.py Outdated Show resolved Hide resolved
python/mxnet/gluon/block.py Outdated Show resolved Hide resolved
Co-authored-by: Leonard Lausen <leonard@lausen.nl>
@leezu
Copy link
Contributor

leezu commented Sep 1, 2020

@mxnet-bot run ci [centos-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [centos-gpu]

@szha szha merged commit 5122d32 into apache:master Sep 1, 2020
@szha szha added the Backport 1.x Pending backport to 1.x label Sep 1, 2020
jmracek pushed a commit to jmracek/incubator-mxnet that referenced this pull request Sep 2, 2020
* fix block.export

```net.hybridize``` may optimize out some ops. These ops are alive in nn.Block(also nn.HybridBlock), but its names are not contained in symbol's ```arg_names``` list. So ignore these ops except that their name are end with 'running_mean' or 'running_var'.

* Update block.py

let user can save their extra param.

* add allow_extra

add allow_extra to let user decide whether to save extra parameters or not.

* Update block.py

add moving_mean and moving_var when export model with SymbolBlock

* Update python/mxnet/gluon/block.py

typo

Co-authored-by: Sheng Zha <szha@users.noreply.github.com>

* Update block.py

* Update block.py

* Update python/mxnet/gluon/block.py

Co-authored-by: Leonard Lausen <leonard@lausen.nl>

Co-authored-by: Sheng Zha <szha@users.noreply.github.com>
Co-authored-by: Leonard Lausen <leonard@lausen.nl>
@jmracek jmracek mentioned this pull request Sep 2, 2020
6 tasks
jmracek pushed a commit to jmracek/incubator-mxnet that referenced this pull request Sep 2, 2020
* fix block.export

```net.hybridize``` may optimize out some ops. These ops are alive in nn.Block(also nn.HybridBlock), but its names are not contained in symbol's ```arg_names``` list. So ignore these ops except that their name are end with 'running_mean' or 'running_var'.

* Update block.py

let user can save their extra param.

* add allow_extra

add allow_extra to let user decide whether to save extra parameters or not.

* Update block.py

add moving_mean and moving_var when export model with SymbolBlock

* Update python/mxnet/gluon/block.py

typo

Co-authored-by: Sheng Zha <szha@users.noreply.github.com>

* Update block.py

* Update block.py

* Update python/mxnet/gluon/block.py

Co-authored-by: Leonard Lausen <leonard@lausen.nl>

Co-authored-by: Sheng Zha <szha@users.noreply.github.com>
Co-authored-by: Leonard Lausen <leonard@lausen.nl>
@jmracek jmracek mentioned this pull request Sep 2, 2020
6 tasks
leezu pushed a commit that referenced this pull request Sep 3, 2020
This PR cherry-picks commit 5122d32 into the v1.x branch. This is to enable the export of models where dangling layers are optimized out during symbol export. For more information, see here and here.
chinakook added a commit to chinakook/mxnet that referenced this pull request Nov 17, 2020
* fix block.export

```net.hybridize``` may optimize out some ops. These ops are alive in nn.Block(also nn.HybridBlock), but its names are not contained in symbol's ```arg_names``` list. So ignore these ops except that their name are end with 'running_mean' or 'running_var'.

* Update block.py

let user can save their extra param.

* add allow_extra

add allow_extra to let user decide whether to save extra parameters or not.

* Update block.py

add moving_mean and moving_var when export model with SymbolBlock

* Update python/mxnet/gluon/block.py

typo

Co-authored-by: Sheng Zha <szha@users.noreply.github.com>

* Update block.py

* Update block.py

* Update python/mxnet/gluon/block.py

Co-authored-by: Leonard Lausen <leonard@lausen.nl>

Co-authored-by: Sheng Zha <szha@users.noreply.github.com>
Co-authored-by: Leonard Lausen <leonard@lausen.nl>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Backport 1.x Pending backport to 1.x
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants