Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Use In-place operator to prevent memory spikes in optimizer updates #13960

Merged
merged 1 commit into from
Feb 15, 2019

Conversation

anirudhacharya
Copy link
Member

@anirudhacharya anirudhacharya commented Jan 22, 2019

Description

The update rules in Nadam, Adadelta, Adamax and SGLD optimizers have been changed to using in-place operators to prevent memory spikes during execution.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

@anirudhacharya
Copy link
Member Author

anirudhacharya commented Jan 22, 2019

Some statistics from profiler.dump() on running with an mnist example.

Adadelta Old

Profile Statistics.
	Note that counter items are counter values and not time units.
Device Storage
=================
Name                          Total Count        Time (ms)    Min Time (ms)    Max Time (ms)    Avg Time (ms)
----                          -----------        ---------    -------------    -------------    -------------
Memory: cpu/0                        1908         627.2000           0.0000        2007.0400        1003.5200

Adadelta New

Profile Statistics.
	Note that counter items are counter values and not time units.
Device Storage
=================
Name                          Total Count        Time (ms)    Min Time (ms)    Max Time (ms)    Avg Time (ms)
----                          -----------        ---------    -------------    -------------    -------------
Memory: cpu/0                        1764         627.2000           0.0000        1606.1440         803.0720

Adamax Old

Profile Statistics.
	Note that counter items are counter values and not time units.
Device Storage
=================
Name                          Total Count        Time (ms)    Min Time (ms)    Max Time (ms)    Avg Time (ms)
----                          -----------        ---------    -------------    -------------    -------------
Memory: cpu/0                        1728         627.2000           0.0000        2009.6000        1004.8000

Adamax New

Profile Statistics.
	Note that counter items are counter values and not time units.
Device Storage
=================
Name                          Total Count        Time (ms)    Min Time (ms)    Max Time (ms)    Avg Time (ms)
----                          -----------        ---------    -------------    -------------    -------------
Memory: cpu/0                        1656         627.2000           0.0000        1606.6560         803.3280

@anirudhacharya
Copy link
Member Author

@mxnet-label-bot add [pr-awaiting-review]

@szha @eric-haibin-lin

@marcoabreu marcoabreu added the pr-awaiting-review PR is waiting for code review label Jan 22, 2019
@anirudhacharya
Copy link
Member Author

anirudhacharya commented Jan 25, 2019

For 5 batches of training with a deep embeddings example the decrease in memory consumption is ~0.27 factor for Nesterov Adam optimizer.

Old Nadam -

old_nadam

New Nadam( with in-place operators) -

new_nadam

Copy link
Contributor

@vandanavk vandanavk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vandanavk
Copy link
Contributor

@mxnet-label-bot update [pr-awaiting-merge]

@marcoabreu marcoabreu added pr-awaiting-merge Review and CI is complete. Ready to Merge and removed pr-awaiting-review PR is waiting for code review labels Feb 5, 2019
@ankkhedia
Copy link
Contributor

@sandeep-krishnamurthy Could you please review/merge this PR?

@eric-haibin-lin eric-haibin-lin merged commit a4e249b into apache:master Feb 15, 2019
@anirudhacharya anirudhacharya deleted the opt_mem branch February 15, 2019 01:29
@anirudhacharya
Copy link
Member Author

Thanks for merging and thanks to @szha for the tip here - #13683 (comment)

stephenrawls pushed a commit to stephenrawls/incubator-mxnet that referenced this pull request Feb 16, 2019
jessr92 pushed a commit to jessr92/incubator-mxnet that referenced this pull request Feb 19, 2019
drivanov pushed a commit to drivanov/incubator-mxnet that referenced this pull request Mar 4, 2019
vdantu pushed a commit to vdantu/incubator-mxnet that referenced this pull request Mar 31, 2019
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-merge Review and CI is complete. Ready to Merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants