Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear Scratch Diffs to Prevent Contaminating Backward through Splits #6202

Merged
merged 3 commits into from
Jan 29, 2018

Conversation

shelhamer
Copy link
Member

@shelhamer shelhamer commented Jan 29, 2018

Certain layers (SoftmaxWithLossLayer, SigmoidCrossEntropyLossLayer, and AccuracyLayer) save memory during forward by making use of bottom diffs that are otherwise unused or overwritten during backward. The trouble is that these scratch diffs can be mistakenly propagated by backward through split layers. All of the top diffs of a split are accumulated even when backward is not called for the losses (when their loss weight is zero) or accuracy (which has no backward step). This was missed at first because it requires the interaction of split layers and the backward pruning that prevents computation of unnecessary gradients.

This fix zeros out the scratch diffs to prevent this kind of contamination. This requires a little computation but not much at all. This is preferable to requiring more memory for internal buffers because the further memory usage might cause an unexpected crash for a previously good configuration.

See #2895 (comment) for the first explanation of this issue.

Related:

Credits:

linziyi and others added 3 commits January 28, 2018 17:05
a few layers make use of otherwise unused diffs to accumulate results,
but unless the diffs are cleared in forward this contaminates the
gradients when these layers share a bottom and their backward is
skipped.
@Noiredd
Copy link
Member

Noiredd commented Jan 29, 2018

This is a neat, systemic solution. Performance impact of caffe_gpu_set is minimal: on a theoretical network making heavy use of Accuracy it doubled the forward pass time... from 0.78ms to 1.60ms - IMO negligible. The same network ate 703 MB RAM with my hotfix, comparable to 526 MB with this PR. On more typical classification setups the differences are unnoticeable.
My vote goes for merging this.

I have added a reference to close #6141 which attempted to solve the Accuracy layer problem by allocating a raw SyncedMemory object to act as an internal buffer. Thanks @sonack!

@shelhamer shelhamer merged commit 08a95a4 into BVLC:master Jan 29, 2018
@shelhamer
Copy link
Member Author

Thanks all for your work on this and thanks again @Noiredd for the final review!

@shelhamer shelhamer deleted the fix-scratch-bottom-diff branch January 29, 2018 19:31
beniz pushed a commit to jolibrain/caffe that referenced this pull request Feb 3, 2018
Clear Scratch Diffs to Prevent Contaminating Backward through Splits
XinYao1994 pushed a commit to XinYao1994/caffe that referenced this pull request Aug 29, 2018
Clear Scratch Diffs to Prevent Contaminating Backward through Splits
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants