Clear Scratch Diffs to Prevent Contaminating Backward through Splits #6202
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Certain layers (
SoftmaxWithLossLayer
,SigmoidCrossEntropyLossLayer
, andAccuracyLayer
) save memory during forward by making use of bottom diffs that are otherwise unused or overwritten during backward. The trouble is that these scratch diffs can be mistakenly propagated by backward through split layers. All of the top diffs of a split are accumulated even when backward is not called for the losses (when their loss weight is zero) or accuracy (which has no backward step). This was missed at first because it requires the interaction of split layers and the backward pruning that prevents computation of unnecessary gradients.This fix zeros out the scratch diffs to prevent this kind of contamination. This requires a little computation but not much at all. This is preferable to requiring more memory for internal buffers because the further memory usage might cause an unexpected crash for a previously good configuration.
See #2895 (comment) for the first explanation of this issue.
Related:
Credits:
Accuracy
layer issue in New Accuracy Layer on GPU interferes with training #5981.cherry-pick of that fix in this PR.