-
Notifications
You must be signed in to change notification settings - Fork 23.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[quant] Enable fusion for conv modules with bias #36173
Conversation
Summary: Previously we were ignoring the conv bias during training if it existed This PR adds the bias from the conv op during the conv+bn fusion process Test Plan: python test/quantization/test_quantization.py Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
💊 CircleCI build failures summary and remediationsAs of commit 3a2e166 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no CircleCI failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker. This comment has been revised 11 times. |
conv = conv * rescale_factor.reshape([1, -1, 1, 1]) | ||
conv = conv + (self.beta - self.gamma * batch_mean * batch_rstd).reshape([1, -1, 1, 1]) | ||
|
||
conv = (self.gamma * batch_rstd).reshape([1, -1, 1, 1]) * conv_orig + \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The correct equation is
conv = (self.gamma * batch_rstd).reshape([1, -1, 1, 1]) * (conv_orig - batch_mean).reshape([1, -1, 1, 1]) + self.beta.reshape([1, -1, 1,1])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would give incorrect results as batch_mean and batch_var need to take the bias into account
conv = conv + (self.beta - self.gamma * batch_mean * batch_rstd).reshape([1, -1, 1, 1]) | ||
|
||
conv = (self.gamma * batch_rstd).reshape([1, -1, 1, 1]) * conv_orig + \ | ||
(self.beta - self.gamma * batch_rstd * batch_mean).reshape([1, -1, 1, 1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to also change the equation for the else condition (training is false to):
conv = conv + self.gamma * (self.bias - self.running_mean)/running_std).reshape([1, -1, 1, 1]) + self.beta.reshape([1, -1, 1,1])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this needs to change since the bias is baked into self.running_mean
and running_std
after training
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not true, think of it this way: If I add a bias to my output during training, the running mean will be bias (assume that the conv output prior to bias add is zero mean).
So, the output will be gamma/sigma * (y + bias- mean) + beta.
If you dont have the bias now the output will be:
gamma/sigma * (y - mean) + beta.
which will be incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation! Updated it.
Summary: Previously we were ignoring the conv bias during training if it existed This PR adds the bias from the conv op during the conv+bn fusion process Test Plan: python test/quantization/test_quantization.py Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Previously we were ignoring the conv bias during training if it existed This PR adds the bias from the conv op during the conv+bn fusion process Test Plan: python test/quantization/test_quantization.py Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Previously we were ignoring the conv bias during training if it existed This PR adds the bias from the conv op during the conv+bn fusion process Test Plan: python test/quantization/test_quantization.py Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 38013245eef18c765897ad1ce8f2977197ff9581 Pull Request resolved: #36173
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see comments, thanks!
Summary: Previously we were ignoring the conv bias during training if it existed This PR adds the bias from the conv op during the conv+bn fusion process Test Plan: python test/quantization/test_quantization.py Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Previously we were ignoring the conv bias during training if it existed This PR adds the bias from the conv op during the conv+bn fusion process Test Plan: python test/quantization/test_quantization.py Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: d8cee7eef326b4225d52ba7081e11d1e905ac69e Pull Request resolved: #36173
This pull request has been merged in 6972c27. |
Summary: Pull Request resolved: pytorch#36173 Previously we were ignoring the conv bias during training if it existed This PR adds the bias from the conv op during the conv+bn fusion process Test Plan: python test/quantization/test_quantization.py Imported from OSS Differential Revision: D20921613 fbshipit-source-id: eacb2ccf9107f413ac4ef23163ba914af9b90924
Stack from ghstack:
Summary:
Previously we were ignoring the conv bias during training if it existed
This PR adds the bias from the conv op during the conv+bn fusion process
Test Plan:
python test/quantization/test_quantization.py
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D20921613