-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added caching of scales for bias in conv2d int8 #36980
Conversation
Thanks for your contribution! |
const auto& scale_in_data = ctx.Attr<float>("Scale_in"); | ||
|
||
bool is_multi_channel = scale_weights_data.size() > 1; | ||
int mask_reorder = is_multi_channel ? 1 << 0 : 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for bit shifting "1" zero bits? Isn't "1 << 0" equal to 1? For now that ternary operation is not needed, since no matter what value "is_multi_channel" has, the result will be equal to 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wozna Could you please take this question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understand "mask_reorder" is used as flags, but even with that knowledge that doesn't seem like it's easy to understand at the first glance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disscussed this with @wozna and there is probably some problem here that require separate investigation (testing accuracy etc.). It will not be solved in this PR
int mask_reorder = is_multi_channel ? 1 << 0 : 1; | ||
const int count = | ||
is_multi_channel | ||
? (groups > 1 ? (weights_tz)[1] * (weights_tz)[0] : (weights_tz)[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nesting ternary operators produces code that is not necessarily easy to understand for me, could you please change that to normal if statement to avoid nesting ternary ops?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. Will do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
bias_scale_tuple = | ||
std::make_shared<std::tuple<float, std::vector<float>>>(std::make_tuple( | ||
static_cast<float>(mask_reorder), std::vector<float>(count))); | ||
for (int i = 0; i < count; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some time ago I would complain about "i++", but since I have read that compiler will optimize it anyway(this knowledge comes from the book that you have recommended about modern CPUs, so thank you!)
? (groups > 1 ? (weights_tz)[1] * (weights_tz)[0] : (weights_tz)[0]) | ||
: 1; | ||
|
||
int count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe if you initiate with :
int count =1;
then
else {
count = 1;
}
won't be necessary. What is your opinion ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I incorporated your suggesstion while keeping if-else blocks as requested by @jakpiase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Performance optimization
PR changes
OPs
Describe
Currently scaling of bias data for conv2d int8 happens in every iterations. This is not needed and scaling cna be done once and then stored. This PR is caching ones computed scaled bias.
Perf improvement: Mobilenet_v1 int8 ~2% improvement on whole model.
processor: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz