-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix to #34554 #37079
Fix to #34554 #37079
Conversation
Thanks for your contribution! |
@pawelpiotrowicz , @Silv3S, @tsocha , @zuzg, @piotrekobiIntel Please review |
template <typename T> | ||
std::shared_ptr<std::tuple<float, std::vector<float>>> get_bias_scales( | ||
const framework::ExecutionContext& ctx, | ||
const platform::MKLDNNDeviceContext& dev_ctx, const std::string& key) { | ||
return std::make_shared<std::tuple<float, std::vector<float>>>( | ||
std::make_tuple(0.0f, std::vector<float>(1, 1.0f))); | ||
} | ||
|
||
template <> | ||
std::shared_ptr<std::tuple<float, std::vector<float>>> get_bias_scales<int8_t>( | ||
const framework::ExecutionContext& ctx, | ||
const platform::MKLDNNDeviceContext& dev_ctx, const std::string& key) { | ||
// Get scales int8 bias key | ||
const std::string key_bs = key + "@bs"; | ||
|
||
// Scales for int8 bias are to be cached to avoid | ||
// computing them each iteration | ||
auto bias_scale_tuple = | ||
std::static_pointer_cast<std::tuple<float, std::vector<float>>>( | ||
dev_ctx.GetBlob(key_bs)); | ||
if (bias_scale_tuple) return bias_scale_tuple; | ||
|
||
const auto* filter = ctx.Input<Tensor>("Filter"); | ||
const auto& weights_tz = framework::vectorize(filter->dims()); | ||
const int groups = std::max(ctx.Attr<int>("groups"), 1); | ||
|
||
const auto& scale_weights_data = | ||
ctx.Attr<std::vector<float>>("Scale_weights"); | ||
const auto& scale_in_data = ctx.Attr<float>("Scale_in"); | ||
|
||
bool is_multi_channel = scale_weights_data.size() > 1; | ||
int mask_reorder = is_multi_channel ? 1 << 0 : 1; | ||
const int count = | ||
is_multi_channel | ||
? (groups > 1 ? (weights_tz)[1] * (weights_tz)[0] : (weights_tz)[0]) | ||
: 1; | ||
|
||
bias_scale_tuple = | ||
std::make_shared<std::tuple<float, std::vector<float>>>(std::make_tuple( | ||
static_cast<float>(mask_reorder), std::vector<float>(count))); | ||
for (int i = 0; i < count; i++) { | ||
std::get<1>(*bias_scale_tuple)[i] = scale_in_data * scale_weights_data[i]; | ||
} | ||
|
||
dev_ctx.SetBlob(key_bs, bias_scale_tuple); | ||
|
||
return bias_scale_tuple; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe merging these 2 functions into one and doing something different depending on the result of:
if(std::is_same<T, int8_t>::value || std::is_same<T, uint8_t>::value
would be a better solution?
It would allow to do get rid of the remap structs that seemed quite confusing to me at first glance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have diffrent observation e.g. I think remapping is way easier for me to understand:) And also "if.." is a runtime check as c++14 does not have if contrexpr (c++17) , and remapping mechanism makes a check evaluated in compile time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
} else { | ||
src_md = platform::MKLDNNMemDesc(src_tz, data_type, chosen_memory_format); | ||
weights_md = platform::MKLDNNMemDesc(weights_tz, data_type, | ||
MKLDNNMemoryFormat::any); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are using "chosen memory format" everywhere except that place, please unity that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch. Fixed
* the memory format preferred for best performance | ||
*/ | ||
const auto chosen_memory_format = MKLDNNMemoryFormat::any; | ||
const auto weights_format = MKLDNNMemoryFormat::any; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the point of having two identical variables here? Every format im conv should use any, do I think that this redundancy is not necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some legacy remaining. I removed those
scale_tuple = | ||
std::make_shared<std::tuple<float, std::vector<float>>>(std::make_tuple( | ||
static_cast<float>(sum_scale), std::vector<float>(count))); | ||
for (int i = 0; i < count; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for loop could use #pragma omp parallel for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generation of scales is performed only once as it is cached now. So there is no need adding openmp as it would just create multiple threads . each thread with some resources that would just be not used later on. So that is why I do not add omp parallel here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok ~
const int groups = ctx.Attr<int>("groups"); | ||
const std::string padding_algorithm = | ||
ctx.Attr<std::string>("padding_algorithm"); | ||
const std::string fuse_activation = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will const auto&
avoid copy here and following assignment ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code I just touched due to indentation changes so I do not know. It sounds like a good idea to check that but I will not do that in this PR. Please add to our bugtracker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@chenwhql Please review and approve PR-CI-APPROVAL. Problem is again with PADDLE_ENFORCE checker that works only on 2 out of 3 lines of PADDLE_ENFORCE. |
@baoachun Please review this |
3731f2f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@jczaja @lidanqing-intel, This pr needs to be blocked temporarily because it involves incompatible upgrades. |
@baoachun There is no API change in this PR. let me know if I can merge it |
@baoachun I want to clarify. There is no API change in this PR. SetMkldnnCacheCapacity() is still changing the capacity of cache as it used to be. With time we will update this function : SetMkldnnCacheCapacity to also change capacity of oneDNN internal cache. Even in a future it will not impact API e.g. SetMkldnnCacheCapacity will remain. |
@jczaja , then can I understand it like this: What you are modifying is the internal cache of oneDNN, and SetMkldnnCacheCapacity() is the cache of the modified paddle oneDNN, they are two types of cache, so the function of the SetMkldnnCacheCapacity() api will not change? |
@baoachun Yes. this is correct |
Hi @jczaja @lidanqing-intel, This pr will cause the performance of the quantitative model to decrease, we need to evaluate the risks brought by the code merge. |
Sorry to inform you that 326b7fc's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
很抱歉,经过我们的反复讨论,你的PR暂未达到合入标准,请阅读飞桨原生算子开发规范,你可以重新提交新的PR,我们先将此PR关闭,感谢你的贡献。 |
PR types
Bug fixes
PR changes
OPs
Describe
This disabled caching of oneDNN primitives so that oneDNN cache its own elements. This change fixes the problems reported in #34554