Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix to #34554 #37079

Closed
wants to merge 15 commits into from
Closed

Conversation

jczaja
Copy link
Contributor

@jczaja jczaja commented Nov 9, 2021

PR types

Bug fixes

PR changes

OPs

Describe

This disabled caching of oneDNN primitives so that oneDNN cache its own elements. This change fixes the problems reported in #34554

@paddle-bot-old
Copy link

paddle-bot-old bot commented Nov 9, 2021

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@jczaja jczaja added the Intel label Nov 9, 2021
@jczaja
Copy link
Contributor Author

jczaja commented Nov 16, 2021

@pawelpiotrowicz , @Silv3S, @tsocha , @zuzg, @piotrekobiIntel Please review

Comment on lines +75 to +122
template <typename T>
std::shared_ptr<std::tuple<float, std::vector<float>>> get_bias_scales(
const framework::ExecutionContext& ctx,
const platform::MKLDNNDeviceContext& dev_ctx, const std::string& key) {
return std::make_shared<std::tuple<float, std::vector<float>>>(
std::make_tuple(0.0f, std::vector<float>(1, 1.0f)));
}

template <>
std::shared_ptr<std::tuple<float, std::vector<float>>> get_bias_scales<int8_t>(
const framework::ExecutionContext& ctx,
const platform::MKLDNNDeviceContext& dev_ctx, const std::string& key) {
// Get scales int8 bias key
const std::string key_bs = key + "@bs";

// Scales for int8 bias are to be cached to avoid
// computing them each iteration
auto bias_scale_tuple =
std::static_pointer_cast<std::tuple<float, std::vector<float>>>(
dev_ctx.GetBlob(key_bs));
if (bias_scale_tuple) return bias_scale_tuple;

const auto* filter = ctx.Input<Tensor>("Filter");
const auto& weights_tz = framework::vectorize(filter->dims());
const int groups = std::max(ctx.Attr<int>("groups"), 1);

const auto& scale_weights_data =
ctx.Attr<std::vector<float>>("Scale_weights");
const auto& scale_in_data = ctx.Attr<float>("Scale_in");

bool is_multi_channel = scale_weights_data.size() > 1;
int mask_reorder = is_multi_channel ? 1 << 0 : 1;
const int count =
is_multi_channel
? (groups > 1 ? (weights_tz)[1] * (weights_tz)[0] : (weights_tz)[0])
: 1;

bias_scale_tuple =
std::make_shared<std::tuple<float, std::vector<float>>>(std::make_tuple(
static_cast<float>(mask_reorder), std::vector<float>(count)));
for (int i = 0; i < count; i++) {
std::get<1>(*bias_scale_tuple)[i] = scale_in_data * scale_weights_data[i];
}

dev_ctx.SetBlob(key_bs, bias_scale_tuple);

return bias_scale_tuple;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe merging these 2 functions into one and doing something different depending on the result of:
if(std::is_same<T, int8_t>::value || std::is_same<T, uint8_t>::value
would be a better solution?
It would allow to do get rid of the remap structs that seemed quite confusing to me at first glance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have diffrent observation e.g. I think remapping is way easier for me to understand:) And also "if.." is a runtime check as c++14 does not have if contrexpr (c++17) , and remapping mechanism makes a check evaluated in compile time

ghost
ghost previously approved these changes Nov 16, 2021
Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Silv3S
Silv3S previously approved these changes Nov 16, 2021
Copy link
Member

@Silv3S Silv3S left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

} else {
src_md = platform::MKLDNNMemDesc(src_tz, data_type, chosen_memory_format);
weights_md = platform::MKLDNNMemDesc(weights_tz, data_type,
MKLDNNMemoryFormat::any);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are using "chosen memory format" everywhere except that place, please unity that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. Fixed

* the memory format preferred for best performance
*/
const auto chosen_memory_format = MKLDNNMemoryFormat::any;
const auto weights_format = MKLDNNMemoryFormat::any;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the point of having two identical variables here? Every format im conv should use any, do I think that this redundancy is not necessary

Copy link
Contributor Author

@jczaja jczaja Nov 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some legacy remaining. I removed those

@jczaja jczaja dismissed stale reviews from Silv3S and ghost via 27ee49e November 17, 2021 15:41
scale_tuple =
std::make_shared<std::tuple<float, std::vector<float>>>(std::make_tuple(
static_cast<float>(sum_scale), std::vector<float>(count)));
for (int i = 0; i < count; i++) {
Copy link
Contributor

@lidanqing-intel lidanqing-intel Nov 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for loop could use #pragma omp parallel for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generation of scales is performed only once as it is cached now. So there is no need adding openmp as it would just create multiple threads . each thread with some resources that would just be not used later on. So that is why I do not add omp parallel here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok ~

const int groups = ctx.Attr<int>("groups");
const std::string padding_algorithm =
ctx.Attr<std::string>("padding_algorithm");
const std::string fuse_activation =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will const auto& avoid copy here and following assignment ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code I just touched due to indentation changes so I do not know. It sounds like a good idea to check that but I will not do that in this PR. Please add to our bugtracker.

jakpiase
jakpiase previously approved these changes Nov 18, 2021
Copy link
Contributor

@jakpiase jakpiase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@lidanqing-intel lidanqing-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jczaja
Copy link
Contributor Author

jczaja commented Nov 19, 2021

@chenwhql Please review and approve PR-CI-APPROVAL. Problem is again with PADDLE_ENFORCE checker that works only on 2 out of 3 lines of PADDLE_ENFORCE.

@lidanqing-intel
Copy link
Contributor

@baoachun Please review this

@jczaja jczaja dismissed stale reviews from lidanqing-intel and jakpiase via 3731f2f November 24, 2021 16:08
Copy link
Contributor

@lidanqing-intel lidanqing-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@baoachun
Copy link
Contributor

@jczaja @lidanqing-intel, This pr needs to be blocked temporarily because it involves incompatible upgrades.

@jczaja
Copy link
Contributor Author

jczaja commented Nov 29, 2021

@baoachun There is no API change in this PR. let me know if I can merge it

@jczaja
Copy link
Contributor Author

jczaja commented Nov 29, 2021

@baoachun I want to clarify. There is no API change in this PR. SetMkldnnCacheCapacity() is still changing the capacity of cache as it used to be. With time we will update this function : SetMkldnnCacheCapacity to also change capacity of oneDNN internal cache. Even in a future it will not impact API e.g. SetMkldnnCacheCapacity will remain.

@baoachun
Copy link
Contributor

baoachun commented Dec 1, 2021

@jczaja , then can I understand it like this: What you are modifying is the internal cache of oneDNN, and SetMkldnnCacheCapacity() is the cache of the modified paddle oneDNN, they are two types of cache, so the function of the SetMkldnnCacheCapacity() api will not change?

@jczaja
Copy link
Contributor Author

jczaja commented Dec 1, 2021

@baoachun Yes. this is correct

@baoachun
Copy link
Contributor

baoachun commented Dec 1, 2021

Hi @jczaja @lidanqing-intel, This pr will cause the performance of the quantitative model to decrease, we need to evaluate the risks brought by the code merge.

@paddle-bot-old
Copy link

paddle-bot-old bot commented Dec 4, 2021

Sorry to inform you that 326b7fc's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@paddle-bot-old
Copy link

paddle-bot-old bot commented May 9, 2022

很抱歉,经过我们的反复讨论,你的PR暂未达到合入标准,请阅读飞桨原生算子开发规范,你可以重新提交新的PR,我们先将此PR关闭,感谢你的贡献。
Sorry to inform you that through our discussion, your PR fails to meet the merging standard (Reference: Paddle Custom Operator Design Doc). You can also submit an new one. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants