-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More precise mkldnn kernel rules in GetExpectedKernelType #29840
More precise mkldnn kernel rules in GetExpectedKernelType #29840
Conversation
Thanks for your contribution! |
5c15712
to
235afda
Compare
a912b3e
to
3889476
Compare
@luotao1 PR-CI-OP-benchmark fails with this PR, however there are no performance changes inside. I restarted it several times but every time it gives another list of errors i.e.: AFAIK the resulting log from the benchmark machine has checks about GPU ops, in the log there is no oneDNN verbose info, so it looks that oneDNN kernels are not run, which would eventually be correlated with the PR. Could you advise on that, please? |
After 10 restarts CI passed :) |
@luotao1 could you please start your review? |
@luotao1 Could you please start your review? PR-CI-Approval will not pass because many files were modified, but most of them it is single line of change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When in the class derived from the OperatorWithKernel
class you do not need to prepend the base class' public method calls with OperatorWithKernel::
nor this->
(applies to CanMKLDNNBeUsed(...)
and IndicateVarDataType(...)
) in particular). Although there are plenty of such calls in the original code, I would stick to the cleaner approach and skip the redundant prefixes.
If you do not agree, that's fine. LGTM then :-)
@@ -93,6 +93,7 @@ framework::OpKernelType GetKernelType(const framework::ExecutionContext& ctx, | |||
const std::string& name) { | |||
framework::LibraryType library{framework::LibraryType::kPlain}; | |||
framework::DataLayout layout = framework::DataLayout::kAnyLayout; | |||
auto data_type = oper.IndicateVarDataType(ctx, name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK the resulting log from the benchmark machine has checks about GPU ops, in the log there is no oneDNN verbose info, so it looks that oneDNN kernels are not run, which would eventually be correlated with the PR.
Could you advise on that, please?
Although op-benchmark-ci checks about GPU ops, does line 96 have additional time cost? How about move it into line 109?
if (library == framework::LibraryType::kPlain && it != oper.Attrs().end()) {
auto data_type = oper.IndicateVarDataType(ctx, name);
if (oper.CanMKLDNNBeUsed(ctx, data_type)) {
xxxx
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@luotao1 Thank you for the comment. The data_type
variable is used also in line 119:
return framework::OpKernelType(data_type
, ctx.GetPlace(), layout, library);
So it will be needed/calculated at the end of the function, whether the condition you mentioned is True or not (despite mkldnn is to be used or not).
In the way it's implemented in PR, the variable value is calculated only once per function as it was prior to my changes, without additional time cost.
It applies to every occurrence in other op files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wojtuss Thank you for the comment. Having in mind baby sitting the PR-CI-OP-benchmark for more than a week on the same code, I prefer to not refactor the code in the PR. Of course if it's OK with you because I respect you opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the way it's implemented in PR, the variable value is calculated only once per function as it was prior to my changes, without additional time cost.
Got it.
I prefer to not refactor the code in the PR. Of course if it's OK with you because I respect you opinion.
@wojtuss What's your opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally understandable.
The accuracy of multiply op has been exposed before, and the threshold has been changed, so no more errors will be reported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Others
PR changes
Others
Describe
More precise mkldnn kernel choice in GetExpectedKernelType based also on kernel's registered data type