Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More precise mkldnn kernel rules in GetExpectedKernelType #29840

Merged

Conversation

arlesniak
Copy link
Contributor

@arlesniak arlesniak commented Dec 22, 2020

PR types

Others

PR changes

Others

Describe

More precise mkldnn kernel choice in GetExpectedKernelType based also on kernel's registered data type

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@jczaja jczaja added the Intel label Dec 22, 2020
@arlesniak arlesniak changed the title More precise mkldnn kernel choice in GetExpectedKernelType More precise mkldnn kernel rules in GetExpectedKernelType Dec 23, 2020
paddle/fluid/framework/operator.cc Outdated Show resolved Hide resolved
paddle/fluid/framework/operator.cc Outdated Show resolved Hide resolved
@arlesniak arlesniak force-pushed the arlesniak/more_precise_kernel_choice branch from 5c15712 to 235afda Compare January 11, 2021 20:50
@arlesniak arlesniak force-pushed the arlesniak/more_precise_kernel_choice branch from a912b3e to 3889476 Compare January 14, 2021 08:59
@arlesniak
Copy link
Contributor Author

arlesniak commented Jan 14, 2021

@luotao1 PR-CI-OP-benchmark fails with this PR, however there are no performance changes inside. I restarted it several times but every time it gives another list of errors i.e.:
previous:
2021-01-13 02:00:19 [check_op_benchmark_result.py:150] [ERROR] Check speed result with case "minimum_2(backward)" failed.
2021-01-13 02:00:19 [check_op_benchmark_result.py:154] [ERROR] Check accuracy result with case "multiply_7(backward)" failed.
2021-01-13 02:00:19 [check_op_benchmark_result.py:154] [ERROR] Check accuracy result with case "multiply_2(backward)" failed.
2021-01-13 02:00:19 [check_op_benchmark_result.py:154] [ERROR] Check accuracy result with case "divide_4(backward)" failed.
latest:
2021-01-14 22:19:03 [check_op_benchmark_result.py:150] [ERROR] Check speed result with case "subtract_1(forward)" failed.
2021-01-14 22:19:03 [check_op_benchmark_result.py:150] [ERROR] Check speed result with case "subtract_7(forward)" failed.
2021-01-14 22:19:03 [check_op_benchmark_result.py:150] [ERROR] Check speed result with case "pow_4(forward)" failed.
2021-01-14 22:19:03 [check_op_benchmark_result.py:150] [ERROR] Check speed result with case "pow_2(backward)" failed.
2021-01-14 22:19:03 [check_op_benchmark_result.py:154] [ERROR] Check accuracy result with case "multiply_7(backward)" failed.
2021-01-14 22:19:03 [check_op_benchmark_result.py:154] [ERROR] Check accuracy result with case "multiply_4(backward)" failed.
2021-01-14 22:19:03 [check_op_benchmark_result.py:154] [ERROR] Check accuracy result with case "multiply_2(backward)" failed.

AFAIK the resulting log from the benchmark machine has checks about GPU ops, in the log there is no oneDNN verbose info, so it looks that oneDNN kernels are not run, which would eventually be correlated with the PR.

Could you advise on that, please?

@arlesniak
Copy link
Contributor Author

After 10 restarts CI passed :)
@luotao1 Could you start your review please ?

@arlesniak
Copy link
Contributor Author

@luotao1 could you please start your review?

@arlesniak
Copy link
Contributor Author

@luotao1 Could you please start your review? PR-CI-Approval will not pass because many files were modified, but most of them it is single line of change.

Copy link

@wojtuss wojtuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When in the class derived from the OperatorWithKernel class you do not need to prepend the base class' public method calls with OperatorWithKernel:: nor this-> (applies to CanMKLDNNBeUsed(...) and IndicateVarDataType(...)) in particular). Although there are plenty of such calls in the original code, I would stick to the cleaner approach and skip the redundant prefixes.
If you do not agree, that's fine. LGTM then :-)

@@ -93,6 +93,7 @@ framework::OpKernelType GetKernelType(const framework::ExecutionContext& ctx,
const std::string& name) {
framework::LibraryType library{framework::LibraryType::kPlain};
framework::DataLayout layout = framework::DataLayout::kAnyLayout;
auto data_type = oper.IndicateVarDataType(ctx, name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK the resulting log from the benchmark machine has checks about GPU ops, in the log there is no oneDNN verbose info, so it looks that oneDNN kernels are not run, which would eventually be correlated with the PR.
Could you advise on that, please?

Although op-benchmark-ci checks about GPU ops, does line 96 have additional time cost? How about move it into line 109?

if (library == framework::LibraryType::kPlain && it != oper.Attrs().end()) {
  auto data_type = oper.IndicateVarDataType(ctx, name);
  if (oper.CanMKLDNNBeUsed(ctx, data_type)) {
    xxxx
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luotao1 Thank you for the comment. The data_type variable is used also in line 119:
return framework::OpKernelType(data_type, ctx.GetPlace(), layout, library);

So it will be needed/calculated at the end of the function, whether the condition you mentioned is True or not (despite mkldnn is to be used or not).

In the way it's implemented in PR, the variable value is calculated only once per function as it was prior to my changes, without additional time cost.
It applies to every occurrence in other op files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wojtuss Thank you for the comment. Having in mind baby sitting the PR-CI-OP-benchmark for more than a week on the same code, I prefer to not refactor the code in the PR. Of course if it's OK with you because I respect you opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the way it's implemented in PR, the variable value is calculated only once per function as it was prior to my changes, without additional time cost.

Got it.

I prefer to not refactor the code in the PR. Of course if it's OK with you because I respect you opinion.

@wojtuss What's your opinion?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally understandable.

@luotao1
Copy link
Contributor

luotao1 commented Jan 25, 2021

@Avin0323 @GaoWei8 Please see how to make op-benchmark-ci more stable? The same commitid have two different results.
图片

@GaoWei8
Copy link
Contributor

GaoWei8 commented Jan 25, 2021

@Avin0323 @GaoWei8 Please see how to make op-benchmark-ci more stable? The same commitid have two different results.
图片

The accuracy of multiply op has been exposed before, and the threshold has been changed, so no more errors will be reported.
As for the speed issue, if rerun can pass, it is not considered as a performance issue.

Copy link

@wojtuss wojtuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luotao1 luotao1 merged commit 5bf25d1 into PaddlePaddle:develop Jan 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants