Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mkldnn layout #11040

Merged
merged 9 commits into from
Jun 7, 2018
Merged

Conversation

mozga-intel
Copy link
Contributor

@mozga-intel mozga-intel commented May 30, 2018

This PR contains changes required to support MKLDNN memory layouts improving performance of MKLDNN computations.

This implementation differs from the proposed one in #10291. It is based mainly on Brian Liu implementation, which is simplified version of what I proposed previously.

The latest version of the pull request is splits into a few part of code (information).

This pull request contains:

  • Implementation of the layout as a supports for mkldnn's operators.

Pull-request is related to:

return 0; \
}

#define REGISTER_OP_KERNEL_WITH_LAYOUT(op_type, LIBRARY_TYPE, LAYOUT, \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you REGISTER_OP_KERNEL_WITH_LAYOUT? I could not find the usage of it.

Copy link
Contributor Author

@mozga-intel mozga-intel May 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to your previous proposition about splitting the pull-request into a few small parts of the code, this one contains only the changes required to support MKLDNN operators, but it does not include example how this #define is used. Could you have a look at this code (#usage)

@@ -189,6 +206,15 @@ class OpKernelRegistrar : public Registrar {
__attribute__((unused)) = \
TouchOpKernelRegistrar_##op_type##_##LIBRARY_TYPE()

#define USE_OP_DEVICE_KERNEL_EXTEND(op_type, LIBRARY_TYPE, LAYOUT) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you USE_OP_DEVICE_KERNEL_EXTEND ? I could not find the usage of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please have a look at this code

Copy link
Contributor

@tensor-tang tensor-tang Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "EXTEND"? Is that specifically for MKLDNN?

Or can you use some existed macro instead?

Do you have any comment about adding #define USE_OP_DEVICE_KERNEL_EXTEND? @jacquesqiao

Copy link
Contributor Author

@mozga-intel mozga-intel Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is a specific way for the MKLDNN's. This is the way to tell the MKLDNN's from the CPU's platform. These functions that we see are a little bit different, therefore I can't use any existing functions. It gives us an ability to register the new MKLDNN's operator which has supports of the layout.

Copy link
Member

@jacquesqiao jacquesqiao Jun 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discuss with @tensor-tang , we think that currently all mkldnn kernels have only one layout MKLDNN, so we can use LIBRARY to mark and distinguish them, mkldnn_kernels can be choosed by this library flag, data transform can use the tensor.layout_ to decide if it need to do transform.

It seems that currently all our work can be done without register layout to op_kernel_registry, so can we just use LIBRARY for now and move forward on our work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jacquesqiao done.

class Tensor {
class Tensor
#ifdef PADDLE_WITH_MKLDNN
: public MKLDNNTensor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not a good way let Tensor : public MKLDNNTensor

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be

class Tensor {
#ifdef PADDLE_WITH_MKLDNN
  mkl_fileds;
  void mkldnn_method();
#endif
};

is a better way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jacquesqiao and @tensor-tang, The inheritance was removed from the code, and the all methods was moved from mkldnn_tensor to tensor file

const std::vector<int>& ksize,
const std::vector<int>& strides,
const std::vector<int>& paddings,
const std::string& suffix) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you modify the parameter order of strides and paddings? Could pool_mkldnn_op.cc remain as before?

Copy link
Contributor Author

@mozga-intel mozga-intel Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified these line of code, because the cpplint gives the piece of information that the code must be modified. I needn't have to did these changes. Likely that is relate to the compiler version which is used by me.

using mkldnn::primitive;
using mkldnn::reorder;

void* get_data_from_tensor(const Tensor& tensor,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about get_data_from_tensor be GetDataFromTensor? Since we use Camel Case in naming.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
}

void TransDataLayoutMkldnn(const OpKernelType& kernel_type_for_var,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about TransDataLayoutMkldnn be TransDataLayoutMKLDNN? Since @wangkuiyi suggest in #3337 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
}

inline MKLDNNDataType to_mkldnn_data_type(const std::type_index type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about to_mkldnn_data_type be ToMKLDNNDataType? Since we use Camel Case in naming.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

using mkldnn::primitive;
using mkldnn::reorder;

void* get_data_from_tensor(const Tensor& tensor,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetDataFromTensor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


PADDLE_ENFORCE(
in_layout == DataLayout::kMKLDNN && out_layout != DataLayout::kMKLDNN,
"TransDataLayoutMkldnn only supports transfrom from MKLDNN to "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transform

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the function name is not appropriate according to this comment.

Copy link
Contributor Author

@mozga-intel mozga-intel Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I changed this name of the function: TransDataLayoutFromMKLDNN

kernel_type_for_var.data_layout_)) {
TransDataLayout(kernel_type_for_var, expected_kernel_type, in, &out);
if (NeedTransformLayout(lout, lin)) {
#ifdef PADDLE_WITH_MKLDNN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any better idea do not use #ifdef PADDLE_WITH_MKLDNN since Paddle itself supports kMKLDNN.

As you can see

enum class LibraryType {
  kPlain = 0,
  kMKLDNN = 1,
  kCUDNN = 2,
};

Copy link
Contributor Author

@mozga-intel mozga-intel Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a temporary solution, why I had to use this #ifdef flag - I'm waiting for mkldnn flag (i.e a global flag). The team of PaddlePaddle working on this problem to make this flag as a global flag - for all mkldnn operators. Please have a look at this issue, where was touched this topic.

Copy link
Contributor

@tensor-tang tensor-tang Jun 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mozga-intel , I can understand that's a temporary solution. But I am afraid that #10765 is not trying to deal with this one.

Maybe you can directly remove #if here since cudnn do not have global flag as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tensor-tang done.

(l != DataLayout::kAnyLayout && r != DataLayout::kAnyLayout && l != r);
#ifdef PADDLE_WITH_MKLDNN
// Layout transform needed for either non-MKLDNN to MKLDNN or vice versa
ret |= (l != DataLayout::kMKLDNN && r == DataLayout::kMKLDNN);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any better solution?
It seems have some duplicated logic with

if (NeedTransformLayout(lout, lin)) {
#ifdef PADDLE_WITH_MKLDNN
   if (lin == DataLayout::kMKLDNN || lout == DataLayout::kMKLDNN) {
...

@@ -189,6 +206,15 @@ class OpKernelRegistrar : public Registrar {
__attribute__((unused)) = \
TouchOpKernelRegistrar_##op_type##_##LIBRARY_TYPE()

#define USE_OP_DEVICE_KERNEL_EXTEND(op_type, LIBRARY_TYPE, LAYOUT) \
Copy link
Contributor

@tensor-tang tensor-tang Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "EXTEND"? Is that specifically for MKLDNN?

Or can you use some existed macro instead?

Do you have any comment about adding #define USE_OP_DEVICE_KERNEL_EXTEND? @jacquesqiao

@mozga-intel mozga-intel force-pushed the mozga-intel/mkldnn-layout branch 3 times, most recently from f82e9d1 to ae23bb6 Compare June 6, 2018 09:32
@luotao1
Copy link
Contributor

luotao1 commented Jun 6, 2018

please merge the latest code to pass the test_listen_and_serv_op in teamcity.

jianhang-liu and others added 9 commits June 7, 2018 08:08
Add MKLDNN layout in Paddle so that MKLDNN friendly memory layout
can be used in MKLDNN enabled OP kernel. Before this commit, NCHW
is hardcode to be used in all MKLDNN op kernels. As a result,
non-optimized execution path is selected in MKLDNN primitive which
bring worse performance.
Besides framework change, three MKLDNN OP kernels were updated
for using new MKLDNN layout. They are conv/pool2d/batch_norm.
Other MKLDNN OP kernels need be also updated in similar way to
achieve best performance.
@mozga-intel
Copy link
Contributor Author

@luotao1 Done.

Copy link
Contributor

@tensor-tang tensor-tang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your great work @mozga-intel , but a little question @luotao1 please help check is that acceptable.

// Fix me: here just change the default layout to kNCHW
// it doesn't fix the real issue, i.e. feeder should set up tensor layout
// according to actual input data
DataLayout layout_ = DataLayout::kNCHW;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this acceptable? @luotao1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mozga-intel @tensor-tang
Discussed with @qingqing01, it's acceptable to modify the default layout.

Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much!

@luotao1 luotao1 merged commit 3ff9ba0 into PaddlePaddle:develop Jun 7, 2018
@mozga-intel
Copy link
Contributor Author

@luotao1 @tensor-tang @jacquesqiao Thanks very much, great work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants