Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapting device-specific Extra Attributes for the PHI kernel #46342

Merged
merged 43 commits into from
Nov 1, 2022

Conversation

chenwhql
Copy link
Contributor

@chenwhql chenwhql commented Sep 21, 2022

PR types

New features

PR changes

Others

Describe

Adapting device-specific Extra Attributes for the PHI kernel

After cleaned up most of the Extra parameters in OpMaker, it is necessary to remove the parameters belonging to the Extra range in the PHI Kernel parameter list. This is a necessary process to standardize the definition of the Kernel, and can also reduce the understanding cost of other hardware manufacturers accessing the Paddle Kernel. For example:

The function signature of the current conv2d kernel is as follows:

template <typename T, typename Context>
void ConvKernel(const Context& dev_ctx,
                const DenseTensor& input,
                const DenseTensor& filter,
                const std::vector<int>& strides,
                const std::vector<int>& paddings,
                const std::string& padding_algorithm,
                int groups,
                const std::vector<int>& dilations,
                const std::string& data_format,
                bool use_addto,
                int workspace_size_MB,
                bool exhaustive_search,
                DenseTensor* out);

And the Python API signature of conv2d is as follows:

def conv2d(x,
           weight,
           bias=None,
           stride=1,
           padding=0,
           dilation=1,
           groups=1,
           data_format="NCHW",
           name=None):

We can found that the following three parameters in the Kernel signature have nothing to do with the python api

                bool use_addto,
                int workspace_size_MB,
                bool exhaustive_search,

In fact, these three parameters are dedicated to the cudnn kernel of conv2d, but since there can only be one signature of the Kernel, CPU, GPU, XPU in paddle, and NPU. MLU in the CustomDevice need to have these three parameters for conv2d kernel. In fact, for these hardware, these three parameters are meaningless, which increases the understanding cost of heterogeneous Kernel development, so we need to remove such parameters. If you add MKLDNN, conv2d has more such parameters like follows:

- op : conv2d
  backward : conv2d_grad
  extra :
    attrs : [bool is_test = false, bool use_cudnn = true, bool fuse_relu_before_depthwise_conv = false, bool use_mkldnn = false,
             bool use_quantizer = false, str mkldnn_data_type = "float32", bool fuse_relu = false,
             str fuse_activation = "", float fuse_alpha = 0.0f, float fuse_beta = 0.0f, bool use_addto = false,
             bool fuse_residual_connection = false, float Scale_in = 1.0f, float Scale_out = 1.0f,
             float Scale_in_eltwise = 1.0f, 'float[] Scale_weights = {1.0f}', bool force_fp32_output = false,
             int workspace_size_MB = platform::GetDefaultConvWorkspaceSizeLimitMB(), bool exhaustive_search = false]

We cannot add such parameters to the public Kernel signature to affect the development experience of many hardware kernels. Since the parameter dedicated to a certain hardware or acceleration library, it should be passed in in a dedicated way.

However, this removal action needs to ensure that the current MKLDNN and CUDNN Kernel functions are compatible. For the Kernels of MKLDNN and CUDNN, these removed parameters are still needed at present, so they are removed from the parameter list, but need to passed into the kernel by other ways, this passing process has the following limitations:

  1. It cannot be passed in through global variables (this goes against the design principles of the phi kernel function)
  2. It cannot be passed in by any method that changes the declaration of the kernel function

Under the premise of the existence of the above restrictions, we can only pass it in through the existing parameters of the Kernel. At present, it seems more appropriate to pass in the Context dedicated to the Kernel:

  • MKLDNN-specific parameters are passed in through OneDNNContext
  • CUDNN-specific parameters are passed in through GPUContext (there is no GPUDNNContext for the time being)

This PR adopts this scheme to adapt the existing MKLDNN and CUDNN dedicated parameters.

Specifically:

  1. Add thread_local dnn_attrs_ member to store special parameters in OneDNNContext and GPUContext
  2. Normalize the parameter list of the current conv2d kernel, obtain the cudnn-specific parameters through dev_ctx in the CUDNN Kernel, and verify it through unittests
  3. According to the normalized conv2d kernel signature, migrate mkldnn's conv2d kernel to phi, and also obtain mkldnn-specific parameters through dev_ctx, and verify it through unittests

Other Changing:

  1. Add TypeInfo into all DeviceContext for judging Context type by enum value
  2. Polish the marco in enforce.h, move to suitable location
  3. Fix conv argument name typo
  4. The NPU and MLU conv2d kernel signature also need to be updated, Normalize conv2d kernel signature PaddleCustomDevice#151

TODO:

  1. Do not remove the original ConvMKLDNN(Grad)Kernel for the time being in this PR, and remove it after both depthwise_conv2d and conv3d are migrated

@paddle-bot
Copy link

paddle-bot bot commented Sep 21, 2022

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

chenwhql and others added 27 commits September 21, 2022 11:40
zyfncg
zyfncg previously approved these changes Oct 26, 2022
Copy link
Contributor

@zyfncg zyfncg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

{"fuse_alpha", ExtraAttrProperty::ONEDNN},
{"fuse_beta", ExtraAttrProperty::ONEDNN},
{"fuse_brelu", ExtraAttrProperty::ONEDNN},
{"fuse_brelu_threshold", ExtraAttrProperty::ONEDNN},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fuse_brelu or fuse_brelu_threshold are no longer used by any oneDNN kernel. Is there any compatibility issue preventing us from deleting these attrs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx, I deleted these two attributes, if there are other attributes that are deprecated, they can also be removed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will update this list along with second phase of kernel migration

{"use_cudnn", ExtraAttrProperty::SCHEDULE},
{"use_mkldnn", ExtraAttrProperty::SCHEDULE},
// ONEDNN dedicated attributes
{"Bias", ExtraAttrProperty::ONEDNN},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can new extra oneDNN-specific attributes be added in future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the process of kernel migration, if there are missing attributes, you can add them temporarily . However, in the long run, it is recommended to add fusion op and kernel, and replace the standard op with fusion op in the pass processing stage to support additional functions, so as to avoid the increasingly complex implementation of the basic Op kernel.

@Silv3S
Copy link
Member

Silv3S commented Oct 26, 2022

We've run first performance and functional tests and this PR passed everything with expected results. We will run one more test on other platform, to make sure that everything is ok. Tests will be finished by the end of this week.
Meanwhile we tried to migrate softmax kernel with changes from this branch and everything is working as expected 👍 #47339

Co-authored-by: Sławomir Siwek <slawomir.siwek@intel.com>
@chenwhql
Copy link
Contributor Author

@Silv3S Hello, are the tests finished?

@Silv3S
Copy link
Member

Silv3S commented Oct 31, 2022

LGTM
There are small differences in accuracy (some positive, some negative). We will run more test after this PR is merged, as they are more stable.

Copy link
Contributor

@qili93 qili93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@XiaoguangHu01 XiaoguangHu01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chenwhql chenwhql merged commit c923e6c into PaddlePaddle:develop Nov 1, 2022
Comment on lines +330 to +338
const std::vector<std::string>& GetInputsName(
const std::string& input) const {
auto it = inputs_name_.find(input);
PADDLE_ENFORCE_NE(it,
inputs_name_.end(),
phi::errors::NotFound(
"OneDnnContext does not have the input %s.", input));
return it->second;
}
Copy link
Contributor

@sfraczek sfraczek Nov 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think name of this function is similar to SetInputsName but getter doesn't return what setter is setting which might be confusing at first sight. This could have a more distinct or precise name.
The same goes for SetOutputsName and GetOutputsName.

Comment on lines +359 to +367
// Holds some attributes only used by the onednn kernel calculation
// Since original mkldnn op kernel directly adds the operations that require
// fusion to the native kernel operations, and uses the attribute `fuse_xxx`
// to control, for onednn, there will be some attributes that seem to be
// independent of the device are also saved here.
// Here, the operation of fusion needs to be implemented separately as
// a fusion op and kernel, instead of patching it to a basic operation.
// Because DeviceContext is a global singleton, you need to ensure thread
// safety, use the thread_local variable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Holds some attributes only used by the onednn kernel calculation
// Since original mkldnn op kernel directly adds the operations that require
// fusion to the native kernel operations, and uses the attribute `fuse_xxx`
// to control, for onednn, there will be some attributes that seem to be
// independent of the device are also saved here.
// Here, the operation of fusion needs to be implemented separately as
// a fusion op and kernel, instead of patching it to a basic operation.
// Because DeviceContext is a global singleton, you need to ensure thread
// safety, use the thread_local variable
// Holds some attributes only used by the onednn kernel calculation.
// Since original onednn op kernel directly adds the operations that require
// fusion to the native kernel operations, and uses the attribute `fuse_xxx`
// to control, for onednn, there will be some attributes that seem to be
// independent of the device are also saved here.
// Here, the operation of fusion needs to be implemented separately as
// a fusion op and kernel, instead of patching it to a basic operation.
// Because DeviceContext is a global singleton, you need to ensure thread
// safety, use the thread_local variable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants