-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ResnetUnitOp implemented by cuDNN fused op(backend code) #35557
Conversation
Thanks for your contribution! |
e6b08e5
to
70976f5
Compare
0808640
to
083598d
Compare
|
||
// get paddle conv2d op results as baseline | ||
template <typename T> | ||
void GetConv2DOp(const std::vector<T> &x, const std::vector<T> &w, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
该函数不是为了拿到一个conv2d op,而是为了拿到conv2d op的计算结果,函数名需正确体现函数的功能。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改成 Conv2DForwardCompute
platform::FilterDescriptor filter_desc_; | ||
platform::TensorDescriptor out_desc_; | ||
platform::TensorDescriptor out_stats_desc_; | ||
platform::ConvolutionDescriptor conv_desc_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
发现原来还有个cudnn_helper.h
文件,且那个文件引用的多一些,其中有ScopedTensorDescriptor
、ScopedFilterDescriptor
、ScopedConvolutionDescriptor
,后续PR可以考虑看这些实现接口是否可用。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
看了一下这个文件,里面的这几个接口用起来限制条件更多一点,而且对于 conv 的覆盖情况不全,暂时还用cudnn_desc.h
这里的接口吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cudnn_desc.h
和cudnn_helper.h
功能重复,我倾向于只保留一个,后续还是可以考虑一下,有需要什么功能也可以加到cudnn_helper.h
里面。
@@ -0,0 +1,95 @@ | |||
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个文件在这个PR中没有用到,不要在这个PR里面添加。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已删除
|
||
void Forward(const platform::CUDADeviceContext &ctx, T *input_ptr, | ||
T *filter_ptr, T *output_ptr, float *sum_ptr, | ||
float *sum_of_squares_ptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
比较倾向于传Tensor
,而不是裸指针。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
因为最终的resnet_unit_op.cu中是三个OP组合在一起的,所以如果都传Tensor的话,有很多代码都是重复的,而传指针只需要定义一次就可以反复使用
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不太理解,后续PR中再看看吧。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. 另外,确认有哪个CI跑到了这个单测吗?
|
||
void Forward(const platform::CUDADeviceContext &ctx, T *input_ptr, | ||
T *filter_ptr, T *output_ptr, float *sum_ptr, | ||
float *sum_of_squares_ptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不太理解,后续PR中再看看吧。
|
||
#if CUDNN_VERSION >= 8000 | ||
template <typename T> | ||
class CudnnNormConvolutionOp { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
该类并没有对应到一个Paddle的OP?所以不建议类名中加Op
。
platform::FilterDescriptor filter_desc_; | ||
platform::TensorDescriptor out_desc_; | ||
platform::TensorDescriptor out_stats_desc_; | ||
platform::ConvolutionDescriptor conv_desc_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cudnn_desc.h
和cudnn_helper.h
功能重复,我倾向于只保留一个,后续还是可以考虑一下,有需要什么功能也可以加到cudnn_helper.h
里面。
kernel_size_ = 1; | ||
stride_ = 1; | ||
pad_ = 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个默认构造函数没有必要?
output_channels_ = output_channels; | ||
kernel_size_ = kernel_size; | ||
stride_ = stride; | ||
pad_ = (kernel_size_ - 1) / 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pad
确定使用这种计算的方式?是只支持这种配置?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
因为只支持kernel_size=1 or 3,且输入输出的h和w保持不变,所以pad不需要外面传入,内部这样算就可以,resnet50组网中也是这样算的
float *sum_of_squares_ptr = sum_of_squares_.mutable_data<float>(place_); | ||
|
||
std::shared_ptr<op::CudnnNormConvolutionOp<T>> conv_op( | ||
new op::CudnnNormConvolutionOp<T>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里直接用op::CudnnNormConvolutionOp<T> conv_op;
就行了吧。
ctx_->Wait(); | ||
} | ||
|
||
void Run() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
倾向于dev_ctx
通过参数传进来。
PR types
New features
PR changes
OPs
Describe
使用 cuDNN 的 fused op 接口实现 resnet_unit_op,此 PR 为后端代码。
因为conv的计算为half类型,最多可表示小数位为3位,因此在单测中使用的阈值为1e-3.
在CI-Py3中会跑到此新增单测,结果如下