-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Add Sub Mul Max Min Pow binary functors in elementwise system #33050
Support Add Sub Mul Max Min Pow binary functors in elementwise system #33050
Conversation
Thanks for your contribution! |
…mesLim-sy/Paddle into Adding_binary_functor_support
auto* z = ctx.Output<framework::LoDTensor>("Out"); | ||
z->mutable_data<T>(ctx.GetPlace()); | ||
int axis = ctx.Attr<int>("axis"); | ||
axis = axis == -1 ? std::abs(x->dims().size() - y->dims().size()) : axis; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这行axis的换算也可以放到LaunchElementwiseCudaKernel
里面?
@@ -35,8 +35,8 @@ class ElementwiseMaxKernel : public framework::OpKernel<T> { | |||
|
|||
z->mutable_data<T>(ctx.GetPlace()); | |||
int axis = ctx.Attr<int>("axis"); | |||
ElementwiseComputeEx<MaxFunctor<T>, DeviceContext, T>(ctx, x, y, axis, | |||
MaxFunctor<T>(), z); | |||
ElementwiseComputeEx<MaxFunctor<T>, platform::CPUDeviceContext, T>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这是一个通用的实现,不是一个特化的实现,不要改这里。后面如果想删除ElementwiseComputeEx
的GPU实现代码,可以把ElementwiseMaxKernel
在.h里面声明,CPU Kernel在.cc里面特化,GPU Kernel在.cu里面特化。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
按照建议修改
public: | ||
void Compute(const framework::ExecutionContext& ctx) const override { | ||
auto x_var = ctx.InputVar("X"); | ||
PADDLE_ENFORCE_EQ(x_var != nullptr, true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个检查可以用PADDLE_ENFORCE_NOT_NULL
。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
按照建议修改
"X's type[%s] is not supported by elementwise_op. X's type should be " | ||
"LoDTensor or SelectedRows.", | ||
framework::ToTypeName(x_var->Type()))); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不要复制粘贴大段代码,L41 - L70写个函数封装一下。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L41 - L70也可以封装到PackTensorsIntoVector
函数里面。
…mesLim-sy/Paddle into Adding_binary_functor_support
paddle/fluid/operators/elementwise/elementwise_op_broadcast.cu.h
Outdated
Show resolved
Hide resolved
"X's type[%s] is not supported by elementwise_op. X's type should be " | ||
"LoDTensor or SelectedRows.", | ||
framework::ToTypeName(x_var->Type()))); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L41 - L70也可以封装到PackTensorsIntoVector
函数里面。
… Adding_binary_functor_support
4f6d870
to
fcb5869
Compare
fcb5869
to
cd40092
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. 还有些代码层面的优化,可以下个PR再做。
PR types
Performance optimization
PR changes
OPs
Describe
Basing on new elementwise + broadcast system, support binary functors below :
Add Sub Mul Max Min Pow
The performance Optimizaiton can be shown in table below:
As can be seen from the table, the performance of supported ops has improved a lot in most cases and catches up with Pytorch after adopting new elementwise + broadcast system. However, the old broadcast subsystem works better in the 2nd test case, the old broadcast subsystem consists of perf-optimized branch and common broadcast branch, the former one works well when the quantity of input tensor data is not big enough and the input tensor`s dim meet the special demands. Apparently, 2nd test case perfectly meets the that demands and data quantity is relatively small, therefore, it beats the new elementwise + broadcast system in 2nd case. But introducing of this branch in old broadcast-elementwise dose make the code less compactness and hard to maintain, and working area of it is not big enough. Furthermore, the elementwise op in paddle is suggested to dealing with NN whose data quantity is often large, so i suggest Approval of 2nd test case and adopt the new elementwise + broadcast system in these binary op.