-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【PFCC算子性能优化】 SeluKernel Optimization #44490
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
float scale; | ||
float alpha; | ||
double zero = static_cast<double>(0.0f); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对于以下内置变量可以用private
关键字标注下:
private :
float scale;
float alpha;
double zero = static_cast<double>(0.0f);
float scale; | ||
float alpha; | ||
T zero = static_cast<T>(0.0f); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同下
代码上没啥问题了,按照这个链接中的表述,简单在PR Comment里面补充描述下吧: |
已修改 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Performance optimization
PR changes
OPs
Describe
[PFCC算子性能优化] Selu Kernel性能优化,抛弃原有Eigen写法,放入activation_kernel文件中。
优化文档PR链接:PaddlePaddle/community#169
当前性能如下表:
PyTorch性能如下:
通过使用飞桨内部的Elementwise Kernel来进行计算。通过向量化读取、向量化写入以及gpu_launch_config.h中的线程配置方法对算子进行优化。
完成优化后,Paddle与优化前的Paddle的性能对比效果如下,达到了预期性能提升效果(提升>=5%):
完成优化后,Paddle与Pytorch的性能对比效果如下,在fp32情况下基本与Pytorch持平,在fp64情况下提升较大 :