-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Fused BN + Add + Relu with cudnnFusedOps API. #35955
Conversation
Thanks for your contribution! |
99b1a72
to
1aed85f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM,代码后续PR继续完善。
fwd_workspace_byte_); | ||
} | ||
|
||
void Backward(const platform::CUDADeviceContext &ctx, T *dy_ptr, T *x_ptr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
感觉Forward、Backward实现在不同的类里面比较好,因为这两个Forward、Backward并不是完全对应的。
@@ -0,0 +1,292 @@ | |||
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cudnn_bn_stats_finalize.cu.h和cudnn_scale_bias_add_relu.cu.h两个文件可以考虑合并成一个。
PR types
New features
PR changes
OPs
Describe
Add bn_add_relu test
对于原始的batchnorm来说,中间所有结果的计算都是 float类型,只是在最后输出时做了一次fp32->fp16的cast;
而在融合计算中,虽然mean和std等结果都与原始相同,但是bn_finalize会输出一个fp16类型的eq_scale和eq_bias,用来进行最后的乘加,因此最终的结果会出现误差,提高单测阈值至2e-3