Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【PaddlePaddle Hackathon 4 No.53】:为 Paddle label_smooth 支持 float16 数据类型 #50921

Closed
wants to merge 10 commits into from

Conversation

thunder95
Copy link
Contributor

PR types

Performance optimization

PR changes

OPs

Describe

为label_smooth 新增float16 数据类型

测试设备:RTX 2070s

目前label_smooth前向和反向推理性能测试:

Case No. input_shape fp32(ms) fp16(ms) diff relative diff
1 [7,8,9] 0.00617 0.0067 -0.00053 slower than 7.9%
2 [16, 512, 31, 31] 0.29677 0.15227 0.1445 faster than 94.90%

中文API文档更新支持fp16数据类型: PaddlePaddle/docs#5642

@paddle-bot
Copy link

paddle-bot bot commented Feb 26, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added contributor External developers status: proposed labels Feb 26, 2023
@thunder95 thunder95 changed the title PaddlePaddle Hackathon 4 No.53】:为 Paddle label_smooth 支持 float16 数据类型 【PaddlePaddle Hackathon 4 No.53】:为 Paddle label_smooth 支持 float16 数据类型 Feb 26, 2023
np.testing.assert_allclose(y_np_1, y_np_2, rtol=1e-03)
np.testing.assert_allclose(x_g_np_1, x_g_np_2, rtol=1e-03)


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要按照低精度单测规范,添加Op的单测。另外,添加完OpTest后,这里的case可以去掉了。https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/dev_guides/amp_precision/amp_test_dev_guide_cn.html#step2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

@@ -28,7 +30,8 @@ struct LabelSmoothGradFunctor {
}

__device__ __forceinline__ T operator()(const T x) const {
return static_cast<T>(1 - epsilon) * x;
using MT = typename phi::dtype::MPTypeTrait<T>::Type;
return static_cast<T>((1 - static_cast<MT>(epsilon)) * static_cast<MT>(x));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

结合上面的代码。这里的epsilon在上面29行已经被cast到了FP16,这里又cast回FP32。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已调整

using MT = typename phi::dtype::MPTypeTrait<T>::Type;
return static_cast<T>((1 - static_cast<MT>(epsilon)) * static_cast<MT>(x) +
static_cast<MT>(epsilon) /
static_cast<MT>(label_dim));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

epsilon问题同上

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

@@ -137,6 +137,9 @@ void PreluChannelWiseDirectCUDAFunctor<T>::operator()(gpuStream_t stream,
stream>>>(
input, alpha, output, channel, numel);
} else {
printf("debug: spatial: %d, ch_num: %d\n",
static_cast<int>(numel / batch_size / channel),
static_cast<int>(channel));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

与PR的功能无关?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已移除

@zhangting2020
Copy link
Contributor

记得提交中文文档的修改,并且在这个PR的描述中引用。

@thunder95
Copy link
Contributor Author

记得提交中文文档的修改,并且在这个PR的描述中引用。

@zhangting2020 中文文档以修改,docs的PR链接已在描述中。

}

__device__ __forceinline__ T operator()(const T x) const {
return static_cast<T>(1 - epsilon) * x;
using MT = typename phi::dtype::MPTypeTrait<T>::Type;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

34行可以删掉,因为上面已有

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

if core.is_compiled_with_cuda():
place = core.CUDAPlace(0)
if core.is_float16_supported(place):
self.check_output_with_place(place, atol=1e-3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpTest Class中,为FP16设置了默认的atol为1e-3。另外在不支持的设备会自动跳过单测,所以test_check_output可以直接删掉,使用基类实现即可。

下面的test_check_grad可以去掉关于fp16支持情况的判断,简化代码

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

@luotao1
Copy link
Contributor

luotao1 commented Mar 14, 2023

close due to the following PR is merged:

@luotao1 luotao1 closed this Mar 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants