fix resnet50 usetime statistics #4838

wanghuancoder · 2020-09-07T13:07:13Z

Fetch优化后，resnet50、resnet101Benchmark性能突增，原因是模型运行时间统计方式错误。

一、Paddle背景情况说明

Paddle exe.run()并非是完整的完成一轮训练返回，而是在成功Fetch用户所需数据后即返回。但有两种场景会通过cudaStreamSynchronize等待所有Kernel执行完成，exe.run()才返回：
1）DropLocalExeScopes清理Scope时；
2）ParallelExecutor析构时；
因此，可能会出现有的exe.run()很快，有些exe.run()很慢的情况。如下图所示：

注：图中，第1轮用时=第1轮前向计算用时，第10轮用时=第9轮反向用时+第10轮前向用时+第10轮反向用时。
因此，Paddle中exe.run()的时间，可能是不均衡的。

二、Fetch修改对Resnet影响

Fetch修改前Resnet50的timeline如下：

修改后如下：

Fetch修改前，由于Fetch阻塞，导致exe.run()不能提早结束。因此每轮exe.run()时间都等于其前向时间+反向时间。
而修改后，出现了exe.run()时间不均衡。
由于exec_strategy.num_iteration_per_drop_scope设置为10。因此，每10轮中，第1轮耗时短，第10轮耗时长，其余8轮时间基本均衡。

三、时间统计漏洞

Benchmark通过抓取模型日志中的耗时，计算speed。
Resnet模型print_step=10，而且每次只打印最后一轮耗时，而非10轮的平均值。
恰好每次都统计到了耗时短的一次，造成了性能的假突增：

四、Fetch修改前后，实际性能情况

模型	修改前（800setp总时间）	修改后	性能提升
Resnet50	84.8940119743	83.2191288471	1.97%
Resnet101	135.613070011	135.286768198	0.24%

shippingwang

Good job!

wanghuancoder added 3 commits September 7, 2020 13:01

fix resnet50 time statistics, test=develop

b750e9f

fix resnet50 time statistics, test=develop

c620e3a

fix resnet50 time statistics, test=develop

1150454

shippingwang approved these changes Sep 8, 2020

View reviewed changes

shippingwang merged commit a00c8af into PaddlePaddle:develop Sep 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix resnet50 usetime statistics #4838

fix resnet50 usetime statistics #4838

wanghuancoder commented Sep 7, 2020

shippingwang left a comment

fix resnet50 usetime statistics #4838

fix resnet50 usetime statistics #4838

Conversation

wanghuancoder commented Sep 7, 2020

一、Paddle背景情况说明

二、Fetch修改对Resnet影响

三、时间统计漏洞

四、Fetch修改前后，实际性能情况

shippingwang left a comment

Choose a reason for hiding this comment