fix resnet50 usetime statistics #4838
Merged
+14
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fetch优化后,resnet50、resnet101Benchmark性能突增,原因是模型运行时间统计方式错误。
一、Paddle背景情况说明
Paddle exe.run()并非是完整的完成一轮训练返回,而是在成功Fetch用户所需数据后即返回。但有两种场景会通过cudaStreamSynchronize等待所有Kernel执行完成,exe.run()才返回:
1)DropLocalExeScopes清理Scope时;
2)ParallelExecutor析构时;
因此,可能会出现有的exe.run()很快,有些exe.run()很慢的情况。如下图所示:
注:图中,第1轮用时=第1轮前向计算用时,第10轮用时=第9轮反向用时+第10轮前向用时+第10轮反向用时。
因此,Paddle中exe.run()的时间,可能是不均衡的。
二、Fetch修改对Resnet影响
Fetch修改前Resnet50的timeline如下:
data:image/s3,"s3://crabby-images/0a5e2/0a5e29a88ca64192600d8bced32c2c210a3c79c2" alt="图片"
data:image/s3,"s3://crabby-images/cf782/cf7821d7e9b72968a6645a63e47a47ae0242c52e" alt="图片"
修改后如下:
三、时间统计漏洞
Benchmark通过抓取模型日志中的耗时,计算speed。
data:image/s3,"s3://crabby-images/b1c59/b1c5900f5a93052d468fbadb3e5ed653be0e77e7" alt="图片"
Resnet模型print_step=10,而且每次只打印最后一轮耗时,而非10轮的平均值。
恰好每次都统计到了耗时短的一次,造成了性能的假突增:
四、Fetch修改前后,实际性能情况