-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【Hackathon No.37】为 Paddle 优化 argmin_argmax op 在 GPU 上的计算性能 #256
Conversation
|
||
## 1.1 飞桨现状 | ||
|
||
当前性能如下表(基于PaddlePaddle develop分支): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里PaddlePaddle
中间多余空格需要删掉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已去掉
|
||
## 1.3 对比分析 | ||
|
||
目前Paddle与Pytorch的API设计方案几乎相同, 且底层都使用了Cub库实现。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个能解释为什么pytorch底层也用了cub但是性能差异这么大吗?如果使用reduce改写,预计性能提升4.5倍后,跟pytorch还是有比较大的差距
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ZzSean 发现paddle和pytorch实现上除了cub外,还有其他细节有些差异,已补充rfc。如果有遗漏的地方,辛苦老师多指点一下。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ZzSean 经测试fastdivmod并没有明显性能提升,block优化配置后有较明显提升但是离pytorch性能差距还比较大,重新研读了torch代码,发现新版torch底层用的reduce。
[1]. [OP Benchmark使用指南](https://github.com/PaddlePaddle/benchmark/blob/master/api/README.md) | ||
|
||
|
||
PPYDDDD111 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是不是需要删掉啊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
误写,已删除 @ZzSean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
提交argmin_argmax OP性能优化设计文档