-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【Hackathon No.52】为 Paddle dist 算子实现 float16 数据类型支持 #50915
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
b1ee7df
to
9edcfa2
Compare
e60a97c
to
8ff0da6
Compare
1ec1d67
to
f7d2bef
Compare
@zhangting2020 大部分CI都过了,能再review一下吗 |
@zhangting2020 补充了性能数据 |
val_ret += __shfl_xor(val_ret, mask, warpSize); | ||
#endif | ||
return static_cast<phi::dtype::float16>(val_ret); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里之前的修改版本,调用原始的实现不是可以正常编译通过和运行吗?
区别主要是else分支转成了fp32?这种场景不需要在算子层面去处理
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里原有实现的编译不过的原因是传入的是phi::dtype::float16,而cuda的函数参数半精度是__half,所以做了一个模板特化处理fp16。这块不太清楚fp16和cuda的half在框架里面是怎么衔接的。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为了支持fp16跑通是可以这么改,但是从算子的计算精度上去有更多考虑:
- 对于reduce sum这种运算fp16下容易损失精度,都是需要保持计算精度为fp32,输入输出fp16的。你需要从调用它的ReduceSumWithSubtract去看,到运行到这个函数时,输入的类型已经不应该是float16了。可能并不需要增加float16的支持。
如果你希望将这个函数写的更通用支持float16,那可以在原始的接口上稍作修改:
- 可以在这个文件中
paddle/phi/backends/gpu/cuda/cuda_device_function.h
找到CudaShuffleDownSync的接口,这是更推荐的写法。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 考虑到精度,inf/-inf path 因为是 max/min 不会有精度问题保留fp16写法,对
-inf<p<inf
的情况使用float32计算 (写得可能有点 naive,求comment 😝 - math_cuda_utils: 改成调用
CudaShuffleDownSync
兼容fp16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhangting2020 改完了,请问当前这个方案可以吗~
f888d7c
to
4922ad6
Compare
…dle#50915)" This reverts commit 9c40653.
PR types
New features
PR changes
OPs
Description
任务:#50658 (comment)
中文文档: PaddlePaddle/docs#5740
OP Performance:
[used AI Studio]