-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add is_mean param for mean op #40757
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
@@ -65,9 +65,10 @@ class MeanCUDAKernel : public framework::OpKernel<T> { | |||
for (decltype(rank) i = 0; i < rank; ++i) { | |||
reduce_dims.push_back(i); | |||
} | |||
TensorReduceImpl<T, T, kernel_primitives::AddFunctor, Div>( | |||
context.cuda_device_context(), *input, output, Div(numel), reduce_dims, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里前面定义的Div是不是可以删掉了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经删除了调用的是IdentityFunctor
@@ -657,6 +658,9 @@ __global__ void ReduceAnyKernel(const Tx* x, | |||
// the last dim gets involved in reduction | |||
int store_offset = 0; | |||
int stride_left = 0; | |||
auto Final = | |||
is_mean ? kps::DivideFunctor<MPType> : kps::IdentityFunctor<MPType>; | |||
auto final_opt = Final(reduce_num); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opt一般指的是optimize的缩写吧,如果这里含义是output建议直接用out或者output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不是out,是operat,进行除法操作,或者是直接返回,只针对最后的store数据进行操作
@@ -657,6 +658,9 @@ __global__ void ReduceAnyKernel(const Tx* x, | |||
// the last dim gets involved in reduction | |||
int store_offset = 0; | |||
int stride_left = 0; | |||
auto Final = | |||
is_mean ? kps::DivideFunctor<MPType> : kps::IdentityFunctor<MPType>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对于is_mean为false的情况,感觉不用再用IdentityFunctor算一遍了,只在最后把reduce_var除一下就可以
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经修改
@@ -33,12 +33,12 @@ void TensorReduceImpl(const platform::CUDADeviceContext& dev_ctx, | |||
const framework::Tensor& x, framework::Tensor* y, | |||
const TransformOp& transform, | |||
const std::vector<int>& origin_reduce_dims, | |||
gpuStream_t stream) { | |||
gpuStream_t stream, bool is_mean = false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_mean要不要加上const
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的 下个PR再修改
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Others
PR changes
OPs
Describe
Add is_mean param for mean op
1.针对mean OP 添加is_mean参数,保证完成所以数据求和之后再进行除法操作。
2.修改reduceHigher的grid配置,当grid.z > 65536时候设置reduce_type为reduceAny
修改背景,1. fp16状态下,模型出现nan, 2, 模型case[6600,:,:] 计算错误 axis = 1