-
Notifications
You must be signed in to change notification settings - Fork 690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize nchw MaxPooling #7426
Optimize nchw MaxPooling #7426
Conversation
@@ -289,20 +294,39 @@ class MaxPool2dKernel final : public user_op::OpKernel { | |||
const MaxPoolingParams3D& params_3d = pooling_cache->GetParams3D(); | |||
|
|||
const int64_t elem_num = y->shape().elem_cnt(); | |||
// const int32_t elem_num = y->shape().elem_cnt(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个注释还要吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已去除
@@ -50,6 +54,12 @@ struct DeviceAdd { | |||
}; | |||
}; | |||
|
|||
#ifdef WITH_CUDA | |||
|
|||
OF_DEVICE_FUNC int32_t device_min(int32_t a, int32_t b) { return a <= b ? a : b; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个函数在这个pr用了吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
device_min好像没用到?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已去除
Speed stats:
|
* first debug * fix maxpool * fix bug * remove redundant code * remove redundant read * remove redundant data_ptr offset * use int32 to describe x shape * Fix cuda input params for maxpool2d * just for debug * just for profile * reduce div * use int32_t indice * revert back to use int64_t * fix maxpool1d 3d * optimize backward * fix all optimize. TODO: NHWC * fix comment Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
测试平台:
A100,cuda11.4
backward这里,torch在1d/2d均采用自己的一套reduce操作,3d使用的是atomic_add。
而我们使用的都是atomic_add,所以这里3d情况下差距不大
TODO:NHWC的优化