Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support leaky_relu half kernel #7569

Merged
merged 25 commits into from
Feb 26, 2022
Merged

support leaky_relu half kernel #7569

merged 25 commits into from
Feb 26, 2022

Conversation

BBuf
Copy link
Contributor

@BBuf BBuf commented Feb 22, 2022

RT。

图片

oneflow/user/kernels/activation_kernels.cu Outdated Show resolved Hide resolved
oneflow/user/kernels/activation_kernels.cu Outdated Show resolved Hide resolved
oneflow/user/kernels/activation_kernels.h Outdated Show resolved Hide resolved
oneflow/user/kernels/activation_kernels.cu Outdated Show resolved Hide resolved
oneflow/user/kernels/activation_kernels.cu Outdated Show resolved Hide resolved
@MARD1NO
Copy link
Contributor

MARD1NO commented Feb 23, 2022

TODO:需要到刷榜的程度,再分开成两种求导方式,一种是alpha为正的时候,使用输出来求。一种是alpha为负的时候,使用输入来求

@BBuf
Copy link
Contributor Author

BBuf commented Feb 23, 2022

TODO:需要到刷榜的程度,再分开成两种求导方式,一种是alpha为正的时候,使用输出来求。一种是alpha为负的时候,使用输入来求

@olojuwin 朱望可以注意一下这个。leaky_relu可以通过特判alpha节省显存扩大分类数,后面如果有需要可以做一下这个优化。

@github-actions
Copy link
Contributor

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

@BBuf BBuf requested review from oneflow-ci-bot and removed request for oneflow-ci-bot February 23, 2022 03:02
@github-actions
Copy link
Contributor

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot February 24, 2022 10:43
@oneflow-ci-bot oneflow-ci-bot self-requested a review February 24, 2022 14:13
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot February 24, 2022 15:34
@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 128.8ms (= 12877.0ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 139.1ms (= 13905.9ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.08 (= 139.1ms / 128.8ms)

✔️ OneFlow resnet50 time: 75.9ms (= 7593.8ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 83.6ms (= 8356.6ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.10 (= 83.6ms / 75.9ms)

OneFlow resnet50 time: 52.4ms (= 10475.4ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 55.4ms (= 11081.1ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.06 (= 55.4ms / 52.4ms)

OneFlow resnet50 time: 40.9ms (= 8177.7ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 48.6ms (= 9725.4ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.19 (= 48.6ms / 40.9ms)

OneFlow resnet50 time: 39.7ms (= 7936.2ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 39.4ms (= 7886.5ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 0.99 (= 39.4ms / 39.7ms)

✔️ OneFlow resnet50 time: 141.7ms (= 14171.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 163.8ms (= 16382.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.16 (= 163.8ms / 141.7ms)

OneFlow resnet50 time: 90.2ms (= 9016.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.5ms (= 10253.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.14 (= 102.5ms / 90.2ms)

OneFlow resnet50 time: 60.9ms (= 12179.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 74.8ms (= 14968.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.23 (= 74.8ms / 60.9ms)

OneFlow resnet50 time: 51.8ms (= 10353.7ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 62.7ms (= 12545.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.21 (= 62.7ms / 51.8ms)

OneFlow resnet50 time: 49.6ms (= 9911.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.1ms (= 13623.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.37 (= 68.1ms / 49.6ms)

@github-actions
Copy link
Contributor

CI failed when running job: cpu-module. PR label automerge has been removed

@oneflow-ci-bot oneflow-ci-bot removed their request for review February 24, 2022 18:36
@github-actions
Copy link
Contributor

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

@BBuf BBuf requested review from oneflow-ci-bot and removed request for oneflow-ci-bot February 26, 2022 02:39
@oneflow-ci-bot oneflow-ci-bot removed their request for review February 26, 2022 04:35
@BBuf BBuf requested a review from oneflow-ci-bot February 26, 2022 04:39
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot February 26, 2022 05:14
@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 128.7ms (= 12865.2ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 141.9ms (= 14188.2ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.10 (= 141.9ms / 128.7ms)

✔️ OneFlow resnet50 time: 78.4ms (= 7842.3ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 85.3ms (= 8531.2ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.09 (= 85.3ms / 78.4ms)

OneFlow resnet50 time: 53.1ms (= 10618.0ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 60.7ms (= 12146.0ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.14 (= 60.7ms / 53.1ms)

OneFlow resnet50 time: 43.5ms (= 8699.8ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 47.4ms (= 9475.4ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.09 (= 47.4ms / 43.5ms)

OneFlow resnet50 time: 39.0ms (= 7801.5ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 41.5ms (= 8298.8ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.06 (= 41.5ms / 39.0ms)

✔️ OneFlow resnet50 time: 142.8ms (= 14277.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 161.5ms (= 16151.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.13 (= 161.5ms / 142.8ms)

OneFlow resnet50 time: 88.3ms (= 8834.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 104.2ms (= 10423.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 104.2ms / 88.3ms)

OneFlow resnet50 time: 61.1ms (= 12225.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 75.7ms (= 15131.3ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.24 (= 75.7ms / 61.1ms)

OneFlow resnet50 time: 51.8ms (= 10363.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 64.8ms (= 12965.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.25 (= 64.8ms / 51.8ms)

OneFlow resnet50 time: 47.6ms (= 9529.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 60.5ms (= 12093.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.27 (= 60.5ms / 47.6ms)

@oneflow-ci-bot oneflow-ci-bot merged commit ec7cbd3 into master Feb 26, 2022
@oneflow-ci-bot oneflow-ci-bot deleted the add_half_leaky_relu branch February 26, 2022 08:42
OF_DEVICE_FUNC explicit LeakyReluGradFunctor(float alpha) : alpha(alpha) {}
OF_DEVICE_FUNC T operator()(T x, T dy) const {
if (alpha > 0) {
return dy > 0 ? dy : dy * alpha;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是要写 y > 0 ? dy : dy * alpha 么?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的,神奇,我为什么能通过单测。。。我修一下

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

应该是x>0 ? dy : dy * alpha

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

y和x大于0一样的,这里特判了一下alpha>0,这个特判没有必要我去掉一下。

marigoold pushed a commit that referenced this pull request Mar 15, 2022
* support leaky_relu half kernel

* add half leaky relu

* fix comment

* add half leaky relu impl

* fix comment

* fix comment

* revert

* auto format by CI

* fix error

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants