Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

relu forward and backward with vectortype #31869

Merged
merged 4 commits into from
Mar 29, 2021

Conversation

AnnaTrainingG
Copy link
Contributor

@AnnaTrainingG AnnaTrainingG commented Mar 25, 2021

PR types

Performance optimization

PR changes

OPs

Describe

add relu forward kernel and backward kernel
:
forward:

dtype test case paddle优化前 pytorch paddle优化后 diff 性能提升 (pytorch - paddle)/paddle
float32 [16,128,257,257] 1.4057ms 1.3111ms 1.3116ms diff:0.0 6.69% -0.04%
float32 [10,128,257,257] 870.12us 820.15us 820.39us diff:0.0 5.72% -0.03%
float32 [16,2,257,257] 22.287us 22.150us 21.981us diff:0.0 1.37% 0.77%
float32 [128,4,257,257] 338.39us 329.02us 329.04us diff:0.0 2.76% -0.01%
float32 [169,333,1,1] 1.6760us 1.6250us 1.5720us diff:0.0 6.21% 3.35%
float32 [1,160800,1,10] 17.249us 17.445us 17.169us diff:0.0 0.46% 1.61%
float16 [16,128,257,257] 886.95us 748.33us 685.61us diff:0.0 22.70% 9.15%
float16 [10,128,257,257] 528.93us 468.54us 428.94us diff:0.0 18.90% 9.23%
float16 [16,2,257,257] 13.892us 13.548us 12.191us diff:0.0 12.24% 11.13%
float16 [128,4,257,257] 202.77us 188.55us 172.61us diff:0.0 14.87% 9.23%
float16 [169,333,1,1] 1.5700us 1.6420us 1.2870us diff:0.0 18.03% 27.58%
float16 [1,160800,1,10] 9.6960us 10.581us 9.2800us diff:0.0 4.29% 14.02%

backward:

case paddle float32 fp32 old fp32 new (old-new)/new (pytorch - paddle_new)/paddle_new pytorch float16 fp16 old paddle float16 (old-new)/new (pytorch - paddle_new)/paddle_new
[333,127,157,1] 96.034us 98.676us 95.654us 3.15% -0.40% 47.038us 56.100us 50.317us 11.49% -6.52%
[191,2048,127,1] 701.80us 738.34us 701.9us 5.19% 0.01% 315.62us 410.48us 361.79us 13.46% -12.76%
[16,128,157,157] 713.08us 751.29us 712.96us 5.38% -0.02% 320.31us 418.49us 368.1us 13.69% -12.98%
[13, 101, 15, 179] 51.987us 53.779us 51.834us 3.75% -0.29% 26.394us 31.368us 27.699us 13.25% -4.71%

在之前的relu代码中存在bug导致cyclegan模型的性能下降,因此补充cyclegan的模型测试,
cyclegan 代码提交前后性能对比:

activationVec 本次提交:
epoch0: batch410:
                         d_A_loss: 0.11804; g_A_loss: 0.66524; g_A_cyc_loss: 1.37414; g_A_idt_loss: 0.47866;
                         d_B_loss: 0.20311; g_B_loss: 0.61002; g_B_cyc_loss: 1.05193; g_B_idt_loss: 0.53779;
                         batch_cost: 0.10165 sec, reader_cost: 0.00006 sec, ips: 9.83790 images/sec
epoch0: batch420:
                         d_A_loss: 0.38354; g_A_loss: 0.35664; g_A_cyc_loss: 2.50070; g_A_idt_loss: 0.57669;
                         d_B_loss: 0.31999; g_B_loss: 0.48912; g_B_cyc_loss: 1.44046; g_B_idt_loss: 1.08799;
                         batch_cost: 0.10256 sec, reader_cost: 0.00005 sec, ips: 9.75005 images/sec
epoch0: batch430:
                         d_A_loss: 0.12727; g_A_loss: 0.49812; g_A_cyc_loss: 1.14830; g_A_idt_loss: 0.36196;
                         d_B_loss: 0.44708; g_B_loss: 0.44432; g_B_cyc_loss: 0.84907; g_B_idt_loss: 0.45836;
                         batch_cost: 0.10199 sec, reader_cost: 0.00004 sec, ips: 9.80521 images/sec
epoch0: batch440:
                         d_A_loss: 0.13990; g_A_loss: 0.73569; g_A_cyc_loss: 1.45807; g_A_idt_loss: 0.49949;
                         d_B_loss: 0.17014; g_B_loss: 0.43425; g_B_cyc_loss: 1.15211; g_B_idt_loss: 0.62724;
                         batch_cost: 0.10266 sec, reader_cost: 0.00005 sec, ips: 9.74083 images/sec
base:
epoch0: batch410:
                         d_A_loss: 0.11186; g_A_loss: 0.54835; g_A_cyc_loss: 1.53336; g_A_idt_loss: 0.31897;
                         d_B_loss: 0.14459; g_B_loss: 0.41567; g_B_cyc_loss: 0.74853; g_B_idt_loss: 0.71545;
                         batch_cost: 0.10242 sec, reader_cost: 0.00005 sec, ips: 9.76373 images/sec
epoch0: batch420:
                         d_A_loss: 0.22219; g_A_loss: 0.35373; g_A_cyc_loss: 1.89355; g_A_idt_loss: 0.35711;
                         d_B_loss: 0.20821; g_B_loss: 0.45151; g_B_cyc_loss: 1.07841; g_B_idt_loss: 0.83607;
                         batch_cost: 0.10169 sec, reader_cost: 0.00006 sec, ips: 9.83416 images/sec
epoch0: batch430:
                         d_A_loss: 0.33835; g_A_loss: 0.44051; g_A_cyc_loss: 1.86066; g_A_idt_loss: 0.50190;
                         d_B_loss: 0.30145; g_B_loss: 0.28956; g_B_cyc_loss: 1.22524; g_B_idt_loss: 0.77076;
                         batch_cost: 0.10191 sec, reader_cost: 0.00006 sec, ips: 9.81269 images/sec
epoch0: batch440:
                         d_A_loss: 0.20997; g_A_loss: 0.89039; g_A_cyc_loss: 1.08945; g_A_idt_loss: 0.48135;
                         d_B_loss: 0.18823; g_B_loss: 0.37901; g_B_cyc_loss: 1.13232; g_B_idt_loss: 0.46191;
                         batch_cost: 0.10303 sec, reader_cost: 0.00005 sec, ips: 9.70601 images/sec

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@CLAassistant
Copy link

CLAassistant commented Mar 25, 2021

CLA assistant check
All committers have signed the CLA.

@zhangting2020
Copy link
Contributor

贴一下你测试机器上,bug前的log,以及现在的log。

Copy link
Contributor

@qili93 qili93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

paddle/fluid/operators/activation_op.cu Show resolved Hide resolved
@Xreki Xreki merged commit a71d72d into PaddlePaddle:develop Mar 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants