Recompute: fix bug with transformer attention mask #34664

JZ-LIANG · 2021-08-06T05:29:48Z

PR types

Bug fixes

PR changes

OPs

Describe

NOTE In Transformer-like network, if user put the attention mask into the recompute segment output,
pylayer will force the stop_gradient of attention mask to be False, which will make the number of
tensor that need grad does not match.
the backward_inputs_with_grad is used to avoid this case.

paddle-bot-old · 2021-08-06T05:29:59Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

ForFishes

LGTM

Recompute: fix bug with transformer attention mask

6dcdbc5

ForFishes approved these changes Aug 9, 2021

View reviewed changes

JZ-LIANG merged commit 0dff82c into PaddlePaddle:develop Aug 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recompute: fix bug with transformer attention mask #34664

Recompute: fix bug with transformer attention mask #34664

JZ-LIANG commented Aug 6, 2021

paddle-bot-old bot commented Aug 6, 2021

ForFishes left a comment

Recompute: fix bug with transformer attention mask #34664

Recompute: fix bug with transformer attention mask #34664

Conversation

JZ-LIANG commented Aug 6, 2021

PR types

PR changes

Describe

paddle-bot-old bot commented Aug 6, 2021

ForFishes left a comment

Choose a reason for hiding this comment