Fix LSTM and GRU layers gradient calculations #18203

bgawrych · 2020-04-30T06:59:40Z

Description

Fix for LSTM and GRU layers without DNNL enabled give wrong gradients #17898

For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect.
Reason of wrong calculations was overwriting y derivative (dy) tensor by
calculated x derivative (dx) tensor before right2left layer could use dy for own
gradient calculations.

For GRU with number of layers > 2 i2h_weight gradient for
layers in the middle (all except last and first) was incorrect.
Wrong caluculations were caused by assigning output pointer to
input instead of calculating new input pointer.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

Changes are complete (i.e. I finished coding on this PR)
To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

mxnet-bot · 2020-04-30T06:59:44Z

Hey @bgawrych , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

To trigger all jobs: @mxnet-bot run ci [all]
To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [miscellaneous, sanity, clang, website, windows-cpu, centos-gpu, centos-cpu, unix-cpu, unix-gpu, edge, windows-gpu]

Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

zixuanweeei · 2020-04-30T07:26:47Z

Thanks for your contribution. CC @pengzhao-intel @TaoLv @ciyongch

@bgawrych Could you please rename the title to a more concise and specific one? When the PR merged, the title will become the commit message. Thanks.

bgawrych · 2020-04-30T07:29:46Z

Thanks for your contribution. CC @pengzhao-intel @TaoLv @ciyongch

@bgawrych Could you please rename the title to a more concise and specific one? When the PR merged, the title will become the commit message. Thanks.

Of course, I overlooked this. Sorry :)

pengzhao-intel · 2020-04-30T10:15:36Z

Thanks @bgawrych yes, we need to cherry pick this to 1.x.

src/operator/rnn-inl.h

pengzhao-intel · 2020-05-06T02:57:58Z

@zixuanweeei @ciyongch please help take a review :)

zixuanweeei

LGTM

pengzhao-intel · 2020-05-06T03:21:36Z

@bgawrych let me know when all internal cases and performance test passed and then I will merge the PR

ciyongch · 2020-05-06T03:41:43Z

src/operator/rnn_impl.h

@@ -594,6 +595,10 @@ void LstmBackward(DType* ws,
                                     x, hx[idx], cx[idx], y, dy, dx, dhx[idx], dcx[idx],
                                     dhy_cur_ptr, dcy_cur_ptr, w_cur_ptr, dw_cur_ptr, db_cur_ptr,
                                     req_data, req_params, req_state, req_statecell);
+
+      // Prevent overwritting dy while calculating dx in left2right layer
+      const int loop_iteration = (L - 1) - i;


Can you add an additional case which contains even number of layers in UT (the current UT is only covering 1 and 3 layer(s))?

For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect. Reason of wrong calculations was overwriting y derivative (dy) tensor by calculated x derivative (dx) tensor before right2left layer could use dy for own gradient calculations. Propsed fix uses additional space to avoid overwriting.

For GRU with number of layers > 2 i2h_weight gradient for layers in the middle (all except last and first) was incorrect. Wrong caluculations were caused by assigning output pointer to input instead of calculating new input pointer.

bgawrych · 2020-05-13T07:45:21Z

@mxnet-bot run ci [windows-gpu]

mxnet-bot · 2020-05-13T07:45:25Z

Jenkins CI successfully triggered : [windows-gpu]

bgawrych · 2020-05-13T10:39:47Z

@pengzhao-intel
I think it's ready. I don't see any performance drops in internal CI and all test passed.

TaoLv · 2020-05-13T14:00:33Z

@zixuanweeei @ciyongch @pengzhao-intel Please take another look at the new commits and also make sure your comments are addressed. Thanks!

ciyongch

Thanks @bgawrych for the contribution :)
LGTM

pengzhao-intel

Good job, Bart, merging now.

pengzhao-intel · 2020-05-14T02:14:25Z

Please also cherry-pick to 1.x branch.

ciyongch · 2020-05-14T02:21:51Z

@bgawrych , please cherry-pick to both v1.7.x and v1.x branch (as these two branches are diverged now), so that this patch will be included in the upcoming 1.7.0 release.

…pache#18203) * Fix input gradient calculation for bidirectional LSTM For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect. Reason of wrong calculations was overwriting y derivative (dy) tensor by calculated x derivative (dx) tensor before right2left layer could use dy for own gradient calculations. Propsed fix uses additional space to avoid overwriting. * Fix gradient calculation for GRU For GRU with number of layers > 2 i2h_weight gradient for layers in the middle (all except last and first) was incorrect. Wrong caluculations were caused by assigning output pointer to input instead of calculating new input pointer. * Enable tests for GRU and LSTM gradients * Fix comments * Change loop iteration deduction * Add more test cases for fused rnn layers

…che#18203) * Fix input gradient calculation for bidirectional LSTM For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect. Reason of wrong calculations was overwriting y derivative (dy) tensor by calculated x derivative (dx) tensor before right2left layer could use dy for own gradient calculations. Propsed fix uses additional space to avoid overwriting. * Fix gradient calculation for GRU For GRU with number of layers > 2 i2h_weight gradient for layers in the middle (all except last and first) was incorrect. Wrong caluculations were caused by assigning output pointer to input instead of calculating new input pointer. * Enable tests for GRU and LSTM gradients * Fix comments * Change loop iteration deduction * Add more test cases for fused rnn layers

…pache#18203) * Fix input gradient calculation for bidirectional LSTM For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect. Reason of wrong calculations was overwriting y derivative (dy) tensor by calculated x derivative (dx) tensor before right2left layer could use dy for own gradient calculations. Propsed fix uses additional space to avoid overwriting. * Fix gradient calculation for GRU For GRU with number of layers > 2 i2h_weight gradient for layers in the middle (all except last and first) was incorrect. Wrong caluculations were caused by assigning output pointer to input instead of calculating new input pointer. * Enable tests for GRU and LSTM gradients * Fix comments * Change loop iteration deduction * Add more test cases for fused rnn layers

…che#18203) * Fix input gradient calculation for bidirectional LSTM For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect. Reason of wrong calculations was overwriting y derivative (dy) tensor by calculated x derivative (dx) tensor before right2left layer could use dy for own gradient calculations. Propsed fix uses additional space to avoid overwriting. * Fix gradient calculation for GRU For GRU with number of layers > 2 i2h_weight gradient for layers in the middle (all except last and first) was incorrect. Wrong caluculations were caused by assigning output pointer to input instead of calculating new input pointer. * Enable tests for GRU and LSTM gradients * Fix comments * Change loop iteration deduction * Add more test cases for fused rnn layers

…pache#18203) * Fix input gradient calculation for bidirectional LSTM For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect. Reason of wrong calculations was overwriting y derivative (dy) tensor by calculated x derivative (dx) tensor before right2left layer could use dy for own gradient calculations. Propsed fix uses additional space to avoid overwriting. * Fix gradient calculation for GRU For GRU with number of layers > 2 i2h_weight gradient for layers in the middle (all except last and first) was incorrect. Wrong caluculations were caused by assigning output pointer to input instead of calculating new input pointer. * Enable tests for GRU and LSTM gradients * Fix comments * Change loop iteration deduction * Add more test cases for fused rnn layers

…8316) * [Large Tensor] Backport of Fixed RNN op (#17632) * Changed relevant function args to index_t * Added nightly test for RNN * Added fix for LSTM, GRU, RNN-ReLU, RNN-tanh * Using const instead of literals * Added nightly test for RNN ReLU & tanh, LSTM, GRU * Type assertion to force evaluation of output NDArray * Incorporated latest round of comments * [v1.7.x] Backport of Fix LSTM and GRU layers gradient calculations (#18203) * Fix input gradient calculation for bidirectional LSTM For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect. Reason of wrong calculations was overwriting y derivative (dy) tensor by calculated x derivative (dx) tensor before right2left layer could use dy for own gradient calculations. Propsed fix uses additional space to avoid overwriting. * Fix gradient calculation for GRU For GRU with number of layers > 2 i2h_weight gradient for layers in the middle (all except last and first) was incorrect. Wrong caluculations were caused by assigning output pointer to input instead of calculating new input pointer. * Enable tests for GRU and LSTM gradients * Fix comments * Change loop iteration deduction * Add more test cases for fused rnn layers Co-authored-by: Connor Goggins <cgoggins0@gmail.com>

* [v1.x] [Large Tensor] Backport of Fixed RNN op (#17632) * Changed relevant function args to index_t * Added nightly test for RNN * Added fix for LSTM, GRU, RNN-ReLU, RNN-tanh * Using const instead of literals * Added nightly test for RNN ReLU & tanh, LSTM, GRU * Type assertion to force evaluation of output NDArray * Incorporated latest round of comments * [v1.x] Backport of Fix LSTM and GRU layers gradient calculations (#18203) * Fix input gradient calculation for bidirectional LSTM For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect. Reason of wrong calculations was overwriting y derivative (dy) tensor by calculated x derivative (dx) tensor before right2left layer could use dy for own gradient calculations. Propsed fix uses additional space to avoid overwriting. * Fix gradient calculation for GRU For GRU with number of layers > 2 i2h_weight gradient for layers in the middle (all except last and first) was incorrect. Wrong caluculations were caused by assigning output pointer to input instead of calculating new input pointer. * Enable tests for GRU and LSTM gradients * Fix comments * Change loop iteration deduction * Add more test cases for fused rnn layers Co-authored-by: Connor Goggins <cgoggins0@gmail.com>

* Fix input gradient calculation for bidirectional LSTM For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect. Reason of wrong calculations was overwriting y derivative (dy) tensor by calculated x derivative (dx) tensor before right2left layer could use dy for own gradient calculations. Propsed fix uses additional space to avoid overwriting. * Fix gradient calculation for GRU For GRU with number of layers > 2 i2h_weight gradient for layers in the middle (all except last and first) was incorrect. Wrong caluculations were caused by assigning output pointer to input instead of calculating new input pointer. * Enable tests for GRU and LSTM gradients * Fix comments * Change loop iteration deduction * Add more test cases for fused rnn layers

…17632) (apache#18317) * [v1.x] [Large Tensor] Backport of Fixed RNN op (apache#17632) * Changed relevant function args to index_t * Added nightly test for RNN * Added fix for LSTM, GRU, RNN-ReLU, RNN-tanh * Using const instead of literals * Added nightly test for RNN ReLU & tanh, LSTM, GRU * Type assertion to force evaluation of output NDArray * Incorporated latest round of comments * [v1.x] Backport of Fix LSTM and GRU layers gradient calculations (apache#18203) * Fix input gradient calculation for bidirectional LSTM For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect. Reason of wrong calculations was overwriting y derivative (dy) tensor by calculated x derivative (dx) tensor before right2left layer could use dy for own gradient calculations. Propsed fix uses additional space to avoid overwriting. * Fix gradient calculation for GRU For GRU with number of layers > 2 i2h_weight gradient for layers in the middle (all except last and first) was incorrect. Wrong caluculations were caused by assigning output pointer to input instead of calculating new input pointer. * Enable tests for GRU and LSTM gradients * Fix comments * Change loop iteration deduction * Add more test cases for fused rnn layers Co-authored-by: Connor Goggins <cgoggins0@gmail.com>

bgawrych changed the title ~~Rnn fix~~ Fix LSTM and GRU layers gradient calculations Apr 30, 2020

pengzhao-intel added the MKLDNN label Apr 30, 2020

pengzhao-intel reviewed Apr 30, 2020

View reviewed changes

src/operator/rnn-inl.h Show resolved Hide resolved

zixuanweeei approved these changes May 6, 2020

View reviewed changes

ciyongch reviewed May 6, 2020

View reviewed changes

bgawrych force-pushed the rnn_fix branch 3 times, most recently from 4fab561 to 34d8947 Compare May 12, 2020 09:13

bgawrych added 6 commits May 12, 2020 11:17

Fix gradient calculation for GRU

624e530

For GRU with number of layers > 2 i2h_weight gradient for layers in the middle (all except last and first) was incorrect. Wrong caluculations were caused by assigning output pointer to input instead of calculating new input pointer.

Enable tests for GRU and LSTM gradients

7bb38c4

Fix comments

201f671

Change loop iteration deduction

df6e90c

Add more test cases for fused rnn layers

34d8947

ciyongch approved these changes May 13, 2020

View reviewed changes

pengzhao-intel approved these changes May 14, 2020

View reviewed changes

pengzhao-intel merged commit b4c70eb into apache:master May 14, 2020

This was referenced May 14, 2020

[1.7.x] Backport of LSTM and GRU fix (#17898) and RNN op (#17632) #18316

Merged

[1.x] Backport of LSTM and GRU fix (#17898) and RNN op (#17632) #18317

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix LSTM and GRU layers gradient calculations #18203

Fix LSTM and GRU layers gradient calculations #18203

bgawrych commented Apr 30, 2020

mxnet-bot commented Apr 30, 2020

zixuanweeei commented Apr 30, 2020

bgawrych commented Apr 30, 2020

pengzhao-intel commented Apr 30, 2020

pengzhao-intel commented May 6, 2020

zixuanweeei left a comment

pengzhao-intel commented May 6, 2020

ciyongch May 6, 2020

bgawrych commented May 13, 2020

mxnet-bot commented May 13, 2020

bgawrych commented May 13, 2020

TaoLv commented May 13, 2020

ciyongch left a comment

pengzhao-intel left a comment

pengzhao-intel commented May 14, 2020

ciyongch commented May 14, 2020

Fix LSTM and GRU layers gradient calculations #18203

Fix LSTM and GRU layers gradient calculations #18203

Conversation

bgawrych commented Apr 30, 2020

Description

Checklist

Essentials

mxnet-bot commented Apr 30, 2020

zixuanweeei commented Apr 30, 2020

bgawrych commented Apr 30, 2020

pengzhao-intel commented Apr 30, 2020

pengzhao-intel commented May 6, 2020

zixuanweeei left a comment

Choose a reason for hiding this comment

pengzhao-intel commented May 6, 2020

ciyongch May 6, 2020

Choose a reason for hiding this comment

bgawrych commented May 13, 2020

mxnet-bot commented May 13, 2020

bgawrych commented May 13, 2020

TaoLv commented May 13, 2020

ciyongch left a comment

Choose a reason for hiding this comment

pengzhao-intel left a comment

Choose a reason for hiding this comment

pengzhao-intel commented May 14, 2020

ciyongch commented May 14, 2020