[1.x] Backport of LSTM and GRU fix (#17898) and RNN op (#17632) #18317

bgawrych · 2020-05-14T07:39:47Z

Description

Fix for LSTM and GRU layers without DNNL enabled give wrong gradients #17898
[Large Tensor] Fixed RNN op #17632

Checklist

Essentials

Changes are complete (i.e. I finished coding on this PR)
To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Comments

This PR is backport of Fix LSTM and GRU layers gradient calculations #18203 and [Large Tensor] Fixed RNN op #17632

mxnet-bot · 2020-05-14T07:39:51Z

Hey @bgawrych , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

To trigger all jobs: @mxnet-bot run ci [all]
To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [windows-cpu, miscellaneous, centos-gpu, unix-gpu, website, sanity, unix-cpu, centos-cpu, clang, edge, windows-gpu]

Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

bgawrych · 2020-05-20T08:40:34Z

@mxnet-bot run ci [edge, unix-gpu]

mxnet-bot · 2020-05-20T08:40:39Z

Jenkins CI successfully triggered : [unix-gpu, edge]

bgawrych · 2020-05-25T07:34:35Z

@mxnet-bot run ci [edge]

mxnet-bot · 2020-05-25T07:34:42Z

Jenkins CI successfully triggered : [edge]

bgawrych · 2020-05-26T08:28:38Z

@mxnet-bot run ci [edge]

mxnet-bot · 2020-05-26T08:28:46Z

Jenkins CI successfully triggered : [edge]

bgawrych · 2020-05-27T09:47:40Z

@mxnet-bot run ci [edge]

mxnet-bot · 2020-05-27T09:47:50Z

Jenkins CI successfully triggered : [edge]

bgawrych · 2020-05-28T06:35:18Z

@mxnet-bot run ci [centos-cpu, centos-gpu, unix-cpu, unix-gpu]

mxnet-bot · 2020-05-28T06:35:27Z

Jenkins CI successfully triggered : [centos-cpu, centos-gpu, unix-gpu, unix-cpu]

bgawrych · 2020-05-28T17:39:03Z

@mxnet-bot run ci [centos-cpu, centos-gpu]

mxnet-bot · 2020-05-28T17:39:12Z

Jenkins CI successfully triggered : [centos-cpu, centos-gpu]

bgawrych · 2020-06-01T06:09:54Z

@mxnet-bot run ci [centos-cpu, centos-gpu]

mxnet-bot · 2020-06-01T06:09:59Z

Jenkins CI successfully triggered : [centos-cpu, centos-gpu]

ciyongch · 2020-06-01T07:10:19Z

Hi @bgawrych , I found recently there's a PR #18437 which delete the ln -s /usr/bin/ninja-build /usr/bin/ninja from the ci script file.
So the new code base will not run into such failure, please try rebase your code. Thanks.
I've create another PR for fixing the issue for v1.7.x branch.

* Changed relevant function args to index_t * Added nightly test for RNN * Added fix for LSTM, GRU, RNN-ReLU, RNN-tanh * Using const instead of literals * Added nightly test for RNN ReLU & tanh, LSTM, GRU * Type assertion to force evaluation of output NDArray * Incorporated latest round of comments

…che#18203) * Fix input gradient calculation for bidirectional LSTM For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect. Reason of wrong calculations was overwriting y derivative (dy) tensor by calculated x derivative (dx) tensor before right2left layer could use dy for own gradient calculations. Propsed fix uses additional space to avoid overwriting. * Fix gradient calculation for GRU For GRU with number of layers > 2 i2h_weight gradient for layers in the middle (all except last and first) was incorrect. Wrong caluculations were caused by assigning output pointer to input instead of calculating new input pointer. * Enable tests for GRU and LSTM gradients * Fix comments * Change loop iteration deduction * Add more test cases for fused rnn layers

bgawrych · 2020-06-01T09:13:52Z

@mxnet-bot run ci [all]

mxnet-bot · 2020-06-01T09:14:04Z

Jenkins CI successfully triggered : [centos-gpu, clang, miscellaneous, sanity, unix-cpu, unix-gpu, windows-cpu, centos-cpu, edge, website, windows-gpu]

bgawrych · 2020-06-01T10:07:26Z

Hi @bgawrych , I found recently there's a PR #18437 which delete the ln -s /usr/bin/ninja-build /usr/bin/ninja from the ci script file.
So the new code base will not run into such failure, please try rebase your code. Thanks.
I've create another PR for fixing the issue for v1.7.x branch.

@ciyongch Done, but there is only 1 check now. Tried to retrigger all jobs, but no effect

ciyongch · 2020-06-01T10:10:06Z

@bgawrych , it's good, all the tests got passed now :)

bgawrych · 2020-06-02T06:27:42Z

@mxnet-bot run ci [unix-gpu]

mxnet-bot · 2020-06-02T06:27:51Z

Jenkins CI successfully triggered : [unix-gpu]

bgawrych · 2020-06-03T06:29:03Z

I thinnk it's ready too
cc @ciyongch @pengzhao-intel @TaoLv

pengzhao-intel

LGTM

…17632) (apache#18317) * [v1.x] [Large Tensor] Backport of Fixed RNN op (apache#17632) * Changed relevant function args to index_t * Added nightly test for RNN * Added fix for LSTM, GRU, RNN-ReLU, RNN-tanh * Using const instead of literals * Added nightly test for RNN ReLU & tanh, LSTM, GRU * Type assertion to force evaluation of output NDArray * Incorporated latest round of comments * [v1.x] Backport of Fix LSTM and GRU layers gradient calculations (apache#18203) * Fix input gradient calculation for bidirectional LSTM For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect. Reason of wrong calculations was overwriting y derivative (dy) tensor by calculated x derivative (dx) tensor before right2left layer could use dy for own gradient calculations. Propsed fix uses additional space to avoid overwriting. * Fix gradient calculation for GRU For GRU with number of layers > 2 i2h_weight gradient for layers in the middle (all except last and first) was incorrect. Wrong caluculations were caused by assigning output pointer to input instead of calculating new input pointer. * Enable tests for GRU and LSTM gradients * Fix comments * Change loop iteration deduction * Add more test cases for fused rnn layers Co-authored-by: Connor Goggins <cgoggins0@gmail.com>

bgawrych force-pushed the 1.x_rnn branch from a89395d to 2bbcf75 Compare May 18, 2020 07:51

bgawrych changed the title ~~[1.x] Backport of fix LSTM and GRU layers gradient calculations~~ [1.x] Backport of LSTM and GRU fix (#17898) and RNN op (#17632) May 18, 2020

bgawrych force-pushed the 1.x_rnn branch from 2bbcf75 to 72f6fdd Compare May 27, 2020 10:15

bgawrych force-pushed the 1.x_rnn branch 2 times, most recently from 82b9578 to 4ad92b1 Compare May 28, 2020 07:50

bgawrych force-pushed the 1.x_rnn branch from 4ad92b1 to ec4d65f Compare June 1, 2020 08:14

connorgoggins and others added 2 commits June 1, 2020 10:25

pengzhao-intel approved these changes Jun 3, 2020

View reviewed changes

pengzhao-intel merged commit 8986e3f into apache:v1.x Jun 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1.x] Backport of LSTM and GRU fix (#17898) and RNN op (#17632) #18317

[1.x] Backport of LSTM and GRU fix (#17898) and RNN op (#17632) #18317

bgawrych commented May 14, 2020 •

edited

Loading

mxnet-bot commented May 14, 2020

bgawrych commented May 20, 2020

mxnet-bot commented May 20, 2020

bgawrych commented May 25, 2020

mxnet-bot commented May 25, 2020

bgawrych commented May 26, 2020

mxnet-bot commented May 26, 2020

bgawrych commented May 27, 2020

mxnet-bot commented May 27, 2020

bgawrych commented May 28, 2020

mxnet-bot commented May 28, 2020

bgawrych commented May 28, 2020

mxnet-bot commented May 28, 2020

bgawrych commented Jun 1, 2020

mxnet-bot commented Jun 1, 2020

ciyongch commented Jun 1, 2020

bgawrych commented Jun 1, 2020

mxnet-bot commented Jun 1, 2020

bgawrych commented Jun 1, 2020

ciyongch commented Jun 1, 2020

bgawrych commented Jun 2, 2020

mxnet-bot commented Jun 2, 2020

bgawrych commented Jun 3, 2020

pengzhao-intel left a comment

[1.x] Backport of LSTM and GRU fix (#17898) and RNN op (#17632) #18317

[1.x] Backport of LSTM and GRU fix (#17898) and RNN op (#17632) #18317

Conversation

bgawrych commented May 14, 2020 • edited Loading

Description

Checklist

Essentials

Comments

mxnet-bot commented May 14, 2020

bgawrych commented May 20, 2020

mxnet-bot commented May 20, 2020

bgawrych commented May 25, 2020

mxnet-bot commented May 25, 2020

bgawrych commented May 26, 2020

mxnet-bot commented May 26, 2020

bgawrych commented May 27, 2020

mxnet-bot commented May 27, 2020

bgawrych commented May 28, 2020

mxnet-bot commented May 28, 2020

bgawrych commented May 28, 2020

mxnet-bot commented May 28, 2020

bgawrych commented Jun 1, 2020

mxnet-bot commented Jun 1, 2020

ciyongch commented Jun 1, 2020

bgawrych commented Jun 1, 2020

mxnet-bot commented Jun 1, 2020

bgawrych commented Jun 1, 2020

ciyongch commented Jun 1, 2020

bgawrych commented Jun 2, 2020

mxnet-bot commented Jun 2, 2020

bgawrych commented Jun 3, 2020

pengzhao-intel left a comment

Choose a reason for hiding this comment

bgawrych commented May 14, 2020 •

edited

Loading