-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[1.x] Backport of LSTM and GRU fix (#17898) and RNN op (#17632) #18317
Conversation
Hey @bgawrych , Thanks for submitting the PR
CI supported jobs: [windows-cpu, miscellaneous, centos-gpu, unix-gpu, website, sanity, unix-cpu, centos-cpu, clang, edge, windows-gpu] Note: |
@mxnet-bot run ci [edge, unix-gpu] |
Jenkins CI successfully triggered : [unix-gpu, edge] |
@mxnet-bot run ci [edge] |
Jenkins CI successfully triggered : [edge] |
@mxnet-bot run ci [edge] |
Jenkins CI successfully triggered : [edge] |
@mxnet-bot run ci [edge] |
Jenkins CI successfully triggered : [edge] |
@mxnet-bot run ci [centos-cpu, centos-gpu, unix-cpu, unix-gpu] |
Jenkins CI successfully triggered : [centos-cpu, centos-gpu, unix-gpu, unix-cpu] |
82b9578
to
4ad92b1
Compare
@mxnet-bot run ci [centos-cpu, centos-gpu] |
Jenkins CI successfully triggered : [centos-cpu, centos-gpu] |
@mxnet-bot run ci [centos-cpu, centos-gpu] |
Jenkins CI successfully triggered : [centos-cpu, centos-gpu] |
* Changed relevant function args to index_t * Added nightly test for RNN * Added fix for LSTM, GRU, RNN-ReLU, RNN-tanh * Using const instead of literals * Added nightly test for RNN ReLU & tanh, LSTM, GRU * Type assertion to force evaluation of output NDArray * Incorporated latest round of comments
…che#18203) * Fix input gradient calculation for bidirectional LSTM For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect. Reason of wrong calculations was overwriting y derivative (dy) tensor by calculated x derivative (dx) tensor before right2left layer could use dy for own gradient calculations. Propsed fix uses additional space to avoid overwriting. * Fix gradient calculation for GRU For GRU with number of layers > 2 i2h_weight gradient for layers in the middle (all except last and first) was incorrect. Wrong caluculations were caused by assigning output pointer to input instead of calculating new input pointer. * Enable tests for GRU and LSTM gradients * Fix comments * Change loop iteration deduction * Add more test cases for fused rnn layers
@mxnet-bot run ci [all] |
Jenkins CI successfully triggered : [centos-gpu, clang, miscellaneous, sanity, unix-cpu, unix-gpu, windows-cpu, centos-cpu, edge, website, windows-gpu] |
@ciyongch Done, but there is only 1 check now. Tried to retrigger all jobs, but no effect |
@bgawrych , it's good, all the tests got passed now :) |
@mxnet-bot run ci [unix-gpu] |
Jenkins CI successfully triggered : [unix-gpu] |
I thinnk it's ready too |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…17632) (apache#18317) * [v1.x] [Large Tensor] Backport of Fixed RNN op (apache#17632) * Changed relevant function args to index_t * Added nightly test for RNN * Added fix for LSTM, GRU, RNN-ReLU, RNN-tanh * Using const instead of literals * Added nightly test for RNN ReLU & tanh, LSTM, GRU * Type assertion to force evaluation of output NDArray * Incorporated latest round of comments * [v1.x] Backport of Fix LSTM and GRU layers gradient calculations (apache#18203) * Fix input gradient calculation for bidirectional LSTM For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect. Reason of wrong calculations was overwriting y derivative (dy) tensor by calculated x derivative (dx) tensor before right2left layer could use dy for own gradient calculations. Propsed fix uses additional space to avoid overwriting. * Fix gradient calculation for GRU For GRU with number of layers > 2 i2h_weight gradient for layers in the middle (all except last and first) was incorrect. Wrong caluculations were caused by assigning output pointer to input instead of calculating new input pointer. * Enable tests for GRU and LSTM gradients * Fix comments * Change loop iteration deduction * Add more test cases for fused rnn layers Co-authored-by: Connor Goggins <cgoggins0@gmail.com>
Description
Fix for LSTM and GRU layers without DNNL enabled give wrong gradients #17898
[Large Tensor] Fixed RNN op #17632
Checklist
Essentials
Comments