-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
@mxnet-label-bot add [pr-awaiting-review] |
20a9bb8
to
5892246
Compare
@@ -123,7 +123,7 @@ struct RNNParam : public dmlc::Parameter<RNNParam> { | |||
}; | |||
|
|||
inline int GetRnnParamSize(int num_layer, | |||
int input_size, | |||
index_t input_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure you don't have to change size1, size2, proj_size and param_size:
https://github.com/apache/incubator-mxnet/pull/17632/files#diff-6dfdca409e69cc495f286170fe1e553eR143
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, fixing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size_t ? make sure API signature doesn't chage. If thats the case then keep it index_t
inline size_t GetRNNWorkspaceSize(int seq_length, | ||
int batch_size, | ||
inline size_t GetRNNWorkspaceSize(index_t seq_length, | ||
index_t batch_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can batch_size be -ve ? @apeforest what do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const int input_size, | ||
const index_t seq_length, | ||
const index_t batch_size, | ||
const index_t input_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did u check that "seq_length, batch_size, input_size" are index_t in functions LstmForwardTraining, GruForwardTraining, VanillaRNNForwardTraining ? If so can you let me know here, else you may need to update them too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent point, updating now.
const int input_size, | ||
const index_t seq_length, | ||
const index_t batch_size, | ||
const index_t input_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did u check that "seq_length, batch_size, input_size" are index_t in functions LstmBackwardTraining, GruBackwardTraining, VanillaRNNBackwardTraining ? If so can you let me know here, else you may need to update them too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent point, updating now.
tests/nightly/test_large_array.py
Outdated
out = nd.RNN(data=data, parameters=parameters, state=state, mode=mode, | ||
state_size=state_size, num_layers=num_layers) | ||
|
||
assert out.shape[0] == 268435456 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use constants for constant values. Its good practise and you may need to re-use them in future tests too.
b58a51a
to
556dbff
Compare
@@ -140,14 +140,14 @@ inline int GetRnnParamSize(int num_layer, | |||
size *= 3; | |||
break; | |||
} | |||
int size1 = (input_size + state_size + 2) * size; // first layer size | |||
int size2 = (state_size * direction + state_size + 2) * size; // other layers size | |||
index_t size1 = (input_size + state_size + 2) * size; // first layer size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets prefer size_t for sizes. Or do you think these values can be negative too ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed - testing changes now.
int seq_length, | ||
int batch_size, | ||
index_t seq_length, | ||
index_t batch_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size_t? if its not a breaking chnage
const index_t seq_length, | ||
const index_t batch_size, | ||
const index_t input_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size_t? if its not a breaking chnage
const index_t seq_length, | ||
const index_t batch_size, | ||
const index_t input_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size_t? if its not a breaking chnage
const index_t seq_length, | ||
const index_t batch_size, | ||
const index_t input_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size_t? if its not a breaking chnage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can keep the signed index_t since all the functions being called using signed and the omp loop requires a signed index as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with keeping the signed index_t
💯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree @apeforest! I believe the omp loop’s required signed index was the root cause of the segfault when I made the size_t changes.
@@ -127,9 +127,9 @@ void LstmForwardTraining(DType* ws, | |||
bool state_outputs, | |||
const int L, | |||
const int D, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are D and L ? can D*L be >5B ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L is num_layers
, and D is direction
. I believe we agreed that we would not support > 2**32 layers, so L should be fine as an int
. D can have two possible values, 0 or 1, to indicate whether to run the op with bidirectional recurrent layers. Consequently, D*L can never be >5B.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair to keep D and L as int then.
src/operator/rnn_impl.h
Outdated
@@ -146,15 +146,15 @@ void LstmForwardTraining(DType* ws, | |||
Tensor<cpu, 3, DType> hx(hx_ptr, Shape3(total_layers, N, H)); | |||
Tensor<cpu, 3, DType> cx(cx_ptr, Shape3(total_layers, N, H)); | |||
const int b_size = 2 * H * 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size_t ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline: H
represents the LSTM state size, and we are not supporting LSTM states w/dimension >= 2**32, so b_size
should remain an int
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand it correctly, this is also the reason that hidden_size
is remaining to be int
type, right? If so, b_size
here, representing the total size of i2h/h2h bias of four gates, still has some overflow risks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zixuanweeei thanks for your feedback. I’m happy to bump b_size up to index_t here if there are overflow concerns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the changes pointed by rohit, rest LGTM
@zixuanweeei Could you please take a look at the changes? Seems need coordinate with the changes in #17702. |
Let's wait for @connorgoggins updating type to |
@zixuanweeei thanks for your feedback! After testing the With these considerations in mind, we are discussing the best way to move forward. |
src/operator/rnn-inl.h
Outdated
@@ -123,7 +123,7 @@ struct RNNParam : public dmlc::Parameter<RNNParam> { | |||
}; | |||
|
|||
inline int GetRnnParamSize(int num_layer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should return index_t
type? Maybe overflow with a large input size. And the UT only covers cases for large sequence length (first dimension of RNN input data). Would you mind taking some tests for a case with a large input to see whether this function still works?
@connorgoggins Just curious about the reason for the segfault. I don't have much knowledge about that. But I guess it may caused by |
@zixuanweeei you're absolutely right - the segfault was generated on line 2032 of |
Thanks for trying out the |
556dbff
to
999328a
Compare
@apeforest @zixuanweeei I believe my latest commit incorporates all of the changes we have discussed. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your contribution!
* Changed relevant function args to index_t * Added nightly test for RNN * Added fix for LSTM, GRU, RNN-ReLU, RNN-tanh * Using const instead of literals * Added nightly test for RNN ReLU & tanh, LSTM, GRU * Type assertion to force evaluation of output NDArray * Incorporated latest round of comments
* Changed relevant function args to index_t * Added nightly test for RNN * Added fix for LSTM, GRU, RNN-ReLU, RNN-tanh * Using const instead of literals * Added nightly test for RNN ReLU & tanh, LSTM, GRU * Type assertion to force evaluation of output NDArray * Incorporated latest round of comments
* Changed relevant function args to index_t * Added nightly test for RNN * Added fix for LSTM, GRU, RNN-ReLU, RNN-tanh * Using const instead of literals * Added nightly test for RNN ReLU & tanh, LSTM, GRU * Type assertion to force evaluation of output NDArray * Incorporated latest round of comments
* Changed relevant function args to index_t * Added nightly test for RNN * Added fix for LSTM, GRU, RNN-ReLU, RNN-tanh * Using const instead of literals * Added nightly test for RNN ReLU & tanh, LSTM, GRU * Type assertion to force evaluation of output NDArray * Incorporated latest round of comments
* Changed relevant function args to index_t * Added nightly test for RNN * Added fix for LSTM, GRU, RNN-ReLU, RNN-tanh * Using const instead of literals * Added nightly test for RNN ReLU & tanh, LSTM, GRU * Type assertion to force evaluation of output NDArray * Incorporated latest round of comments
* Changed relevant function args to index_t * Added nightly test for RNN * Added fix for LSTM, GRU, RNN-ReLU, RNN-tanh * Using const instead of literals * Added nightly test for RNN ReLU & tanh, LSTM, GRU * Type assertion to force evaluation of output NDArray * Incorporated latest round of comments
* Changed relevant function args to index_t * Added nightly test for RNN * Added fix for LSTM, GRU, RNN-ReLU, RNN-tanh * Using const instead of literals * Added nightly test for RNN ReLU & tanh, LSTM, GRU * Type assertion to force evaluation of output NDArray * Incorporated latest round of comments
* Changed relevant function args to index_t * Added nightly test for RNN * Added fix for LSTM, GRU, RNN-ReLU, RNN-tanh * Using const instead of literals * Added nightly test for RNN ReLU & tanh, LSTM, GRU * Type assertion to force evaluation of output NDArray * Incorporated latest round of comments
* Changed relevant function args to index_t * Added nightly test for RNN * Added fix for LSTM, GRU, RNN-ReLU, RNN-tanh * Using const instead of literals * Added nightly test for RNN ReLU & tanh, LSTM, GRU * Type assertion to force evaluation of output NDArray * Incorporated latest round of comments
…8316) * [Large Tensor] Backport of Fixed RNN op (#17632) * Changed relevant function args to index_t * Added nightly test for RNN * Added fix for LSTM, GRU, RNN-ReLU, RNN-tanh * Using const instead of literals * Added nightly test for RNN ReLU & tanh, LSTM, GRU * Type assertion to force evaluation of output NDArray * Incorporated latest round of comments * [v1.7.x] Backport of Fix LSTM and GRU layers gradient calculations (#18203) * Fix input gradient calculation for bidirectional LSTM For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect. Reason of wrong calculations was overwriting y derivative (dy) tensor by calculated x derivative (dx) tensor before right2left layer could use dy for own gradient calculations. Propsed fix uses additional space to avoid overwriting. * Fix gradient calculation for GRU For GRU with number of layers > 2 i2h_weight gradient for layers in the middle (all except last and first) was incorrect. Wrong caluculations were caused by assigning output pointer to input instead of calculating new input pointer. * Enable tests for GRU and LSTM gradients * Fix comments * Change loop iteration deduction * Add more test cases for fused rnn layers Co-authored-by: Connor Goggins <cgoggins0@gmail.com>
* [v1.x] [Large Tensor] Backport of Fixed RNN op (#17632) * Changed relevant function args to index_t * Added nightly test for RNN * Added fix for LSTM, GRU, RNN-ReLU, RNN-tanh * Using const instead of literals * Added nightly test for RNN ReLU & tanh, LSTM, GRU * Type assertion to force evaluation of output NDArray * Incorporated latest round of comments * [v1.x] Backport of Fix LSTM and GRU layers gradient calculations (#18203) * Fix input gradient calculation for bidirectional LSTM For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect. Reason of wrong calculations was overwriting y derivative (dy) tensor by calculated x derivative (dx) tensor before right2left layer could use dy for own gradient calculations. Propsed fix uses additional space to avoid overwriting. * Fix gradient calculation for GRU For GRU with number of layers > 2 i2h_weight gradient for layers in the middle (all except last and first) was incorrect. Wrong caluculations were caused by assigning output pointer to input instead of calculating new input pointer. * Enable tests for GRU and LSTM gradients * Fix comments * Change loop iteration deduction * Add more test cases for fused rnn layers Co-authored-by: Connor Goggins <cgoggins0@gmail.com>
…17632) (apache#18317) * [v1.x] [Large Tensor] Backport of Fixed RNN op (apache#17632) * Changed relevant function args to index_t * Added nightly test for RNN * Added fix for LSTM, GRU, RNN-ReLU, RNN-tanh * Using const instead of literals * Added nightly test for RNN ReLU & tanh, LSTM, GRU * Type assertion to force evaluation of output NDArray * Incorporated latest round of comments * [v1.x] Backport of Fix LSTM and GRU layers gradient calculations (apache#18203) * Fix input gradient calculation for bidirectional LSTM For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect. Reason of wrong calculations was overwriting y derivative (dy) tensor by calculated x derivative (dx) tensor before right2left layer could use dy for own gradient calculations. Propsed fix uses additional space to avoid overwriting. * Fix gradient calculation for GRU For GRU with number of layers > 2 i2h_weight gradient for layers in the middle (all except last and first) was incorrect. Wrong caluculations were caused by assigning output pointer to input instead of calculating new input pointer. * Enable tests for GRU and LSTM gradients * Fix comments * Change loop iteration deduction * Add more test cases for fused rnn layers Co-authored-by: Connor Goggins <cgoggins0@gmail.com>
Description
The RNN op was previously breaking on large tensor (dimension >= 2^32) data. With the following input:
the following error was thrown:
To root cause this issue, I ran the previous command in a Python script with GDB, and found that the underlying problem was in several of the function definitions of
rnn-inl.h
. Several of the data variables (input_size
,batch_size
, andseq_length
) used theint
dtype when they should have been usingindex_t
to properly handle long int dimensions. I switched these variables toindex_t
in the relevant function headers and, after rebuilding, the previous input command displayed the correct output:However, this only accounted for running the RNN op in
relu
mode. The op currently supports 3 other modes:rnn_tanh
,lstm
, andgru
. To ensure that the op worked with large tensor data in each of these three modes in addition torelu
mode, I made extensive modifications tornn_impl.h
. My modifications involved changing the data type of many function parameters and local variables for forward and backward functions for these three modes.To ensure completeness and to prevent future breaking changes, I also added a nightly test for the RNN op with large tensor data in
tests/nightly/test_large_array.py
.Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments
Tested on r5dn.24xl-ubuntu 16.04 and p2.16xl-ubuntu 16.04 with
Results
The key difference between CPU and GPU tests was the instance type (r5dn.24xl for CPU, p2.16xl for GPU). All relevant build flags remain the same, and both were tested using CPU context.
Single operator test - RNN ReLU op (GPU)
Single operator test - RNN ReLU op (CPU)
Single operator test - RNN tanh op (CPU)
Single operator test - RNN LSTM op (CPU)
Single operator test - RNN GRU op (CPU)
Full OpPerf test (CPU)
@apeforest @access2rohit @ChaiBapchya