-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
@szha @sbodenstein Please help to review. |
cc @stephenrawls since the work is related. |
@szha @stephenrawls we are not 100% sure about the refactor of GPU part. Please help take a review and the suggestion and contributions are highly appreciated. We can add the related folks into collaboration list for the code refactoring. |
@mxnet-label-bot add[pr-awaiting-review] |
This will definitely impact my cudnn var-length PR. It's been on back burner for last couple weeks but been meaning to push it through soon. If this is going to get merged first I can just update my PR to work with this change. I'm not entirely familiar with the code base. Can you let me know what making RNN a stateful operator means? I'm not sure if this is something that I need to worry about in my PR or not. Mostly it looks like this is just taking the cudnn specific header file and putting all that code in the generic rnn-inl.h header file? |
@stephenrawls This is to change the RNN op from legacy operator to a stateful operator. A stateful operator supports carrying arbitrary state from the forward computation to the backward computation. This makes it easy to pass states such as cudnn reserved spaces from forward op to the backward op. |
…hen USE_CUDA=1 USE_CUDNN=0
sync to latest code
ci is blocked by #14507 |
Fixes for CI downloads (apache#14504)
src/operator/rnn-inl.h
Outdated
DType* cx_ptr = NULL; | ||
DType* cy_ptr = NULL; | ||
DType * cx_ptr = NULL; | ||
DType * cy_ptr = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DType* cx_ptr
looks better and makes the style more consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Conducted one round of review. Some minor problems. |
@lihaofd thanks for the fix. Could you rebase to the current master? The flaky tests might have been disabled on master already. |
sync to latest master
sync to latest master
sync code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had an offline discussion with @eric-haibin-lin regarding the specific point on using storage handler for managing temporary space. Conclusions:
- This is not a good practice as operator should not be able to interact with storage manager.
- A good alternative is to create NDArray for the same purpose, and the operators can rely on its RAII property for life-cycle management.
However, we realize that this is a bad example that already happens in the existing code, so we would not want to impose the work to correct it on the PR requester. @lihaofd let me know if you prefer to correct this point (which would be really helpful!) or if you'd prefer to merge as is (which we can understand too).
@szha I suggest we can merge the current PR and keep enhancing it in the future. Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution. This is significant amount of work to refactor all these code. Please check my comments. While my intent is not to block this PR from merging, I hope to get some clarifications. I have no objection merging the PR if there is urgency/dependency.
3, | ||
dim_w)); | ||
|
||
// Query weight layout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cudnn provides an interface for querying the layout of the fused weight matrix. the current operator hard-codes it instead of relying on this interface since there isn't much change in the layout requirement.
// Allocate the reserve space | ||
reserve_space_ = Storage::Get()->Alloc(reserve_space_byte_, Context::GPU(s->dev_id)); | ||
// Allocate the temp space | ||
temp_space_ = Storage::Get()->Alloc(workspace_byte_, Context::GPU(s->dev_id)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand it correctly, temp_space_ can be requested by adding kTempResource
to resource request by checking dev_type in FResourceRequestEx
, instead of calling StorageManager directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it was done that way in the previous implementation. given that storage manager already is exposed due to reserved space I requested that the temp space be made consistent. it also has the benefit that the space can be shared between forward and backward since the size doesn't change.
dep.push_back(out_data[rnn_enum::kStateOut]); | ||
dep.push_back(out_grad[rnn_enum::kStateOut]); | ||
/* | ||
index description |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably make some enums like RNNOpInputs
for these indices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends on param.mode == rnn_enum::kLstm and param.state_outputs, the index order will be different for different RNN Type with or without stateoutput
* change RNN OP to stateful * retrigger the ci * fix windows compile issue * move cudnnrnn class into rnn-inl.h * fix some bugs * fix bug about gpu case like tests/python/gpu/test_gluon_gpu.test_rnn_layers_fp32 etc * fix gpu compile issue of unix-gpu and windows-gpu * print log for test * fix GPU NO CUDNN for unix-gpu case * print log * remove print log and make gpu case has same error message as master when USE_CUDA=1 USE_CUDNN=0 * fix typo bug * retrigger the ci * retrigger the ci * retrigger the ci * fix comments * retrigger the ci * fix comments * retrigger the ci * fix comments * retrigger the ci * retrigger the ci * retrigger the ci * retrigger the ci * fix comments
Description
In this PR, it refactored RNN OP to stateful by using FStatefulCompute
@pengzhao-intel, @TaoLv , @ciyongch
Feature changes
New features
Unit-test changes
Checklist