Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MKLDNN] mkldnn RNN operator enhancement #17075

Merged
merged 3 commits into from
Dec 16, 2019

Conversation

zixuanweeei
Copy link
Contributor

@zixuanweeei zixuanweeei commented Dec 15, 2019

Description

Enhancement for RNN operator.

Checklist

Changes

@ciyongch @pengzhao-intel @TaoLv

`add` operation support

Rename AddTo

Add MXNET_USE_MKLDNN_RNN env

Add Env var for switching to naive RNN impl and naive add/copy impl
@zixuanweeei
Copy link
Contributor Author

@pengzhao-intel @TaoLv CI passed. Please take a review. Thanks.

Copy link
Contributor

@ciyongch ciyongch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good

@@ -349,6 +349,10 @@ If ctypes is used, it must be `mxnet._ctypes.ndarray.NDArrayBase`.
- Values: 0(false) or 1(true) ```(default=1)```
- If this variable is set, MXNet will simplify the computation graph, eliminating duplicated operations on the same inputs.

* MXNET_USE_MKLDNN_RNN
- Values: 0(false) or 1(true) ```(default=1)```
- This variable controls whether to use the MKL-DNN backend in fused RNN operator for CPU context. There are two fusion implementations of RNN operator in MXNet. The MKL-DNN implementation has a better performance than the naive one, but the latter is more stable in the backward operation currently.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the MKL-DNN fused kernel is not stable in backward pass? Or MKL-DNN version is not flexible as naive one due to some implementation limitation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is not stable in the backward pass. I have trained the bucketing model (https://github.com/apache/incubator-mxnet/tree/master/example/rnn/bucketing) with the backend of MKL-DNN RNN Backward. It resulted in a convergent optimizing curve. But it has not been tested in other applications for training a model. So I provided an env variable for users to switch to the naive implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, got it. I thought the results are not stable previously :) The similar description will be only verified with a limited but not broader test cases.

Copy link
Contributor

@ciyongch ciyongch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@pengzhao-intel pengzhao-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvements. Merging now.

@pengzhao-intel pengzhao-intel merged commit 897f4fa into apache:master Dec 16, 2019
zixuanweeei added a commit to zixuanweeei/mxnet that referenced this pull request Jan 6, 2020
* mkldnn rnn operator enhancement

`add` operation support

Rename AddTo

Add MXNET_USE_MKLDNN_RNN env

Add Env var for switching to naive RNN impl and naive add/copy impl

* Re-run CI, op:test_reduce failed on Unix-CPU

* Rerun CI, Python2 CPU on Unix-CPU timeout
TaoLv pushed a commit that referenced this pull request Jan 7, 2020
)

* [MKLDNN] mkldnn RNN operator enhancement (#17075)

* mkldnn rnn operator enhancement

`add` operation support

Rename AddTo

Add MXNET_USE_MKLDNN_RNN env

Add Env var for switching to naive RNN impl and naive add/copy impl

* Re-run CI, op:test_reduce failed on Unix-CPU

* Rerun CI, Python2 CPU on Unix-CPU timeout

* MKL-DNN RNN backward path enhancement (#17183)

* Flush memory before RNN backward primitive

* Add gluon rnn unit test for gradients check

* Cache reorder

* Re-write rnn supporting check

* Update OpSignature.AddSign to avoid potential hash collision for
rnn-packed memory

Get the data type from mkldnn memory descriptor when setting grad handle
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants