-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[MKLDNN] Support quantized rnn towards v1.6.x #18028
[MKLDNN] Support quantized rnn towards v1.6.x #18028
Conversation
* Add _contrib_quantized_rnn op * Add asymmetric quantization - _contrib_quantized_asym op * Add MXNET_USE_WEIGHT_CACHE to control rnn init behavior * Support data layout in NDArrayIter * Move MKLDNNRnnMemMgr to individual layer
Hey @zixuanweeei , Thanks for submitting the PR
CI supported jobs: [unix-gpu, website, centos-gpu, edge, windows-gpu, sanity, centos-cpu, clang, miscellaneous, unix-cpu, windows-cpu] Note: |
@mxnet-label-bot add [mkldnn] |
Accuracy and performance: #18001 (comment) |
CI checks passed. Please take a review. @ciyongch @TaoLv @pengzhao-intel |
LGTM for the current implementation, but we still need another DNNL patch (or DNNL version upgrade) to mitigate the overhead of |
Sure, let's wait for the patch. |
Thanks, we can upgrade DNNL a little later and I will merge this PR for early testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
Mirror PR of #18001, towards v1.6.x branch. In this PR, we add support of quantization flow of the rnn operator. Currently, only the LSTM mode supports INT8 inference.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
NCHW
layout by default, and there is no way to support other layouts, like sequentialTNC
layout. This PR makes some changes to NDArrayIter to leverage the feature (assuming that N represents the batch).##Comments##
@ciyongch @TaoLv @pengzhao-intel