Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[v1.7.x] ElementWiseSum fix for oneDNN #18777

Merged
merged 5 commits into from
Aug 5, 2020

Conversation

bgawrych
Copy link
Contributor

Description

This PR fixes bug which occurs when training gluonCV deeplab with oneDNN support.
Original issue: dmlc/gluon-cv#1368

To reproduce:

python3 train.py --dataset pascal_aug --model-zoo deeplab_resnet101_coco --aux --lr 0.001 --checkname res101 --no-cuda

where train.py comes from https://gluon-cv.mxnet.io/_downloads/1f67ebbf3e0adc5c6fa863a3bc7672a6/train.py (from https://gluon-cv.mxnet.io/build/examples_segmentation/train_fcn.html )

Checklist

Essentials

  • Changes are complete (i.e. I finished coding on this PR)
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Comments

Author of the fix is @anko-intel with my small help

@mxnet-bot
Copy link

Hey @bgawrych , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [clang, unix-cpu, windows-cpu, sanity, edge, miscellaneous, windows-gpu, website, centos-cpu, unix-gpu, centos-gpu]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

for (const NDArray& in : inputs) {
// if ndarray is in default storage and MKLDNN is available,
// need to make sure cpu layout data is used, instead of MKL layout
if (in.storage_type() == kDefaultStorage) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

&& in.IsMKLDNNData()

// if ndarray is in default storage and MKLDNN is available,
// need to make sure cpu layout data is used, instead of MKL layout
if (in.storage_type() == kDefaultStorage) {
in_data.push_back(in.Reorder2Default());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about emplace_back

@wuxun-zhang
Copy link
Contributor

Thanks for the fix. Looks that master branch also needs this fix.

@lanking520 lanking520 added the pr-awaiting-review PR is waiting for code review label Jul 26, 2020
@roy6324
Copy link

roy6324 commented Jul 27, 2020

Thanks for the fix , problem is solved.

@bgawrych
Copy link
Contributor Author

@mxnet-bot run ci [unix-cpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu]

@bgawrych
Copy link
Contributor Author

@wuxun-zhang @xinyu-intel Can you look again? I've changed fix by reordering conditions in (I think) proper way

@xinyu-intel
Copy link
Contributor

@bgawrych thx, can you please add a simple python test case for this fix? I think it should be more clear.

Copy link
Contributor

@wuxun-zhang wuxun-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@pengzhao-intel pengzhao-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
please add a test case too

@bgawrych
Copy link
Contributor Author

LGTM
please add a test case too

@pengzhao-intel Done :)

@bgawrych bgawrych mentioned this pull request Aug 4, 2020
3 tasks
@pengzhao-intel pengzhao-intel merged commit 3143aab into apache:v1.7.x Aug 5, 2020
@ChaiBapchya
Copy link
Contributor

@pengzhao-intel @bgawrych Missing in v1.x Was it intentional?

@pengzhao-intel
Copy link
Contributor

Yes, it's better to include this PR. @anko-intel

bgawrych added a commit to bgawrych/incubator-mxnet that referenced this pull request Sep 21, 2020
* Fix ElementwiseSum for DNNL

* Fix sanity and replace push_back with emplace_back

* Change order of the data format conditions

* Add NOLINT to avoid readability error

* Add test for oneDNN ElemwiseSum

Co-authored-by: Bart Gawrych <gawrych.bartlomiej@intel.com>
bgawrych added a commit to bgawrych/incubator-mxnet that referenced this pull request Sep 21, 2020
* Fix ElementwiseSum for DNNL

* Fix sanity and replace push_back with emplace_back

* Change order of the data format conditions

* Add NOLINT to avoid readability error

* Add test for oneDNN ElemwiseSum

Co-authored-by: Bart Gawrych <gawrych.bartlomiej@intel.com>
bgawrych added a commit to bgawrych/incubator-mxnet that referenced this pull request Sep 21, 2020
* Fix ElementwiseSum for DNNL

* Fix sanity and replace push_back with emplace_back

* Change order of the data format conditions

* Add NOLINT to avoid readability error

* Add test for oneDNN ElemwiseSum

Co-authored-by: Bart Gawrych <gawrych.bartlomiej@intel.com>
bgawrych added a commit to bgawrych/incubator-mxnet that referenced this pull request Sep 21, 2020
* Fix ElementwiseSum for DNNL

* Fix sanity and replace push_back with emplace_back

* Change order of the data format conditions

* Add NOLINT to avoid readability error

* Add test for oneDNN ElemwiseSum

Co-authored-by: Bart Gawrych <gawrych.bartlomiej@intel.com>
samskalicky pushed a commit that referenced this pull request Sep 22, 2020
* Fix ElementwiseSum for DNNL

* Fix sanity and replace push_back with emplace_back

* Change order of the data format conditions

* Add NOLINT to avoid readability error

* Add test for oneDNN ElemwiseSum

Co-authored-by: Bart Gawrych <gawrych.bartlomiej@intel.com>

Co-authored-by: Bart Gawrych <gawrych.bartlomiej@intel.com>
pengzhao-intel pushed a commit that referenced this pull request Sep 23, 2020
* Fix ElementwiseSum for DNNL

* Fix sanity and replace push_back with emplace_back

* Change order of the data format conditions

* Add NOLINT to avoid readability error

* Add test for oneDNN ElemwiseSum

Co-authored-by: Bart Gawrych <gawrych.bartlomiej@intel.com>

Co-authored-by: Bart Gawrych <gawrych.bartlomiej@intel.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
MKLDNN pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants