Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[BUGFIX] Fix AmpCast for float16 #19749

Merged
merged 2 commits into from
Feb 5, 2021

Conversation

anko-intel
Copy link
Contributor

@anko-intel anko-intel commented Jan 13, 2021

Description

OneDNN doesn't support float16 format, so fallback to standard
implementation is needed.
It fixes issue #19631.

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@mxnet-bot
Copy link

Hey @anko-intel , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [clang, unix-cpu, unix-gpu, centos-gpu, windows-cpu, windows-gpu, miscellaneous, website, sanity, edge, centos-cpu]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@lanking520 lanking520 added the pr-awaiting-testing PR is reviewed and waiting CI build and test label Jan 13, 2021
OneDNN doesn't support float16 format so fallback to standard
implementation is needed.
It fixes issue 19631.
@lanking520 lanking520 added pr-awaiting-review PR is waiting for code review pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-review PR is waiting for code review labels Jan 13, 2021
@anko-intel
Copy link
Contributor Author

@rongzha1 - could you review?

mkldnn::memory::dims i_dims = mkldnn::memory::dims(i_ndim);
for (size_t i = 0; i < i_ndim; i++) {
i_dims[i] = static_cast<int>(data.shape()[i]);
if (data.dtype() != mshadow::kFloat16) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we add isValidMKLDNNDataType() to check whether it is supported by mkldnn? mshadow has so many data types and some of them are not supported. https://github.com/apache/incubator-mxnet/blob/64f737cdd59fe88d2c5b479f25d011c5156b6a8a/3rdparty/mshadow/mshadow/base.h#L364:3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that. If created isValidMKLDNNDataType() function could be used in many places like MKLDNNStorageType() for FInferStorageType it makes sense. But in this particular situation, amp_cast operator only accept 3 float types (see https://github.com/apache/incubator-mxnet/blob/v1.x/src/operator/tensor/amp_cast.h#L70 ) so I just excluded float16 as not supported in MKLDNN.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. LGTM

mkldnn::memory::dims i_dims = mkldnn::memory::dims(i_ndim);
for (size_t i = 0; i < i_ndim; i++) {
i_dims[i] = static_cast<int>(data.shape()[i]);
if (data.dtype() != mshadow::kFloat16) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. LGTM

@anko-intel
Copy link
Contributor Author

@PatricZhao, @szha could you review and merge if everything is ok?

Copy link
Member

@szha szha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! Could you add a test for verification?

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-awaiting-review PR is waiting for code review labels Jan 27, 2021
@anko-intel
Copy link
Contributor Author

Hi @szha,
Originally I thought that float16 was not intended to be passed to the amp_cast in CPU context and I treated this change as fix only for #19631.
Now I have enabled existing test for float32->float16 for amp_cast on CPU.

@lanking520 lanking520 added pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Jan 27, 2021
@anko-intel
Copy link
Contributor Author

@mxnet-bot run ci [centos-cpu, unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [centos-cpu, unix-gpu]

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Jan 27, 2021
@lanking520 lanking520 added the pr-awaiting-review PR is waiting for code review label Jan 27, 2021
@szha szha merged commit 0a65920 into apache:v1.x Feb 5, 2021
anko-intel added a commit to anko-intel/incubator-mxnet that referenced this pull request Mar 10, 2021
* Fix AmpCast for float16

OneDNN doesn't support float16 format so fallback to standard
implementation is needed.
It fixes issue 19631.

* Enable amp_cast test for float16 on CPU context
szha pushed a commit that referenced this pull request Mar 12, 2021
chinakook pushed a commit to chinakook/mxnet that referenced this pull request May 2, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants