Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

modifying SyncBN doc for FP16 use case #14041

Merged
merged 1 commit into from
Feb 5, 2019
Merged

Conversation

mseth10
Copy link
Contributor

@mseth10 mseth10 commented Feb 1, 2019

Description

Updating docs to add instructions for MXNet users on SyncBN FP16 support and possible workarounds. This fixes #13976 .

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • SyncBN doc

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@vandanavk
Copy link
Contributor

@mxnet-label-bot add [pr-awaiting-review, Doc]

@marcoabreu marcoabreu added Doc pr-awaiting-review PR is waiting for code review labels Feb 4, 2019
@vandanavk
Copy link
Contributor

@mxnet-label-bot update [pr-awaiting-merge, Doc]

@aaronmarkham for review/merge

@marcoabreu marcoabreu added pr-awaiting-merge Review and CI is complete. Ready to Merge and removed pr-awaiting-review PR is waiting for code review labels Feb 5, 2019
@srochel srochel merged commit 3f6778b into apache:master Feb 5, 2019
We follow the sync-onece implmentation described in the paper [2]_.
We follow the implementation described in the paper [2]_.

Note: Current implementation of SyncBN does not support FP16 training.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So does it not support training or inference?
You say training in this line, and refer to inference in the next line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SyncBN does not have FP16 support for both training and inference. But for FP16 inference, SyncBN can be replaced with nn.BatchNorm as they have similar functionality.

stephenrawls pushed a commit to stephenrawls/incubator-mxnet that referenced this pull request Feb 16, 2019
vdantu pushed a commit to vdantu/incubator-mxnet that referenced this pull request Mar 31, 2019
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
@mseth10 mseth10 deleted the fix-doc-syncBN branch June 1, 2020 10:47
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Doc pr-awaiting-merge Review and CI is complete. Ready to Merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Problem of exporting FP16 SyncBN model.
6 participants