Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[Improvement] Invoke mkldnn and cudnn BatchNorm when axis != 1 #18504

Merged
merged 21 commits into from
Jul 9, 2020

Conversation

wkcn
Copy link
Member

@wkcn wkcn commented Jun 6, 2020

Description

Hi there.
In previous version, the MKLDNN and CUDNN implementation of BatchNorm will be invoked when axis is 1. In other axis, the naive implementation will be applied.

In this PR, it support calling MKLDNN and CUDNN Batch Norm in arbitrary axis.

If the shape of input A is shape, the input will be shaped into (prod(shape[0:axis]), shape[axis], 1, prod(shape[axis+1:len(shape)])).

The PR contains the PR #18500, and it should be merged after #18500 .

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

The unittest test_symbol.py seems to be removed in the previous PR.
I add a new param mkldnn_off into BatchNorm, since it is not numerical stable when the input is extremely large. It will leads to the failure of tests/python/unittest/test_symbol.py:test_load_000800

[2020-06-11T06:53:50.575Z] Items are not equal:
[2020-06-11T06:53:50.575Z] Error 1.072210 exceeds tolerance rtol=1.000000e-03, atol=1.000000e-03 (mismatch at least 5.500000%).
[2020-06-11T06:53:50.575Z] Location of maximum error: (0, 75), a=-12467010038075455310653024108544.00000000, b=-12480391637972769640987846770688.00000000
[2020-06-11T06:53:50.575Z]  ACTUAL: array([[ 2.9346620e+32,  1.9349947e+33, -1.0458480e+33, ...,
[2020-06-11T06:53:50.575Z]          2.2364499e+33,  1.0379849e+33,  5.1014372e+32]], dtype=float32)
[2020-06-11T06:53:50.575Z]  DESIRED: array([[ 2.9377518e+32,  1.9370348e+33, -1.0469506e+33, ...,
[2020-06-11T06:53:50.575Z]          2.2388062e+33,  1.0390785e+33,  5.1068152e+32]], dtype=float32)

@mxnet-bot
Copy link

Hey @wkcn , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [website, edge, miscellaneous, centos-cpu, unix-cpu, clang, sanity, windows-gpu, windows-cpu, unix-gpu, centos-gpu]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@wkcn
Copy link
Member Author

wkcn commented Jun 9, 2020

@mxnet-bot run ci [unix-cpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu]

@wkcn
Copy link
Member Author

wkcn commented Jun 9, 2020

@mxnet-bot run ci [unix-cpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu]

@wkcn
Copy link
Member Author

wkcn commented Jun 11, 2020

@mxnet-bot run ci [unix-cpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu]

szha
szha previously requested changes Jun 11, 2020
Copy link
Member

@szha szha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest not to add mkldnn specific attribute to the operator. Along the same line, we need better ways of turning off cudnn too.

For master branch, I'd suggest replacing both mkldnn_off and cudnn_off op-level attributes with environment variables for now. We could consider creating MXNET_DISABLE_CUDNN environment variable where its value could be True, False (default), and a list of comma separated operator names.

@wkcn
Copy link
Member Author

wkcn commented Jun 11, 2020

@szha
Thanks for your suggestion!
I agree. I will update this PR after replacing mkldnn_off and cudnn_off attributes with the environment variables.

@wkcn wkcn linked an issue Jul 1, 2020 that may be closed by this pull request
@TaoLv
Copy link
Member

TaoLv commented Jul 1, 2020

@wkcn, do you have any performance number?

@wkcn
Copy link
Member Author

wkcn commented Jul 1, 2020

@wkcn, do you have any performance number?

Sorry that I do not have any performance table. I have only a laptop computer with i7-7500u 2C4T. Could you help me for the benchmark? Thank you!

@wkcn
Copy link
Member Author

wkcn commented Jul 2, 2020

@mxnet-bot run ci [unix-cpu, centos-cpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [centos-cpu, unix-cpu]

@wkcn
Copy link
Member Author

wkcn commented Jul 2, 2020

@mxnet-bot run ci [unix-cpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu]

@sandeep-krishnamurthy
Copy link
Contributor

@mseth10 @access2rohit - Please help review this PR.

@szha szha dismissed their stale review July 2, 2020 16:52

agreed to addressing the options problem in follow-up PR by @wkcn

shape_[2] = 1;
shape_[3] = in_data.shape_.ProdShape(2, in_data.ndim());
shape_[3] = static_cast<dim_t>(in_data.shape_.ProdShape(param_.axis + 1,
static_cast<int>(in_data.ndim())));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need static_cast here? why cant we do it like line 276?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return dtype of ProdShape is size_t, a unsigned int, but the dtype of shape_[i] is dim_t, namely int64_t.
I do not know whether the static_cast is neceseary to avoid the potential compiler's warning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add static_cast to line 276 as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in_data.ndims() shouldn’t need a static cast though, right?

Copy link
Member Author

@wkcn wkcn Jul 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the dtype of .ndim() is int32_t.

variable dtype
axis int32_t
ProdShape size_t
shape_[i] dim_t, int64_t
ndim() int32_t

The signature of ProdShape is size_t ProdShape(int dimstart, int dimend) const.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mseth10 , do you have any suggestion about whether to use static_cast ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo you can ignore the compiler warnings (if any) for int32_t to int,
you can keep static_cast for size_t to dim_t

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I have updated it : )

@@ -420,10 +420,14 @@ static bool BatchNormType(const nnvm::NodeAttrs& attrs,

#if MXNET_USE_MKLDNN == 1
static inline bool SupportMKLDNNBN(const NDArray &input, const BatchNormParam &param) {
mxnet::TShape shape = input.shape();
return SupportMKLDNN(input) && shape.ndim() == 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are removing the check for ndim == 4 here, and another lighter check for ndim == 1 || ndim ==2 || ndim ==4 present in SupportMKLDNN.
Does that mean ndim can be anything >0 ? What are the allowed values for ndim?

Copy link
Member Author

@wkcn wkcn Jul 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, ndim coulld be anything > 0.

If the shape of input A is shape, the input will be shaped into (prod(shape[0:axis]), shape[axis], 1, prod(shape[axis+1:len(shape)]) ).

ndim shape axis=0 axis=1 axis=2 axis=3 axis=4
1 (N,) (1, N, 1, 1) x x x x
2 (N,C) (1, N, 1, C) (N, C, 1, 1) x x x
3 (N,C,H) (1, N, 1, CH) (N, C, 1, H) (NC, H, 1, 1) x x
4 (N,C,H,W) (1, N, 1, CHW) (N, C, 1, HW) (NC, H, 1, W) (NCH, W, 1, 1) x
5 (N,D,C,H,W) (1, N, 1, DCHW) (N, D, 1, CHW) (ND,C, 1, HW) (NDC, H, 1, W) (NDCH, W, 1, 1)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation! Does the table continue further for ndims > 5? Or should we place a check for that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it supports ndim > 5 too.

Copy link
Contributor

@mseth10 mseth10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the contribution @wkcn

@stu1130
Copy link
Contributor

stu1130 commented Jul 6, 2020

@mxnet-bot run ci [macosx-x86_64]

@mxnet-bot
Copy link

None of the jobs entered are supported.
Jobs entered by user: [macosx-x86_64]
CI supported Jobs: [centos-gpu, miscellaneous, windows-cpu, windows-gpu, website, clang, unix-gpu, centos-cpu, sanity, unix-cpu, edge]

@stu1130
Copy link
Contributor

stu1130 commented Jul 6, 2020

@TaoLv could you help merge it if the PR looks good to you?

@sandeep-krishnamurthy
Copy link
Contributor

@szha @eric-haibin-lin - Can you please help review/merge with this PR? Thank you.

@szha szha merged commit beafba7 into apache:master Jul 9, 2020
stu1130 pushed a commit to stu1130/incubator-mxnet that referenced this pull request Jul 9, 2020
…e#18504)

* fix batch norm when fix_gamma is True

* support gradient accumulation for batch norm

* mkldnn batchnorm support grad add

* unittest for bn

* fix bn arg

* fix lint

* fix mkldnn

* fix mkldnn bn

* fix grad when fixing gamma

* fix naive gpu bn

* fix lint

* invoke mkldnn and cudnn batchnorm when axis != 1

* backport 18500

* change condition

* fix

* fix

* add mkldnn_off for bn

* remove mkldnn_off

* recover save_000800.json

* cast
zheyuye added a commit to zheyuye/incubator-mxnet that referenced this pull request Jul 17, 2020
commit a77f774ed179786fc8429d913a2da1d942528de9
Author: Leonard Lausen <lausen@amazon.com>
Date:   Fri Jul 17 05:01:17 2020 +0000

    Remove NNPACK integration (#18722)

commit 3ef00b8840c05c49118705f6fd9663ebb951f3a1
Author: Andrei Ivanov <andrey.ivanov@gmail.com>
Date:   Thu Jul 16 16:57:58 2020 -0700

    Refactoring of Pooled Storage Manager classes (#18582)

    * Refactoring of Pooled Storage Manager classes

    * Adding test for new functionality

    * Fixing compilation problems which appear for MXNET_USE_CUDA=0

    * Fixing compilation problems for WINDOWS and ANDROID

    * Fixing compilation problems which appear for WINDOWS and __APPLE__

    * Fixing lint problems

    * test_dataloader_context(): Bypassing custom_dev_id pinned mem test on system with GPUs < 2.

    * Fixing compilation for Android. Elimination of unused includes.

    * Fixing problems with CPUPinned Storage Manager which appears when MXNET_USE_CUDA = 0

    * Removing test_bucketing.py

    * Imroving CPU_Pinned Pooled Storage Manager case.

    * Fixing lint problem

    * The GPU profiling commands calls moved into mutex area

    * Fixing lint problem

    * Improved reporting regarding the Storage Manager used.

    * Fixing lint problem

    * Trigger CI

    * Removing some comments, as suggested by @szha

    * Trigger CI

    * Trigger CI

    Co-authored-by: andreii <andreii@nvidia.com>

commit 2abf0b8c2b3361c73c9dfdeabdb8a88278b693d0
Author: Leonard Lausen <lausen@amazon.com>
Date:   Thu Jul 16 17:41:22 2020 +0000

    Initialize docker cache in build.py for docker-compose containers (#18724)

commit 37bdf0bf981d11a89bd248b02f473211d57bc9c6
Author: JackieWu <wkcn@live.cn>
Date:   Fri Jul 17 01:25:01 2020 +0800

    [MXNET-1453] Support the intput whose dimension is greater than 6 for Transpose and Rollaxis (#18707)

    * support 6+ dims for transpose

    * test over

    * reorder code

    * fix transposeex

commit 8198442f0c7bde0fc47f507c3f81a0b5cf0a5235
Author: AntiZpvoh <59728467+AntiZpvoh@users.noreply.github.com>
Date:   Thu Jul 16 15:01:59 2020 +0800

    [numpy] symbolic advanced indexing (#18319)

    * add ndarray and boolean indexing for numpy symbol

    * fix sanity and unit test

    * ensure consistency between the imperative and symbolic interface

    * Update python/mxnet/numpy/multiarray.py and add new test
    Co-authored-by: Leonard Lausen <leonard@lausen.nl>

    * Don't rely on indexing_key_expand_implicit_axes for deciding if
    _npi.advanced_indexing_multiple is applicable

    * fix sanity

    Co-authored-by: Leonard Lausen <lausen@amazon.com>

commit 690132516a0a99337625248772fd44930686a82b
Author: 蔡舒起 <867907127@qq.com>
Date:   Thu Jul 16 10:12:20 2020 +0800

    Add the newest mxnet discuss  version. Add d2l.ai (#18663)

    * Add the newest mxnet discuss  version. Add d2l.ai

    * delete [] and insert old version

commit e2366e9102e6862416bf998af52baaa5e9c0a31b
Author: Leonard Lausen <lausen@amazon.com>
Date:   Wed Jul 15 22:01:36 2020 +0000

    Refactor scope functionality in Python API (#18619)

    * Refactor scope functionality in Python API

    - Remove deprecated metaclass functionality
    - Remove global state in naming
    - Switch from threading.local to asyncio compatible contextvars
    - Stop exposing UUIDs in parameter name

    * Fix dependencies

    * Fixes

    * Fixes

    * Fix

    * Fix after merge master

commit 12ec04611c78a603c03707488d66bdbbedf0d536
Author: Chaitanya Prakash Bapat <chai.bapat@gmail.com>
Date:   Wed Jul 15 13:59:34 2020 -0700

    Migrate from private to public jetson toolchain files (#18677)

commit 0dc30a2c170fd0aa369d325a1feae6aad75a52c2
Author: Leonard Lausen <lausen@amazon.com>
Date:   Wed Jul 15 01:02:36 2020 +0000

    Enable GPU Memory profiler tests (#18701)

    * Enable GPU Memory profiler tests

    Previously tests are not run as test_profiler.py was not taken into account on
    GPU CI runs and some tests were marked for being skipped if run on a CPU-only
    machine.

    * Disable broken tests

commit d512814c2981f9bfb23937064634982ca97d0338
Author: Leonard Lausen <lausen@amazon.com>
Date:   Wed Jul 15 00:57:38 2020 +0000

    Disable test coverage in MKL builds (#18443)

    * Disable test coverage in MKL builds

    * Enable test parallelization

    * Set OMP_NUM_THREADS

    * Fix

    * Fix unpack_and_init

commit d8430b6b412e637d07b291dbee1350df7168234d
Author: Leonard Lausen <lausen@amazon.com>
Date:   Wed Jul 15 00:53:49 2020 +0000

    Set CMAKE_CUDA_COMPILER in aarch64-linux-gnu-toolchain.cmake (#18713)

    CMAKE_CUDA_HOST_COMPILER will be reset if CMAKE_CUDA_COMPILER is not set as of cmake 3.17.3

    See https://gitlab.kitware.com/cmake/cmake/-/issues/20826

commit f125f5fd9ff91e9a70e5add3735c32d4e3bf9cd0
Author: Yang Shi <yangshia@amazon.com>
Date:   Tue Jul 14 14:29:14 2020 -0700

    Fix all anchor shifts on website (#18674)

commit 7c9c4fc3d3ef66310537c0bc6810a90af551a63e
Author: Yang Shi <yangshia@amazon.com>
Date:   Tue Jul 14 14:28:17 2020 -0700

    Merge content from numpy.mxnet.io into mxnet official website (#18691)

commit 7f7e1c5a714262e8cd1015716258416e6ce1ff3e
Author: Serge Panev <spanev@nvidia.com>
Date:   Tue Jul 14 14:12:00 2020 -0700

    Add better partial args/aux handling in symbol optimize_for (#18350)

    * Add missing args/aux support in optimize_for and deferred inference option

    Signed-off-by: Serge Panev <spanev@nvidia.com>

    * Add input shape_dict, type_dict and stype_dict to optimize_for

    Signed-off-by: Serge Panev <spanev@nvidia.com>

    * Remove warnings for Werror

    Signed-off-by: Serge Panev <spanev@nvidia.com>

    * Address PR comments

    Signed-off-by: Serge Panev <spanev@nvidia.com>

commit 9d623926d4857a2cfa32515b58cd1398371f97f3
Author: Yang Shi <yangshia@amazon.com>
Date:   Mon Jul 13 15:54:51 2020 -0700

    Fix python micro-site table of content bugs (#18664)

    * update footer style

    * add compiled css of footer styles changes

    * add same style for footer2

    * more fix to the toc

commit 8ebb5372c3ad414cde096fb82de8be14cb748b11
Author: Sheng Zha <szha@users.noreply.github.com>
Date:   Mon Jul 13 13:17:12 2020 -0700

    add 'needs triage' label to new bug reports (#18696)

commit 9c5b95a9c5d6f83a067504fb47fac4e3aed27e81
Author: Serge Panev <spanev@nvidia.com>
Date:   Mon Jul 13 11:45:29 2020 -0700

    Partition API adding and deleting new params to Block and Symbol (#18405)

    * Add deleting of args aux aux to Partition API

    Signed-off-by: Serge Panev <spanev@nvidia.com>

    * Delete args from Block.params

    Signed-off-by: Serge Panev <spanev@nvidia.com>

    * Fix to use arg/auxdict when optimize_for is called in HybridBlock

    Signed-off-by: Serge Panev <spanev@nvidia.com>

    * Address PR comments

    Signed-off-by: Serge Panev <spanev@nvidia.com>

commit 19e373daac76b466cf11b5d31fa5d5e2eb518a21
Author: Leonard Lausen <lausen@amazon.com>
Date:   Sat Jul 11 09:09:51 2020 -0700

    Fix scipy dependency in probability module (#18689)

    * Fix scipy dependency in probability module

    * Fix copy-paste error

    * dtype='float32' for digamma and gammaln

commit a9b16f7024878611b236c9f3734ccd37a5a35d38
Author: JackieWu <wkcn@live.cn>
Date:   Sat Jul 11 02:59:21 2020 +0800

    change bn test (#18688)

commit beafba76395e75c093f99d20ac62e38f48e91012
Author: JackieWu <wkcn@live.cn>
Date:   Thu Jul 9 08:01:35 2020 +0800

    [Improvement] Invoke mkldnn and cudnn BatchNorm when axis != 1 (#18504)

    * fix batch norm when fix_gamma is True

    * support gradient accumulation for batch norm

    * mkldnn batchnorm support grad add

    * unittest for bn

    * fix bn arg

    * fix lint

    * fix mkldnn

    * fix mkldnn bn

    * fix grad when fixing gamma

    * fix naive gpu bn

    * fix lint

    * invoke mkldnn and cudnn batchnorm when axis != 1

    * backport 18500

    * change condition

    * fix

    * fix

    * add mkldnn_off for bn

    * remove mkldnn_off

    * recover save_000800.json

    * cast

commit 348ab4d8d77359bf60d97a0befbd9086fd52ee49
Author: Yang Shi <yangshia@amazon.com>
Date:   Tue Jul 7 15:06:34 2020 -0700

    fix broken installation widget - remove empty entries (#18661)

commit b4b8b805fe94a6df905c6eae7f6c1f83cfea9b73
Author: Xi Wang <xidulu@gmail.com>
Date:   Wed Jul 8 01:22:05 2020 +0800

    Gluon.probability (#18403)

    * package created

    * mvn WIP

    * normal wip, to be tested

    * update

    * docstring added, normal mostly done

    * add test file

    * Bernoulli WIP

    * bernoulli wip

    * bernoulli doc done

    * dense variational WIP

    * add kl infra

    * implement normal kl method

    * refactor kl

    * add not implemented handling, rename kl_storage

    * add  abstract method and Categorical class

    * rewrite logit2prob prob2logit for multiclass support

    * normal broadcast_to implemented

    * categorical mostly done

    * update distributions/utils.py

    * add dot ahead of import

    * fix normal F

    * bernoulli, normal brief tests implemented

    * add hybridize tests

    * transformation infras done

    * affine transformation, implemented tested

    * add tests cases

    * add sum_right_most

    * fix get F bug

    * compose transform implemented, tested

    * fix

    * add event_dim

    * fetch mvn from upstremm

    * clean code, implement normal cdf and tests

    * constraint in bernoulli done

    * fix constraint

    * finish half normal

    * add cached_property

    * add test on cached_property

    * add more features to distribution and constratins

    * change constraint

    * fix bernoulli

    * add independent

    * add independent tests

    * update naming of cached_property

    * revert

    * add constraints

    * add Cat

    * add Stack for imperative mode

    * add Stack for imperative mode

    * add bernoulli entropy

    * categorical WIP

    * categorical sampling implemented

    * finish categorical log_prob, sampling

    * enumerate_support finished

    * polish StochasticBlock, add test

    * add test for stochastic sequential

    * clean loss list in __call__

    * fix affine, implement sigmoid, softmax

    * add gumbel, relaxed bernoulli

    * relaxed one-hot sampling implemented

    * gamma done

    * gamma, dirichlet implemented

    * beta done

    * gumbel softmax log-likelihood implemented

    * refactor tests, implement exponential, fix compose transform

    * weibull implemented, transformed distribution cdf icdf added

    * pareto implemented

    * uniform wip

    * uniform done

    * rewrite lgamma, implement chi2

    * fix chi2 scale

    * F distributiion done

    * t implemented

    * fix tiny problem

    * cauchy done

    * add half cauchy

    * multinomial done, tests to be added

    * add multinomial test

    * MVN done, tests todo

    * mvn polished

    * fix a few precison issues

    * add erf, erfinv unified api and learnable transform

    * fix mvn attribute check

    * MVN done

    * poisson done

    * hack poisson for size support

    * geometric finished

    * negative binomial done

    * binomial done

    * implement some kl

    * add more kl

    * refactor kl test

    * add more kl

    * binomial kl todo

    * change constraint logical op implement

    * implement gamma entropy

    * finish beta dirchlet entropy

    * finishi all entropy

    * kl finished

    * add constraint test

    * domain map done

    * remove bayesian dense

    * fix tiny problems

    * add kl uniform normal

    * add kl tests

    * acquire patch from upstream

    * add some doc

    * finish doc

    * refactor kl test(WIP)

    * add more kl, fix float32 underflow issue

    * make sampling more stable

    * handle inconsistent mode

    * replace boolean idx with np.where

    * fix file name

    * add more doc

    * add constraint check

    * add half_normal/cauchy pdf cdf support check

    * fix import problem

    * change nosetest to pytest

    * remove buggy lines

    * change alias register path

    * attempt to fix ci

    * fix lint, change a few tests

    * fix lint

    * modify hybrid sequential

    * fix lint

    * change import order

    * add test gluon probability v2

    * fix hybridize flag

    * change implementation of stochastic block

    * fix lint

    * fix comments

    * fix block

    * modify domain map

    * add raises for improper add_loss

    * add raises for improper add_loss

    * add extra cases

    * change collectLoss decorator to mandatory

    * skip stochastic block tests

    * remove test cases

    * put gpu tests back

    * add test_gluon_stochastic_block back

    * remove export test

    * put a test back

    * tiny refactor

    * add memory leak flag

    * small changes

    Co-authored-by: Zheng <shzheng@a483e789dd93.ant.amazon.com>

commit 54c0155b7581f5e10b1469a17ddf127d3c75e156
Author: Yang Shi <yangshia@amazon.com>
Date:   Mon Jul 6 17:01:42 2020 -0700

    User Feedback Widget (#18639)

    * user feedback widget implementation

    * add user feedback widget to python docs site

    * update margin

    * add apache license

    * one more license

    * turn off feedback widget on python site

    * update copy

    * format

    * add event value field

    * turn on widget on Python site

commit 646288716cbba482d4ede0fb4f6141b2ea505090
Author: Yiyan66 <57363390+Yiyan66@users.noreply.github.com>
Date:   Sat Jul 4 09:13:41 2020 +0800

    [numpy] Fix less/greater bug with scalar input (#18642)

    * fix ffi

    * fix less/greater error

    * back

    * submodule

    * fixed

    Co-authored-by: Ubuntu <ubuntu@ip-172-31-8-94.us-east-2.compute.internal>

commit d1b0a09669d1fa17b12a9acee887672d1e621523
Author: Yiyan66 <57363390+Yiyan66@users.noreply.github.com>
Date:   Fri Jul 3 15:10:55 2020 +0800

    [numpy] FFI flip, rollaxis, stack (#18614)

    * flip

    * rollaxis

    * stack

    * fixed

    * retrigger ci

    Co-authored-by: Ubuntu <ubuntu@ip-172-31-18-97.us-east-2.compute.internal>

commit c519e0e2db54fb8ad74e0e44d586720bf4023490
Author: Leonard Lausen <lausen@amazon.com>
Date:   Thu Jul 2 18:21:08 2020 -0700

    Mark test_get_symbol as garbage_expected (#18595)

commit d1b2cd9d8ada39ab4f16caff4ac43337476f2efc
Author: Leonard Lausen <lausen@amazon.com>
Date:   Thu Jul 2 18:20:48 2020 -0700

    build.py --no-pull (#18589)

    Add --no-pull option which disables overwriting the local docker cache based on CI docker cache. It is useful when locally changing Dockerfiles.

commit 0c8b6b2405e8313db3cf1a6f374a945d3c871b26
Author: Yang Shi <yangshia@amazon.com>
Date:   Thu Jul 2 13:15:54 2020 -0700

    Clipboard refactor (#18605)

    * refactor clipboard

    * make lang getter more extensible

    * trigger ci

commit a8c8dea67593df7f1d2061893dddfdeee4750d9f
Author: Tao Lv <tao.a.lv@intel.com>
Date:   Wed Jul 1 22:53:54 2020 +0800

    update to onednn v1.4 (#18273)

commit 9a122cac5e1317ccca2dea6884253ce32ac3671a
Author: bgawrych <bartlomiej.gawrych@intel.com>
Date:   Wed Jul 1 16:43:06 2020 +0200

    Fix softmax, logsoftmax failed on empty ndarray (#18602)

    * Fix failing empty array (log_)softmax

    * Modify test for npx (log_)softmax

commit 37bed6e3af794624d651e888101eceb30c27c001
Author: Andrzej Kotłowski <Andrzej.Kotlowski@intel.com>
Date:   Wed Jul 1 16:39:22 2020 +0200

    Fix BatchNorm backward synchronization (#18644)

    * Add test for BatchNorm running variables synchronization

    * Fix BatchNorm backward synchronization

    It fixes issue #18610

commit 21581060d2f967cc2faeb5a76979cdffbf578657
Author: XIAO-XIA <47599701+XIAO-XIA@users.noreply.github.com>
Date:   Tue Jun 30 14:16:20 2020 +0800

    [Numpy] FFI: tril_indices (#18546)

    * add numpy tril_indices ffi

    * Update src/api/operator/numpy/np_matrix_op.cc

    Co-authored-by: Haozheng Fan <hzfan9@outlook.com>

    Co-authored-by: Haozheng Fan <hzfan9@outlook.com>

commit 638622f37dcc4ef4b36dcabfd3d7a695fdb7d4c9
Author: Rohit Kumar Srivastava <srivastava.141@osu.edu>
Date:   Mon Jun 29 14:36:42 2020 -0700

    Improve performance of broadcast_axis on CPU (#17882)

    * adding comments explaining code optimizations

    * fixing broadcast_axis kernel to int32

    * fixing slice_axis kernel to int32

    * combining CPU and GPU implementation method signatures and cleaned up
    code

    * adding new broadcast_axis to np_matmul

    Co-authored-by: Rohit Kumar Srivastava <srivastava.141@buckeyemail.osu.edu>

commit becb9ca694f51fdc0583d58429ccc943e6462810
Author: Sheng Zha <szha@users.noreply.github.com>
Date:   Mon Jun 29 12:16:16 2020 -0700

    Remove mention of nightly in pypi (#18635)

commit b12abbfb356be93f8c24d427c72448f91d1980ec
Author: ciyong <ciyong.chen@intel.com>
Date:   Mon Jun 29 11:14:34 2020 +0800

    Enhance license checker to cover multiple license header and md files (#18633)

commit d6c35785a870ac6e0b42903d7e27de2c9a6efdbe
Author: Shuai Zheng <szhengac@users.noreply.github.com>
Date:   Sat Jun 27 13:25:03 2020 -0700

    Add LANS optimizer (#18620)

    * add lans optimizer

    * fix

    * fix

    Co-authored-by: Zheng <shzheng@a483e789dd93.ant.amazon.com>

commit 8ee460077b8e8f2d7a1dd96efca1751fc337cb63
Author: Yang Shi <yangshia@amazon.com>
Date:   Fri Jun 26 11:22:15 2020 -0700

    fix contrib interleaved_matmul_selfatt_valatt not render correctly (#18621)

commit ecbda07c7bf8ce671744f0e9d361a1e8b5b744da
Author: Yang Shi <yangshia@amazon.com>
Date:   Thu Jun 25 11:11:00 2020 -0700

    fix julia api redirect (#18613)

commit c9dcdd11853e8600879615c8d8be0aa5cdf851cf
Author: Yang Shi <yangshia@amazon.com>
Date:   Thu Jun 25 11:02:09 2020 -0700

    add version check on installation guide (#18587)

commit e4c93e3e3a68559cb38e4ff92c9e0bf9c9cdd0bf
Author: Shuai Zheng <szhengac@users.noreply.github.com>
Date:   Wed Jun 24 22:03:39 2020 -0700

    add epsilon to adamax (#18532)

    Co-authored-by: Ubuntu <ubuntu@ip-172-31-92-136.ec2.internal>

commit 3f555f850f4eef897bbafcb61df726491954ffbb
Author: Leonard Lausen <lausen@amazon.com>
Date:   Wed Jun 24 19:41:34 2020 -0700

    Update disclaimer wording (#18616)

commit 1fcc7ea8b8f5dfebd3f5440ffe9e0c7d4b13b90f
Author: RuRo <andrey.stotskiy@tevian.ru>
Date:   Wed Jun 24 12:03:20 2020 +0300

    use new mxnet.gluon.block APIs (#18601)

commit acf2d27efe583ceb0f6b5253f0ac78ad6bf00e8e
Author: acphile <phile_999@126.com>
Date:   Wed Jun 24 10:25:44 2020 +0800

    Update tutorials (#18609)

    Update docs according to new Block APIs (#18413)

commit 4b86c32832a994e76b97dfc58c8a672db87e721d
Author: mk-61 <56651474+mk-61@users.noreply.github.com>
Date:   Tue Jun 23 13:49:06 2020 -0700

    Allow input reordering duing Gluon / CachedOp graph transformations (#17949)

    * Initial commit of input reordering in Gluon

    * Add test for Gluon input reorder

    * Fix backward in CachedOp for input reordering

    * Fix test_input_reorder for backward pass

    * Fix merge error in NaiveCachedOp

    * Include correct header for std::iota

    Co-authored-by: Vladimir Cherepanov <vcherepanov@nvidia.com>

commit 74fcb9938a14ec80f0c690b5a58a700537a621c5
Author: Yang Shi <yangshia@amazon.com>
Date:   Mon Jun 22 18:54:05 2020 -0700

    redirect api reference on v-master to v1.6 (#18607)

    * redirect api reference on v-master to v1.6

    * update R docs

commit 56cfd9c272e81988682db6fde1b9205becc6a235
Author: Ram Rachum <ram@rachum.com>
Date:   Mon Jun 22 21:23:04 2020 +0300

    Use chain.from_iterable in artifact_repository.py (#18578)

commit 2fbec60e0da8832d71f7e3f93d4407dbca745e51
Author: Haibin Lin <linhaibin.eric@gmail.com>
Date:   Sun Jun 21 23:02:13 2020 -0700

    graph executor c api removal  (#18598)

    * add default ctx to cachedop fwd

    * add test

    * perl fix

    * initial commit

    * update sparse tests

    * add aux_states

    * fix aux-state type

    * fix some tests

    * fix check symbolic forwrad/backward

    * fix symbolic grad check

    * arg_dict fixes

    * support init ops

    * support forward only graph

    * fix check symbolic backward stype

    * add missing file

    * replace extension test bind

    * replace bind with _bind

    * simplify backward_mul implementation

    * small fix

    * drop contrib.sparseembedding

    * remove simple_bind in test sparse ops

    * use simple_bind

    * replave simple bind in quantization

    * fix aux index

    * update amp simple_bind calls

    * drop ifft

    * fix a bug found in subgraph op

    * add aux_array method

    * replace symbols

    * minor fix

    * fix executor default context

    * fix import

    * bug fix for nd.where

    * add subgraph test

    * fix forward grad req

    * fix batch dot dtype

    * remove unused code

    * fix slice dtype

    * fix attach grad

    * remove tests for non-existing sparse ops

    * MXCachedOpGetOptimizedSymbol

    * fix foreach test

    * enhance err msg

    * skip failed test

    * add docs

    * add docs

    * fix lint

    * fix lint, remove quantization

    * fix lint

    * fix lint

    * fix lint

    * fix build and import

    * fix import

    * remove scala, R, julia, perl bindings

    * remove cpp, matlab bindings

    * fix perl call

    * fix test

    * remove perl binding

    * remove reshape test

    * fix profiler, trt

    * remove tensorrt test

    * remove quantization tests

    * fix import

    * fix conflcit

    * fix lint

    * skip buggy test

    * remove clojure

    * remove executor c api

    * remove amalgamation

    * fix build

    * move executor folder

    * fix import

    * fix lint

    * fix cpp pcakge

    * fix predict cpp

    * fix cpp make

    * remove jnilint

    * remove cpp package tset

    * remove julia test pipeline

    * disable numpy tests

    * disable compat test for delete

    Co-authored-by: EC2 Default User <ec2-user@ip-172-31-81-80.ec2.internal>
    Co-authored-by: Lin <haibilin@a483e7be4c92.ant.amazon.com>

commit c1098aa33d6795f84a19601d0319d5bb8e19f317
Author: Haibin Lin <linhaibin.eric@gmail.com>
Date:   Sat Jun 20 14:49:58 2020 -0700

    Switch to cached op in the testing suite (#18579)

    * add default ctx to cachedop fwd

    * add test

    * perl fix

    * initial commit

    * update sparse tests

    * add aux_states

    * fix aux-state type

    * fix some tests

    * fix check symbolic forwrad/backward

    * fix symbolic grad check

    * arg_dict fixes

    * support init ops

    * support forward only graph

    * fix check symbolic backward stype

    * add missing file

    * replace extension test bind

    * replace bind with _bind

    * simplify backward_mul implementation

    * small fix

    * drop contrib.sparseembedding

    * remove simple_bind in test sparse ops

    * use simple_bind

    * replave simple bind in quantization

    * fix aux index

    * update amp simple_bind calls

    * drop ifft

    * fix a bug found in subgraph op

    * add aux_array method

    * replace symbols

    * minor fix

    * fix executor default context

    * fix import

    * bug fix for nd.where

    * add subgraph test

    * fix forward grad req

    * fix batch dot dtype

    * remove unused code

    * fix slice dtype

    * fix attach grad

    * remove tests for non-existing sparse ops

    * MXCachedOpGetOptimizedSymbol

    * fix foreach test

    * enhance err msg

    * skip failed test

    * add docs

    * add docs

    * fix lint

    * fix lint, remove quantization

    * fix lint

    * fix lint

    * fix lint

    * fix build and import

    * fix import

    * fix perl call

    * fix test

    * remove perl binding

    * remove reshape test

    * fix profiler, trt

    * remove tensorrt test

    * remove quantization tests

    * fix import

    * fix conflcit

    * fix lint

    * skip buggy test

    Co-authored-by: EC2 Default User <ec2-user@ip-172-31-81-80.ec2.internal>
    Co-authored-by: Lin <haibilin@a483e7be4c92.ant.amazon.com>

commit c1b96f562f55dfa024ac941d7b104f00e239ee0f
Author: Leonard Lausen <lausen@amazon.com>
Date:   Fri Jun 19 14:46:27 2020 -0700

    cmake: x86 options only on x86 and remove manual specification on CI (#18588)

    Use CMAKE_SYSTEM_PROCESSOR to detect target architecture and make x86 related
    options available only when compiling for x86. Remove the code turning these
    options manually off on CI.

    Remove ANDROID cmake option which was used to decide if -lpthread needs to be
    specified explicitly (on most Linux systems) or not (on Android). Instead
    auto-detect the behavior.

commit 041bd3016375c6bdadddc9e9f43655923ee739bf
Author: RuRo <andrey.stotskiy@tevian.ru>
Date:   Fri Jun 19 21:56:05 2020 +0300

    [MXNET-889] Implement ONNX export for gluon LSTM. (#17734)

    * implement onnx translations for _full type nodes

    * implement onnx translations for _rnn_param_concat

    * implement onnx translations for RNN (LSTM mode)

    * implement node export unittest for gluon.LSTM

commit bf0753702b37cc932baf417be2af2e7abe034bab
Author: Manu Seth <22492939+mseth10@users.noreply.github.com>
Date:   Fri Jun 19 10:20:55 2020 -0700

    Link GluonCV object detection tutorial for Jetson (#18530)

    * add object detection tutorial for Jetson

    * adding GluonCV in title

    * cross reference gluoncv turorial

commit cb54a4a99463b23b8abaa2629661954c4ba3c60b
Author: acphile <phile_999@126.com>
Date:   Fri Jun 19 14:31:08 2020 +0800

    Simplify mxnet.gluon Block APIs (#18413)

    ## Motivations
    Currently the implementation of mxnet.gluon.block is not so pythonic and there are many redundancies

    ### 1. overlaps between Block._params and Block._reg_params
    when we want to self-define a model, we currently need to use the code as follows:
    ```
    class Net(nn.HybridBlock):
        def __init__(self, **kwargs):
            super(HybridNet, self).__init__(**kwargs)
            with self.name_scope():
                self.hidden1 = nn.Dense(256, activation='relu')
                self.a=self.params.get('a', shape=(1, ))
    ```
    There are several shortcomings when using this form of registration:
    a. adding parameter ‘a’ will lead to double recordings in both self._params and self._reg_params, which is a redundancy. And there is also a discrepancy in Block:
    &nbsp;&nbsp; &nbsp;&nbsp; i. In the method “collect_params”, we use “_params” to get all parameters
    &nbsp;&nbsp;&nbsp;&nbsp; ii. while in the method “_collect_params_with_prefix” (and methods “load_parameters” accordingly), we use “_reg_params” to get all parameters.
    b. Currently if we do not use “with self.name_scope():” for children blocks, it will lead to wrong name scopes. For the following example, we actually can not get the parameters of self.hidden1 from the result of collect_params
    ```
    class HybridNet(nn.HybridBlock):
        def __init__(self, **kwargs):
            super(HybridNet, self).__init__(**kwargs)
            self.hidden1 = nn.Dense(256, activation='relu')
            with self.name_scope():
                self.hidden2 = nn.Dense(10, activation='relu')

        def hybrid_forward(self, F, x):
            x = self.hidden2(self.hidden1(x))
            return x

    >>> net = HybridNet()
    >>> net.initialize()
    >>> print(net.collect_params())
    hybridnet0_ (
      Parameter dense0_weight (shape=(256, -1), dtype=float32)
      Parameter dense0_bias (shape=(256,), dtype=float32)
      Parameter hybridnet0_dense0_weight (shape=(10, -1), dtype=float32)
      Parameter hybridnet0_dense0_bias (shape=(10,), dtype=float32)
    )
    ```
    From the above example we can also find that the parameter names are not related to the attributes’ names, which is not straightforward.

    In all, we find that using name_scope and ParameterDict is not user-friendly. Thus we plan to remove such redundancies and simplify the definitions of children blocks and parameters, like:
    ```
    class Net(nn.HybridBlock):
        def __init__(self, **kwargs):
            super(HybridNet, self).__init__(**kwargs)
            self.hidden1 = nn.Dense(256, activation='relu')
            self.a=gluon.parameter.Parameter(name="a", shape=(1, ))
    ```

    ### 2. parameter sharing
    Currently, we use parameter “params” in the definition of Block for parameter sharing. It means before the __init__ of Block, shared parameters already recorded in self._params.shared. And currently Block forbids overriding parameters.
    We think that this is not convenient. A most common way to share parameter is like what Pytorch does, like
    ```
    self.hidden1.weight=self.hidden2.weight
    ```
    But note that in the case where we have a HybridBlock and the block has been hybridized, then we shouldn't allow overriding the parameter but ask the user to unhybridize the Block first.
    To further allow sharing parameters recursively, we plan to add an API:
    ```
        def share_parameters(self, params : Dict):
    ```
    We plan to use the structured based form (like what is used in “_collect_params_with_prefix()”) to represent each parameter recursively. For example, we denote “self.hidden1.weight” as “hidden_weight”

    In all, we plan to make the following improvements:

    1. remove parameters “prefix” and “params” in the “\_\_init\_\_" function.
    2. remove the use of self._params(ParameterDict) in Block
    3. allow parameter attribute overriding in non-hydridization case.
    4. add the method “share_parameters" to recursively share parameters in children blocks.

    ## Parameter naming
    Once a parameter is created, `param.name` would not be changed in the following operations. It is in the form of `param_{uuid4}_{name}`, where `name` is from `__init __` parameter. Here `name` is optional, default `weight`. It is mainly used to denote which default initialization should be used.
    We use `param.name` as the name of a parameter's symbol representation.
    ## collect_params()
    It returns a `dict`, where the keys are structural names of parameters, like
    `{'hidden1.weight': Parameter (shape=(3, -1), dtype=float32), 'hidden1.bias': Parameter (shape=(3,), dtype=float32)}`
    Note that we use `.` as the linking character again because the structured based naming scheme is no longer used in the symbol representation.

    ## Save and Load
    For `HybridBlock`, there are two ways to save and load parameters:
    ### save_parameters() and load_parameters()
    In `save_parameters()`, we use `structural name` to save parameters, and they should be loaded by `load_parameters()`, which loads parameters based on a model's structure.
    ### HybridBlock.export and SymbolBlock.imports
    In `export`, we only save parameters using `param.name` without `structural name`. The param file should be loaded in SymbolBlock.imports.
    ## SymbolBlock
    When using `SymbolBlock.imports`, keys in `self.param` would be the loaded parameters' names `param.name`.
    While in `SymbolBlock(outputs, inputs, params=None)`, if you provide like `params=net.collect_params()`,  keys in `self.param` would be structural names of `net`'s parameters (keys in net.collect_params() ). It is often used in this situation that a `SymbolBlock` is a children block of another `HybridBlock`. Otherwise, keys in `self.param` would be the loaded parameters' names `param.name`.

commit 55856066b4b6242f233cc31da8970c91f06d4bc0
Author: ciyong <ciyong.chen@intel.com>
Date:   Fri Jun 19 06:23:07 2020 +0800

    Add KEY for Ciyong Chen (#18577)

commit e96fbeb3adb78d4300f5f10cc22531583914e590
Author: Leonard Lausen <lausen@amazon.com>
Date:   Thu Jun 18 15:20:14 2020 -0700

    Update cmake/upstream/FindCUDAToolkit.cmake (#18528)

    Previously MXNet includes a hotfix for a cross-compiling bug in upstream FindCUDAToolkit.cmake. Upstream has fixed the bug now in their master branch. Replace MXNet's fix by the upstream fix to avoid diverging from upstream.

    See https://gitlab.kitware.com/cmake/cmake/-/issues/20572

commit 14aeb384a51c9e420c349f42cea001f0a5ef5dfe
Author: RuRo <andrey.stotskiy@tevian.ru>
Date:   Fri Jun 19 01:16:12 2020 +0300

    Add parameter name to AssertionError for deferred shape inference (#18537)

commit 9591436967347cc8e34a01e126b696b3447f8081
Author: Johannes Czech <QueensGambit@users.noreply.github.com>
Date:   Thu Jun 18 07:33:08 2020 +0200

    [Numpy] Bugfix of slice operator export (MXNet to ONNX) v2 (#18535)

    * fixed get_inputs() for onnx slice operator export

    * added unit test for onnx slice operator export

    * implement get_inputs with_shapes helper

    * update slice ops to use with_shapes

    * added verbose parameter for get_outputs()

    Co-authored-by: Andrey Stotskiy <andrey.stotskiy@tevian.ru>

commit 92971b822dd0151aadba965c0c6b8b22cb82bf76
Author: Neutron3529 <qweytr_1@163.com>
Date:   Thu Jun 18 13:30:10 2020 +0800

    fix misbehave of KLDivLoss (#18423)

    * fix misbehave of KLDivLoss

    In the current version of KLDivLoss, the return value is not the same value calculated by SoftmaxCrossEntropyLoss, which is not documented. It may due to the incorrect settings which using mean rather than sum dealing with the return value.
    I provide a fix of this setting, which will keep the return value of `KLDivLoss` and SoftmaxCrossEntropyLoss` almost the same when `from_logits=False` and `sparse_label=False` are set to these functions seperately.
    Now, the behave of KLDivLoss is exactly the same to what the document say.
    ```
    import mxnet as mx
    a=mx.nd.array([[-1,1],[1,-1]])
    b=mx.nd.array([1,0]).one_hot(2)
    TrueLoss=mx.gluon.loss.SoftmaxCrossEntropyLoss(sparse_label=False)
    FalseLoss=mx.gluon.loss.KLDivLoss(from_logits=False)
    c=TrueLoss(a,b)
    d=FalseLoss(a,b)*a.shape[-1]
    assert((c-d).abs().sum()==0 and a.shape[-1]==2)
    ```

    * update sdml loss

    the current version of SDMLLoss told us to `multiply for the number of labels` but actually it `multiply batch_size`. After this PR, it is no need to `multiply batch_size` or `multiply the number of labels` any more.

    * remove outdated comment

commit b9118d9bfa0b34307c53456ea6af3927e57b8635
Author: Yang Shi <ys2843@nyu.edu>
Date:   Wed Jun 17 13:00:04 2020 -0700

    fix contribute page anchor position shifted (#18571)

    Co-authored-by: Yang Shi <yangshia@amazon.com>

commit eddd27d375ee403a026e3262264485c83161787f
Author: Yang Shi <ys2843@nyu.edu>
Date:   Wed Jun 17 11:59:41 2020 -0700

    add FAQ redirect rules (#18552)

    Co-authored-by: Yang Shi <yangshia@amazon.com>

commit 103d839aa8477419ddc82f09e2ddb246e24a8d3d
Author: Manu Seth <22492939+mseth10@users.noreply.github.com>
Date:   Tue Jun 16 16:52:46 2020 -0700

    Test CD mxnet_lib/static and python/pypi stages on CI (#18559)

    * add cd mxnet_lib/static stages to ci

    * add cd pypi packaging stage to ci

    * removing existing cmake static compile stages in favor of other added stages

    * pass mxnet_variant correctly

commit 8039377e6630bcb00c5a95abdaf0851803686bc6
Author: JiangZhaoh <54654391+JiangZhaoh@users.noreply.github.com>
Date:   Wed Jun 17 01:45:30 2020 +0800

    add op npx.index_update (#18545)

    * add op npx.index_update

    * remove debug comment

    * change eps

    * fix stupid error

    * add blank line in docs

    * gpu temporary space request alignment

    * fix test error

    Co-authored-by: Ubuntu <ubuntu@ip-172-31-54-85.us-west-2.compute.internal>

commit 72a54e7a5f427dc73fbd1cb826ff944d9aa82573
Author: andevellicus <762254+andevellicus@users.noreply.github.com>
Date:   Mon Jun 15 22:13:13 2020 -0400

    Julia: fix deprecation in visualize.jl (#18515)

    * Update visualize.jl

    matchall has been deprecated as of Julia 1.3. Changes made to fix.

    * Cleaned

    * Update julia/src/visualize.jl

    * Update julia/src/visualize.jl

    Co-authored-by: Iblis Lin <iblis@hs.ntnu.edu.tw>

commit e8fce62b369dac627dec23d730661624ec79b957
Author: Manu Seth <22492939+mseth10@users.noreply.github.com>
Date:   Mon Jun 15 18:42:51 2020 -0700

    Skip flaky test_gpu_memory_profiler_gluon on cd pipeline (#18565)

commit 1b02225fefd8ccc93bc73223f0d3cde103fad661
Author: Chaitanya Prakash Bapat <chai.bapat@gmail.com>
Date:   Mon Jun 15 11:45:03 2020 -0700

    Add comments to init.py (#18327)

commit cc6c64909afd78c6b5b63ee1215922e8da589c20
Author: Chaitanya Prakash Bapat <chai.bapat@gmail.com>
Date:   Mon Jun 15 08:55:14 2020 -0700

    [OpPerf] Add example of using opperf with internal op locally (#18324)

    * add example of using opperf with internal op locally

    * split diff to old and new code for readability

    * mx.nd.copyto doesnt exist & website title shows ndarray instead of symbol

    * Revert "mx.nd.copyto doesnt exist & website title shows ndarray instead of symbol"

    This reverts commit 118b0900a58586aca84ec5c853d00cf687615853.

commit af1b45ba3590b21014c55c58838c3e04b3f2cea3
Author: Chaitanya Prakash Bapat <chai.bapat@gmail.com>
Date:   Sun Jun 14 22:45:57 2020 -0700

    Create config.yml (#18553)

    Add options for stackoverflow and discuss to issue_template & disable blank issue

commit da252734c70164a0983404de076464ba7a526a60
Author: Manu Seth <22492939+mseth10@users.noreply.github.com>
Date:   Sat Jun 13 18:30:29 2020 -0700

    remove dependency on train_mnist.py script (#18550)

    * remove dependency on train_mnist.py script

    * remove image classification tests from nightly

commit 09cf48a24682e308b552a7fa70a816c024308438
Author: Leonard Lausen <lausen@amazon.com>
Date:   Sat Jun 13 16:31:59 2020 -0700

    Use correct array type for outputs in HybridBlock.forward (#18554)

commit f1f3f44166e2e47afad6c65025fb48dd47efeb65
Author: Haibin Lin <linhaibin.eric@gmail.com>
Date:   Sat Jun 13 10:10:25 2020 -0700

    Remove the deprecated BatchNorm_v1 op (#18538)

    * remove batchnorm_v1

    * fix gpu build

    Co-authored-by: EC2 Default User <ec2-user@ip-172-31-81-80.ec2.internal>
    Co-authored-by: Lin <haibilin@a483e7be4c92.ant.amazon.com>

commit 97d4ba5a133f93ff6075dcde3ef842b23d498a12
Author: Haibin Lin <linhaibin.eric@gmail.com>
Date:   Fri Jun 12 16:52:47 2020 -0700

    Remove XXOutput loss operators  (#18531)

    * remove xxOutput operators used in Module

    * remove SVMOutput

    * remove RegressionOutput in language binding

    * remove more examples

    * fix scala, perl

    * remove spark examples

    * remove softmaxoutput op

    * remove more tests

    * remove more SoftmaxOutput related code

    * remove MAERegression

    * remove symbol.Softmax

    * fix perl test count

    * fix failing tests

    * remove mlp cpu test

    * fix scala test

    * remove tests/examples relying on imagenet-1k pretrained symbolic models

    * fix scala build

    * remove MultiTaskSuite for scala

    * fix cpp build

    * fix scale, clojure test

    * fix scala and python test

    * fix scala and clojure test

    * remove clojure test

    * remove clojure test

    * remove test_forward for python

    * remove clj viz test

    * remove viz tests

    * remove clj tutorail test

    * remove bert test

    * remove clj tests

    * remove clj multi-label test

    * remove module mlp test for clh

    * remove module test for clj

    * rm ./contrib/clojure-package/test/org/apache/clojure_mxnet/ndarray_api_test.clj

    * remove clj tests

    * rm test_mkldnn_model

    Co-authored-by: EC2 Default User <ec2-user@ip-172-31-81-80.ec2.internal>
    Co-authored-by: Lin <haibilin@a483e7be4c92.ant.amazon.com>

commit 1bf881f381f91b157a26d9beddcaa8f4960cc038
Author: Yang Shi <ys2843@nyu.edu>
Date:   Thu Jun 11 14:01:17 2020 -0700

    Fix Slow Site Loading Speed part2 (#18512)

    * host JQuery locally

    * defer time consuming scripts

    * defer more render-blocking script

    * move general version dropdown css from head to scss

    * update quotation mark

    * add cache control

    * add licenses info to jquery

    * remove jquery from github

    # Conflicts:
    #	docs/static_site/src/assets/js/jquery-3.3.1.min.js

    * load jquery based on env

    * update wget jquery command

    Co-authored-by: Yang Shi <yangshia@amazon.com>

commit a361f33497c8e87a4eab48a666fcb4a586a607b1
Author: Manu Seth <22492939+mseth10@users.noreply.github.com>
Date:   Thu Jun 11 09:17:44 2020 -0700

    revert changes causing cd failures (#18533)

    Reverting the following changes to cd_unittest_ubuntu causing CD pipeline failures:

        The first change was using Naive Engine for operator tests, which causes timeout failures in CD
        Added here: 10b6b48

        Second change was running integrationtest_ubuntu_gpu_byteps as part of cu* CD tests, added here: e28e9fe

commit 743bbcbc7c8c85661a146d94ebd3196306650677
Author: Yijun Chen <chenyijun0902@gmail.com>
Date:   Thu Jun 11 23:22:56 2020 +0800

    unify impl (#18523)

commit fb73de7582de4e622299a4ad045e25f771568193
Author: Haibin Lin <linhaibin.eric@gmail.com>
Date:   Wed Jun 10 19:54:25 2020 -0700

    remove mx.module.* APIs for MXNet 2.0 (#18525)

    * remove Module tests

    * remove APIs relying on module

    * remove docs and tools using mx.module

    * remove executor manager

    * remove ssd and ncf examples

    * add back grad compression api doc

    * fix lint

    * add back cpredict exmaple

    * fix resnet memory test

    * remove tests

    * remove tests/python/tensorrt/test_tensorrt_lenet5.py since it depends on a model traiend by mx.Module

    * skip flaky test

    * fix quantization test

    * remove subgraph tests

    Co-authored-by: EC2 Default User <ec2-user@ip-172-31-81-80.ec2.internal>
    Co-authored-by: Lin <haibilin@a483e7be4c92.ant.amazon.com>

commit 26f44b71d8de84bbc88af496ae0aeb7ce535312d
Author: Serge Panev <spanev@nvidia.com>
Date:   Wed Jun 10 10:41:50 2020 -0700

    Add backward Type inference to main NN operators (#18378)

    * Add backward Type inference to main DNN operators

    Signed-off-by: Serge Panev <spanev@nvidia.com>

    * Add comments

    Signed-off-by: Serge Panev <spanev@nvidia.com>

commit b6b40878f0aba2ba5509f3f3a4cd517a654847ce
Author: Leonard Lausen <lausen@amazon.com>
Date:   Tue Jun 9 22:05:16 2020 -0700

    Consolidate installation instructions on website and add disclaimer for non-ASF ressources (#18487)

    * Update website with disclaimer for non-ASF ressources

    * Integrate Windows instructions to build_from_source.md

    * Remove master version from selector

    * Update Download links

    * Update get_started/download.md per Release Download Page policy

commit cf3984bf5c67cb7d1feeb5b3cb55a41ca995e5c8
Author: Yiyan66 <57363390+Yiyan66@users.noreply.github.com>
Date:   Wed Jun 10 05:56:13 2020 +0800

    [numpy] fix op repeat with list input (#18371)

    * except .h

    * except storage

    * repeat

    * change fwd

    * delete

    * codecov

    Co-authored-by: Ubuntu <ubuntu@ip-172-31-18-97.us-east-2.compute.internal>

commit 028d01d5fb4867988a5ca50634562c1f4e75ca6f
Author: Sam Skalicky <samskalicky@gmail.com>
Date:   Mon Jun 8 10:42:09 2020 -0700

    Drop list support in optimize_for (#18483)

    * initial commit

    * fixed typos

    * changed warning to exception

    * updated subgraph_op unittests

commit 2d58ff5512e27e7a12ae9c9335d2554ee0b2bc1f
Author: JackieWu <wkcn@live.cn>
Date:   Tue Jun 9 01:41:35 2020 +0800

    [Bug Fixed] Fix batch norm when grad_req is `add` (#18500)

    * fix batch norm when fix_gamma is True

    * support gradient accumulation for batch norm

    * mkldnn batchnorm support grad add

    * unittest for bn

    * fix bn arg

    * fix lint

    * fix mkldnn

    * fix mkldnn bn

    * fix grad when fixing gamma

    * fix naive gpu bn

    * fix lint

    * fix cudnn bn

    * fix flag

    * fix lint

    * fix testcase

    * fix

    * use @pytest.mark.parametrize

    * combination

    * remove redundant test in batchnorm

    * npx.batch_norm test

    * try to fix test

    * reduce the number of tests for batchnorm

    * fix

commit 992ed3c1ea449fdb1f4f7010dfd05d00ae88a020
Author: Haibin Lin <linhaibin.eric@gmail.com>
Date:   Mon Jun 8 10:39:56 2020 -0700

    remove mx.rnn APIs (#18507)

    * remove mx.rnn APIs

    * fix test

    * update test

    Co-authored-by: Ubuntu <ubuntu@ip-172-31-37-108.ec2.internal>
    Co-authored-by: Lin <haibilin@a483e7be4c92.ant.amazon.com>

commit e3493e7b47ddcaa6974280ee432c82eb89d0f756
Author: Haibin Lin <linhaibin.eric@gmail.com>
Date:   Sun Jun 7 18:20:46 2020 -0700

    remove tools dependent on mx.module APIs (#18508)

    * remove tools depending on mx.module

    * remove caffe converter and coreml tools

    Co-authored-by: Lin <haibilin@a483e7be4c92.ant.amazon.com>

commit 5df002567dd2e9ebcfeb620a9ba55adbded743da
Author: Przemyslaw Tredak <ptredak@nvidia.com>
Date:   Fri Jun 5 19:55:06 2020 -0700

    Fix race condition in FusedOp (#18498)

commit a1db5b29451938e84ade0e768c3b93b8fd71ad15
Author: Leonard Lausen <lausen@amazon.com>
Date:   Fri Jun 5 16:40:22 2020 -0700

    Update .codecov.yml (#18497)

commit 644b69d01e5b037c3d7b0bd61d282f406c01b759
Author: Mosalam Ebrahimi <hesham.ebrahimi@gmail.com>
Date:   Fri Jun 5 13:52:01 2020 -0700

    Fix typo (#18496)

commit deae9b88c1724e056a4e7dc21f04b58c28304111
Author: RuRo <andrey.stotskiy@tevian.ru>
Date:   Fri Jun 5 23:18:16 2020 +0300

    Fix tests for ONNX version 1.5.0 bump (#18054)

    * implement onnx translation helpers

    * bump onnx version to 1.5

    * add export only test cases for topk and slice_axis

commit 4be095500de74ff95ed18ebdf695eae171375818
Author: ciyong <ciyong.chen@intel.com>
Date:   Sat Jun 6 03:44:04 2020 +0800

    Julia: remove downloading of the non-ASF binary build (#18489)

commit 24d88a2cdec3e0ab8f4fe0e436eb0015e9ccfd47
Author: Manu Seth <22492939+mseth10@users.noreply.github.com>
Date:   Fri Jun 5 09:45:31 2020 -0700

    Update Jetson installation guide (#18485)

    * add config Makefile for jetson

    * modify jetson install guide

commit 7054e42c0786a2b8223b5183b852f68e72822a76
Author: Manu Seth <22492939+mseth10@users.noreply.github.com>
Date:   Fri Jun 5 09:40:44 2020 -0700

    Add image classification tutorial for jetson (#18434)

    * add image classification tutorial for jetson

    * update code to use gluon model zoo; update doc

    * referencing MXNet official website for Jetson installation guide

commit a156ed8e37e17f79cf0383dd9b0e1427309ad127
Author: Yang Shi <ys2843@nyu.edu>
Date:   Fri Jun 5 09:38:02 2020 -0700

    Revert installation dropdown change (#18488)

    This broke the version selector.

    Co-authored-by: Yang Shi <yangshia@amazon.com>

commit b07152244c311b9270b448b6629f8ae470f3fab1
Author: Leonard Lausen <lausen@amazon.com>
Date:   Thu Jun 4 17:44:52 2020 -0700

    Update website instructions for compiling for / on Raspberry Pi. (#18472)

    * Update ci/README.md

    * Update raspberry pi instructions

commit e28e9fec9bba07708ed0093c882b8070a96dfdd5
Author: Haibin Lin <linhaibin.eric@gmail.com>
Date:   Thu Jun 4 14:20:52 2020 -0700

    BytePS trainer + tests (#18032)

    * [MXNET-#16795] Byteps-KVStore: Intergrate Byteps into mxnet as new type of kvstore backend (#17555)

    * Add Byteps backend for kvstore

    * Add a temp launcher for byteps backend

    * make the test fit for byteps kvstore.

    * final workable test

    * Remove trashy print and logs

    * correct comment

    * add hostfile for ci test

    * add ci test for byteps kvstore

    * add visibile devices for byteps-kvstore ci test

    * add licenses for tools/byteps_launcher.py

    * syntax error

    * pylint error (remove unused import like logging)

    * pylint error

    * pylint error

    * enable launching without hostfile (local byteps)

    * 1. rename byteps_kvstore.py to byteps.py; 2. shorten the launch option  to ; 3. add instruction for -H and -SH options for launch; 4. add documentation for byteps kvstore in kvstore/base.py: create(name='local')

    * edit documentation of KVStoreBase::is_capable(capability); reture fasle for BytePS(KVStoreBase):is_capable(any).

    * pylint error

    * remove an error of arg.byteps

    * use --env option to set workers' environment

    * error in byteps-launcher.py

    * remove the unpurposed editing mistake in runtime_functions.sh

    * disable cpu support for byteps kvstore.

    * 1. format the document to avoid julia doc build error;
    2. little change to nightly test;
    3. add byteps copy right declararation in byteps_launcher.py
    4. if args.byteps == True ===> if args.byteps

    * remove the --scheduler_ip and --scheduler_port options in launch.py

    * 1. maintain the origin value of broadcast and pushpull
    2. optimize when out = value or [out]=value
    3. add some missing documentation to avoid doc building error.

    * Add bytePS to CI

    * add dependency

    * +integrationtest_ubuntu_gpu_byteps

    * add byteps pipeline

    * disable a few tests

    * remove more tests

    * fix permission

    * remove apt-get

    * fix python path

    * improve logging

    * fix printns

    * add back CI

    Co-authored-by: Ubuntu <ubuntu@ip-172-31-39-16.ec2.internal>
    Co-authored-by: Piyush Ghai <ghai.8@osu.edu>
    Co-authored-by: eric-haibin-lin <linhaibin.eric@gmail.com>
    Co-authored-by: eric-haibin-lin <--global>
    Co-authored-by: Lin <haibilin@a483e7be4c92.ant.amazon.com>

    * fix byteps logging and declare tensor

    * check exceptions and return -1

    * print logging in CI

    * Update byteps.py

    * Update runtime_functions.sh

    * add numa dependency

    * pin dependency

    * Update runtime_functions.sh

    * Update Dockerfile.build.ubuntu

    * Update runtime_functions.sh

    * Update runtime_functions.sh

    * Update runtime_functions.sh

    * Update runtime_functions.sh

    * Update Jenkins_steps.groovy

    * remove launcher. use bpslauncher instead.

    Co-authored-by: Chaokun Chang <33217209+ChaokunChang@users.noreply.github.com>
    Co-authored-by: Ubuntu <ubuntu@ip-172-31-39-16.ec2.internal>
    Co-authored-by: Piyush Ghai <ghai.8@osu.edu>
    Co-authored-by: Lin <haibilin@a483e7be4c92.ant.amazon.com>
    Co-authored-by: Ubuntu <ubuntu@ip-172-31-37-108.ec2.internal>
    Co-authored-by: EC2 Default User <ec2-user@ip-172-31-81-80.ec2.internal>
    Co-authored-by: Ubuntu <ubuntu@ip-172-31-57-164.ec2.internal>

commit 7cc6700fdd5e9f6837389155b63c2911652d2c91
Author: Yang Shi <ys2843@nyu.edu>
Date:   Thu Jun 4 13:29:08 2020 -0700

    Add Developer Guide Docs to MXNet Website (#18474)

    * init dev guide

    * move dev guide above FAQ

    * update format and images

    * hoist git docs and fix styles

    * use relative urls

    * remove useless code block

    * use consistent url and file name

    * update heading

    * add apache license header

    * init dev guide

    * move dev guide above FAQ

    * update format and images

    * hoist git docs and fix styles

    * use relative urls

    * remove useless code block

    * use consistent url and file name

    * update heading

    * add apache license header

    * update doc - git clone recursive

    * reviewing the dev guide - proof reading and text edits

    Co-authored-by: Yang Shi <yangshia@amazon.com>
    Co-authored-by: Talia Chopra <chopt@amazon.com>
samskalicky pushed a commit that referenced this pull request Aug 10, 2020
…o v1.7.x (#18676)

* [Improvement] Invoke mkldnn and cudnn BatchNorm when axis != 1 (#18504)

* fix batch norm when fix_gamma is True

* support gradient accumulation for batch norm

* mkldnn batchnorm support grad add

* unittest for bn

* fix bn arg

* fix lint

* fix mkldnn

* fix mkldnn bn

* fix grad when fixing gamma

* fix naive gpu bn

* fix lint

* invoke mkldnn and cudnn batchnorm when axis != 1

* backport 18500

* change condition

* fix

* fix

* add mkldnn_off for bn

* remove mkldnn_off

* recover save_000800.json

* cast

* remove  and fix flaky test

Co-authored-by: JackieWu <wkcn@live.cn>
stu1130 added a commit to stu1130/incubator-mxnet that referenced this pull request Aug 10, 2020
…o v1.7.x (apache#18676)

* [Improvement] Invoke mkldnn and cudnn BatchNorm when axis != 1 (apache#18504)

* fix batch norm when fix_gamma is True

* support gradient accumulation for batch norm

* mkldnn batchnorm support grad add

* unittest for bn

* fix bn arg

* fix lint

* fix mkldnn

* fix mkldnn bn

* fix grad when fixing gamma

* fix naive gpu bn

* fix lint

* invoke mkldnn and cudnn batchnorm when axis != 1

* backport 18500

* change condition

* fix

* fix

* add mkldnn_off for bn

* remove mkldnn_off

* recover save_000800.json

* cast

* remove  and fix flaky test

Co-authored-by: JackieWu <wkcn@live.cn>
stu1130 added a commit to stu1130/incubator-mxnet that referenced this pull request Aug 12, 2020
…o v1.7.x (apache#18676)

* [Improvement] Invoke mkldnn and cudnn BatchNorm when axis != 1 (apache#18504)

* fix batch norm when fix_gamma is True

* support gradient accumulation for batch norm

* mkldnn batchnorm support grad add

* unittest for bn

* fix bn arg

* fix lint

* fix mkldnn

* fix mkldnn bn

* fix grad when fixing gamma

* fix naive gpu bn

* fix lint

* invoke mkldnn and cudnn batchnorm when axis != 1

* backport 18500

* change condition

* fix

* fix

* add mkldnn_off for bn

* remove mkldnn_off

* recover save_000800.json

* cast

* remove  and fix flaky test

Co-authored-by: JackieWu <wkcn@live.cn>
stu1130 added a commit to stu1130/incubator-mxnet that referenced this pull request Aug 12, 2020
…o v1.7.x (apache#18676)

* [Improvement] Invoke mkldnn and cudnn BatchNorm when axis != 1 (apache#18504)

* fix batch norm when fix_gamma is True

* support gradient accumulation for batch norm

* mkldnn batchnorm support grad add

* unittest for bn

* fix bn arg

* fix lint

* fix mkldnn

* fix mkldnn bn

* fix grad when fixing gamma

* fix naive gpu bn

* fix lint

* invoke mkldnn and cudnn batchnorm when axis != 1

* backport 18500

* change condition

* fix

* fix

* add mkldnn_off for bn

* remove mkldnn_off

* recover save_000800.json

* cast

* remove  and fix flaky test

Co-authored-by: JackieWu <wkcn@live.cn>
szha pushed a commit that referenced this pull request Aug 14, 2020
…o v1.7.x (#18676) (#18890)

* [Improvement] Invoke mkldnn and cudnn BatchNorm when axis != 1 (#18504)

* fix batch norm when fix_gamma is True

* support gradient accumulation for batch norm

* mkldnn batchnorm support grad add

* unittest for bn

* fix bn arg

* fix lint

* fix mkldnn

* fix mkldnn bn

* fix grad when fixing gamma

* fix naive gpu bn

* fix lint

* invoke mkldnn and cudnn batchnorm when axis != 1

* backport 18500

* change condition

* fix

* fix

* add mkldnn_off for bn

* remove mkldnn_off

* recover save_000800.json

* cast

* remove  and fix flaky test

Co-authored-by: JackieWu <wkcn@live.cn>

Co-authored-by: JackieWu <wkcn@live.cn>
chinakook pushed a commit to chinakook/mxnet that referenced this pull request Nov 17, 2020
…e#18504)

* fix batch norm when fix_gamma is True

* support gradient accumulation for batch norm

* mkldnn batchnorm support grad add

* unittest for bn

* fix bn arg

* fix lint

* fix mkldnn

* fix mkldnn bn

* fix grad when fixing gamma

* fix naive gpu bn

* fix lint

* invoke mkldnn and cudnn batchnorm when axis != 1

* backport 18500

* change condition

* fix

* fix

* add mkldnn_off for bn

* remove mkldnn_off

* recover save_000800.json

* cast
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants