Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

flaky test: test_random.test_randint_generator #13446

Closed
zachgk opened this issue Nov 28, 2018 · 3 comments · Fixed by #13498
Closed

flaky test: test_random.test_randint_generator #13446

zachgk opened this issue Nov 28, 2018 · 3 comments · Fixed by #13498

Comments

@zachgk
Copy link
Contributor

zachgk commented Nov 28, 2018

test_random.test_randint_generator in "Python 3: GPU Win" failed on the CI at http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-13364/4/pipeline/1096/ for an unrelated PR #13364. Error log shown below:

======================================================================

FAIL: test_random.test_randint_generator

----------------------------------------------------------------------

Traceback (most recent call last):

  File "C:\Anaconda3\envs\py3\lib\site-packages\nose\case.py", line 197, in runTest

    self.test(*self.arg)

  File "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\common.py", line 173, in test_new

    orig_test(*args, **kwargs)

  File "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\test_random.py", line 878, in test_randint_generator

    verify_generator(generator=generator_mx_same_seed, buckets=buckets, probs=probs)

  File "C:\jenkins_slave\workspace\ut-python-gpu\windows_package\python\mxnet\test_utils.py", line 1966, in verify_generator

    str(buckets), str(probs)))

AssertionError: Generator test fails, Chi-square p=[0.0039778006662674397, 0.0038008926958397801, 0.026007793216137099, 0.00574968961725508, 5.5082198954479213e-05], obs_freq=[array([201090, 

200294, 199907, 200047, 198662]), array([199510, 199653, 200738, 201095, 199004]), array([200553, 199797, 200146, 200679, 198825]), array([201138, 199506, 200190, 200285, 198881]), 

array([200288, 200437, 199925, 201170, 198180])], expected_freq=[array([200000, 200000, 200000, 200000, 200000]), array([200000, 200000, 200000, 200000, 200000]), array([200000, 200000, 

200000, 200000, 200000]), array([200000, 200000, 200000, 200000, 200000]), array([200000, 200000, 200000, 200000, 200000])].

buckets=[[-50000000, -40001980], [-40001980, -30003960], [-30003960, -20005940], [-20005940, -10007920], [-10007920, -9900]], probs=[0.2, 0.2, 0.2, 0.2, 0.2]

-------------------- >> begin captured logging << --------------------

common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=1389836864 to reproduce.

--------------------- >> end captured logging << ---------------------
@zachgk
Copy link
Contributor Author

zachgk commented Nov 28, 2018

@mxnet-label-bot add [Test, Flaky, Python]

@zachgk
Copy link
Contributor Author

zachgk commented Nov 29, 2018

@zachgk
Copy link
Contributor Author

zachgk commented Nov 29, 2018

ChaiBapchya added a commit to ChaiBapchya/mxnet that referenced this issue Nov 30, 2018
anirudh2290 pushed a commit that referenced this issue Nov 30, 2018
ChaiBapchya added a commit to ChaiBapchya/mxnet that referenced this issue Nov 30, 2018
sergeykolychev pushed a commit that referenced this issue Dec 5, 2018
…ile (#13478)

* updated to v1.5.0

* Bumped minor version from 1.4.0 to 1.5.0 on master

* added Anirudh as maintainer for R package

... adding something useful and re-trigger PR check

* Updated license file for clojure, onnx-tensorrt, gtest, R-package

* Get the correct include path in pip package (#13452)

* add find_include_path API

* address reviewer comment

* change return type from list to string

* add unit test

* address reviewer comment

* address reviewer comment

* address reviewer comment

* address reviewer comment

* fix include path problem in pip package

* add comment

* fix lint error

* address reviewer comment

* address reviewer comment

* Use ~/.ccache as default ccache directory so is not cache is not erased on reboot (#13431)

* Skip flaky test #13446 (#13480)

* Rewrite dataloader with process pool, improves responsiveness and reliability (#13447)

* fix recordio.py

* rewrite dataloader with pool

* fix batch as tuple

* fix prefetching

* fix pylint

* picklable function

* use pickle

* add missing commit

* Fix errors in docstrings for subgraph op; use code directive (#13463)

* [MXNET-1158] JVM Memory Management Documentation (#13105)

* update train_mnist

* Add documentation for JVM Memory Management

* update doc

* address nit picks

* address nit picks

* Grammar and clarity edits for memory management doc

* Edits for scala memory management

* Update memory-management.md

* Update memory-management.md

* Update memory-management.md

* capitalization fix

* Update row_sparse tutorial (#13414)

Update row_sparse tutorial

* Add resiliency to onnx export code (#13426)

* Added resiliency to onnx export code

- With previous infer-shape implementation, if input shape was list instead of tuple or if extra non-existent parameters were provided, the code would still work. The fixes in this commit make sure that behavior is restored to prevent any compatibility issues with existing export code.

* Fixed name of net in unittest

* Fix pylint

* [MXNET-1185] Support large array in several operators (part 1) (#13418)

* fix a few operators with large arrays (# of elements)

* fix bug in broadcast_div and add tests

* address reviewer comment

* add unit test

* add empty line

* retrigger CI

* [MXNET-1210 ] Gluon Audio - Example (#13325)

* Initialized the example

* Addressed PR comments, about existing synset.txt file - no overwrite

* RST - docstring issues fixed

* added README

* Addressed PR comments

* Addressed PR comments, checking Divide by 0

* Raising error if format is not supported.

* changed a line for ndarray of labels

* Trigger CI

* Trigger CI

* PR comments addressed around skip_header argument

* Addressed PR comments around librosa import

* PR Comments

* Passing lazy=lazy from argument

* Added PR comments, labels to README.MD

* Trigger CI

* Addressing PR Comments in README

* Modified README.md

* Added example under audio folder

* Retrigger CI

* Retrigger CI

* ONNX export: Instance normalization, Shape (#12920)

* ONNX import/export: Make backend_rep common

* ONNX export: Instance Normalization

* ONNX export: Shape operator

* Clarify dependency on OpenCV in CNN Visualization tutorial. (#13495)

* clarify ops faq regarding docs strings (#13492)

* Add graph_compact operator. (#13436)

* add graph_compact.

* fix.

* add doc.

* add tests for graph_compact.

* address comments.

* update docs.

* trigger CI

* Deprecate Jenkinsfile (#13474)

* update github location for sampled_block.py (#13508)

Updated to https://github.com/dmlc/gluon-nlp/blob/master/src/gluonnlp/model/sampled_block.py

* #13453 [Clojure] - Add Spec Validations to the Optimizer namespace (#13499)

* ONNX export: Logical operators (#12852)

* Fix cmake options parsing in dev_menu (#13458)

Add GPU+MKLDNN unittests to dev_menu

* Revert "Manually track num_max_thread (#12380)" (#13501)

This reverts commit 7541021.

* Feature/mkldnn static 2 (#13503)

* build mkldnn as static lib

* update makefile to statically build mkldnn

* build static mkldnn

* fix static name

* fix static name

* update static for mac

* rename mkldnn dep in ci

* remove moving mkldnn dynamic lib

* remove commented code

* remove mkldnn dnaymic for unitest

* force static for mkldnn lib

* remove dynamic mkldnn bind

* only link windows

* add mkldnn.mk

* try force linking

* remove mkldnn dynanmic check

* remove test mkldnn install

* fix spacing

* fix index

* add artifacts

* add comment about windows

* remove static

* update makefile

* fix toctree Sphinx errors (#13489)

* fix toctree errors

* nudging file for CI

* Disabled flaky test test_gluon_data.test_recordimage_dataset_with_data_loader_multiworker (#13527)

* [MXNET-1234] Fix shape inference problems in Activation backward (#13409)

* Provide a failing test for ReLU activation shape inference bug

* Fix Activation backward shape inference

fixes: #13333

* Add softsign Activation to test_gluon.py

* Use activation in GPU if we are using CUDNN and not MKLDNN as it's happening right now

* Don't disable MKLDNN
zhaoyao73 pushed a commit to zhaoyao73/incubator-mxnet that referenced this issue Dec 13, 2018
zhaoyao73 pushed a commit to zhaoyao73/incubator-mxnet that referenced this issue Dec 13, 2018
…ile (apache#13478)

* updated to v1.5.0

* Bumped minor version from 1.4.0 to 1.5.0 on master

* added Anirudh as maintainer for R package

... adding something useful and re-trigger PR check

* Updated license file for clojure, onnx-tensorrt, gtest, R-package

* Get the correct include path in pip package (apache#13452)

* add find_include_path API

* address reviewer comment

* change return type from list to string

* add unit test

* address reviewer comment

* address reviewer comment

* address reviewer comment

* address reviewer comment

* fix include path problem in pip package

* add comment

* fix lint error

* address reviewer comment

* address reviewer comment

* Use ~/.ccache as default ccache directory so is not cache is not erased on reboot (apache#13431)

* Skip flaky test apache#13446 (apache#13480)

* Rewrite dataloader with process pool, improves responsiveness and reliability (apache#13447)

* fix recordio.py

* rewrite dataloader with pool

* fix batch as tuple

* fix prefetching

* fix pylint

* picklable function

* use pickle

* add missing commit

* Fix errors in docstrings for subgraph op; use code directive (apache#13463)

* [MXNET-1158] JVM Memory Management Documentation (apache#13105)

* update train_mnist

* Add documentation for JVM Memory Management

* update doc

* address nit picks

* address nit picks

* Grammar and clarity edits for memory management doc

* Edits for scala memory management

* Update memory-management.md

* Update memory-management.md

* Update memory-management.md

* capitalization fix

* Update row_sparse tutorial (apache#13414)

Update row_sparse tutorial

* Add resiliency to onnx export code (apache#13426)

* Added resiliency to onnx export code

- With previous infer-shape implementation, if input shape was list instead of tuple or if extra non-existent parameters were provided, the code would still work. The fixes in this commit make sure that behavior is restored to prevent any compatibility issues with existing export code.

* Fixed name of net in unittest

* Fix pylint

* [MXNET-1185] Support large array in several operators (part 1) (apache#13418)

* fix a few operators with large arrays (# of elements)

* fix bug in broadcast_div and add tests

* address reviewer comment

* add unit test

* add empty line

* retrigger CI

* [MXNET-1210 ] Gluon Audio - Example (apache#13325)

* Initialized the example

* Addressed PR comments, about existing synset.txt file - no overwrite

* RST - docstring issues fixed

* added README

* Addressed PR comments

* Addressed PR comments, checking Divide by 0

* Raising error if format is not supported.

* changed a line for ndarray of labels

* Trigger CI

* Trigger CI

* PR comments addressed around skip_header argument

* Addressed PR comments around librosa import

* PR Comments

* Passing lazy=lazy from argument

* Added PR comments, labels to README.MD

* Trigger CI

* Addressing PR Comments in README

* Modified README.md

* Added example under audio folder

* Retrigger CI

* Retrigger CI

* ONNX export: Instance normalization, Shape (apache#12920)

* ONNX import/export: Make backend_rep common

* ONNX export: Instance Normalization

* ONNX export: Shape operator

* Clarify dependency on OpenCV in CNN Visualization tutorial. (apache#13495)

* clarify ops faq regarding docs strings (apache#13492)

* Add graph_compact operator. (apache#13436)

* add graph_compact.

* fix.

* add doc.

* add tests for graph_compact.

* address comments.

* update docs.

* trigger CI

* Deprecate Jenkinsfile (apache#13474)

* update github location for sampled_block.py (apache#13508)

Updated to https://github.com/dmlc/gluon-nlp/blob/master/src/gluonnlp/model/sampled_block.py

* apache#13453 [Clojure] - Add Spec Validations to the Optimizer namespace (apache#13499)

* ONNX export: Logical operators (apache#12852)

* Fix cmake options parsing in dev_menu (apache#13458)

Add GPU+MKLDNN unittests to dev_menu

* Revert "Manually track num_max_thread (apache#12380)" (apache#13501)

This reverts commit 7541021.

* Feature/mkldnn static 2 (apache#13503)

* build mkldnn as static lib

* update makefile to statically build mkldnn

* build static mkldnn

* fix static name

* fix static name

* update static for mac

* rename mkldnn dep in ci

* remove moving mkldnn dynamic lib

* remove commented code

* remove mkldnn dnaymic for unitest

* force static for mkldnn lib

* remove dynamic mkldnn bind

* only link windows

* add mkldnn.mk

* try force linking

* remove mkldnn dynanmic check

* remove test mkldnn install

* fix spacing

* fix index

* add artifacts

* add comment about windows

* remove static

* update makefile

* fix toctree Sphinx errors (apache#13489)

* fix toctree errors

* nudging file for CI

* Disabled flaky test test_gluon_data.test_recordimage_dataset_with_data_loader_multiworker (apache#13527)

* [MXNET-1234] Fix shape inference problems in Activation backward (apache#13409)

* Provide a failing test for ReLU activation shape inference bug

* Fix Activation backward shape inference

fixes: apache#13333

* Add softsign Activation to test_gluon.py

* Use activation in GPU if we are using CUDNN and not MKLDNN as it's happening right now

* Don't disable MKLDNN
cconvey added a commit to NervanaSystems/ngraph-mxnet that referenced this issue Jan 28, 2019
* Support full convention in quantized pooling (#13260)

* fix quantized pooling and enable it in INT8 SqueezeNet

* add test

* fix test

* address review comments

* refine the test for quantized pooling

* Add utility slave (#13383)

* A few operators on graphs stored as CSR (#13290)

* edge_id op csr forward on CPU (#34)

* add node subgraph generator. (#35)

* create DGLSubgraph.

* fix.

* return old eids in node_subgraph.

* accelerate subgraph construction.

* Add neighborhood op (#37)

* add csr_neighborhood op

* update neighborhood sample

* Update csr_neighborhood_sample-inl.h

* Update csr_neighborhood_sample-inl.h

* Update csr_neighborhood_sample.cc

* add graph compact operator.

* fix a bug in dgl_subgraph.

* fix a bug in dgl_graph_compact.

* Update csr sample op (#39)

* add csr_neighborhood op

* update neighborhood sample

* Update csr_neighborhood_sample-inl.h

* Update csr_neighborhood_sample-inl.h

* Update csr_neighborhood_sample.cc

* Update csr_neighborhood_sample-inl.h

* Update csr_neighborhood_sample.cc

* Update csr_neighborhood_sample-inl.h

* remove space.

* move to dgl_graph to contrib.

* move code.

* move edge id.

* fix compilation error.

* add test for subgraph.

* cleanup.

* fix.

* fix.

* fix compile error.

* fix compile error.

* fix compile error.

* fix.

* add operator doc.

* remove graph_compact

* update doc.

* address comments.

* retrigger.

* address comments.

* retrigger

* fix a bug in test.

* retrigger

* add check_format

* Fixes #13386 - Refer Warnings (#13387)

* Updated the paths for images for java tutorial (#13361)

* Updated the paths for images

* Empty commit

* Empty commit

* Nudge to CI

* Fix/env disable mkldnn cache map (#13324)

* add flag to disable mkldnn cache

* update docs

* fix typos

* update var name

* fix ordering

* set cache size

* fix log message

* update docs

* fix lint

* fix lint

* fix comparison

* update method name

* fix missing

* fix logging

* remove random item when cache exceeded

* update helper name

* update hash namespace

* make ophash template

* udpate function params

* fix return

* fix return

* update return for helper

* chagne class to typename

* add typename

* fix lint

* update doc

* pass ptr to cache

* retrigger

* retrigger

* retrigger

* change env var name to MXNET_MKLDNN_CACHE_NUM

* fix log env name

* retrigger

* Initial website documentation for Java API (#13289)

* Initial website documentation for Java API

* Changing paths to be relative

* Refactoring Java API website landing page

* Update Java web docs based on feedback

* Minor formatting fixes

* Update maven repo to nightly build so that java will be available prior to 1.4.0 release

* Adding java tutorial index to test_sanity_tutorials whitelist

* Fix link to javadocs

* Fix javadoc for infer package and minor install doc fix

* Minor path fix

* Replace mxnetci dockcross with public dockcross due to missing image (#13402)

* Replace mxnetci dockcross with public dockcross due to missing image

* Remove source lists change

* Disable Jetson

* Move to mxnetcipinned

* Correct shapes of images in cifar10 and cifar100 (#13348)

* Correct shapes of images in cifar10 and cifar100

cifar10 and cifar100 have 3 channels

* Retrigger build

* Updated recommenders example (#13041)

* initial modification recommender

* Recommender updates

* fix notebooks

* Update README.md

* trigger build

* Update README.md

* Retrigger build

* Improving multi-processing reliability for gluon DataLoader (#13318)

* improving multi-processing reliability for gluon dataloader

I found some multi-processing-related issues in the Gluon  DataLoader.

 1) Each time a _MultiWorkerIter shuts down, it could leave some dangling processes. The shutdown mechanism could not guarantee that all worker processes can be terminated. As a result, after running for several epochs, more and more dangling processes will stay there.

  This problem barely happens during training. In this case, there is a decent time interval between the last-batch data prefetching and the _MultiWorkerIter's shutting down).
  But the problem frequently happens 1) when I stop the iter before the end of an epoch, and 2) when I use the DataLoader for a data loading service and load the data as fast as possible. In both cases, the time interval between the most recent data prefetching and the iter shutdown are short. I guess that the _MultiWorkerIter iter is unable to shut down properly during active data prefetching.

  To fix this, I explicitly terminate the worker processes inside the shutdown function.

  2) When loading data fast (still mostly during testing and data serving), there seems to be a risk of data racing. The data iter uses a _MultiWorkerIter to cache prefetched data, but the dict does not seem to be thread-safe for concurrent inserting and deleting elements. So occasionally, the data can be missing from the  dict.

  To prevent this, I use a scope lock to guard the dict access.

* do not wait for the workers to join, and kill any alive wokers as soon as possible

* Onnx multi output (#13390)

* Fix ONNX export to support multi-output graphs

* Add ONNX unit-test

* Added multi-output shape inference.

- Removed unnecessary forward_pass() call
- Modified infer_output_shape to return multiple shapes for multiple outputs as well as output names.

* Fixed pylint

* Change docker login (#13408)

* Fixing doc links and minor edits for Java API (#13405)

Update the main website links

* Fix repeated typo in mxnet_op.h (#13406)

* Use dynamic omp schedule for sparse dot with large matrix (#13398)

* dynamic omp for dot

update heuristic

* add doc

* Update mxnet_op.h

* Update dot-inl.h

* Added proper default value in cpp-package for optional<bool> (#13415)

* Fix infoGan Gluon tutorial errors. (#13416)

- Update notebook to avoid divide by 0 causing a warning.
- Add MXBoard dependency.

* :memo: Fixes #13388 Adds Clojure to MXNet installation docs (#13393)

* Minor fixes to documentation (#13412)

* Minor fixes to documentation

* Updated the Maven Repository URL to point to staging repo

* [Example] fix cpp example inception-bn and training acc issue (#13284)

* fix inception-bn and training acc issue

* add parameter initialization, fix lint

* fix comparison

* change optimizer to sgd

* update sgd and update model name

* add inception_bn in jenkins build

* make max epoch an argument

* remove inception_bn test

* trigger ci

* remove ci test

* trigger ci

* [Example]Fix mlp_csv example (#13273)

* add instruction to get the data and fix typo

* fix typo

* update file name

* trigger CI

* add unit_test for unit_test_mlp_csv

* add mlp_csv to jenkinsfile

* revert jenkinsfile to another PR

* trigger CI

* trigger CI

* Java doc (#13368)

* Fix scaladoc and javadoc errors

* Stop on errors starting on scala 1.3.x build

* Adding Java to ubuntu setup install page and minor fixes to docs (#13420)

* Adding Java to ubuntu setup install page and minor fixes to other java api docs

* Improving javadoc for java-api predictor class

Mostly documentation changes

* [MXNET-1029] Feature request: randint operator (#12749)

* randint operator add along with add optional tag to params

* register param

* lint space issue

* randn issue fix

* uniform_int_distribution doesn't support int8, uint8 fix

* dtype ftype

* ftype to dtype - invalid template arg

* fix template arg issue

* test with int dtype for windows

* removed int8,uint8 from test

* gpu implementation

* gpu engine state diff

* removed gpu support

* empty commit

* temporary fix : batchnorm flaky test skip

* removed randn symbol specific code since other PR is on it

* revert ndarray/randn for compatibility

* added unit test for checking extremes and uniform distribution for sufficient samples

* increased the high val

* int32 to int64 support, indentation fix, check for optype correctly based on type of random function

* gpu support, revert finfertype using template specialization, remove defaults, prints, test other low high val

* fix for invalid template arg by checking for int32,int64

* gpu randint in random_generator

* sample_uniform issue and param, removed old flaky test skip line

* replaced discrete_uniform function by rand_int64 for consistency

* formula update and removed itype

* change ctx to include gpu, randint samepl_op.cu typo

* trigger ci

* doc fix, check fix, whitespace remove

* added the without dtype testcase

* Java demo file-path fix (#13358)

* fix on ubuntu

* add readme instruction

* fix intellij Tutorials

* fix intelliJ tutorial

* fix the document

* update demo

* revert the change on intelliJ tutorial

* fix make process

* fix documentation

* Updated README and NEWS with 1.3.1 release information (#13423)

* Be more explicit about the exit status of the container (#13425)

* [MKLDNN]Add quantized concat (#13297)

* Add quantized concat

* Fix non-mkldnn build

* Add size check for MKLDNNQuantizedConcatForward

* use all capital for constant

* Rename constant with Google C++ style.

* Address apeforest comments

* Address apeforest comments

* fix lint

* Add frontend interface.

* Retrigger CI

* Add ARMv7 builds to dev_menu.py (#13432)

* Add ARMv7 builds to dev_menu.py

* Add Python3 CPU Intel MKLDNN unittests to dev_menu

* [MXNET-1110] find the path to include header files (#13359)

* add find_include_path API

* address reviewer comment

* change return type from list to string

* add unit test

* address reviewer comment

* address reviewer comment

* address reviewer comment

* address reviewer comment

* add subgraph adjacency operator. (#13396)

* add adjacency.

* fix lint.

* add GPU impl.

* retrigger

* address comments.

* Update dgl_graph.cc

* Java added to install page (#13404)

* added java install option

* update maven blocks

* update maven button url to snapshot search for java

* add version; remove formatting on dependency

* merge clojure updates

* merge clojure updates - give code some breathing room

* merge clojure updates - give code even more breathing room

* 1.3.1 website updates (#13444)

* 1.3.1 website updates

* Java added to install page (#13404)

* added java install option

* update maven blocks

* update maven button url to snapshot search for java

* add version; remove formatting on dependency

* merge clojure updates

* merge clojure updates - give code some breathing room

* merge clojure updates - give code even more breathing room

* remove redundant link (#13428)

* remove redundant link

* retrigger

* retrigger

* [MXNET-886] ONNX export: HardSigmoid, Less, Greater, Equal (#12812)

* ONNX export: Comparison operators

* ONNX export: Hard sigmoid

* Correct Inception Reference for Pertained Model (#13360)

I noticed that the symbols and parameters in the model zoo are infact from https://github.com/apache/incubator-mxnet/blob/master/example/image-classification/symbols/inception-bn.py, which is not inception v3. It is inception + batch normalization. 

In this commit, I update the documentation and link to the correct research basis.

* exclude the error folder from sphinx toc (#13354)

* exclude the error folder from sphinx toc

* bumping commit for CI

* Update MKL-DNN to fix LSTM perf regression (#13417)

* update MKL-DNN CI id

* fix the reorder perf issue

* bumped version to v0.17.1

* bumped to MKL-DNN v0.17.1

* pin to v0.17.1

* Mitigate #13341 (#13343)

- KL never succeeds so it always goes exponential
- Too many weight matrices were rejected because of zero weights, simplify generation to not include 0 weight edges

* parallelize NDArray::Copy<cpu, cpu> when data size is large (#12926)

* parallelize NDArray::Copy<cpu, cpu> by OpenMP when data size > MXNET_CPU_PARALLEL_COPY_SIZE

* code specification according to reviewer's suggestions

* align with std::memcpy api

* add descriptive error message

* update MXNET_CPU_PARALLEL_COPY_SIZE doc

* update MXNET_CPU_PARALLEL_COPY_SIZE doc again

* fix property not updating bug (#13085)

* [MXNET-1222] Scala Inference enable different shapes input (#13330)

* init commit with Predictor Improvement

* add predictor Example

* change into dArr

* add img config

* add new line and fix code style

important bug fixes

* Fix deconvolution  / PR 13421 (#13433)

* add test case

* revert refactor

* use with seed decorator

* retrigger

* remove seed

* remove iteration

* remove old test

* update deconvolution test to have filter length that triggers mkldnn reorder

* Add DGL subgraph sampling op (#13392)

* add csr sample op

* fix compile error in some platform

* update

* update openmp

* speedup sampling

* update csr

* update csr

* update time seed

* update

* fix compiler error

* update doc

* fix ci error

* fix quantize_graph pass error when there're multiple outputs from a single node (#13000)

* fix quantize_graph pass error when there're multiple outputs from
a single node that need to insert 'contrib_quantize', 'min' and
'max' nodes for these outputs.

* fix lint

* Make the single output align with multiple outputs when inserting contrib_quantize

* Change op comparing from its name to itself

* skip unsupported quantize_concat

* retrigger ci

* Get the correct include path in pip package (#13452)

* add find_include_path API

* address reviewer comment

* change return type from list to string

* add unit test

* address reviewer comment

* address reviewer comment

* address reviewer comment

* address reviewer comment

* fix include path problem in pip package

* add comment

* fix lint error

* address reviewer comment

* address reviewer comment

* Use ~/.ccache as default ccache directory so is not cache is not erased on reboot (#13431)

* Skip flaky test https://github.com/apache/incubator-mxnet/issues/13446 (#13480)

* Rewrite dataloader with process pool, improves responsiveness and reliability (#13447)

* fix recordio.py

* rewrite dataloader with pool

* fix batch as tuple

* fix prefetching

* fix pylint

* picklable function

* use pickle

* add missing commit

* Fix errors in docstrings for subgraph op; use code directive (#13463)

* [MXNET-1158] JVM Memory Management Documentation (#13105)

* update train_mnist

* Add documentation for JVM Memory Management

* update doc

* address nit picks

* address nit picks

* Grammar and clarity edits for memory management doc

* Edits for scala memory management

* Update memory-management.md

* Update memory-management.md

* Update memory-management.md

* capitalization fix

* Update row_sparse tutorial (#13414)

Update row_sparse tutorial

* Add resiliency to onnx export code (#13426)

* Added resiliency to onnx export code

- With previous infer-shape implementation, if input shape was list instead of tuple or if extra non-existent parameters were provided, the code would still work. The fixes in this commit make sure that behavior is restored to prevent any compatibility issues with existing export code.

* Fixed name of net in unittest

* Fix pylint

* [MXNET-1185] Support large array in several operators (part 1) (#13418)

* fix a few operators with large arrays (# of elements)

* fix bug in broadcast_div and add tests

* address reviewer comment

* add unit test

* add empty line

* retrigger CI

* [MXNET-1210 ] Gluon Audio - Example (#13325)

* Initialized the example

* Addressed PR comments, about existing synset.txt file - no overwrite

* RST - docstring issues fixed

* added README

* Addressed PR comments

* Addressed PR comments, checking Divide by 0

* Raising error if format is not supported.

* changed a line for ndarray of labels

* Trigger CI

* Trigger CI

* PR comments addressed around skip_header argument

* Addressed PR comments around librosa import

* PR Comments

* Passing lazy=lazy from argument

* Added PR comments, labels to README.MD

* Trigger CI

* Addressing PR Comments in README

* Modified README.md

* Added example under audio folder

* Retrigger CI

* Retrigger CI

* ONNX export: Instance normalization, Shape (#12920)

* ONNX import/export: Make backend_rep common

* ONNX export: Instance Normalization

* ONNX export: Shape operator

* Clarify dependency on OpenCV in CNN Visualization tutorial. (#13495)

* clarify ops faq regarding docs strings (#13492)

* Add graph_compact operator. (#13436)

* add graph_compact.

* fix.

* add doc.

* add tests for graph_compact.

* address comments.

* update docs.

* trigger CI

* Deprecate Jenkinsfile (#13474)

* update github location for sampled_block.py (#13508)

Updated to https://github.com/dmlc/gluon-nlp/blob/master/src/gluonnlp/model/sampled_block.py

* #13453 [Clojure] - Add Spec Validations to the Optimizer namespace (#13499)

* ONNX export: Logical operators (#12852)

* Fix cmake options parsing in dev_menu (#13458)

Add GPU+MKLDNN unittests to dev_menu

* Revert "Manually track num_max_thread (#12380)" (#13501)

This reverts commit 75410210e07a5fab5e044348aee276d578d5857e.

* Feature/mkldnn static 2 (#13503)

* build mkldnn as static lib

* update makefile to statically build mkldnn

* build static mkldnn

* fix static name

* fix static name

* update static for mac

* rename mkldnn dep in ci

* remove moving mkldnn dynamic lib

* remove commented code

* remove mkldnn dnaymic for unitest

* force static for mkldnn lib

* remove dynamic mkldnn bind

* only link windows

* add mkldnn.mk

* try force linking

* remove mkldnn dynanmic check

* remove test mkldnn install

* fix spacing

* fix index

* add artifacts

* add comment about windows

* remove static

* update makefile

* fix toctree Sphinx errors (#13489)

* fix toctree errors

* nudging file for CI

* Disabled flaky test test_gluon_data.test_recordimage_dataset_with_data_loader_multiworker (#13527)

* [MXNET-1234] Fix shape inference problems in Activation backward (#13409)

* Provide a failing test for ReLU activation shape inference bug

* Fix Activation backward shape inference

fixes: #13333

* Add softsign Activation to test_gluon.py

* Use activation in GPU if we are using CUDNN and not MKLDNN as it's happening right now

* Don't disable MKLDNN

* Docs & website sphinx errors squished 🌦  (#13488)

* fix scala ndarray docs; remove interpreter style

* fix docs error in kvstore

* remove interpreter format in examples

* remove python indicator for these non-functioning python code blocks; clears a sphinx error

* remove old table that was not being used and was triggering a sphinx error

* get rid of curly braces that was causing a pygments error

* fix ambiguous reference causing sphinx error

* nudging file for CI

* [MXNET-1235] Add a test for AdaMax optimizer (#13467)

* Add a test for AdaMax optimizer

* Modify nested for loop with itertools.product and left tolerance to default

* Trigger

* Adadelta optimizer test (#13443)

* adadelta test

* comments

* Update java setup docs for 1.4.0 (#13536)

* Update java setup docs for 1.4.0

* Update Java-demo to 1.4.0

* Revert "Feature/mkldnn static 2 (#13503)" (#13540)

This reverts commit 65edc9500b10a3404945d6d79acbae54a2833890.

* doc fix (#13465)

* [MXAPPS-1020] Clean up some Sphinx warnings. (#13539)

* [MXNET-1110] Add header files required by horovod (#13062)

* Add header files required by horovod

* Add symbolic link and cherry picked required header

* add python API to return include path

* update link

* fix windows CI

* fix windows build

* fix dlpack link

* merge with master

* exclude 3rd party header files from license check

* exclude license check

* exclude include directory

* remove commented lines

* Bumped minor version from 1.4.0 to 1.5.0 on master, updated License file (#13478)

* updated to v1.5.0

* Bumped minor version from 1.4.0 to 1.5.0 on master

* added Anirudh as maintainer for R package

... adding something useful and re-trigger PR check

* Updated license file for clojure, onnx-tensorrt, gtest, R-package

* Get the correct include path in pip package (#13452)

* add find_include_path API

* address reviewer comment

* change return type from list to string

* add unit test

* address reviewer comment

* address reviewer comment

* address reviewer comment

* address reviewer comment

* fix include path problem in pip package

* add comment

* fix lint error

* address reviewer comment

* address reviewer comment

* Use ~/.ccache as default ccache directory so is not cache is not erased on reboot (#13431)

* Skip flaky test https://github.com/apache/incubator-mxnet/issues/13446 (#13480)

* Rewrite dataloader with process pool, improves responsiveness and reliability (#13447)

* fix recordio.py

* rewrite dataloader with pool

* fix batch as tuple

* fix prefetching

* fix pylint

* picklable function

* use pickle

* add missing commit

* Fix errors in docstrings for subgraph op; use code directive (#13463)

* [MXNET-1158] JVM Memory Management Documentation (#13105)

* update train_mnist

* Add documentation for JVM Memory Management

* update doc

* address nit picks

* address nit picks

* Grammar and clarity edits for memory management doc

* Edits for scala memory management

* Update memory-management.md

* Update memory-management.md

* Update memory-management.md

* capitalization fix

* Update row_sparse tutorial (#13414)

Update row_sparse tutorial

* Add resiliency to onnx export code (#13426)

* Added resiliency to onnx export code

- With previous infer-shape implementation, if input shape was list instead of tuple or if extra non-existent parameters were provided, the code would still work. The fixes in this commit make sure that behavior is restored to prevent any compatibility issues with existing export code.

* Fixed name of net in unittest

* Fix pylint

* [MXNET-1185] Support large array in several operators (part 1) (#13418)

* fix a few operators with large arrays (# of elements)

* fix bug in broadcast_div and add tests

* address reviewer comment

* add unit test

* add empty line

* retrigger CI

* [MXNET-1210 ] Gluon Audio - Example (#13325)

* Initialized the example

* Addressed PR comments, about existing synset.txt file - no overwrite

* RST - docstring issues fixed

* added README

* Addressed PR comments

* Addressed PR comments, checking Divide by 0

* Raising error if format is not supported.

* changed a line for ndarray of labels

* Trigger CI

* Trigger CI

* PR comments addressed around skip_header argument

* Addressed PR comments around librosa import

* PR Comments

* Passing lazy=lazy from argument

* Added PR comments, labels to README.MD

* Trigger CI

* Addressing PR Comments in README

* Modified README.md

* Added example under audio folder

* Retrigger CI

* Retrigger CI

* ONNX export: Instance normalization, Shape (#12920)

* ONNX import/export: Make backend_rep common

* ONNX export: Instance Normalization

* ONNX export: Shape operator

* Clarify dependency on OpenCV in CNN Visualization tutorial. (#13495)

* clarify ops faq regarding docs strings (#13492)

* Add graph_compact operator. (#13436)

* add graph_compact.

* fix.

* add doc.

* add tests for graph_compact.

* address comments.

* update docs.

* trigger CI

* Deprecate Jenkinsfile (#13474)

* update github location for sampled_block.py (#13508)

Updated to https://github.com/dmlc/gluon-nlp/blob/master/src/gluonnlp/model/sampled_block.py

* #13453 [Clojure] - Add Spec Validations to the Optimizer namespace (#13499)

* ONNX export: Logical operators (#12852)

* Fix cmake options parsing in dev_menu (#13458)

Add GPU+MKLDNN unittests to dev_menu

* Revert "Manually track num_max_thread (#12380)" (#13501)

This reverts commit 75410210e07a5fab5e044348aee276d578d5857e.

* Feature/mkldnn static 2 (#13503)

* build mkldnn as static lib

* update makefile to statically build mkldnn

* build static mkldnn

* fix static name

* fix static name

* update static for mac

* rename mkldnn dep in ci

* remove moving mkldnn dynamic lib

* remove commented code

* remove mkldnn dnaymic for unitest

* force static for mkldnn lib

* remove dynamic mkldnn bind

* only link windows

* add mkldnn.mk

* try force linking

* remove mkldnn dynanmic check

* remove test mkldnn install

* fix spacing

* fix index

* add artifacts

* add comment about windows

* remove static

* update makefile

* fix toctree Sphinx errors (#13489)

* fix toctree errors

* nudging file for CI

* Disabled flaky test test_gluon_data.test_recordimage_dataset_with_data_loader_multiworker (#13527)

* [MXNET-1234] Fix shape inference problems in Activation backward (#13409)

* Provide a failing test for ReLU activation shape inference bug

* Fix Activation backward shape inference

fixes: #13333

* Add softsign Activation to test_gluon.py

* Use activation in GPU if we are using CUDNN and not MKLDNN as it's happening right now

* Don't disable MKLDNN

* Fixing a 404 in the ubuntu setup doc (#13542)

* [MXNET-1249] Fix Object Detector Performance with GPU (#13522)

* Reduce post processing time

* fix ssd

* fix the CI

* add comments

* [MXNET-769] Use MXNET_HOME in a tempdir in windows to prevent access denied due t… (#13531)

* Use MXNET_HOME in cwd in windows to prevent access denied due to concurrent data downloads

Fixes #13484

* Revert "Disabled flaky test test_gluon_data.test_recordimage_dataset_with_data_loader_multiworker (#13527)"

This reverts commit 3d499cb3584919b767142c5596211a7f7fb18d50.

* Add a retry to qemu_provision (#13551)

Fixes #13504

* Fix #13521 (#13537)

* fix pool release

* fix

* Simplifications and some fun stuff for the MNIST Gluon tutorial (#13094)

* Simplify mnist Gluon tutorial and add mislabelled sample plotting

* Add mnist Gluon tutorial images

* Gluon MNIST tutorial: Use modern Gluon constructs, fix some wordings

* [Gluon] Move to data loaders and improve wording in MNIST tutorial

* Fix broken links

* Fix spelling of mislabeled

* Final rewordings and code simplifications

* Fix things according to review

- Apply hybrid blocks
- Move outputs outside of code blocks and mark for notebooks
  to ignore
- Remove images, use external link
- Fix a few formulations

* Change activations to sigmoid in MNIST tutorial

* Remove superfluous last layer activations in MNIST tutorial

* Updated docs for randint operator (#13541)

* updated docs for randint

* added randint in __all__ and reordered acc to categorical then alphabetical

* Trigger CI

* minus mxnet.symbol and alphabetical for ndarray,symbol.md

* alphabetical order

* Chi_square_check for discrete distribution fix (#13543)

* check for bucket instead of index

* enumerate instead of range(len())

* count instead of sum to solve attribute error

* revert to sum

* seperate discrete and continuous

* Trigger CI

* Revert "Bumped minor version from 1.4.0 to 1.5.0 on master, updated License file" (#13558)

* Revert "Chi_square_check for discrete distribution fix (#13543)"

This reverts commit cf6e8cbd035bf315b3e8280416468a629c780d03.

* Revert "Updated docs for randint operator (#13541)"

This reverts commit e0ff3c36ee171386fef01fb86c54c343e4b04c14.

* Revert "Simplifications and some fun stuff for the MNIST Gluon tutorial (#13094)"

This reverts commit 8bbac827742c21607a863137792f03bd09847419.

* Revert "Fix #13521 (#13537)"

This reverts commit f6b4665995f8f8ff32862a029b2074475d8467eb.

* Revert "Add a retry to qemu_provision (#13551)"

This reverts commit f6f840110d74111f98c20eab5b08d64a46ebf0cd.

* Revert "[MXNET-769] Use MXNET_HOME in a tempdir in windows to prevent access denied due t… (#13531)"

This reverts commit bd8e0f8356676749ecae16ec38a366b4cc00bf15.

* Revert "[MXNET-1249] Fix Object Detector Performance with GPU (#13522)"

This reverts commit 1c8972c3c8f832519364916865541f48597581c7.

* Revert "Fixing a 404 in the ubuntu setup doc (#13542)"

This reverts commit cb0db290adcfd0fce956d02c234f81d453e41013.

* Revert "Bumped minor version from 1.4.0 to 1.5.0 on master, updated License file (#13478)"

This reverts commit 40db61908000ee86d21aac847ff2225807d6c168.

*  #13441 [Clojure] Add Spec Validations for the Random namespace (#13523)

* Adding test for softmaxoutput (#13116)

* Add workspace cleaning after job finished (#13490)

* Add workspace cleaning after job finished

* Update Jenkinsfile_utils.groovy

* Update Jenkinsfile_utils.groovy

* Fix flaky test test_random:test_randint_generator (#13498)

* updated seed, alpha value, comments

* typo in comment fix

* added nrepeat

* removed unusued variable, added link for scipy alpha, rephrased the sentence for discrete distribution buckets

* removed fixed seed, alpha

* Update version to v1.5.0 including clojure package (#13566)

* Update DESCRIPTION

* update version to v1.5.0 except for clojure

* update version from 1.4.0 to 1.5.0
- add utility script to help bump versions in future
- fix README to correct to current maven versions

* License update  (#13565)

* Update LICENSE

* update license for Clojure, R, ONNX-TRT and location of 3rd party
dependencies.

* fixed typo

* Fix use-before-assignment in convert_dot (#13511)

* fix the situation where idx didn't align with rec (#13550)

minor fix the image.py

add last_batch_handle for imagedeiter

remove the label type

refactor the imageiter unit test

fix the trailing whitespace

fix coding style

add new line

move helper function to the top of the file

* Update MXNetTutorialTemplate.ipynb (#13568)

Fix typos

* ONNX import/export: Size (#13112)

* fix link for gluon model zoo (#13583)

* Fix exception handling api doc (#13519)

* Fix exception handling api doc

* Update waitall api doc

Co-Authored-By: anirudh2290 <anirudh2290@apache.org>

* add cpp example inception to nightly test (#13534)

* add inception test

* fix max iter for mlp

* rename and add comment

* rename epoch num

* Add notes about debug with libstdc++ symbols (#13533)

* Add imresize and copyMakeBorder to mx.image (#13357)

* Add imresize API to docs

* address comments

* copyMakeBorder

* [MXNET-1253] fix control_flow_op (#13555)

* fix control_flow_op

* change type for M

* add test for sparse where op

* Add Intel MKL blas to Jenkins (#13607)

* add mkl blas to Jenkins

* add mkl install script

* fix bug in mkl script

* remove python2 ut and add cpu-mkl node

*  #13385 [Clojure] - Turn examples into integration tests (#13554)

* fix the Float not showing correctly problem (#13617)

Merge this PR for 1.4.x

* [MXNET-1155] Add scala packageTest utility (#13046)

* [MXNET-1155] Add scala packageTest utility

* Clean up utility

* Safe change directory in Makefile for scala

* mvn install file instructions with details

* [MXNET-1224]: improve scala maven jni build and packing. (#13493)

Major JNI feature changes. Please find more info here: https://cwiki.apache.org/confluence/display/MXNET/Scala+maven+build+improvement

* [MXNET-1225] Always use config.mk in make install instructions (#13364)

* Always use config.mk in make install instructions

* Specify Cuda 0 for ubuntu with mkldnn

* Scala install doc avoid build_from_source

Minor doc fixes

* Fix build_from_source CMake usage

* CPP Install Instruction with CMake

* Use cmake out of source build

* Fix warning in waitall doc (#13618)

* Optimize C++ API (#13496)

* Optimize C++ API

Pass parameter with reference instead of value.
Add const as well as it is not changed.

* fix docs/architecture/overview.md

Fix BinaryShapeFunction typedef
Add a right brace for SmoothL1Shape_

* fix quantize pass error when the quantization supported Op are excluded in the model (#13596)

* Scripts for building dependency libraries of MXNet (#13282)

* openblas script

* ps-lite dependencies

* USE_S3 dependencies

* image libraries

* license

* add batch norm test (#13625)

* add batch norm test

* fix formatting

* use out_arr as input

* fix typo

* remove const

* use ptr

* eval ptr

* Set install path for libmxnet.so dynamic lib on Mac OS (#13629)

* Fix the bug of BidirectionalCell (#13575)

* Fix the bug of BidirectionalCell

I did hybridize( ) and pass "valid_length" to the unroll( ) function of BidirectionalCell, then returned AssertionError in line 79. Because symbol.split( ) return a symbol but not a symbol list. Result in the length of inputs dont equal parameter "length"  when call unroll( )  to compute r_outputs and r_states.

* add a test for BidirectionalCell

* Fix the bug of BidirectionalCell

I did hybridize( ) and pass "valid_length" to the unroll( ) function of BidirectionalCell, then returned AssertionError in line 79. Because symbol.split( ) return a symbol but not a symbol list. Result in the length of inputs dont equal parameter "length"  when call unroll( )  to compute r_outputs and r_states.

* fix test_bidirectional_unroll_valid_length( )

Fix the error of parameter.

* Fix the bug of BidirectionalCell

I did hybridize( ) and pass "valid_length" to the unroll( ) function of BidirectionalCell, then returned AssertionError in line 79. Because symbol.split( ) return a symbol but not a symbol list. Result in the length of inputs dont equal parameter "length"  when call unroll( )  to compute r_outputs and r_states.

* fix test_bidirectional_unroll_valid_length( )

* Feature/mkldnn static (#13628)

* Revert "Revert "Feature/mkldnn static 2 (#13503)" (#13540)"

This reverts commit a3eca5f5c96eed0bc29bd4e58e470997091a1fb3.

* include headers on mkldnn lib

* retrigger

* retrigger

* build config for maven and pip (#13556)

* config for pip

* symbol whitelist

* maven build config

* Fix for import mxnet taking long time if multiple process launched (#13602)

* https://github.com/apache/incubator-mxnet/issues/12255
doing import mxnet in multiple processes take very long time.
Details : #12255
One of the reason we have OMP tuning code which iterates to find OMP
tune overhead. We are reducing this iteration count to reduce the
overehead of tuning code.
Also, We added an environment variable where users can set the number
of cores that should be used to determine tuning.

* cpplint fix

* Adding new environment variable: MXNET_USE_NUM_CORES_OPERATOR_TUNING to doc

* fixing formatting in doc

* Add reshape op supported by MKL-DNN (#12980)

* Add reshape op supported by MKL-DNN

* fix build issue

* fix lint

* fix lint

* fix lint

* fix lint

* fix lint

* fix lint

* fix white space

* add unit test

* merge if blocks

* Improve dev_menu usability, local build and virtualenv (#13529)

* Improve dev_menu, add build command and virtualenv creation with local builds for easy testing

* Update dev_menu.py

Co-Authored-By: larroy <pedro.larroy.lists@gmail.com>

* Cuda off by default, use ccache

* address CR

* [Clojure] Correct the versions in the README so they correspond to the latest maven.org release (#13507)

* Correct the versions so they correspond to the latest maven.org release

* trigger build

* feedback from @kohr-h

* Optimization of metric evaluation (#13471)

* Change argsort to argpartition

* Global statistics in metrics

* Fix lint

* Fixes from review

* Trigger

* Fixes from review, fix to F1, MCC and perplexity metrics,
added test for global stats

* Fix lint

* Fix compatibility with Python 2

* Revert "Feature/mkldnn static (#13628)" (#13638)

This reverts commit 5bcf2bd6e8b48fa27bfcfdafd06401ec2d28978b.

* support mkl log when dtype is fp32 or fp64 (#13150)

* support mkl log when dtype is fp32 or fp64

* remove macro

* ensure data size less than or equal MKL_INT_MAX

* code specification

* fix indent

* for retrigger

* [MXNET-1209] Tutorial transpose reshape  (#13208)

* transpose tutorial

* Adding Anirudhs comments

* Update tutorial with some more examples

* Adding links

* Fixing the links, adding more examples

* Update reshape_transpose.md

* Fixing spelling mistakes

* Updating image resolution

* Adding Simon's comments

* Small fixes

* Update reshape_transpose.md

* Update reshape_transpose.md

* empty commit

* empty commit

* updated reference to Apache MXNet (#13645)

* Complimentary gluon DataLoader improvements (#13606)

* init

* add tests

* doc

* lint

* fix openmp

* Improve CCache handling (#13456)

* Remove gitignore entries

* Modify Makefile

* Modify user permissions

* Add new ccache wrapper function

* Change PATH rewrite to a different one to resolve CUDA issues

* Add ccache to gpu cmake

* Enable ccache for every build

* Set permissions for arm dockerfiles

* Disable ccache for ASAN

* Remove g++-8 ccache redirect

* Update Android Dockerfiles for user permissions

* Fix ASAN compiler typo

* Remove sanity for speed

* Move build dir creation in android armv8

* Revert "Remove sanity for speed"

This reverts commit e8386a774dafe96337930b9cac36cb24fc36585e.

* Add ccache for NVCC in Makefile

* [MXNET-918] Random module (#13039)

* introduce random API

* revert useless changes

* shorter types in APIDoc gen code

* fix after merge from master

* Trigger CI

* temp code / diag on CI

* cleanup type-class code

* cleanup type-class code

* fix scalastyle

* Fix incorrect delete in MXExecutorReshape exception handling (#13376)

* Fix bad delete.

Delete the pointed-to handle on cleanup, not the location of the handle itself. Also don't delete it if we didn't set it in the first place.

* Remove unusued 'exec' var from MXExecutorBindEX.

* [MXNET-1251] Basic configuration to do static-linking (#13621)

* Basic configuration to do static-linking

* update build script and place it in the install part

* clean up the code further

* revert maven into build-from-source

* add curl to deps

* [MXNET-1195] Cleanup Scala README file (#13582)

* Updated the Scala-Readme with upto-date information

* Updated the header

* Removed redundant build status

* Minor formatting changes

* Addressed the PR feedback

* Added section on Scala training APIs

* Removed mention of deprecated Model API

* scripts for building libmxnet binary and wheel (#13648)

* add script for making all dependencies

* tools for building pip package

* build scripts for lib and wheel

* [MXNET-1083] Add the example to demonstrate the inference workflow using C++ API (#13294)

* [MXNET-1083] Add the example to demonstrate the inference workflow using C++ API

* [MXNET-1083] Add the example to demonstrate the inference workflow using C++ API

* Updated the code to address the review comments.

* Added the README file for the folder.

* Addressed the review comments

* Addressed the review comments to use argmax and default mean values.

* Update MKLDNN_README.md (#13653)

* Support Quantized Fully Connected by INT8 GEMM (#12922)

* add quantized fully connect support

* disable qfc cpu case since s8u8s32 is only supported by MKL BLAS library

* retrigger to ci testing

* move implementation to cc file and add  STORAGE_TYPE_ASSIGN_CHECK

* fix typo bug

* retrigger the ci test

* fix typo bug

* retrigger ci

* retrigger the ci test

* retrigger the ci

* retrigger the ci test

* retrigger ci test

* fix indent issue

* retrigger the ci

* retrigger the ci test

* add verbose message

* update log message

* using range for loop

* using for auto range

* enable MKL BLAS ci test

* fix typo issue

* use TYPE_ASSIGN_CHECK

* retrigger the ci

* add build fix for Scala/Java build (#13655)

* Fix Jetson compilation (#13532)

* remove omp which can cause ssd accuracy variance (#13622)

* Revert "[MXNET-43] Fix Jetson compilation" (#13665)

* Revert "remove omp which can cause ssd accuracy variance (#13622)"

This reverts commit 655f1c6f7a0706dd622f73db9af2e6df895ca213.

* Revert "Fix Jetson compilation (#13532)"

This reverts commit 48e25c4cae355753dd96ea7afe004bf78e0719e4.

* Fix Jetson compilation (#13666)

* turn on Sphinx warnings as errors (#13544)

* turn on warnings as errors

* move warnings as error logic to build_all_version

* fix typo in comment

* add warning as error option for docs pipeline

* bump ci to test again; use this chance to add notes on this feature

* fix bugs in image.py docs

* Update CODEOWNERS, add Pedro Larroy. (#13579)

* Revert "Revert "[MXNET-43] Fix Jetson compilation" (#13665)" (#13672)

This reverts commit 3433776dac7be75928082bbc1d552fca248fb8e8.

* Accelerate DGL csr neighbor sampling (#13588)

* Speedup and fix bug in dgl_csr_sampling op

* Update dgl_graph.cc

* simplify functions.

* avoid adding nodes in the last level in the queue.

* remove a hashtable lookup in neigh_pos.

* reduce a hashtable lookup in sub_ver_mp.

* merge copying vids and layers.

* reduce hashtable lookup when writing to output csr.

* fix a bug.

* limit the number of sampled vertices.

* fix lint.

* fix a compile error.

* fix compile error.

* fix compile.

* remove one hashtable lookup per vertex and hashtable iteration.

* remove queue.

* use vector for neigh_pos.

* fix lint

* avoid init output arrays.

* fix tests.

* fix tests.

* update docs.

* retrigger

* retrigger

* [MXNET-1252][1 of 2] Decouple NNVM to ONNX from NNVM to TenosrRT conversion (#13659)

* fix unpicklable transform_first on windows (#13686)

* Move the debug output message into MXNET_MKLDNN_DEBUG (#13662)

* NEWS.md backport from v1.4.x to master (#13693)

* merge NEWS.md from 1.4.x to master

* NEWS.md backport from v1.4.x to master

* Fallback to dense version for grad(reshape), grad(expand_dims) (#13599)

* fallback to dense version for grad(reshape), grad(expand_dims)

* add _backward_reshape gpu version

* reshape test case comments

* fix gpu test

* remove mkldnn support for _backward_reshape

* ONNX export: Add Flatten before Gemm (#13356)

* Add Flatten before Gemm

* ONNX export test: Allow multiple inputs in forward pass

* ONNX export: Test for fully connected

* [MXNET-1164] Generate the document for cpp-package using Doxygen (#12977)

* Adding cpp-package directory to the Doxyfile. Updating the index.md file in c++ api directory.

* Updating the link to classes in C++ API to point to correct html file.

* Updated the links to use relative paths.

* Removed the extra slash character in the url

* Excluded the 3rdparty folder as per the review comment.

* Update git clone location to apache github (#13706)

* Add timeout/retry logic to docker cache download (#13573)

* Added timeout/retry (linear backoff) to docker cache download

* Units changed, as time.sleep takes seconds as argument

* Improved error handling

* Using retry decorator

* Added retry decorator to _login_dockerhub method

* Fixed wrong import

* Fix NDArray ToDLPack Bug (#13698)

* Added javadocs and improved example instructions (#13711)

* Rearrange tests written only for update_on_kvstore = True (#13514)

* Update test_gluon_trainer.py

* Update test_gluon_trainer.py

* test

* Update mshadow to support batch_dot with fp16. (#13716)

* fp16 dot

* update mshadow

* update mshadow

* update mshadow

* Fix the quantization script to support Python2 (#13700)

* fix the quantization script to support python2

* Fix comments, fix similiar issue in imagenet_inference.py

* ONNX test code cleanup (#13553)

* ONNX test code cleanup

* Make tests use the common test case list

* Remove import test_cases

* Make Gluon backend rep common

* Partially enable broadcast tests

* Common function to populate tests

* Make backend common

* test models

* Test nodes

* ONNX export: Test for fully connected

* Edit CI scripts mxnet export test cleanup

* Further cleanup backend tests

* README

* Some corrections

* test case format for test_models

* update social media section (#13705)

* script for installing gpu libraries and build tools (#13646)

* Port of scala infer package to clojure (#13595)

* Port of scala infer package to clojure

* Add inference examples

* Fix project.clj

* Update code for integration tests

* Address comments and add unit tests

* Add specs and simplify interface

* Minor nit

* Update README

* update code owner (#13737)

* AdamW operator (Fixing Weight Decay Regularization in Adam) (#13728)

* tests

* remove optimizer and move op to contrib

* rename parameter

* ONNX import/export: Add missing tests, ONNX export: LogSoftMax (#13654)

* Logsoftmax, missing tests

* Support multiple outputs in Gluon backendrep

* Remove repeated unsqueeze test

* Allow multiple output support

* ONNX test code cleanup - part 2 (#13738)

* Common test caller

* Remove incorrect comment

* Make corrections to CI

* fix ci script

* Update basic_layers.py (#13732)

* ONNX import: Hardmax (#13717)

* ONNX import: Hardmax

* Fix lint errors

* add github link for issue with reshape

* gluon docfix (#13631)

* Fixes for trainer with update_on_kvstore=False (#13721)

* add clarification for param_dict

* more tests for dist kvstore

* more unittests

* fix a bug

* more dist exception test

* revert optimizer list

* fix bug and comment

* fix doc rendering and lint

* add invalid sched test

* fix website

* trigger

* update doc

* Reorder module import orders for dist-kvstore (#13742)

* Reorder module import orders for dist-kvstore

* more code comments

* CMake: Enable installation of cpp-package headers (#13339)

* Allow CMake based installation of cpp-package

* Add installation of missing nnvm headers

* Add documentation as to where public headers will be installed

* disable error checking when building old versions (#13725)

* Integrate MKLDNN Conv1d and support 3d layout (#13530)

* add 3d layout support for MKLDNN Conv and Activation

* fix lint

* code refactor

* add testcase for group1 conv and skip quantization for conv1d

* fix lint

* avoid conv1d quantization

* code refactor and add activation ut

* del todo

* Making MKL-DNN default on MXNet master (#13681)

* mkldnn is default makefile and explicitly turn off for buidls

* add endif

* retrigger

* retrigger

* build mkldnn as static lib

* update makefile to statically build mkldnn

* build static mkldnn

* fix static name

* fix static name

* update static for mac

* rename mkldnn dep in ci

* remove moving mkldnn dynamic lib

* retrigger

* remove commented code

* retrigger

* remove mkldnn dnaymic for unitest

* retrigger

* retrigger

* force static for mkldnn lib

* turn of mkldnn on arm builds

* remove dynamic mkldnn bind

* update jenkins to use only mkldnn

* remove last flag

* turn mkldnn by default on mac

* move mkldnn files for GPU MKLDNN build

* copy lib mxnet in gpu build

* only link windows

* add mkldnn.mk

* try force linking

* retrigger

* retrigger

* remove mkldnn dynanmic check

* use ifndef

* remove test mkldnn install

* fix spacing

* fix index

* remove cp of mkldnn since statically linked

* add libmkldnn.a to list of files to pack

* include mkl_ml

* add mkldnn to pack

* add libiomp to ci pack

* move static libs

* fix typo

* pack mkldnn

* retrigger

* add linux artifacts

* move libmkldnn in gpu cmake build

* move libmkldnn and libiomp5 on gpu workspace

* move linked files

* fix typo

* fix typo

* add artifacts for tensorrt

* move mkldnn lib in scala build

* move mkldnn lib on cpu scala

* create dir for binding

* rename libmkldnn in scala

* move mklml dep in scala builds

* move mkl to another linked folder

* move libmkl to another dir

* add libmklml

* move mkldnn

* move mkldnn on centos

* specify new dynamic path

* retrigger

* remove mkldnn dynamic lib

* remove moving mkldnn artifact

* add ld path

* retrigger

* Revert "remove moving mkldnn artifact"

This reverts commit 16cca196e9e1ad92db74f4e8a01b3b052076d268.

* Revert "remove mkldnn dynamic lib"

This reverts commit d51043622d4ef7fcb95aff6a3e84d91ab71b48c9.

* update makefile

* Revert RPATH change and trigger CI

* correcting use-mkldnn flags for two tests

* mkldnn default on linux for starters

* reverting naming rules of pack_lib

* adding mkldnn=0 flags to centos non mkldnn builds

* adding mkldnn=0 flags to ubuntu gpu non mkldnn builds

* removing mkldnn binary operation for ubuntu gpu cmake non mkldnn build

* removing mkldnn binary operation for centos non-mkldnn unittest

* adding explicit USE_MKLDNN=0 flags for clang builds

* adding explicit USE_MKLDNN=0 flags for cpu ubuntu builds

* removing mkldnn binaries from non mkldnn builds scala gpu

* adding explicit flag mkldnn=0 for tensorrt gpu build

* adding explicit flag mkldnn=0 for ubuntu cmake asan

* adding centos cpu mkldnn tests to CI

* adding CentOS GPU MKLDNN build and unittest

* not keeping mkldnn default for mac os

* setting mkldnn default for x86_64 only

* running docs with mkldnn=0 flag

* removing CentOS CPU Scala MKLDNN test

* setting mkldnn default for x86_64 only

* not making mkldn default on windows

* removing Centos MKLDNN tests from CI

* retrigger

* retrigger

* retrigger

* use relative links; update links (#13741)

* [MXNET-1231] Allow not using Some in the Scala operators (#13619)

* add initial commit

* update image classifier as well

* create Util class make Some conversion

* add test changes

* adress Comments

* fix the spacing problem

* fix generator base

* change name to Option

* fix bug in profiler tutorial when using cpu (#13695)

try except approach only goes to ctx=mx.gpu() because test_utils.list_gpus() at least returns empty array and do not producing error

* local docs build feature (#13682)

* make ROIAlign support position-sensitive pooling (#13088)

* make ROIAlign support position-sensitive pooling

* add unittest for RoIAlign op

* fix ccplint error

* fix python3 compability for unittest

* change OMP for better performance

* delete blank line to trigger CI

* add shape check when position_sensitive is true

* fix the typo

* typo: shuold -> should

* remove private() clause in omp statement

* add examples and fix the dependency problem (#13620)

* add examples and fix the dependency problem

* add Nightly run and optimized script

* add explanation for the line

* Update Adam optimizer documentation (#13754)

* Less cudaGet/SetDevice calls in Gluon execution (#13764)

* Remove unnecessary cudaGetDevice/cudaSetDevice calls

* Fixes for the DeviceGuard

* Retrigger CI

* Fix for possible invalid device ordinal when using DeviceStore while
driver is unloading

* Fix for RTC when the driver API call is the first call

* Added DeviceStore to pooled engine

* Scope requests so it's not needed for dev_menu (#13771)

* Fix USE_MKLDNN check in Makefile (#13775)

* fix makefile

* change make/config.mk

* add comments

* retrigger ci

* fix c complier to clang (#13778)

* Fixed mailing list addresses (#13766)

* [MXNET-1255] update hybridize documentation (#13597)

* update hybridize documentation

* address review comments

* improve doc

* address comments

* address comments

* [MXNET-244] Work around likely compiler bug on nested inlines and temporary acces… (#13535)

* Work around likely compiler bug on nested inlines and temporary access to stream

* Don't compile khatri_rao tests if we don't have LAPACK

* Address CR comment

* Use curl to download sample data instead of wget. (#13761)

* fix bipartite match memory corruption (#13727)

* remove attributs clear on TRT nodes for GetOptimizedSymbol (#13703)

* Add CPU test coverage and refine cmake builds (#13338)

* add license (#13793)

* [MXNET-862] Basic maven jenkins pipeline (#13450)

* Jenkins Publish Nightly Maven

Progress

* Seperate Build, Test, and Deploy Stages with parallel

* Re-organize Scala maven build (#13626)

* Re-organize scala maven build

1. Automatically detect which platform to build for scala.
2. Remove platform dependend submodules
3. Fix cyclic module dependencies
4. Fix scalatype style check
5. Now mvn can be executed in submodule
6. Maven build can be executed from any directory not only in root project
7. Checkin javah header file, and use verify task to detect native API changes
8. Improve incremental build performance
9. Remove unittest and integration-test profile, use proper task instead
10. Delete generated scala file during maven clean.

* Redo maven deploy related tasks.

1. Removed maven release plugin.
2. Make maven build friendly to CI, allow cli override version.
3. Moved gpg signing to deploy stage.
4. Created a separeated deploy module.
5. Updated Makefile to new maven build change.
6. Remove unused nexus-staging-plugin
7. Added nightly and staging profile for CI.

* Support mkldnn for Scala.

* Add extra header file to export for error checking (#13795)

* add extra header file to include

* fix sanity check

* fix sanity

* move c_api_common.h to include folder

* fix build error

* keep c_api_common.h internal

* strip out error handling API into a separate header

* consolidate comment into one paragraph per review

* remove unnecessary include

* fix redirection issues; set default version to master (#13796)

* [MXNET-898] ONNX import/export: Sample_multinomial, ONNX export: GlobalLpPool, LpPool (#13500)

* ONNX import/export: Sample_multinomial

* ONNX export: GlobalLpPool, LpPool

* Handle default p_value

* Add tests for multinomial, lppool, globallppool

* add a comment about shape test

* whitelist symbols for using MXNet error handling externally (#13812)

* fix for params with no dims in onnx (#13413)

* fix for params with no dims

* fix

* fix

* retrigger build

* test added

* retrigger CI

* retrigger ci

* Remove semicolon in libmxnet.sym file (#13822)

* Remove semicolon in libmxnet.sym file

* empty commit to trigger CI

*  Clojure example for fixed label-width captcha recognition  (#13769)

* Clojure example for fixed label-width captcha recognition

* Update evaluation

* Better training and inference (w/ cleanup)

* Captcha generation for testing

* Make simple test work

* Add test and update README

* Add missing consts file

* Follow comments

* Update LICENSE File with subcomponents (#13808)

* Update LICENSE File with subcomponents

* Fix JavaScript licenses

* Dockerfiles for Publish Testing (#13707)

* Add new Maven build for Scala package (#13819)

* clean up build

* fix minor issue and add mkldnn

* fix mx_dist problem

* fix clojure build

* fix skip test

* ONNX ops: norm exported and lpnormalization imported (#13806)

* ReduceL1, l2 export, lpnormalization import added

* fix

* fix

* fix

* fix

* remove useless code (#13777)

* Fixing a symlink issue with R install (#13708)

* fix minor indentation (#13827)

* [MXNET-880] ONNX export: Random uniform, Random normal, MaxRoiPool (#13676)

* ONNX export: Random uniform, Random normal

* ONNX export: MaxRoiPool

* tests for maxroipool, randomnormal, randomuniform

* onnx export ops (#13821)

* onnx export ops

* retrigger ci

* retrigger ci

* fix

* [MXNET-1260] Float64 DType computation support in Scala/Java (#13678)

* Added Float64 as a supported datatype in NDArray

* Added unit tests for Float64 in NDArray

* Fix for failing Clojure unit tests

* Added Float and Double as MX_PRIMITIVES for computation in Scala

* Trying out second approach --> Private Impl methods with generic signature, and public methods calling the Impls

* Fixed errors in *= method

* Added Float64 in IO.scala and DataIter.scala

* Added another testcase for IO.DataDesc creation

* Fixed failing CI

* Added Float64 in Predictor class

* Added Float64 in Classifier class

* Added Double as a possible return type to : classifyWithNDArray

* Added unit tests for Classifier and Predictor.scala classes for Float64/Double

* Approach 3 --> Using a trait to mirror Float and Double in Scala

* Added comments on MX_PRIMITIVES.scala

* Added Float64/Double support for inference in ImageClassifier APIs

* Added unary- and compareTo in MX_NUMBER_LIKE

* Renamed MX_NUMBER_LIKE to MX_PRIMITIVE_TYPE

* Fixed linting issue

* Now specifying dType from the available data in copyTo and MXDataIter.scala for creating a new DataIterator

* Add primitives support handling to the generator for proper conversion

* Reduced code duplication in classify method in Classifier.scala

* Fix infer package for new signatures and address some bugs

* Removed code duplication in getPixelsArray

* remove debugging

* Changed classifyWithNDArray method in Classifier.scala

* Removed code duplication in predictImpl

* Satisfying lint god _/\_

* Fixed failing PredictorSuite test

* Renamed MX_FLOAT to Camel case

* Revert "Renamed MX_FLOAT to Camel case"

This reverts commit 9d7c3ce6f9c4d6ed2c46041a02e23c0f1df8dfe5.

* Added an implicit conversion from int--> float to support int operations in NDArrays. (These ops were already supported in the previous versions)

* Added Float64 as a training option to ImClassification Suite. Also added integration tests for it

* Satisfy Lint God _/\_

* Added Float64 support in Java NDArray

* Added Float64 support in Java's Predictor API

* Added yours truly to the Contributors list

* Added method comments on Predictor.predict with Array[Double] as a possible input

* Added method comments explaining what MX_PRIMITIVE_TYPE is

*  Fixed errors cause by rebasing with master

* Added licences to the files

* [MXNET-1263] Unit Tests for Java Predictor and Object Detector APIs (#13794)

* Added unit tests for Predictor API in Java

* Added unit tests for ObjectDetectorOutput

* Added unit tests for ObjectDetector API in Java

* Addressed PR comments

* Added Maven SureFire plugin to run the Java UTs

* Pom file clean up -- moved surefire plugin to parent pom.xml

* Renamed skipTests to SkipJavaTests

* Fix scala doc build break for v1.3.1 (#13820)

* Fix doc build break for v1.3.1

* ignore errors on v1.3.x during scala docs gen

* Remove MXNET_STORAGE_FALLBACK_LOG_VERBOSE from test_autograd.py (#13830)

* Add Local test stage and option to jump directly to menu item from commandline (#13809)

* Removes unneeded nvidia driver ppa installation (#13814)

* Improve license_header tool by only traversing files under revision c… (#13803)

* Improve license_header tool by only traversing files under revision control

* use HEAD instead of master for CI

* Disabled flaky test (#13758)

* change to compile time (#13835)

* fix Makefile for rpkg (#13590)

* fix Makefile for rpkg

* update R and roxygen2 requirements

* add roxygen requirement

* add roxygen requirement

* [CI] Prevent timeouts when rebuilding containers with docker. (#13818)

* Prevent timeouts when rebuilding containers with docker.
Increase timeout from 120 to 180 for pipelines

* Increase docker cache timeout

* Increase timeout also for docs

* limit parallel builds to 10

* Code modification for  testcases of various network models in directory example (#12498)

* example testcase modified

* rcnn file add

* license add

* license init

* CI test trigger

* rcnn modify give up

* trigger

* modify for better user experience

* change the default parameter to xpu=None

* Update bdk_demo.py

* Update fcn_xs.py

* Update test.py

* Update train.py

* Update bdk_demo.py

* Update bdk_demo.py

* modify review comments

* refine

* modify Readmes according to the changed code.

* finetune READMEs

* re-trigger ci

* re-trigger ci twice

* Add copyrights for third party licenses to license file (#13851)

* Fix Tree Reduction on new instance type p3dn.24xlarge (#13852)

* add fallback for gpu topology detection using CUDA 9.2

* add fallback for gpu topology detection using CUDA 9.2

* add log

* update 3rdparty to master

* add fallback for gpu topology detection using CUDA 9.2

* add log

* update 3rdparty to master

* bring 3rdparty packages to upstream/master

* rebase to master

* Update gpu_topology.h

* [Clojure] package infer tweaks (#13864)

* change object detection prediction to be a map

* change predictions to a map for image-classifiers

* change return types of the classifiers to be a map
- add tests for base classifier and with-ndarray as well

* tweak return types and inputs for predict
- add test for plain predict

* updated infer-classify examples

* adjust the infer/object detections tests

* tweak predictor test

* Feedback from @kedarbellare review

* put scaling back in

* put back predict so it can handle multiple inputs

* restore original functions signatures (remove first)

* Modifying clojure CNN text classification example (#13865)

* Modifying clojure CNN text classification example

* Small fixes

* Another minor fix

* adding tolerance to flaky test (#13850)

* adding tolerance

* retrigger ci

* retrigger ci

* Julia v0.7/1.0 support and drop v0.6 support (#12845)

* Fix cpp examples build on Mac. (#13826)

This is a regression of addning @rpath name to libmxnet.so on Mac,
example executable is not able to find libmxnet.so anymore.
Add @rpath search path to fix this issue.

* Fix launch bounds in spatial transformer (#13188)

* Fix launch bounds in spatial transformer

* Adding explanation in comment.

* Update example scripts classpath. (#13849)

* [MXNET-1177]Adding Scala Demo to be run as a part of Nightly CI (#13823)

* Adding Scala Demo to be run as a part of Nightly CI

* Addressed PR feedback : making a profile to fetch nightly jars only on CI

* Changed name from scalacidemo to scala_ci_demo

* Synchronized the scala-demo and java-demo for nightly CI runs

* Pruned the maven command to simply maven install

* changed running from ./.sh to bash .sh to be consistent

* Add CODEOWNERS for Julia package (#13872)

* fix ssd quantization script error (#13843)

* fix ssd quantization script error

* update readme for ssd

* move quantized SSD instructions from quantization/README.md to ssd/README.md

* update ssd readme and accuracy

* update readme for SSD-vGG16

* Rename to avoid merge conflict with upstream.

* Update submodule versions.

- update mkldnn and mshadow to version used by upstream master
- update ngraph-mxnet-bridge to current master

Renames nGraph README to follow MXnet conventions.

* Fix merge error for nGraph support in CMakeLists.txt

* Fixes CMake file error.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants