Skip to content
This repository has been archived by the owner on Jan 3, 2023. It is now read-only.

Merge in upstream master through 4fe546 #543

Merged
merged 437 commits into from
Jan 28, 2019
Merged

Merge in upstream master through 4fe546 #543

merged 437 commits into from
Jan 28, 2019

Conversation

cconvey
Copy link
Contributor

@cconvey cconvey commented Jan 17, 2019

Description

(Brief description on what this PR is about)

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

TaoLv and others added 30 commits November 22, 2018 19:42
* fix quantized pooling and enable it in INT8 SqueezeNet

* add test

* fix test

* address review comments

* refine the test for quantized pooling
* edge_id op csr forward on CPU (#34)

* add node subgraph generator. (#35)

* create DGLSubgraph.

* fix.

* return old eids in node_subgraph.

* accelerate subgraph construction.

* Add neighborhood op (#37)

* add csr_neighborhood op

* update neighborhood sample

* Update csr_neighborhood_sample-inl.h

* Update csr_neighborhood_sample-inl.h

* Update csr_neighborhood_sample.cc

* add graph compact operator.

* fix a bug in dgl_subgraph.

* fix a bug in dgl_graph_compact.

* Update csr sample op (#39)

* add csr_neighborhood op

* update neighborhood sample

* Update csr_neighborhood_sample-inl.h

* Update csr_neighborhood_sample-inl.h

* Update csr_neighborhood_sample.cc

* Update csr_neighborhood_sample-inl.h

* Update csr_neighborhood_sample.cc

* Update csr_neighborhood_sample-inl.h

* remove space.

* move to dgl_graph to contrib.

* move code.

* move edge id.

* fix compilation error.

* add test for subgraph.

* cleanup.

* fix.

* fix.

* fix compile error.

* fix compile error.

* fix compile error.

* fix.

* add operator doc.

* remove graph_compact

* update doc.

* address comments.

* retrigger.

* address comments.

* retrigger

* fix a bug in test.

* retrigger

* add check_format
* Updated the paths for images

* Empty commit

* Empty commit

* Nudge to CI
* add flag to disable mkldnn cache

* update docs

* fix typos

* update var name

* fix ordering

* set cache size

* fix log message

* update docs

* fix lint

* fix lint

* fix comparison

* update method name

* fix missing

* fix logging

* remove random item when cache exceeded

* update helper name

* update hash namespace

* make ophash template

* udpate function params

* fix return

* fix return

* update return for helper

* chagne class to typename

* add typename

* fix lint

* update doc

* pass ptr to cache

* retrigger

* retrigger

* retrigger

* change env var name to MXNET_MKLDNN_CACHE_NUM

* fix log env name

* retrigger
* Initial website documentation for Java API

* Changing paths to be relative

* Refactoring Java API website landing page

* Update Java web docs based on feedback

* Minor formatting fixes

* Update maven repo to nightly build so that java will be available prior to 1.4.0 release

* Adding java tutorial index to test_sanity_tutorials whitelist

* Fix link to javadocs

* Fix javadoc for infer package and minor install doc fix

* Minor path fix
…(#13402)

* Replace mxnetci dockcross with public dockcross due to missing image

* Remove source lists change

* Disable Jetson

* Move to mxnetcipinned
* Correct shapes of images in cifar10 and cifar100

cifar10 and cifar100 have 3 channels

* Retrigger build
* initial modification recommender

* Recommender updates

* fix notebooks

* Update README.md

* trigger build

* Update README.md

* Retrigger build
* improving multi-processing reliability for gluon dataloader

I found some multi-processing-related issues in the Gluon  DataLoader.

 1) Each time a _MultiWorkerIter shuts down, it could leave some dangling processes. The shutdown mechanism could not guarantee that all worker processes can be terminated. As a result, after running for several epochs, more and more dangling processes will stay there.

  This problem barely happens during training. In this case, there is a decent time interval between the last-batch data prefetching and the _MultiWorkerIter's shutting down).
  But the problem frequently happens 1) when I stop the iter before the end of an epoch, and 2) when I use the DataLoader for a data loading service and load the data as fast as possible. In both cases, the time interval between the most recent data prefetching and the iter shutdown are short. I guess that the _MultiWorkerIter iter is unable to shut down properly during active data prefetching.

  To fix this, I explicitly terminate the worker processes inside the shutdown function.

  2) When loading data fast (still mostly during testing and data serving), there seems to be a risk of data racing. The data iter uses a _MultiWorkerIter to cache prefetched data, but the dict does not seem to be thread-safe for concurrent inserting and deleting elements. So occasionally, the data can be missing from the  dict.

  To prevent this, I use a scope lock to guard the dict access.

* do not wait for the workers to join, and kill any alive wokers as soon as possible
* Fix ONNX export to support multi-output graphs

* Add ONNX unit-test

* Added multi-output shape inference.

- Removed unnecessary forward_pass() call
- Modified infer_output_shape to return multiple shapes for multiple outputs as well as output names.

* Fixed pylint
* dynamic omp for dot

update heuristic

* add doc

* Update mxnet_op.h

* Update dot-inl.h
- Update notebook to avoid divide by 0 causing a warning.
- Add MXBoard dependency.
* Minor fixes to documentation

* Updated the Maven Repository URL to point to staging repo
* fix inception-bn and training acc issue

* add parameter initialization, fix lint

* fix comparison

* change optimizer to sgd

* update sgd and update model name

* add inception_bn in jenkins build

* make max epoch an argument

* remove inception_bn test

* trigger ci

* remove ci test

* trigger ci
* add instruction to get the data and fix typo

* fix typo

* update file name

* trigger CI

* add unit_test for unit_test_mlp_csv

* add mlp_csv to jenkinsfile

* revert jenkinsfile to another PR

* trigger CI

* trigger CI
* Fix scaladoc and javadoc errors

* Stop on errors starting on scala 1.3.x build
…420)

* Adding Java to ubuntu setup install page and minor fixes to other java api docs

* Improving javadoc for java-api predictor class

Mostly documentation changes
* randint operator add along with add optional tag to params

* register param

* lint space issue

* randn issue fix

* uniform_int_distribution doesn't support int8, uint8 fix

* dtype ftype

* ftype to dtype - invalid template arg

* fix template arg issue

* test with int dtype for windows

* removed int8,uint8 from test

* gpu implementation

* gpu engine state diff

* removed gpu support

* empty commit

* temporary fix : batchnorm flaky test skip

* removed randn symbol specific code since other PR is on it

* revert ndarray/randn for compatibility

* added unit test for checking extremes and uniform distribution for sufficient samples

* increased the high val

* int32 to int64 support, indentation fix, check for optype correctly based on type of random function

* gpu support, revert finfertype using template specialization, remove defaults, prints, test other low high val

* fix for invalid template arg by checking for int32,int64

* gpu randint in random_generator

* sample_uniform issue and param, removed old flaky test skip line

* replaced discrete_uniform function by rand_int64 for consistency

* formula update and removed itype

* change ctx to include gpu, randint samepl_op.cu typo

* trigger ci

* doc fix, check fix, whitespace remove

* added the without dtype testcase
* fix on ubuntu

* add readme instruction

* fix intellij Tutorials

* fix intelliJ tutorial

* fix the document

* update demo

* revert the change on intelliJ tutorial

* fix make process

* fix documentation
* Add quantized concat

* Fix non-mkldnn build

* Add size check for MKLDNNQuantizedConcatForward

* use all capital for constant

* Rename constant with Google C++ style.

* Address apeforest comments

* Address apeforest comments

* fix lint

* Add frontend interface.

* Retrigger CI
* Add ARMv7 builds to dev_menu.py

* Add Python3 CPU Intel MKLDNN unittests to dev_menu
lanking520 and others added 20 commits January 10, 2019 19:56
* fix Makefile for rpkg

* update R and roxygen2 requirements

* add roxygen requirement

* add roxygen requirement
* Prevent timeouts when rebuilding containers with docker.
Increase timeout from 120 to 180 for pipelines

* Increase docker cache timeout

* Increase timeout also for docs

* limit parallel builds to 10
…y example (#12498)

* example testcase modified

* rcnn file add

* license add

* license init

* CI test trigger

* rcnn modify give up

* trigger

* modify for better user experience

* change the default parameter to xpu=None

* Update bdk_demo.py

* Update fcn_xs.py

* Update test.py

* Update train.py

* Update bdk_demo.py

* Update bdk_demo.py

* modify review comments

* refine

* modify Readmes according to the changed code.

* finetune READMEs

* re-trigger ci

* re-trigger ci twice
* add fallback for gpu topology detection using CUDA 9.2

* add fallback for gpu topology detection using CUDA 9.2

* add log

* update 3rdparty to master

* add fallback for gpu topology detection using CUDA 9.2

* add log

* update 3rdparty to master

* bring 3rdparty packages to upstream/master

* rebase to master

* Update gpu_topology.h
* change object detection prediction to be a map

* change predictions to a map for image-classifiers

* change return types of the classifiers to be a map
- add tests for base classifier and with-ndarray as well

* tweak return types and inputs for predict
- add test for plain predict

* updated infer-classify examples

* adjust the infer/object detections tests

* tweak predictor test

* Feedback from @kedarbellare review

* put scaling back in

* put back predict so it can handle multiple inputs

* restore original functions signatures (remove first)
* Modifying clojure CNN text classification example

* Small fixes

* Another minor fix
* adding tolerance

* retrigger ci

* retrigger ci
This is a regression of addning @rpath name to libmxnet.so on Mac,
example executable is not able to find libmxnet.so anymore.
Add @rpath search path to fix this issue.
* Fix launch bounds in spatial transformer

* Adding explanation in comment.
* Adding Scala Demo to be run as a part of Nightly CI

* Addressed PR feedback : making a profile to fetch nightly jars only on CI

* Changed name from scalacidemo to scala_ci_demo

* Synchronized the scala-demo and java-demo for nightly CI runs

* Pruned the maven command to simply maven install

* changed running from ./.sh to bash .sh to be consistent
* fix ssd quantization script error

* update readme for ssd

* move quantized SSD instructions from quantization/README.md to ssd/README.md

* update ssd readme and accuracy

* update readme for SSD-vGG16
- update mkldnn and mshadow to version used by upstream master
- update ngraph-mxnet-bridge to current master

Renames nGraph README to follow MXnet conventions.
@cconvey cconvey requested a review from mbrookhart January 17, 2019 20:51
Copy link
Contributor

@mbrookhart mbrookhart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the cmake build got lost in the merge?

CMakeLists.txt Outdated Show resolved Hide resolved
@cconvey cconvey merged commit 8b3f6a6 into master Jan 28, 2019
@cconvey cconvey deleted the cconvey/mfi2 branch January 28, 2019 22:46
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.