Basic CPU Kernel OMP selection based upon whether GPU has been used #7854

cjolivier01 · 2017-09-11T22:16:38Z

First iteration for performance enhancements
If GPU isn't used, then use OMP for running CPU kernels
GPU usage is triggerred by ThreadedEngine or NaiveEngine

Currently, the intended net effect of this PR is to allow for normal OMP behavior for GPU builds when the GPU is not used. More robust OMP thread management is forthcoming.

piiswrong · 2017-09-11T22:43:15Z

src/engine/threaded_engine.h

+#if MXNET_USE_CUDA
+    if(run_ctx.ctx.dev_mask() == gpu::kDevMask) {
+      // Signify to kernel that GPU is being used
+      mxnet::op::mxnet_op::KernelState::SetUsingGPU(true);


this should be done in gpu's lazy alloc queue

The queue doesn't know that it's executing GPU, right? Are you suggesting to set the queue to know that it should call this function? That seems kind of messy, right?

piiswrong · 2017-09-11T22:45:04Z

This needs to be combined with launching more cpu workers when using GPU to be useful.
There should be a thread_local variable for how many threads to use, not a true/false use_gpu flag

cjolivier01 · 2017-09-11T23:04:00Z

The complex logic to determine how many would be needed is coming in a separate PR

piiswrong · 2017-09-11T23:05:03Z

src/operator/mxnet_op.h

@@ -221,12 +236,23 @@ template<typename OP>
 struct Kernel<OP, cpu> {
  template<typename ...Args>
  inline static void Launch(mshadow::Stream<cpu> *s, int N, Args... args) {
-#if (MXNET_USE_CUDA == 0)
+#if MXNET_USE_CUDA == 0


we don't even need this right?

Probably not

* Disabling the test_CSVIter for now This test causing random failure while running on windows. Disabling it for now till we fix it. An git hub issue has been created to track it. * Update test_io.py * Update test_io.py

piiswrong · 2017-09-12T17:34:19Z

src/operator/mxnet_op.cc

@@ -0,0 +1,31 @@
+/*


remove file

piiswrong · 2017-09-14T05:22:45Z

src/engine/threaded_engine.h

@@ -293,6 +301,19 @@ class ThreadedEngine : public Engine {
    finished_cv_.notify_all();
  }

+  static int DefaultOMPThreadsPerWorker() {
+    int cores = std::thread::hardware_concurrency();


physical core or logical core?

I am changing in other branch to omp number of processors call.

piiswrong · 2017-09-14T05:23:00Z

src/engine/threaded_engine.h

+      cores = omp_get_num_threads();
+    } else {
+      // By default, leave one core to run the engine
+      --cores;


we may need to leave more threads

actually for CPU only case 1 is enough

What is "case 1"?

piiswrong · 2017-09-14T05:23:49Z

src/operator/mxnet_op.h

+        OP::Map(i, args...);
+      }
+    } else {
+      #pragma omp parallel for num_threads(omp_cores - 1)


I am changing to omp_threads only. Was leaving a thread for the engine. However in this PR I only wish to keep same OMP behavior other than consistent behavior between CPU and GPU builds.

Leaving 1 for the engine.
Eventually probably OMP will be, for example, divided by number of concurrent ops running. Also I am working on tuning which will likely come into play as well. For a later PR early next week.

cjolivier01 · 2017-09-14T05:32:13Z

Please be advised this change is only meant to have the previous behavior for CPU builds when running in GPU mode with GPU not used. More elegant OMP behavior is forthcoming in a later PR next week.

…. This is not changed from master branch. Trying a different format.

piiswrong · 2017-09-20T19:45:29Z

src/engine/threaded_engine.h

+    // TODO(cjolivier01): Programatically obtain hyperthreading count (if supported)
+    // Taking max including omp_get_max_threads() in case this implementation of OMP accounts for
+    // hyperthreading
+    return std::max(omp_get_max_threads(), omp_get_num_procs());


It may have been set by the environment variable, or it may have been set elsewhere to something lower. However, a previous call to set_max... in some library may have reduced it, but we want to either use a larger number in the environment (ie they wish to use hyperthreading number * procs), or the number of procs.
More OMP tuning is in coming PR (including recursion depth, etc)

How would user opt to use less number of threads than the number of cores?

Model serving where you have a separate webserver process.

Changed to allow environment override. omp_get_max_threads() can be implementation-specific and may take into account hyperthreading or whatever. Otherwise, we use # procs (per Eric)

…to optimize_basic_omp

cjolivier01 · 2017-10-09T22:02:19Z

This is an example of CI leaving artifacts from previous builds in the workspace on the builkd machine (ie lua-package/)

…to optimize_basic_omp

Trigger build

apache#7854 Unit test framework for C++ timing of generic operators. Activation operator converted to Kernel from MShadow. Performance improves (see below). OLD Timing: 50 iterations of 10 calls, shape = (1,1,28,28) Fully connected: Timing [Forward] 56.215 ms, avg: 0.11243 ms X 500 passes Fully connected: Timing [Backward] 69.322 ms, avg: 0.138644 ms X 500 passes Timing: 50 iterations of 10 calls, shape = (1,3,28,28) Fully connected: Timing [Forward] 24.187 ms, avg: 0.048374 ms X 500 passes Fully connected: Timing [Backward] 33.798 ms, avg: 0.067596 ms X 500 passes Timing: 50 iterations of 10 calls, shape = (50,1,18,32) Fully connected: Timing [Forward] 98.219 ms, avg: 0.196438 ms X 500 passes Fully connected: Timing [Backward] 35.933 ms, avg: 0.071866 ms X 500 passes Timing: 50 iterations of 10 calls, shape = (50,3,18,32) Fully connected: Timing [Forward] 346.737 ms, avg: 0.693474 ms X 500 passes Fully connected: Timing [Backward] 60.141 ms, avg: 0.120282 ms X 500 passes Timing: 50 iterations of 10 calls, shape = (20,3,128,128) Fully connected: Timing [Forward] 3607.84 ms, avg: 7.21567 ms X 500 passes Fully connected: Timing [Backward] 387.725 ms, avg: 0.77545 ms X 500 passes NEW Timing: 50 iterations of 10 calls, shape = (1,1,28,28) Fully connected: Timing [Forward] 44.111 ms, avg: 0.088222 ms X 500 passes Fully connected: Timing [Backward] 0.84 ms, avg: 0.00168 ms X 500 passes Timing: 50 iterations of 10 calls, shape = (1,3,28,28) Fully connected: Timing [Forward] 16.093 ms, avg: 0.032186 ms X 500 passes Fully connected: Timing [Backward] 1.419 ms, avg: 0.002838 ms X 500 passes Timing: 50 iterations of 10 calls, shape = (50,1,18,32) Fully connected: Timing [Forward] 137.882 ms, avg: 0.275764 ms X 500 passes Fully connected: Timing [Backward] 38.945 ms, avg: 0.07789 ms X 500 passes Timing: 50 iterations of 10 calls, shape = (50,3,18,32) Fully connected: Timing [Forward] 340.161 ms, avg: 0.680322 ms X 500 passes Fully connected: Timing [Backward] 68.256 ms, avg: 0.136512 ms X 500 passes Timing: 50 iterations of 10 calls, shape = (20,3,128,128) Fully connected: Timing [Forward] 3465.03 ms, avg: 6.93005 ms X 500 passes Fully connected: Timing [Backward] 322.912 ms, avg: 0.645824 ms X 500 passes - [ ] Passed code style checking (`make lint`) - [ ] Changes are complete (i.e. I finished coding on this PR) - [ ] All changes have test coverage - [ ] For user-facing API changes, API doc string has been updated. - [ ] To my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change - [ ] Feature1, tests, (and when applicable, API doc) - [ ] Feature2, tests, (and when applicable, API doc) - If this change is a backward incompatible change, why must this change be made. - Intersting edge cases to note here

cjolivier01 · 2017-10-11T22:26:21Z

Note, I have a couple of PR's stacked up behind this one...

…pache#7854) * Basic CPU Kernel OMP selection based upon whether GPU has been used * lint * Disabling the test_CSVIter for now (apache#7829) * Disabling the test_CSVIter for now This test causing random failure while running on windows. Disabling it for now till we fix it. An git hub issue has been created to track it. * Update test_io.py * Update test_io.py * Use OMP thread count as test in Kernel, set count for Kernel loop * lint * removed * Remove assert * Adjust DefaultOMPThreadsPerWorker * remove -1 from omp_cores * Trigger build * It is not clear why pylint claims that this is re-imported. It is not. This is not changed from master branch. Trying a different format. * lint * lint * Change getter/setter naming style * allow env override * check environment directly, since OMP_NUM_THREADS mnay have odd formatting (i.e. 3, 2"). * CR comments * Squashed commit of the following: commit ec704f1 Author: Olivier <coolivie@amazon.com> Date: Mon Sep 25 12:29:25 2017 -0700 Fix formatting commit 0218c49 Author: Olivier <coolivie@amazon.com> Date: Mon Sep 25 12:21:48 2017 -0700 Splitting unary ops commit 9abbba1 Author: Olivier <coolivie@amazon.com> Date: Mon Sep 25 11:38:04 2017 -0700 split unary * Update mxnet_predict0.cc * Update mxnet_predict0.cc * fix oversight with bracket * Binary scatter working on CPU and GPU * return unchanged * This test case is BS. I can't even tell what's wrong on the CI build because so many errors coming from this test. * inconsequential cleanup * Update test_kvstore.py * Update CMakeLists.txt * Update CMakeLists.txt trigger build * force fail * remove forced error * test clean every make * Test * Copy Jenkinsfile from upstream/master to fix the build. * logic was reversed * Update threaded_engine.h Trigger build * Trigger rebuild * Trigger build * Trigger build

#8232) * GPROF update, also include include/mxnet/*.h as sources for CLionwq * Added FindGperftools.cmake * Add option USE_GPERFTOOLS * Add option USE_GPERFTOOLS * Add option USE_GPERFTOOLS * USE_GPERFTOOLS off by default for now * Add Apache license to FindGperftools.cmake * Update CMakeLists.txt Try to use GPerftools or JEmalloc by default * Update CMakeLists.txt Off by default for now * internal labeling * gperftools and jemalloc * gperftools and jemalloc on by default * Fixing the Caught error (#8199) * Temporarily disable some unit tests to fix the build (#8253) * Temporarily disable the following unit tests that have been causing build failures: test_rms: This can be re-enabled once #8230 is fixed. test_autograd_save_memory: This can be re-enabled once #8211 is fixed. * OMP num threads 0->1 * remove check * Update documentation links to point to mxnet.incubator.apache.org Update documentation links to point to mxnet.incubator.apache.org * add export to gluon (#8212) * add export * fix * add test * fix nnvm * fix * ReleaseFeedback: License Files (#8247) * Updating license Headers * License changes * Sequential aug (#8243) * add sequentialAug * add type for castaug * modify docs * Basic CPU Kernel OMP selection based upon whether GPU has been used (#7854) * Basic CPU Kernel OMP selection based upon whether GPU has been used * lint * Disabling the test_CSVIter for now (#7829) * Disabling the test_CSVIter for now This test causing random failure while running on windows. Disabling it for now till we fix it. An git hub issue has been created to track it. * Update test_io.py * Update test_io.py * Use OMP thread count as test in Kernel, set count for Kernel loop * lint * removed * Remove assert * Adjust DefaultOMPThreadsPerWorker * remove -1 from omp_cores * Trigger build * It is not clear why pylint claims that this is re-imported. It is not. This is not changed from master branch. Trying a different format. * lint * lint * Change getter/setter naming style * allow env override * check environment directly, since OMP_NUM_THREADS mnay have odd formatting (i.e. 3, 2"). * CR comments * Squashed commit of the following: commit ec704f1 Author: Olivier <coolivie@amazon.com> Date: Mon Sep 25 12:29:25 2017 -0700 Fix formatting commit 0218c49 Author: Olivier <coolivie@amazon.com> Date: Mon Sep 25 12:21:48 2017 -0700 Splitting unary ops commit 9abbba1 Author: Olivier <coolivie@amazon.com> Date: Mon Sep 25 11:38:04 2017 -0700 split unary * Update mxnet_predict0.cc * Update mxnet_predict0.cc * fix oversight with bracket * Binary scatter working on CPU and GPU * return unchanged * This test case is BS. I can't even tell what's wrong on the CI build because so many errors coming from this test. * inconsequential cleanup * Update test_kvstore.py * Update CMakeLists.txt * Update CMakeLists.txt trigger build * force fail * remove forced error * test clean every make * Test * Copy Jenkinsfile from upstream/master to fix the build. * logic was reversed * Update threaded_engine.h Trigger build * Trigger rebuild * Trigger build * Trigger build * Multiplatform docker based builds (#7792) * Add dockerized multi-architecture build files * Add android arm64 build * Operators for sum(csr, axis=0) and sum(csr, axis=1) (#8174) * Add Infer storage for sparse slice operator * Remove unused files * Indentation fix and add gpu test for fallback * Change sum builtin to py_sum * Add sum_axis(csr,axis=0)=dense and sum(csr,axis=1)=dense operator * Documentation changes for sparse * Add fallback unittest for keepdims and exclude * PR review based changes : * Fix CHECK_NE * Change in_stype to int * Using const int instead of int * Initialize mid with the start * Generalizing * OMP num threads 0->1 * remove check

apache#8232) * GPROF update, also include include/mxnet/*.h as sources for CLionwq * Added FindGperftools.cmake * Add option USE_GPERFTOOLS * Add option USE_GPERFTOOLS * Add option USE_GPERFTOOLS * USE_GPERFTOOLS off by default for now * Add Apache license to FindGperftools.cmake * Update CMakeLists.txt Try to use GPerftools or JEmalloc by default * Update CMakeLists.txt Off by default for now * internal labeling * gperftools and jemalloc * gperftools and jemalloc on by default * Fixing the Caught error (apache#8199) * Temporarily disable some unit tests to fix the build (apache#8253) * Temporarily disable the following unit tests that have been causing build failures: test_rms: This can be re-enabled once apache#8230 is fixed. test_autograd_save_memory: This can be re-enabled once apache#8211 is fixed. * OMP num threads 0->1 * remove check * Update documentation links to point to mxnet.incubator.apache.org Update documentation links to point to mxnet.incubator.apache.org * add export to gluon (apache#8212) * add export * fix * add test * fix nnvm * fix * ReleaseFeedback: License Files (apache#8247) * Updating license Headers * License changes * Sequential aug (apache#8243) * add sequentialAug * add type for castaug * modify docs * Basic CPU Kernel OMP selection based upon whether GPU has been used (apache#7854) * Basic CPU Kernel OMP selection based upon whether GPU has been used * lint * Disabling the test_CSVIter for now (apache#7829) * Disabling the test_CSVIter for now This test causing random failure while running on windows. Disabling it for now till we fix it. An git hub issue has been created to track it. * Update test_io.py * Update test_io.py * Use OMP thread count as test in Kernel, set count for Kernel loop * lint * removed * Remove assert * Adjust DefaultOMPThreadsPerWorker * remove -1 from omp_cores * Trigger build * It is not clear why pylint claims that this is re-imported. It is not. This is not changed from master branch. Trying a different format. * lint * lint * Change getter/setter naming style * allow env override * check environment directly, since OMP_NUM_THREADS mnay have odd formatting (i.e. 3, 2"). * CR comments * Squashed commit of the following: commit ec704f1 Author: Olivier <coolivie@amazon.com> Date: Mon Sep 25 12:29:25 2017 -0700 Fix formatting commit 0218c49 Author: Olivier <coolivie@amazon.com> Date: Mon Sep 25 12:21:48 2017 -0700 Splitting unary ops commit 9abbba1 Author: Olivier <coolivie@amazon.com> Date: Mon Sep 25 11:38:04 2017 -0700 split unary * Update mxnet_predict0.cc * Update mxnet_predict0.cc * fix oversight with bracket * Binary scatter working on CPU and GPU * return unchanged * This test case is BS. I can't even tell what's wrong on the CI build because so many errors coming from this test. * inconsequential cleanup * Update test_kvstore.py * Update CMakeLists.txt * Update CMakeLists.txt trigger build * force fail * remove forced error * test clean every make * Test * Copy Jenkinsfile from upstream/master to fix the build. * logic was reversed * Update threaded_engine.h Trigger build * Trigger rebuild * Trigger build * Trigger build * Multiplatform docker based builds (apache#7792) * Add dockerized multi-architecture build files * Add android arm64 build * Operators for sum(csr, axis=0) and sum(csr, axis=1) (apache#8174) * Add Infer storage for sparse slice operator * Remove unused files * Indentation fix and add gpu test for fallback * Change sum builtin to py_sum * Add sum_axis(csr,axis=0)=dense and sum(csr,axis=1)=dense operator * Documentation changes for sparse * Add fallback unittest for keepdims and exclude * PR review based changes : * Fix CHECK_NE * Change in_stype to int * Using const int instead of int * Initialize mid with the start * Generalizing * OMP num threads 0->1 * remove check

#8232) * GPROF update, also include include/mxnet/*.h as sources for CLionwq * Added FindGperftools.cmake * Add option USE_GPERFTOOLS * Add option USE_GPERFTOOLS * Add option USE_GPERFTOOLS * USE_GPERFTOOLS off by default for now * Add Apache license to FindGperftools.cmake * Update CMakeLists.txt Try to use GPerftools or JEmalloc by default * Update CMakeLists.txt Off by default for now * internal labeling * gperftools and jemalloc * gperftools and jemalloc on by default * Fixing the Caught error (#8199) * Temporarily disable some unit tests to fix the build (#8253) * Temporarily disable the following unit tests that have been causing build failures: test_rms: This can be re-enabled once #8230 is fixed. test_autograd_save_memory: This can be re-enabled once #8211 is fixed. * OMP num threads 0->1 * remove check * Update documentation links to point to mxnet.incubator.apache.org Update documentation links to point to mxnet.incubator.apache.org * add export to gluon (#8212) * add export * fix * add test * fix nnvm * fix * ReleaseFeedback: License Files (#8247) * Updating license Headers * License changes * Sequential aug (#8243) * add sequentialAug * add type for castaug * modify docs * Basic CPU Kernel OMP selection based upon whether GPU has been used (#7854) * Basic CPU Kernel OMP selection based upon whether GPU has been used * lint * Disabling the test_CSVIter for now (#7829) * Disabling the test_CSVIter for now This test causing random failure while running on windows. Disabling it for now till we fix it. An git hub issue has been created to track it. * Update test_io.py * Update test_io.py * Use OMP thread count as test in Kernel, set count for Kernel loop * lint * removed * Remove assert * Adjust DefaultOMPThreadsPerWorker * remove -1 from omp_cores * Trigger build * It is not clear why pylint claims that this is re-imported. It is not. This is not changed from master branch. Trying a different format. * lint * lint * Change getter/setter naming style * allow env override * check environment directly, since OMP_NUM_THREADS mnay have odd formatting (i.e. 3, 2"). * CR comments * Squashed commit of the following: commit ec704f1 Author: Olivier <coolivie@amazon.com> Date: Mon Sep 25 12:29:25 2017 -0700 Fix formatting commit 0218c49 Author: Olivier <coolivie@amazon.com> Date: Mon Sep 25 12:21:48 2017 -0700 Splitting unary ops commit 9abbba1 Author: Olivier <coolivie@amazon.com> Date: Mon Sep 25 11:38:04 2017 -0700 split unary * Update mxnet_predict0.cc * Update mxnet_predict0.cc * fix oversight with bracket * Binary scatter working on CPU and GPU * return unchanged * This test case is BS. I can't even tell what's wrong on the CI build because so many errors coming from this test. * inconsequential cleanup * Update test_kvstore.py * Update CMakeLists.txt * Update CMakeLists.txt trigger build * force fail * remove forced error * test clean every make * Test * Copy Jenkinsfile from upstream/master to fix the build. * logic was reversed * Update threaded_engine.h Trigger build * Trigger rebuild * Trigger build * Trigger build * Multiplatform docker based builds (#7792) * Add dockerized multi-architecture build files * Add android arm64 build * Operators for sum(csr, axis=0) and sum(csr, axis=1) (#8174) * Add Infer storage for sparse slice operator * Remove unused files * Indentation fix and add gpu test for fallback * Change sum builtin to py_sum * Add sum_axis(csr,axis=0)=dense and sum(csr,axis=1)=dense operator * Documentation changes for sparse * Add fallback unittest for keepdims and exclude * PR review based changes : * Fix CHECK_NE * Change in_stype to int * Using const int instead of int * Initialize mid with the start * Generalizing * OMP num threads 0->1 * remove check

…pache#7854) * Basic CPU Kernel OMP selection based upon whether GPU has been used * lint * Disabling the test_CSVIter for now (apache#7829) * Disabling the test_CSVIter for now This test causing random failure while running on windows. Disabling it for now till we fix it. An git hub issue has been created to track it. * Update test_io.py * Update test_io.py * Use OMP thread count as test in Kernel, set count for Kernel loop * lint * removed * Remove assert * Adjust DefaultOMPThreadsPerWorker * remove -1 from omp_cores * Trigger build * It is not clear why pylint claims that this is re-imported. It is not. This is not changed from master branch. Trying a different format. * lint * lint * Change getter/setter naming style * allow env override * check environment directly, since OMP_NUM_THREADS mnay have odd formatting (i.e. 3, 2"). * CR comments * Squashed commit of the following: commit ec704f1 Author: Olivier <coolivie@amazon.com> Date: Mon Sep 25 12:29:25 2017 -0700 Fix formatting commit 0218c49 Author: Olivier <coolivie@amazon.com> Date: Mon Sep 25 12:21:48 2017 -0700 Splitting unary ops commit 9abbba1 Author: Olivier <coolivie@amazon.com> Date: Mon Sep 25 11:38:04 2017 -0700 split unary * Update mxnet_predict0.cc * Update mxnet_predict0.cc * fix oversight with bracket * Binary scatter working on CPU and GPU * return unchanged * This test case is BS. I can't even tell what's wrong on the CI build because so many errors coming from this test. * inconsequential cleanup * Update test_kvstore.py * Update CMakeLists.txt * Update CMakeLists.txt trigger build * force fail * remove forced error * test clean every make * Test * Copy Jenkinsfile from upstream/master to fix the build. * logic was reversed * Update threaded_engine.h Trigger build * Trigger rebuild * Trigger build * Trigger build

apache#8232) * GPROF update, also include include/mxnet/*.h as sources for CLionwq * Added FindGperftools.cmake * Add option USE_GPERFTOOLS * Add option USE_GPERFTOOLS * Add option USE_GPERFTOOLS * USE_GPERFTOOLS off by default for now * Add Apache license to FindGperftools.cmake * Update CMakeLists.txt Try to use GPerftools or JEmalloc by default * Update CMakeLists.txt Off by default for now * internal labeling * gperftools and jemalloc * gperftools and jemalloc on by default * Fixing the Caught error (apache#8199) * Temporarily disable some unit tests to fix the build (apache#8253) * Temporarily disable the following unit tests that have been causing build failures: test_rms: This can be re-enabled once apache#8230 is fixed. test_autograd_save_memory: This can be re-enabled once apache#8211 is fixed. * OMP num threads 0->1 * remove check * Update documentation links to point to mxnet.incubator.apache.org Update documentation links to point to mxnet.incubator.apache.org * add export to gluon (apache#8212) * add export * fix * add test * fix nnvm * fix * ReleaseFeedback: License Files (apache#8247) * Updating license Headers * License changes * Sequential aug (apache#8243) * add sequentialAug * add type for castaug * modify docs * Basic CPU Kernel OMP selection based upon whether GPU has been used (apache#7854) * Basic CPU Kernel OMP selection based upon whether GPU has been used * lint * Disabling the test_CSVIter for now (apache#7829) * Disabling the test_CSVIter for now This test causing random failure while running on windows. Disabling it for now till we fix it. An git hub issue has been created to track it. * Update test_io.py * Update test_io.py * Use OMP thread count as test in Kernel, set count for Kernel loop * lint * removed * Remove assert * Adjust DefaultOMPThreadsPerWorker * remove -1 from omp_cores * Trigger build * It is not clear why pylint claims that this is re-imported. It is not. This is not changed from master branch. Trying a different format. * lint * lint * Change getter/setter naming style * allow env override * check environment directly, since OMP_NUM_THREADS mnay have odd formatting (i.e. 3, 2"). * CR comments * Squashed commit of the following: commit ec704f1 Author: Olivier <coolivie@amazon.com> Date: Mon Sep 25 12:29:25 2017 -0700 Fix formatting commit 0218c49 Author: Olivier <coolivie@amazon.com> Date: Mon Sep 25 12:21:48 2017 -0700 Splitting unary ops commit 9abbba1 Author: Olivier <coolivie@amazon.com> Date: Mon Sep 25 11:38:04 2017 -0700 split unary * Update mxnet_predict0.cc * Update mxnet_predict0.cc * fix oversight with bracket * Binary scatter working on CPU and GPU * return unchanged * This test case is BS. I can't even tell what's wrong on the CI build because so many errors coming from this test. * inconsequential cleanup * Update test_kvstore.py * Update CMakeLists.txt * Update CMakeLists.txt trigger build * force fail * remove forced error * test clean every make * Test * Copy Jenkinsfile from upstream/master to fix the build. * logic was reversed * Update threaded_engine.h Trigger build * Trigger rebuild * Trigger build * Trigger build * Multiplatform docker based builds (apache#7792) * Add dockerized multi-architecture build files * Add android arm64 build * Operators for sum(csr, axis=0) and sum(csr, axis=1) (apache#8174) * Add Infer storage for sparse slice operator * Remove unused files * Indentation fix and add gpu test for fallback * Change sum builtin to py_sum * Add sum_axis(csr,axis=0)=dense and sum(csr,axis=1)=dense operator * Documentation changes for sparse * Add fallback unittest for keepdims and exclude * PR review based changes : * Fix CHECK_NE * Change in_stype to int * Using const int instead of int * Initialize mid with the start * Generalizing * OMP num threads 0->1 * remove check

Basic CPU Kernel OMP selection based upon whether GPU has been used

0dd7e61

cjolivier01 requested review from mli and piiswrong as code owners September 11, 2017 22:16

piiswrong reviewed Sep 11, 2017

View reviewed changes

Olivier and others added 5 commits September 11, 2017 16:06

lint

5ad31e3

Disabling the test_CSVIter for now (apache#7829)

a74a7ef

* Disabling the test_CSVIter for now This test causing random failure while running on windows. Disabling it for now till we fix it. An git hub issue has been created to track it. * Update test_io.py * Update test_io.py

Merge remote-tracking branch 'apache/master' into optimize_basic_omp

6d3ed8d

Use OMP thread count as test in Kernel, set count for Kernel loop

7994d0d

lint

f0ff547

piiswrong reviewed Sep 12, 2017

View reviewed changes

Olivier added 3 commits September 12, 2017 11:17

removed

b55b470

Merge remote-tracking branch 'apache/master' into optimize_basic_omp

388f430

Remove assert

7cb8a57

piiswrong reviewed Sep 14, 2017

View reviewed changes

Olivier added 9 commits September 14, 2017 08:07

Merge remote-tracking branch 'apache/master' into optimize_basic_omp

fc70a1a

Adjust DefaultOMPThreadsPerWorker

f4e59ce

remove -1 from omp_cores

9857eb8

Trigger build

f512139

It is not clear why pylint claims that this is re-imported. It is not…

f8c3a53

…. This is not changed from master branch. Trying a different format.

lint

bafd91b

Merge remote-tracking branch 'apache/master' into optimize_basic_omp

5796f7d

lint

2d2d92f

Merge remote-tracking branch 'apache/master' into optimize_basic_omp

33d3a21

piiswrong reviewed Sep 20, 2017

View reviewed changes

Olivier added 6 commits October 9, 2017 09:13

Merge branch 'optimize_basic_omp' of github.com:/cjolivier01/mxnet in…

03e4437

…to optimize_basic_omp

force fail

8e3cf54

remove forced error

ec4dbe6

test clean every make

702930b

Merge remote-tracking branch 'apache/master' into optimize_basic_omp

ce84554

Test

e6cad58

indhub and others added 8 commits October 10, 2017 05:56

Copy Jenkinsfile from upstream/master to fix the build.

99083f1

Merge remote-tracking branch 'apache/master' into optimize_basic_omp

6746828

Merge branch 'optimize_basic_omp' of github.com:/cjolivier01/mxnet in…

3a7176e

…to optimize_basic_omp

logic was reversed

f0682a2

Update threaded_engine.h

95d7d7b

Trigger build

Merge remote-tracking branch 'apache/master' into optimize_basic_omp

5f4554c

Trigger rebuild

ead033c

Merge remote-tracking branch 'apache/master' into optimize_basic_omp

72c2345

cjolivier01 mentioned this pull request Oct 11, 2017

Unit test framework for C++ timing of generic operators and activation improvement. #8228

Closed

7 tasks

Olivier and others added 3 commits October 12, 2017 09:53

Trigger build

84569a5

Merge branch 'master' into optimize_basic_omp

9f878ad

Trigger build

a8d39b7

indhub mentioned this pull request Oct 12, 2017

WIP: Julia CI build #8175

Merged

piiswrong merged commit 43234d0 into apache:master Oct 13, 2017

cjolivier01 deleted the optimize_basic_omp branch October 13, 2017 23:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic CPU Kernel OMP selection based upon whether GPU has been used #7854

Basic CPU Kernel OMP selection based upon whether GPU has been used #7854

cjolivier01 commented Sep 11, 2017 •

edited

Loading

piiswrong Sep 11, 2017 •

edited

Loading

cjolivier01 Sep 11, 2017

cjolivier01 Sep 11, 2017

piiswrong commented Sep 11, 2017

cjolivier01 commented Sep 11, 2017

piiswrong Sep 11, 2017

cjolivier01 Sep 11, 2017

piiswrong Sep 12, 2017

cjolivier01 Sep 12, 2017

piiswrong Sep 14, 2017

cjolivier01 Sep 14, 2017

piiswrong Sep 14, 2017

piiswrong Sep 14, 2017

cjolivier01 Sep 14, 2017

cjolivier01 Sep 14, 2017

piiswrong Sep 14, 2017

cjolivier01 Sep 14, 2017

cjolivier01 Sep 14, 2017

cjolivier01 commented Sep 14, 2017

piiswrong Sep 20, 2017

cjolivier01 Sep 20, 2017

szha Sep 20, 2017

szha Sep 20, 2017

cjolivier01 Sep 20, 2017

cjolivier01 commented Oct 9, 2017

cjolivier01 commented Oct 11, 2017 •

edited

Loading

Basic CPU Kernel OMP selection based upon whether GPU has been used #7854

Basic CPU Kernel OMP selection based upon whether GPU has been used #7854

Conversation

cjolivier01 commented Sep 11, 2017 • edited Loading

piiswrong Sep 11, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piiswrong commented Sep 11, 2017

cjolivier01 commented Sep 11, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cjolivier01 commented Sep 14, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cjolivier01 commented Oct 9, 2017

cjolivier01 commented Oct 11, 2017 • edited Loading

cjolivier01 commented Sep 11, 2017 •

edited

Loading

piiswrong Sep 11, 2017 •

edited

Loading

cjolivier01 commented Oct 11, 2017 •

edited

Loading