[MXNET-729] Scala Examples memory leak fix #12232

lanking520 · 2018-08-17T21:50:25Z

Description

Currently the Scala integration test running are strong influenced by CUDA memory not enough. In order to address that issue, this PR gives a test run on the NDArrayCollector created by @yzhliu.
@andrewfayres @nswamy

related CI failure:
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-11753/9/pipeline

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-11753/10/pipeline

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

yzhliu · 2018-08-22T01:18:24Z

scala-package/examples/src/test/scala/org/apache/mxnetexamples/rnn/ExampleRNNSuite.scala

@@ -50,26 +49,32 @@ class ExampleRNNSuite extends FunSuite with BeforeAndAfterAll {
      System.getenv("SCALA_TEST_ON_GPU").toInt == 1) {
      ctx = Context.gpu()
    }
-    LstmBucketing.runTraining(tempDirPath + "/RNN/sherlockholmes.train.txt",
+    NDArrayCollector.auto().withScope {
+      LstmBucketing.runTraining(tempDirPath + "/RNN/sherlockholmes.train.txt",


it is better to have the collector inside runTraining, like that you do in GanMnist.scala. Otherwise when the # of loop increases, too many ndarrays will be stored temporarily in the collector here.
But if you just mean to get rid of mem leak in CI, then this is fine.

As you said, this piece of code is just used to improve the memory leak issues in the CI.

@lanking520 Can you please make the change that Yizhi is asking? lets do it right because this will become a pattern in other parts of the code.

@nswamy Depends on the purpose of the action:

Improving CI, this is the right action to wrap from outside

Improving model itself, then need to change from inside.

nswamy · 2018-08-22T20:59:13Z

scala-package/examples/src/test/scala/org/apache/mxnetexamples/gan/GanExampleSuite.scala

@@ -44,7 +44,9 @@ class GanExampleSuite extends FunSuite with BeforeAndAfterAll{

        val context = Context.gpu()

-        val output = GanMnist.runTraining(modelDirPath, context, modelDirPath, 5)
+        val output = NDArrayCollector.auto().withScope {


I don't think you need NDCollectors here. It should be sufficient to just have it inside the training loop(for each epoch)

lanking520 · 2018-08-22T21:19:27Z

I only add a big and giant {} in every example, not changing things inside.

apache#12232) * initial fix for RNN * add CI test * ignore the test due to memory leaks * release the GAN beast * enable rnn * add collector and dispose * revert the hacky thing after rebase * rename with inference * add collector in some examples * add experimental tag and comments * change the scope of the NDArrayCollector * apply final changes... * fix scalastyle

lanking520 requested review from nswamy and yzhliu as code owners August 17, 2018 21:50

lanking520 force-pushed the example-memory branch from 0b8b18f to 3d1b8c1 Compare August 17, 2018 22:04

lanking520 changed the title ~~[MXNET-729][WIP] Scala Examples memory leak fix~~ [MXNET-729] Scala Examples memory leak fix Aug 20, 2018

lanking520 force-pushed the example-memory branch from f2eaa79 to af25f54 Compare August 21, 2018 18:23

yzhliu reviewed Aug 22, 2018

View reviewed changes

lanking520 added 10 commits August 22, 2018 13:34

initial fix for RNN

60c2938

add CI test

f377edb

ignore the test due to memory leaks

cf13768

release the GAN beast

5a094c0

enable rnn

395ba35

add collector and dispose

59a90ca

revert the hacky thing after rebase

85d72b4

rename with inference

84771c7

add collector in some examples

1a103f6

add experimental tag and comments

f7e0485

lanking520 force-pushed the example-memory branch from cf82dba to f7e0485 Compare August 22, 2018 20:35

nswamy reviewed Aug 22, 2018

View reviewed changes

lanking520 added 2 commits August 22, 2018 14:10

change the scope of the NDArrayCollector

9bdebc5

apply final changes...

b1e6a31

nswamy approved these changes Aug 22, 2018

View reviewed changes

fix scalastyle

60ab6ef

nswamy merged commit 2f177d8 into apache:master Aug 23, 2018

lanking520 deleted the example-memory branch September 19, 2018 23:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MXNET-729] Scala Examples memory leak fix #12232

[MXNET-729] Scala Examples memory leak fix #12232

lanking520 commented Aug 17, 2018 •

edited

Loading

yzhliu Aug 22, 2018

lanking520 Aug 22, 2018

nswamy Aug 22, 2018

lanking520 Aug 22, 2018

nswamy Aug 22, 2018

lanking520 commented Aug 22, 2018 •

edited

Loading

[MXNET-729] Scala Examples memory leak fix #12232

[MXNET-729] Scala Examples memory leak fix #12232

Conversation

lanking520 commented Aug 17, 2018 • edited Loading

Description

Checklist

Essentials

yzhliu Aug 22, 2018

Choose a reason for hiding this comment

lanking520 Aug 22, 2018

Choose a reason for hiding this comment

nswamy Aug 22, 2018

Choose a reason for hiding this comment

lanking520 Aug 22, 2018

Choose a reason for hiding this comment

nswamy Aug 22, 2018

Choose a reason for hiding this comment

lanking520 commented Aug 22, 2018 • edited Loading

lanking520 commented Aug 17, 2018 •

edited

Loading

lanking520 commented Aug 22, 2018 •

edited

Loading