-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[MXNET-729] Scala Examples memory leak fix #12232
Conversation
0b8b18f
to
3d1b8c1
Compare
f2eaa79
to
af25f54
Compare
@@ -50,26 +49,32 @@ class ExampleRNNSuite extends FunSuite with BeforeAndAfterAll { | |||
System.getenv("SCALA_TEST_ON_GPU").toInt == 1) { | |||
ctx = Context.gpu() | |||
} | |||
LstmBucketing.runTraining(tempDirPath + "/RNN/sherlockholmes.train.txt", | |||
NDArrayCollector.auto().withScope { | |||
LstmBucketing.runTraining(tempDirPath + "/RNN/sherlockholmes.train.txt", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is better to have the collector inside runTraining
, like that you do in GanMnist.scala. Otherwise when the # of loop increases, too many ndarrays will be stored temporarily in the collector here.
But if you just mean to get rid of mem leak in CI, then this is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you said, this piece of code is just used to improve the memory leak issues in the CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lanking520 Can you please make the change that Yizhi is asking? lets do it right because this will become a pattern in other parts of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nswamy Depends on the purpose of the action:
- Improving CI, this is the right action to wrap from outside
- Improving model itself, then need to change from inside.
cf82dba
to
f7e0485
Compare
@@ -44,7 +44,9 @@ class GanExampleSuite extends FunSuite with BeforeAndAfterAll{ | |||
|
|||
val context = Context.gpu() | |||
|
|||
val output = GanMnist.runTraining(modelDirPath, context, modelDirPath, 5) | |||
val output = NDArrayCollector.auto().withScope { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need NDCollectors here. It should be sufficient to just have it inside the training loop(for each epoch)
I only add a big and giant {} in every example, not changing things inside. |
apache#12232) * initial fix for RNN * add CI test * ignore the test due to memory leaks * release the GAN beast * enable rnn * add collector and dispose * revert the hacky thing after rebase * rename with inference * add collector in some examples * add experimental tag and comments * change the scope of the NDArrayCollector * apply final changes... * fix scalastyle
apache#12232) * initial fix for RNN * add CI test * ignore the test due to memory leaks * release the GAN beast * enable rnn * add collector and dispose * revert the hacky thing after rebase * rename with inference * add collector in some examples * add experimental tag and comments * change the scope of the NDArrayCollector * apply final changes... * fix scalastyle
Description
Currently the Scala integration test running are strong influenced by CUDA memory not enough. In order to address that issue, this PR gives a test run on the
NDArrayCollector
created by @yzhliu.@andrewfayres @nswamy
related CI failure:
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-11753/9/pipeline
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-11753/10/pipeline
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.