[SPARK-14558][CORE] In ClosureCleaner, clean the outer pointer if it's a REPL line object #12327

cloud-fan · 2016-04-12T12:45:12Z

What changes were proposed in this pull request?

When we clean a closure, if its outermost parent is not a closure, we won't clone and clean it as cloning user's objects is dangerous. However, if it's a REPL line object, which may carry a lot of unnecessary references(like hadoop conf, spark conf, etc.), we should clean it as it's not a user object.

This PR improves the check for user's objects to exclude REPL line object.

How was this patch tested?

existing tests.

cloud-fan · 2016-04-12T12:46:06Z

cc @yhuai @JoshRosen @rxin

cloud-fan · 2016-04-12T12:47:01Z

core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala

  private def getInnerClosureClasses(obj: AnyRef): List[Class[_]] = {
    val seen = Set[Class[_]](obj.getClass)
-    var stack = List[Class[_]](obj.getClass)
+    val stack = Stack[Class[_]](obj.getClass)


kind of unrelated, but it's obvious that using Stack is more efficient here.

SparkQA · 2016-04-12T14:21:12Z

Test build #55604 has finished for PR 12327 at commit b78b2ce.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

liancheng · 2016-04-12T14:39:32Z

LGTM

liancheng · 2016-04-12T14:39:38Z

retest this please

SparkQA · 2016-04-12T16:41:36Z

Test build #55614 has finished for PR 12327 at commit b78b2ce.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2016-04-13T00:28:14Z

cc @andrewor14

yhuai · 2016-04-13T17:09:40Z

@JoshRosen Can you take a look?

andrewor14 · 2016-04-13T18:57:53Z

core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala

-    } else if (outerPairs.size > 0) {
-      logDebug(s" + outermost object is a closure, so we just keep it: ${outerPairs.head}")
+    if (outerPairs.size > 0) {
+      if (isClosure(outerPairs.head._1)) {


not really your code, but can you do:

val (outermostClass, outermostObject) = outerPairs.head if (isClosure(outermostClass)) { ... } else if (outermostClass.getName.startsWith("$line")) { ... } else { ... parent = outermostObject outerPairs = outerPairs.tail }

so it's more readable.

andrewor14 · 2016-04-13T18:59:50Z

@cloud-fan Can you add a test to ClosureCleanerSuite2? Otherwise this LGTM.

SparkQA · 2016-04-14T06:24:03Z

Test build #55792 has finished for PR 12327 at commit 3db685c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2016-04-14T17:57:05Z

core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala

 import java.io.{ByteArrayInputStream, ByteArrayOutputStream}

 import scala.collection.mutable.{Map, Set, Stack}
+import scala.language.existentials


andrewor14 · 2016-04-14T17:57:49Z

LGTM merging into master thanks

a-roberts · 2016-04-19T16:11:45Z

Hi, with this new test I'm seeing large deviations using IBM JDKs and different platforms, for example:

java.lang.AssertionError: assertion failed: deviation too large: 0.25359712230215825, first size: 26688, second size: 19920

Is the 20% deviation particularly important? Why this number?

FYI here's the comparison between different Java vendors and architectures:

OpenJDK on Intel (passes), cacheSize1: 180392 and cacheSize2: 187896 (4.07%)
IBM JDK with SUSE on zSystems (fails): cacheSize1: 26688 and cacheSize2: 19920 (29%)
IBM JDK with Ubuntu 14 04 on Power 8 LE (fails): cacheSize1: 26688 and cacheSize2: 19920 (29%)
IBM JDK with Ubuntu 14 04 on Intel (fails): cacheSize1: 354692 and cacheSize2: 263800 (29.3%)

I'll look into this, interesting that with IBM Java it's always a very similar and much larger percentage

cloud-fan · 2016-04-20T00:22:00Z

@a-roberts wow that's interesting, I think maybe IBM JDK brings more information to scala REPL line object, which increases the cache size. It will be great if you can look into it, thanks!

BTW if this problem do matters for you, feel free to send a PR to increase the threshold(20%).

a-roberts · 2016-04-21T16:17:07Z

@cloud-fan I've had a closer look at this and think a more robust method would be to use weak references to identify when an object is out of scope, with IBM Java we see the 29% reduction between cache size 1 and cache size 2 but with OpenJDK we see a 4% increase, suggesting that we can't rely on the sizes being similar across JDK vendors, now thinking this is a test case issue rather than a problem in the ClosureCleaner or IBM Java code.

With IBM Java our second cache size (after repartitioning) is much smaller; repartitioning uses ContextCleaner whereas with OpenJDK it grows. Either we have a bigger memory footprint or the cached size is being calculated incorrectly (looks fine to me and we actually have smaller object sizes). The problem on Z was due to using repl/pom.xml instead of pom.xml in the Spark home directory (same result if we use the right pom.xml file) so can be discarded for this discussion.

I'm going to figure out what's in the Scala REPL line objects between vendors, I think the intention of this commit is to test that the REPL line object is being cleaned but the assertion in place at the moment doesn't look to be correct (the size is bigger after the cleaning and cacheSize2 is the result of cleaning if I'm understanding the code correctly), have I missed a trick?

cloud-fan · 2016-04-22T00:24:13Z

cacheSize1 and cacheSize2 are both the size after cleaning. The difference is that, cacheSize1 is the size after cleaned the data with line object reference, cacheSize2 is the size after cleaned the data without line object reference.

And yes, the test missed the difference of size of Scala REPL line objects between vendors, feel free to send a PR to fix it and thanks for investigating this!

a-roberts · 2016-05-27T09:57:16Z

cacheSize1 and cacheSize2 are both the size after cleaning. The difference is that, cacheSize1 is the size after cleaned the data with line object reference, cacheSize2 is the size after cleaned the data without line object reference.

Looking for clarity here, is it true that clean "with the reference" should be bigger (cacheSize1) and clean "without the reference" should be smaller (cacheSize2)?

OpenJDK, cacheSize1: 180392, cacheSize2: 187896 (bigger without the line object reference)

IBM JDK, cacheSize1: 354692, cacheSize2: 263800 (smaller without the line object reference)

What exactly does "without line object reference" mean and should cacheSize1 be smaller or bigger than cacheSize2?

I know the SizeEstimator overestimates for IBM Java so our cached footprint is much larger (handling this), so because of the larger difference we get this test failing, OpenJDK fails with Kryo and IBM passes with Kryo for this test.

A better check would be to run with and without the closure cleaner change and to check the second result is less by the size of the line object, so based on our cacheSize2 being smaller (without the line object reference), I'm thinking that IBM Java functions as expected and OpenJDK doesn't - but this depends on my questions above, interested to hear what you think @cloud-fan

cloud-fan · 2016-05-27T16:45:32Z

cacheSize1 and cacheSize2 should be almost same, because they cache the same data, with different RDD partition numbers. However, before this PR, cacheSize1 is much larger because it references line object and doesn't clean it.

A better check would be to run with and without the closure cleaner change

Yea, this is what I did locally, but how to write a test for it?

clean the outer pointer if it's a REPL line object

b78b2ce

cloud-fan reviewed Apr 12, 2016
View reviewed changes

andrewor14 reviewed Apr 13, 2016
View reviewed changes

cloud-fan added 3 commits April 14, 2016 10:02

address comments

6b3473c

Merge remote-tracking branch 'origin/master' into closure

4e2f873

add test

3db685c

andrewor14 reviewed Apr 14, 2016
View reviewed changes

asfgit closed this in 1d04c86 Apr 14, 2016

[SPARK-14558][CORE] In ClosureCleaner, clean the outer pointer if it's a REPL line object #12327

[SPARK-14558][CORE] In ClosureCleaner, clean the outer pointer if it's a REPL line object #12327

Uh oh!

Conversation

cloud-fan commented Apr 12, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

cloud-fan commented Apr 12, 2016

Uh oh!

cloud-fan Apr 12, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Apr 12, 2016

Uh oh!

liancheng commented Apr 12, 2016

Uh oh!

liancheng commented Apr 12, 2016

Uh oh!

SparkQA commented Apr 12, 2016

Uh oh!

cloud-fan commented Apr 13, 2016

Uh oh!

yhuai commented Apr 13, 2016

Uh oh!

andrewor14 Apr 13, 2016

Choose a reason for hiding this comment

Uh oh!

andrewor14 commented Apr 13, 2016

Uh oh!

SparkQA commented Apr 14, 2016

Uh oh!

andrewor14 Apr 14, 2016

Choose a reason for hiding this comment

Uh oh!

andrewor14 commented Apr 14, 2016

Uh oh!

a-roberts commented Apr 19, 2016

Uh oh!

cloud-fan commented Apr 20, 2016

Uh oh!

a-roberts commented Apr 21, 2016

Uh oh!

cloud-fan commented Apr 22, 2016

Uh oh!

a-roberts commented May 27, 2016

Uh oh!

cloud-fan commented May 27, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

cloud-fan commented May 27, 2016 •

edited

Loading