[SPARK-2996] Implement userClassPathFirst for driver, yarn. #3233

vanzin · 2014-11-12T21:53:03Z

Yarn's config option spark.yarn.user.classpath.first does not work the same way as
spark.files.userClassPathFirst; Yarn's version is a lot more dangerous, in that it
modifies the system classpath, instead of restricting the changes to the user's class
loader. So this change implements the behavior of the latter for Yarn, and deprecates
the more dangerous choice.

To be able to achieve feature-parity, I also implemented the option for drivers (the existing
option only applies to executors). So now there are two options, each controlling whether
to apply userClassPathFirst to the driver or executors. The old option was deprecated, and
aliased to the new one (spark.executor.userClassPathFirst).

The existing "child-first" class loader also had to be fixed. It didn't handle resources, and it
was also doing some things that ended up causing JVM errors depending on how things
were being called.

This change adds support for using ChildExecutorURLClassLoader within Yarn, to give user classes preference over system classes while still allowing Spark to load its own version of conflicting classes. This is similar to the existing "spark.files.userClassPathFirst" configuration that standalone mode has. To avoid breaking the semantics of the existing options, I added two new options ("spark.{driver,executor}.userClassPathFirst"), the second of which replaces the existing option, so that users are not surprised that now their driver is also being affected by the option. One potentially breaking change is that ChildExecutorURLClassLoader now also applies the same logic (user first) to resources, not just classes; this should be fine since it's more in line with the spirit of the option anyway.

Preparation for changes to come.

This requires plumbing some initial state into the Executor instance, since Yarn distributes some user files before the containers are started, and those need to be added to the user's class path for things to work. Also cleaned up some other things in ClientBase to make the shared code more generic.

Make SparkSubmit respect the new option for driver isolation. This covers client mode for, hopefully, all cluster managers. Added simple SparkSubmit tests and also yarn-client test to make sure changes work as expected.

Matches the previous one and is actually more descriptive of what it actually does.

This allows code to just have to handle the current config names instead of having to look for all the deprecated ones too.

The class loader could run into a situation where a class managed by the child class loader referenced a still-unloaded class from the parent class loader. Since the child class loader ("userClassLoader" in the previous code) did not have a parent, finding that class would fail, and the child class would fail to load. A second problem was calling `loadClass()` from a `findClass()` implementation. This could cause the code to try to define the same class more than once. The fix for both is to override `loadClass()` instead of `findClass()`, and implement the "priority inversion" there.

Also, only print warning when setting configuration values. And use an atomic boolean to pretty up some ugly code.

spark-submit is needed for the class path tests, so just run all tests using it instead of potentially polluting the current VM and using internal APIs. Also remove extra log info from child process since the thread name already contains that information.

Use the (deprecated) spark.yarn.user.classpath.first option to force user jars to show up in the classpath.

Still need to instantiate the user class loader to load the user jars.

Avoid extra copying of files already distributed by Yarn, which could lead to problems (Utils.fetchFile() trying to change permissions of file managed by Yarn container). This required changing the code to check for up-to-date jars slightly, so that it considers the file name and not the HTTP URL sent by the driver. Also rename a method to be consistent with other uses of the config option.

Need to pass user jars in the command line regardless of the userClassPathFirst configuration.

Mostly minor, except for ClientBase.scala where I cleaned up the generated classpath to only include the necessary entries.

Seems that I can't really hit the issue when my cluster's clocks are properly synchronized, so remove the hack.

vanzin · 2014-11-12T21:54:23Z

Aside from the unit tests, I also used this code to test things:
https://gist.github.com/vanzin/f684290e97dfd9294e4a

I submitted the job with guava 11 added in --jars, and ran it in local, standalone/{client,cluster} and yarn/{client,cluster} modes.

SparkQA · 2014-11-12T21:59:57Z

Test build #23280 has started for PR 3233 at commit fa1aafa.

This patch merges cleanly.

SparkQA · 2014-11-12T22:00:11Z

Test build #23280 has finished for PR 3233 at commit fa1aafa.

This patch fails RAT tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-12T22:00:12Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23280/
Test FAILed.

SparkQA · 2014-11-12T22:10:12Z

Test build #23281 has started for PR 3233 at commit d1273b2.

This patch merges cleanly.

SparkQA · 2014-11-12T23:34:35Z

Test build #23281 has finished for PR 3233 at commit d1273b2.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- logInfo("Adding " + url + " to class loader")

AmplabJenkins · 2014-11-12T23:34:39Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23281/
Test PASSed.

Conflicts: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala

SparkQA · 2014-11-13T02:10:05Z

Test build #23294 has started for PR 3233 at commit 54e1a98.

This patch merges cleanly.

SparkQA · 2014-11-13T03:34:59Z

Test build #23294 has finished for PR 3233 at commit 54e1a98.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- logInfo("Adding " + url + " to class loader")
- abstract class EdgeRDD[ED, VD](
- abstract class VertexRDD[VD](

AmplabJenkins · 2014-11-13T03:35:02Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23294/
Test PASSed.

andrewor14 · 2014-11-13T22:10:43Z

Hey @vanzin this seems to fix two issues right, one is SPARK-2996, which is supposed to be simple. The other is it also adds the feature for driver. Could you file a separate JIRA for the other one and add it to the title of this PR?

vanzin · 2014-11-13T22:14:19Z

@andrewor14 SPARK-2996 seemed to be simple, but as I tried to explain in the description above, it's really not, since both options work very differently. So this change is actually what it takes to fix SPARK-2996 properly.

andrewor14 · 2015-02-09T02:15:12Z

yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala

a little confused. Is this related to this patch?

also prefer breaking the line at orElse and foreach

It's not entirely related to the patch. But without this I just could not run pyspark apps on yarn as part of testing. I couldn't for the life of me find out how SPARK_HOME propagated to executors before.

I think I figured this one out.

The assembly generated by maven includes the python files needed to run pyspark. The assembly generated by sbt doesn't. So if you deploy the sbt assembly, since SPARK_HOME is not propagated, it won't include the py4j and pyspark modules in PYTHONPATH, and pyspark jobs will fail.

In any case, unrelated to this issue and this is probably the wrong fix anyway, so I'll revert this part.

andrewor14 · 2015-02-09T02:57:43Z

@vanzin thanks for doing this. This looks mostly good to me; most of my comments are fairly minor. @tgravescs did you have more comments or does it look good to you?

Conflicts: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

This uncovered some fun things. To do this, we need to override the executor's class path to make the test actually test what it intends to. But that config was being used by the build system to propagate the test classpath. So change the build (surefire and sbt test runner) to use SPARK_DIST_CLASSPATH for that, like the maven scalatest plugin does. Also, propagate SPARK_DIST_CLASSPATH to the executor's classpath when running on Yarn.

I ran into issues because the sbt-built assembly does not include pyspark files. That is a bug but it's unrelated to this patch, so undo the change.

SparkQA · 2015-02-10T01:58:00Z

Test build #27157 has started for PR 3233 at commit a1499e2.

This patch does not merge cleanly.

Conflicts: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala

SparkQA · 2015-02-10T02:22:52Z

Test build #27159 has started for PR 3233 at commit 9cf9cf1.

This patch merges cleanly.

SparkQA · 2015-02-10T03:29:23Z

Test build #27157 has finished for PR 3233 at commit a1499e2.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-02-10T03:29:27Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27157/
Test FAILed.

SparkQA · 2015-02-10T03:35:31Z

Test build #27159 has finished for PR 3233 at commit 9cf9cf1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-02-10T03:35:35Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27159/
Test PASSed.

andrewor14 · 2015-02-10T05:16:22Z

Ok @vanzin I'm merging this into master and 1.3 thanks a lot for reiterating on the reviews quickly. I will remove the random println that I commented.

Yarn's config option `spark.yarn.user.classpath.first` does not work the same way as `spark.files.userClassPathFirst`; Yarn's version is a lot more dangerous, in that it modifies the system classpath, instead of restricting the changes to the user's class loader. So this change implements the behavior of the latter for Yarn, and deprecates the more dangerous choice. To be able to achieve feature-parity, I also implemented the option for drivers (the existing option only applies to executors). So now there are two options, each controlling whether to apply userClassPathFirst to the driver or executors. The old option was deprecated, and aliased to the new one (`spark.executor.userClassPathFirst`). The existing "child-first" class loader also had to be fixed. It didn't handle resources, and it was also doing some things that ended up causing JVM errors depending on how things were being called. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #3233 from vanzin/SPARK-2996 and squashes the following commits: 9cf9cf1 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 a1499e2 [Marcelo Vanzin] Remove SPARK_HOME propagation. fa7df88 [Marcelo Vanzin] Remove 'test.resource' file, create it dynamically. a8c69f1 [Marcelo Vanzin] Review feedback. cabf962 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 a1b8d7e [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 3f768e3 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 2ce3c7a [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 0e6d6be [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 70d4044 [Marcelo Vanzin] Fix pyspark/yarn-cluster test. 0fe7777 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 0e6ef19 [Marcelo Vanzin] Move class loaders around and make names more meaninful. fe970a7 [Marcelo Vanzin] Review feedback. 25d4fed [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 3cb6498 [Marcelo Vanzin] Call the right loadClass() method on the parent. fbb8ab5 [Marcelo Vanzin] Add locking in loadClass() to avoid deadlocks. 2e6c4b7 [Marcelo Vanzin] Mention new setting in documentation. b6497f9 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 a10f379 [Marcelo Vanzin] Some feedback. 3730151 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 f513871 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 44010b6 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 7b57cba [Marcelo Vanzin] Remove now outdated message. 5304d64 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 35949c8 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 54e1a98 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 d1273b2 [Marcelo Vanzin] Add test file to rat exclude. fa1aafa [Marcelo Vanzin] Remove write check on user jars. 89d8072 [Marcelo Vanzin] Cleanups. a963ea3 [Marcelo Vanzin] Implement spark.driver.userClassPathFirst for standalone cluster mode. 50afa5f [Marcelo Vanzin] Fix Yarn executor command line. 7d14397 [Marcelo Vanzin] Register user jars in executor up front. 7f8603c [Marcelo Vanzin] Fix yarn-cluster mode without userClassPathFirst. 20373f5 [Marcelo Vanzin] Fix ClientBaseSuite. 55c88fa [Marcelo Vanzin] Run all Yarn integration tests via spark-submit. 0b64d92 [Marcelo Vanzin] Add deprecation warning to yarn option. 4a84d87 [Marcelo Vanzin] Fix the child-first class loader. d0394b8 [Marcelo Vanzin] Add "deprecated configs" to SparkConf. 46d8cf2 [Marcelo Vanzin] Update doc with new option, change name to "userClassPathFirst". a314f2d [Marcelo Vanzin] Enable driver class path isolation in SparkSubmit. 91f7e54 [Marcelo Vanzin] [yarn] Enable executor class path isolation. a853e74 [Marcelo Vanzin] Re-work CoarseGrainedExecutorBackend command line arguments. 89522ef [Marcelo Vanzin] Add class path isolation support for Yarn cluster mode. (cherry picked from commit 20a6013) Signed-off-by: Andrew Or <andrew@databricks.com>

Marcelo Vanzin added 16 commits November 12, 2014 12:59

Re-work CoarseGrainedExecutorBackend command line arguments.

a853e74

Preparation for changes to come.

Enable driver class path isolation in SparkSubmit.

a314f2d

Make SparkSubmit respect the new option for driver isolation. This covers client mode for, hopefully, all cluster managers. Added simple SparkSubmit tests and also yarn-client test to make sure changes work as expected.

Update doc with new option, change name to "userClassPathFirst".

46d8cf2

Matches the previous one and is actually more descriptive of what it actually does.

Add "deprecated configs" to SparkConf.

d0394b8

This allows code to just have to handle the current config names instead of having to look for all the deprecated ones too.

Add deprecation warning to yarn option.

0b64d92

Also, only print warning when setting configuration values. And use an atomic boolean to pretty up some ugly code.

Fix ClientBaseSuite.

20373f5

Use the (deprecated) spark.yarn.user.classpath.first option to force user jars to show up in the classpath.

Fix yarn-cluster mode without userClassPathFirst.

7f8603c

Still need to instantiate the user class loader to load the user jars.

Fix Yarn executor command line.

50afa5f

Need to pass user jars in the command line regardless of the userClassPathFirst configuration.

Implement spark.driver.userClassPathFirst for standalone cluster mode.

a963ea3

Cleanups.

89d8072

Mostly minor, except for ClientBase.scala where I cleaned up the generated classpath to only include the necessary entries.

Remove write check on user jars.

fa1aafa

Seems that I can't really hit the issue when my cluster's clocks are properly synchronized, so remove the hack.

Add test file to rat exclude.

d1273b2

Merge branch 'master' into SPARK-2996

54e1a98

Conflicts: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala

andrewor14 reviewed Feb 9, 2015
View reviewed changes

Marcelo Vanzin added 4 commits February 9, 2015 10:31

Merge branch 'master' into SPARK-2996

cabf962

Conflicts: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

Review feedback.

a8c69f1

Remove SPARK_HOME propagation.

a1499e2

I ran into issues because the sbt-built assembly does not include pyspark files. That is a bug but it's unrelated to this patch, so undo the change.

Merge branch 'master' into SPARK-2996

9cf9cf1

Conflicts: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala

asfgit closed this in 20a6013 Feb 10, 2015

vanzin deleted the SPARK-2996 branch February 12, 2015 00:01

MattWhelan mentioned this pull request Feb 16, 2015

SPARK-5358: Rework the classloader impelementation. #4166

Closed

[SPARK-2996] Implement userClassPathFirst for driver, yarn. #3233

[SPARK-2996] Implement userClassPathFirst for driver, yarn. #3233

Uh oh!

Conversation

vanzin commented Nov 12, 2014

Uh oh!

vanzin commented Nov 12, 2014

Uh oh!

SparkQA commented Nov 12, 2014

Uh oh!

SparkQA commented Nov 12, 2014

Uh oh!

AmplabJenkins commented Nov 12, 2014

Uh oh!

SparkQA commented Nov 12, 2014

Uh oh!

SparkQA commented Nov 12, 2014

Uh oh!

AmplabJenkins commented Nov 12, 2014

Uh oh!

SparkQA commented Nov 13, 2014

Uh oh!

SparkQA commented Nov 13, 2014

Uh oh!

AmplabJenkins commented Nov 13, 2014

Uh oh!

andrewor14 commented Nov 13, 2014

Uh oh!

vanzin commented Nov 13, 2014

Uh oh!

andrewor14 Feb 9, 2015

Choose a reason for hiding this comment

Uh oh!

andrewor14 Feb 9, 2015

Choose a reason for hiding this comment

Uh oh!

vanzin Feb 9, 2015

Choose a reason for hiding this comment

Uh oh!

vanzin Feb 9, 2015

Choose a reason for hiding this comment

Uh oh!

andrewor14 commented Feb 9, 2015

Uh oh!

SparkQA commented Feb 10, 2015

Uh oh!

SparkQA commented Feb 10, 2015

Uh oh!

SparkQA commented Feb 10, 2015

Uh oh!

AmplabJenkins commented Feb 10, 2015

Uh oh!

SparkQA commented Feb 10, 2015

Uh oh!

AmplabJenkins commented Feb 10, 2015

Uh oh!

andrewor14 commented Feb 10, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants