Skip to content

Conversation

@vanzin
Copy link
Contributor

@vanzin vanzin commented Nov 12, 2014

Yarn's config option spark.yarn.user.classpath.first does not work the same way as
spark.files.userClassPathFirst; Yarn's version is a lot more dangerous, in that it
modifies the system classpath, instead of restricting the changes to the user's class
loader. So this change implements the behavior of the latter for Yarn, and deprecates
the more dangerous choice.

To be able to achieve feature-parity, I also implemented the option for drivers (the existing
option only applies to executors). So now there are two options, each controlling whether
to apply userClassPathFirst to the driver or executors. The old option was deprecated, and
aliased to the new one (spark.executor.userClassPathFirst).

The existing "child-first" class loader also had to be fixed. It didn't handle resources, and it
was also doing some things that ended up causing JVM errors depending on how things
were being called.

Marcelo Vanzin added 16 commits November 12, 2014 12:59
This change adds support for using ChildExecutorURLClassLoader
within Yarn, to give user classes preference over system classes
while still allowing Spark to load its own version of conflicting
classes. This is similar to the existing
"spark.files.userClassPathFirst" configuration that standalone
mode has.

To avoid breaking the semantics of the existing options, I added
two new options ("spark.{driver,executor}.userClassPathFirst"),
the second of which replaces the existing option, so that users
are not surprised that now their driver is also being affected
by the option.

One potentially breaking change is that ChildExecutorURLClassLoader
now also applies the same logic (user first) to resources, not
just classes; this should be fine since it's more in line with
the spirit of the option anyway.
This requires plumbing some initial state into the Executor instance,
since Yarn distributes some user files before the containers are
started, and those need to be added to the user's class path for
things to work.

Also cleaned up some other things in ClientBase to make the shared
code more generic.
Make SparkSubmit respect the new option for driver isolation. This
covers client mode for, hopefully, all cluster managers.

Added simple SparkSubmit tests and also yarn-client test to make
sure changes work as expected.
Matches the previous one and is actually more descriptive of what
it actually does.
This allows code to just have to handle the current config names
instead of having to look for all the deprecated ones too.
The class loader could run into a situation where a class managed by the
child class loader referenced a still-unloaded class from the parent
class loader. Since the child class loader ("userClassLoader" in the
previous code) did not have a parent, finding that class would fail,
and the child class would fail to load.

A second problem was calling `loadClass()` from a `findClass()`
implementation. This could cause the code to try to define the
same class more than once.

The fix for both is to override `loadClass()` instead of `findClass()`,
and implement the "priority inversion" there.
Also, only print warning when setting configuration values. And
use an atomic boolean to pretty up some ugly code.
spark-submit is needed for the class path tests, so just run
all tests using it instead of potentially polluting the current
VM and using internal APIs.

Also remove extra log info from child process since the thread name
already contains that information.
Use the (deprecated) spark.yarn.user.classpath.first option to
force user jars to show up in the classpath.
Still need to instantiate the user class loader to load the
user jars.
Avoid extra copying of files already distributed by Yarn, which could
lead to problems (Utils.fetchFile() trying to change permissions of
file managed by Yarn container). This required changing the code
to check for up-to-date jars slightly, so that it considers the file
name and not the HTTP URL sent by the driver.

Also rename a method to be consistent with other uses of the config
option.
Need to pass user jars in the command line regardless of the
userClassPathFirst configuration.
Mostly minor, except for ClientBase.scala where I cleaned up the
generated classpath to only include the necessary entries.
Seems that I can't really hit the issue when my cluster's clocks
are properly synchronized, so remove the hack.
@vanzin
Copy link
Contributor Author

vanzin commented Nov 12, 2014

Aside from the unit tests, I also used this code to test things:
https://gist.github.com/vanzin/f684290e97dfd9294e4a

I submitted the job with guava 11 added in --jars, and ran it in local, standalone/{client,cluster} and yarn/{client,cluster} modes.

@SparkQA
Copy link

SparkQA commented Nov 12, 2014

Test build #23280 has started for PR 3233 at commit fa1aafa.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 12, 2014

Test build #23280 has finished for PR 3233 at commit fa1aafa.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23280/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Nov 12, 2014

Test build #23281 has started for PR 3233 at commit d1273b2.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 12, 2014

Test build #23281 has finished for PR 3233 at commit d1273b2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • logInfo("Adding " + url + " to class loader")

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23281/
Test PASSed.

Conflicts:
	core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala
@SparkQA
Copy link

SparkQA commented Nov 13, 2014

Test build #23294 has started for PR 3233 at commit 54e1a98.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 13, 2014

Test build #23294 has finished for PR 3233 at commit 54e1a98.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • logInfo("Adding " + url + " to class loader")
    • abstract class EdgeRDD[ED, VD](
    • abstract class VertexRDD[VD](

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23294/
Test PASSed.

@andrewor14
Copy link
Contributor

Hey @vanzin this seems to fix two issues right, one is SPARK-2996, which is supposed to be simple. The other is it also adds the feature for driver. Could you file a separate JIRA for the other one and add it to the title of this PR?

@vanzin
Copy link
Contributor Author

vanzin commented Nov 13, 2014

@andrewor14 SPARK-2996 seemed to be simple, but as I tried to explain in the description above, it's really not, since both options work very differently. So this change is actually what it takes to fix SPARK-2996 properly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a little confused. Is this related to this patch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also prefer breaking the line at orElse and foreach

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not entirely related to the patch. But without this I just could not run pyspark apps on yarn as part of testing. I couldn't for the life of me find out how SPARK_HOME propagated to executors before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I figured this one out.

The assembly generated by maven includes the python files needed to run pyspark. The assembly generated by sbt doesn't. So if you deploy the sbt assembly, since SPARK_HOME is not propagated, it won't include the py4j and pyspark modules in PYTHONPATH, and pyspark jobs will fail.

In any case, unrelated to this issue and this is probably the wrong fix anyway, so I'll revert this part.

@andrewor14
Copy link
Contributor

@vanzin thanks for doing this. This looks mostly good to me; most of my comments are fairly minor. @tgravescs did you have more comments or does it look good to you?

Marcelo Vanzin added 4 commits February 9, 2015 10:31
Conflicts:
	core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
This uncovered some fun things. To do this, we need to override the
executor's class path to make the test actually test what it intends
to. But that config was being used by the build system to propagate
the test classpath. So change the build (surefire and sbt test runner)
to use SPARK_DIST_CLASSPATH for that, like the maven scalatest plugin
does.

Also, propagate SPARK_DIST_CLASSPATH to the executor's classpath when
running on Yarn.
I ran into issues because the sbt-built assembly does not include pyspark
files. That is a bug but it's unrelated to this patch, so undo the change.
@SparkQA
Copy link

SparkQA commented Feb 10, 2015

Test build #27157 has started for PR 3233 at commit a1499e2.

  • This patch does not merge cleanly.

Conflicts:
	yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
@SparkQA
Copy link

SparkQA commented Feb 10, 2015

Test build #27159 has started for PR 3233 at commit 9cf9cf1.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 10, 2015

Test build #27157 has finished for PR 3233 at commit a1499e2.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27157/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Feb 10, 2015

Test build #27159 has finished for PR 3233 at commit 9cf9cf1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27159/
Test PASSed.

@andrewor14
Copy link
Contributor

Ok @vanzin I'm merging this into master and 1.3 thanks a lot for reiterating on the reviews quickly. I will remove the random println that I commented.

asfgit pushed a commit that referenced this pull request Feb 10, 2015
Yarn's config option `spark.yarn.user.classpath.first` does not work the same way as
`spark.files.userClassPathFirst`; Yarn's version is a lot more dangerous, in that it
modifies the system classpath, instead of restricting the changes to the user's class
loader. So this change implements the behavior of the latter for Yarn, and deprecates
the more dangerous choice.

To be able to achieve feature-parity, I also implemented the option for drivers (the existing
option only applies to executors). So now there are two options, each controlling whether
to apply userClassPathFirst to the driver or executors. The old option was deprecated, and
aliased to the new one (`spark.executor.userClassPathFirst`).

The existing "child-first" class loader also had to be fixed. It didn't handle resources, and it
was also doing some things that ended up causing JVM errors depending on how things
were being called.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #3233 from vanzin/SPARK-2996 and squashes the following commits:

9cf9cf1 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
a1499e2 [Marcelo Vanzin] Remove SPARK_HOME propagation.
fa7df88 [Marcelo Vanzin] Remove 'test.resource' file, create it dynamically.
a8c69f1 [Marcelo Vanzin] Review feedback.
cabf962 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
a1b8d7e [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
3f768e3 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
2ce3c7a [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
0e6d6be [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
70d4044 [Marcelo Vanzin] Fix pyspark/yarn-cluster test.
0fe7777 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
0e6ef19 [Marcelo Vanzin] Move class loaders around and make names more meaninful.
fe970a7 [Marcelo Vanzin] Review feedback.
25d4fed [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
3cb6498 [Marcelo Vanzin] Call the right loadClass() method on the parent.
fbb8ab5 [Marcelo Vanzin] Add locking in loadClass() to avoid deadlocks.
2e6c4b7 [Marcelo Vanzin] Mention new setting in documentation.
b6497f9 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
a10f379 [Marcelo Vanzin] Some feedback.
3730151 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
f513871 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
44010b6 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
7b57cba [Marcelo Vanzin] Remove now outdated message.
5304d64 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
35949c8 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
54e1a98 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
d1273b2 [Marcelo Vanzin] Add test file to rat exclude.
fa1aafa [Marcelo Vanzin] Remove write check on user jars.
89d8072 [Marcelo Vanzin] Cleanups.
a963ea3 [Marcelo Vanzin] Implement spark.driver.userClassPathFirst for standalone cluster mode.
50afa5f [Marcelo Vanzin] Fix Yarn executor command line.
7d14397 [Marcelo Vanzin] Register user jars in executor up front.
7f8603c [Marcelo Vanzin] Fix yarn-cluster mode without userClassPathFirst.
20373f5 [Marcelo Vanzin] Fix ClientBaseSuite.
55c88fa [Marcelo Vanzin] Run all Yarn integration tests via spark-submit.
0b64d92 [Marcelo Vanzin] Add deprecation warning to yarn option.
4a84d87 [Marcelo Vanzin] Fix the child-first class loader.
d0394b8 [Marcelo Vanzin] Add "deprecated configs" to SparkConf.
46d8cf2 [Marcelo Vanzin] Update doc with new option, change name to "userClassPathFirst".
a314f2d [Marcelo Vanzin] Enable driver class path isolation in SparkSubmit.
91f7e54 [Marcelo Vanzin] [yarn] Enable executor class path isolation.
a853e74 [Marcelo Vanzin] Re-work CoarseGrainedExecutorBackend command line arguments.
89522ef [Marcelo Vanzin] Add class path isolation support for Yarn cluster mode.

(cherry picked from commit 20a6013)
Signed-off-by: Andrew Or <andrew@databricks.com>
@asfgit asfgit closed this in 20a6013 Feb 10, 2015
@vanzin vanzin deleted the SPARK-2996 branch February 12, 2015 00:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants