-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-2996] Implement userClassPathFirst for driver, yarn. #3233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This change adds support for using ChildExecutorURLClassLoader
within Yarn, to give user classes preference over system classes
while still allowing Spark to load its own version of conflicting
classes. This is similar to the existing
"spark.files.userClassPathFirst" configuration that standalone
mode has.
To avoid breaking the semantics of the existing options, I added
two new options ("spark.{driver,executor}.userClassPathFirst"),
the second of which replaces the existing option, so that users
are not surprised that now their driver is also being affected
by the option.
One potentially breaking change is that ChildExecutorURLClassLoader
now also applies the same logic (user first) to resources, not
just classes; this should be fine since it's more in line with
the spirit of the option anyway.
Preparation for changes to come.
This requires plumbing some initial state into the Executor instance, since Yarn distributes some user files before the containers are started, and those need to be added to the user's class path for things to work. Also cleaned up some other things in ClientBase to make the shared code more generic.
Make SparkSubmit respect the new option for driver isolation. This covers client mode for, hopefully, all cluster managers. Added simple SparkSubmit tests and also yarn-client test to make sure changes work as expected.
Matches the previous one and is actually more descriptive of what it actually does.
This allows code to just have to handle the current config names instead of having to look for all the deprecated ones too.
The class loader could run into a situation where a class managed by the
child class loader referenced a still-unloaded class from the parent
class loader. Since the child class loader ("userClassLoader" in the
previous code) did not have a parent, finding that class would fail,
and the child class would fail to load.
A second problem was calling `loadClass()` from a `findClass()`
implementation. This could cause the code to try to define the
same class more than once.
The fix for both is to override `loadClass()` instead of `findClass()`,
and implement the "priority inversion" there.
Also, only print warning when setting configuration values. And use an atomic boolean to pretty up some ugly code.
spark-submit is needed for the class path tests, so just run all tests using it instead of potentially polluting the current VM and using internal APIs. Also remove extra log info from child process since the thread name already contains that information.
Use the (deprecated) spark.yarn.user.classpath.first option to force user jars to show up in the classpath.
Still need to instantiate the user class loader to load the user jars.
Avoid extra copying of files already distributed by Yarn, which could lead to problems (Utils.fetchFile() trying to change permissions of file managed by Yarn container). This required changing the code to check for up-to-date jars slightly, so that it considers the file name and not the HTTP URL sent by the driver. Also rename a method to be consistent with other uses of the config option.
Need to pass user jars in the command line regardless of the userClassPathFirst configuration.
Seems that I can't really hit the issue when my cluster's clocks are properly synchronized, so remove the hack.
|
Aside from the unit tests, I also used this code to test things: I submitted the job with guava 11 added in |
|
Test build #23280 has started for PR 3233 at commit
|
|
Test build #23280 has finished for PR 3233 at commit
|
|
Test FAILed. |
|
Test build #23281 has started for PR 3233 at commit
|
|
Test build #23281 has finished for PR 3233 at commit
|
|
Test PASSed. |
Conflicts: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala
|
Test build #23294 has started for PR 3233 at commit
|
|
Test build #23294 has finished for PR 3233 at commit
|
|
Test PASSed. |
|
Hey @vanzin this seems to fix two issues right, one is SPARK-2996, which is supposed to be simple. The other is it also adds the feature for driver. Could you file a separate JIRA for the other one and add it to the title of this PR? |
|
@andrewor14 SPARK-2996 seemed to be simple, but as I tried to explain in the description above, it's really not, since both options work very differently. So this change is actually what it takes to fix SPARK-2996 properly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a little confused. Is this related to this patch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also prefer breaking the line at orElse and foreach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not entirely related to the patch. But without this I just could not run pyspark apps on yarn as part of testing. I couldn't for the life of me find out how SPARK_HOME propagated to executors before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I figured this one out.
The assembly generated by maven includes the python files needed to run pyspark. The assembly generated by sbt doesn't. So if you deploy the sbt assembly, since SPARK_HOME is not propagated, it won't include the py4j and pyspark modules in PYTHONPATH, and pyspark jobs will fail.
In any case, unrelated to this issue and this is probably the wrong fix anyway, so I'll revert this part.
|
@vanzin thanks for doing this. This looks mostly good to me; most of my comments are fairly minor. @tgravescs did you have more comments or does it look good to you? |
Conflicts: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
This uncovered some fun things. To do this, we need to override the executor's class path to make the test actually test what it intends to. But that config was being used by the build system to propagate the test classpath. So change the build (surefire and sbt test runner) to use SPARK_DIST_CLASSPATH for that, like the maven scalatest plugin does. Also, propagate SPARK_DIST_CLASSPATH to the executor's classpath when running on Yarn.
I ran into issues because the sbt-built assembly does not include pyspark files. That is a bug but it's unrelated to this patch, so undo the change.
|
Test build #27157 has started for PR 3233 at commit
|
Conflicts: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
|
Test build #27159 has started for PR 3233 at commit
|
|
Test build #27157 has finished for PR 3233 at commit
|
|
Test FAILed. |
|
Test build #27159 has finished for PR 3233 at commit
|
|
Test PASSed. |
|
Ok @vanzin I'm merging this into master and 1.3 thanks a lot for reiterating on the reviews quickly. I will remove the random println that I commented. |
Yarn's config option `spark.yarn.user.classpath.first` does not work the same way as `spark.files.userClassPathFirst`; Yarn's version is a lot more dangerous, in that it modifies the system classpath, instead of restricting the changes to the user's class loader. So this change implements the behavior of the latter for Yarn, and deprecates the more dangerous choice. To be able to achieve feature-parity, I also implemented the option for drivers (the existing option only applies to executors). So now there are two options, each controlling whether to apply userClassPathFirst to the driver or executors. The old option was deprecated, and aliased to the new one (`spark.executor.userClassPathFirst`). The existing "child-first" class loader also had to be fixed. It didn't handle resources, and it was also doing some things that ended up causing JVM errors depending on how things were being called. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #3233 from vanzin/SPARK-2996 and squashes the following commits: 9cf9cf1 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 a1499e2 [Marcelo Vanzin] Remove SPARK_HOME propagation. fa7df88 [Marcelo Vanzin] Remove 'test.resource' file, create it dynamically. a8c69f1 [Marcelo Vanzin] Review feedback. cabf962 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 a1b8d7e [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 3f768e3 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 2ce3c7a [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 0e6d6be [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 70d4044 [Marcelo Vanzin] Fix pyspark/yarn-cluster test. 0fe7777 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 0e6ef19 [Marcelo Vanzin] Move class loaders around and make names more meaninful. fe970a7 [Marcelo Vanzin] Review feedback. 25d4fed [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 3cb6498 [Marcelo Vanzin] Call the right loadClass() method on the parent. fbb8ab5 [Marcelo Vanzin] Add locking in loadClass() to avoid deadlocks. 2e6c4b7 [Marcelo Vanzin] Mention new setting in documentation. b6497f9 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 a10f379 [Marcelo Vanzin] Some feedback. 3730151 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 f513871 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 44010b6 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 7b57cba [Marcelo Vanzin] Remove now outdated message. 5304d64 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 35949c8 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 54e1a98 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 d1273b2 [Marcelo Vanzin] Add test file to rat exclude. fa1aafa [Marcelo Vanzin] Remove write check on user jars. 89d8072 [Marcelo Vanzin] Cleanups. a963ea3 [Marcelo Vanzin] Implement spark.driver.userClassPathFirst for standalone cluster mode. 50afa5f [Marcelo Vanzin] Fix Yarn executor command line. 7d14397 [Marcelo Vanzin] Register user jars in executor up front. 7f8603c [Marcelo Vanzin] Fix yarn-cluster mode without userClassPathFirst. 20373f5 [Marcelo Vanzin] Fix ClientBaseSuite. 55c88fa [Marcelo Vanzin] Run all Yarn integration tests via spark-submit. 0b64d92 [Marcelo Vanzin] Add deprecation warning to yarn option. 4a84d87 [Marcelo Vanzin] Fix the child-first class loader. d0394b8 [Marcelo Vanzin] Add "deprecated configs" to SparkConf. 46d8cf2 [Marcelo Vanzin] Update doc with new option, change name to "userClassPathFirst". a314f2d [Marcelo Vanzin] Enable driver class path isolation in SparkSubmit. 91f7e54 [Marcelo Vanzin] [yarn] Enable executor class path isolation. a853e74 [Marcelo Vanzin] Re-work CoarseGrainedExecutorBackend command line arguments. 89522ef [Marcelo Vanzin] Add class path isolation support for Yarn cluster mode. (cherry picked from commit 20a6013) Signed-off-by: Andrew Or <andrew@databricks.com>
Yarn's config option
spark.yarn.user.classpath.firstdoes not work the same way asspark.files.userClassPathFirst; Yarn's version is a lot more dangerous, in that itmodifies the system classpath, instead of restricting the changes to the user's class
loader. So this change implements the behavior of the latter for Yarn, and deprecates
the more dangerous choice.
To be able to achieve feature-parity, I also implemented the option for drivers (the existing
option only applies to executors). So now there are two options, each controlling whether
to apply userClassPathFirst to the driver or executors. The old option was deprecated, and
aliased to the new one (
spark.executor.userClassPathFirst).The existing "child-first" class loader also had to be fixed. It didn't handle resources, and it
was also doing some things that ended up causing JVM errors depending on how things
were being called.