[SPARK-21840][core] Add trait that allows conf to be directly set in application. #19519

vanzin · 2017-10-17T18:58:37Z

Currently SparkSubmit uses system properties to propagate configuration to
applications. This makes it hard to implement features such as SPARK-11035,
which would allow multiple applications to be started in the same JVM. The
current code would cause the config data from multiple apps to get mixed
up.

This change introduces a new trait, currently internal to Spark, that allows
the app configuration to be passed directly to the application, without
having to use system properties. The current "call main() method" behavior
is maintained as an implementation of this new trait. This will be useful
to allow multiple cluster mode apps to be submitted from the same JVM.

As part of this, SparkSubmit was modified to collect all configuration
directly into a SparkConf instance. Most of the changes are to tests so
they use SparkConf instead of an opaque map.

Tested with existing and added unit tests.

…application. Currently SparkSubmit uses system properties to propagate configuration to applications. This makes it hard to implement features such as SPARK-11035, which would allow multiple applications to be started in the same JVM. The current code would cause the config data from multiple apps to get mixed up. This change introduces a new trait, currently internal to Spark, that allows the app configuration to be passed directly to the application, without having to use system properties. The current "call main() method" behavior is maintained as an implementation of this new trait. This will be useful to allow multiple cluster mode apps to be submitted from the same JVM. As part of this, SparkSubmit was modified to collect all configuration directly into a SparkConf instance. Most of the changes are to tests so they use SparkConf instead of an opaque map. Tested with existing and added unit tests.

SparkQA · 2017-10-17T22:13:19Z

Test build #82852 has finished for PR 19519 at commit 8f31cf5.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
throw new IllegalStateException(\"The main method in the given main class must be static\")

SparkQA · 2017-10-18T01:40:32Z

Test build #82863 has finished for PR 19519 at commit d4466f2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2017-10-18T17:50:43Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

    val childArgs = new ArrayBuffer[String]()
    val childClasspath = new ArrayBuffer[String]()
-    val sysProps = new HashMap[String, String]()
+    val sparkConf = new SparkConf()


Hi, @vanzin .
Is it intentionally loading default config? Previously, it wasn't in line 340.

Yes. Becase this conf will now be exposed to apps (once I change code to extend SparkApplication), the conf needs to respect system properties.

In fact the previous version should probably have done that too from the get go.

jerryshao · 2017-10-19T08:03:40Z

@vanzin , how do we leverage this new trait, would you please explain more? Thanks!

jiangxb1987

LGTM

vanzin · 2017-10-19T16:59:41Z

how do we leverage this new trait

In a separate future commit I'll change the YARN Client, for example, to extend from this new trait. That will allow multiple submissions of yarn-cluster apps from the same JVM without getting configuration options being mixed up.

jerryshao · 2017-10-20T06:32:06Z

LGTM.

jerryshao · 2017-10-23T02:02:17Z

core/src/main/scala/org/apache/spark/deploy/SparkApplication.scala

+    }
+
+    mainMethod.invoke(null, args)
+  }


@vanzin , do we need to remove all the system properties after mainMethod is finished?

That's dangerous, because you don't know what the user code is doing. What if it spawns a thread and that separate thread creates the SparkContext? Then you may be clearing system properties before the user app had the chance to read them.

But based on your comment "allow multiple applications to be started in the same JVM", will this system properties contaminate follow-up applications? Sorry if I misunderstood your scenario.

Yes, and there's not much we can do in that case. The main use case will be to make cluster mode applications do the right thing first, and warn about starting in-process client mode applications through the launcher API.

If you have a better solution I'm all ears.

I see, thanks for explanation. I cannot figure out a solution which doesn't break the current semantics of SparkConf, this might be the only choice.

SparkQA · 2017-10-23T21:35:31Z

Test build #82992 has finished for PR 19519 at commit 1915135.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
final class OnlineLDAOptimizer extends LDAOptimizer with Logging
class PythonUdfType(object):
case class WriteToDataSourceV2(writer: DataSourceV2Writer, query: LogicalPlan) extends LogicalPlan
case class WriteToDataSourceV2Exec(writer: DataSourceV2Writer, query: SparkPlan) extends SparkPlan
class RowToInternalRowDataWriterFactory(
class RowToInternalRowDataWriter(rowWriter: DataWriter[Row], encoder: ExpressionEncoder[Row])
case class JoinConditionSplitPredicates(

jerryshao · 2017-10-26T07:47:42Z

LGTM, merging to master.

Style.

d4466f2

dongjoon-hyun reviewed Oct 18, 2017

View reviewed changes

jiangxb1987 approved these changes Oct 19, 2017

View reviewed changes

jerryshao reviewed Oct 23, 2017

View reviewed changes

Merge branch 'master' into SPARK-21840

1915135

asfgit closed this in 3073344 Oct 26, 2017

vanzin deleted the SPARK-21840 branch October 26, 2017 18:33

[SPARK-21840][core] Add trait that allows conf to be directly set in application. #19519

[SPARK-21840][core] Add trait that allows conf to be directly set in application. #19519

Uh oh!

Conversation

vanzin commented Oct 17, 2017

Uh oh!

SparkQA commented Oct 17, 2017

Uh oh!

SparkQA commented Oct 18, 2017

Uh oh!

dongjoon-hyun Oct 18, 2017

Choose a reason for hiding this comment

Uh oh!

vanzin Oct 18, 2017

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Oct 18, 2017

Choose a reason for hiding this comment

Uh oh!

jerryshao commented Oct 19, 2017

Uh oh!

jiangxb1987 left a comment

Choose a reason for hiding this comment

Uh oh!

vanzin commented Oct 19, 2017

Uh oh!

jerryshao commented Oct 20, 2017

Uh oh!

jerryshao Oct 23, 2017

Choose a reason for hiding this comment

Uh oh!

vanzin Oct 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryshao Oct 25, 2017

Choose a reason for hiding this comment

Uh oh!

vanzin Oct 25, 2017

Choose a reason for hiding this comment

Uh oh!

jerryshao Oct 25, 2017

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 23, 2017

Uh oh!

jerryshao commented Oct 26, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

vanzin Oct 23, 2017 •

edited

Loading