Skip to content

Conversation

@vanzin
Copy link
Contributor

@vanzin vanzin commented Oct 17, 2017

Currently SparkSubmit uses system properties to propagate configuration to
applications. This makes it hard to implement features such as SPARK-11035,
which would allow multiple applications to be started in the same JVM. The
current code would cause the config data from multiple apps to get mixed
up.

This change introduces a new trait, currently internal to Spark, that allows
the app configuration to be passed directly to the application, without
having to use system properties. The current "call main() method" behavior
is maintained as an implementation of this new trait. This will be useful
to allow multiple cluster mode apps to be submitted from the same JVM.

As part of this, SparkSubmit was modified to collect all configuration
directly into a SparkConf instance. Most of the changes are to tests so
they use SparkConf instead of an opaque map.

Tested with existing and added unit tests.

…application.

Currently SparkSubmit uses system properties to propagate configuration to
applications. This makes it hard to implement features such as SPARK-11035,
which would allow multiple applications to be started in the same JVM. The
current code would cause the config data from multiple apps to get mixed
up.

This change introduces a new trait, currently internal to Spark, that allows
the app configuration to be passed directly to the application, without
having to use system properties. The current "call main() method" behavior
is maintained as an implementation of this new trait. This will be useful
to allow multiple cluster mode apps to be submitted from the same JVM.

As part of this, SparkSubmit was modified to collect all configuration
directly into a SparkConf instance. Most of the changes are to tests so
they use SparkConf instead of an opaque map.

Tested with existing and added unit tests.
@SparkQA
Copy link

SparkQA commented Oct 17, 2017

Test build #82852 has finished for PR 19519 at commit 8f31cf5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • throw new IllegalStateException(\"The main method in the given main class must be static\")

@SparkQA
Copy link

SparkQA commented Oct 18, 2017

Test build #82863 has finished for PR 19519 at commit d4466f2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val childArgs = new ArrayBuffer[String]()
val childClasspath = new ArrayBuffer[String]()
val sysProps = new HashMap[String, String]()
val sparkConf = new SparkConf()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @vanzin .
Is it intentionally loading default config? Previously, it wasn't in line 340.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Becase this conf will now be exposed to apps (once I change code to extend SparkApplication), the conf needs to respect system properties.

In fact the previous version should probably have done that too from the get go.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@jerryshao
Copy link
Contributor

@vanzin , how do we leverage this new trait, would you please explain more? Thanks!

Copy link
Contributor

@jiangxb1987 jiangxb1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vanzin
Copy link
Contributor Author

vanzin commented Oct 19, 2017

how do we leverage this new trait

In a separate future commit I'll change the YARN Client, for example, to extend from this new trait. That will allow multiple submissions of yarn-cluster apps from the same JVM without getting configuration options being mixed up.

@jerryshao
Copy link
Contributor

LGTM.

}

mainMethod.invoke(null, args)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vanzin , do we need to remove all the system properties after mainMethod is finished?

Copy link
Contributor Author

@vanzin vanzin Oct 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's dangerous, because you don't know what the user code is doing. What if it spawns a thread and that separate thread creates the SparkContext? Then you may be clearing system properties before the user app had the chance to read them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But based on your comment "allow multiple applications to be started in the same JVM", will this system properties contaminate follow-up applications? Sorry if I misunderstood your scenario.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and there's not much we can do in that case. The main use case will be to make cluster mode applications do the right thing first, and warn about starting in-process client mode applications through the launcher API.

If you have a better solution I'm all ears.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for explanation. I cannot figure out a solution which doesn't break the current semantics of SparkConf, this might be the only choice.

@SparkQA
Copy link

SparkQA commented Oct 23, 2017

Test build #82992 has finished for PR 19519 at commit 1915135.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • final class OnlineLDAOptimizer extends LDAOptimizer with Logging
  • class PythonUdfType(object):
  • case class WriteToDataSourceV2(writer: DataSourceV2Writer, query: LogicalPlan) extends LogicalPlan
  • case class WriteToDataSourceV2Exec(writer: DataSourceV2Writer, query: SparkPlan) extends SparkPlan
  • class RowToInternalRowDataWriterFactory(
  • class RowToInternalRowDataWriter(rowWriter: DataWriter[Row], encoder: ExpressionEncoder[Row])
  • case class JoinConditionSplitPredicates(

@jerryshao
Copy link
Contributor

LGTM, merging to master.

@asfgit asfgit closed this in 3073344 Oct 26, 2017
@vanzin vanzin deleted the SPARK-21840 branch October 26, 2017 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants