-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-21840][core] Add trait that allows conf to be directly set in application. #19519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…application. Currently SparkSubmit uses system properties to propagate configuration to applications. This makes it hard to implement features such as SPARK-11035, which would allow multiple applications to be started in the same JVM. The current code would cause the config data from multiple apps to get mixed up. This change introduces a new trait, currently internal to Spark, that allows the app configuration to be passed directly to the application, without having to use system properties. The current "call main() method" behavior is maintained as an implementation of this new trait. This will be useful to allow multiple cluster mode apps to be submitted from the same JVM. As part of this, SparkSubmit was modified to collect all configuration directly into a SparkConf instance. Most of the changes are to tests so they use SparkConf instead of an opaque map. Tested with existing and added unit tests.
|
Test build #82852 has finished for PR 19519 at commit
|
|
Test build #82863 has finished for PR 19519 at commit
|
| val childArgs = new ArrayBuffer[String]() | ||
| val childClasspath = new ArrayBuffer[String]() | ||
| val sysProps = new HashMap[String, String]() | ||
| val sparkConf = new SparkConf() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Becase this conf will now be exposed to apps (once I change code to extend SparkApplication), the conf needs to respect system properties.
In fact the previous version should probably have done that too from the get go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
|
@vanzin , how do we leverage this new trait, would you please explain more? Thanks! |
jiangxb1987
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
In a separate future commit I'll change the YARN Client, for example, to extend from this new trait. That will allow multiple submissions of yarn-cluster apps from the same JVM without getting configuration options being mixed up. |
|
LGTM. |
| } | ||
|
|
||
| mainMethod.invoke(null, args) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vanzin , do we need to remove all the system properties after mainMethod is finished?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's dangerous, because you don't know what the user code is doing. What if it spawns a thread and that separate thread creates the SparkContext? Then you may be clearing system properties before the user app had the chance to read them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But based on your comment "allow multiple applications to be started in the same JVM", will this system properties contaminate follow-up applications? Sorry if I misunderstood your scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and there's not much we can do in that case. The main use case will be to make cluster mode applications do the right thing first, and warn about starting in-process client mode applications through the launcher API.
If you have a better solution I'm all ears.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for explanation. I cannot figure out a solution which doesn't break the current semantics of SparkConf, this might be the only choice.
|
Test build #82992 has finished for PR 19519 at commit
|
|
LGTM, merging to master. |
Currently SparkSubmit uses system properties to propagate configuration to
applications. This makes it hard to implement features such as SPARK-11035,
which would allow multiple applications to be started in the same JVM. The
current code would cause the config data from multiple apps to get mixed
up.
This change introduces a new trait, currently internal to Spark, that allows
the app configuration to be passed directly to the application, without
having to use system properties. The current "call main() method" behavior
is maintained as an implementation of this new trait. This will be useful
to allow multiple cluster mode apps to be submitted from the same JVM.
As part of this, SparkSubmit was modified to collect all configuration
directly into a SparkConf instance. Most of the changes are to tests so
they use SparkConf instead of an opaque map.
Tested with existing and added unit tests.