[SPARK-41944][CONNECT] Pass configurations when local remote mode is on #39463
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR mainly proposes to pass the user-specified configurations to local remote mode.
Previously, all user-specific configurations were ignored in case of PySpark shell such as
./bin/pysparkor plain Python interpreter - PySpark application submission case was fine.Now, configurations are properly passed to the server side, e.g.,
./bin/pyspark --remote local --conf aaa=bbbandaaa=bbbis properly passed to the server side.For
spark.masterandspark.plugins, user-specific configurations are respected. If they are unset, they are automatically set, e.g.,org.apache.spark.sql.connect.SparkConnectPlugin. If they are set, users have to provide the proper values to overwrite them, meaning that either:or
./bin/pyspark --remote localIn addition, this PR fixes the related code as below:
spark.local.connectinternal configuration to be used in Spark Submit (so we don't have to parse and manipulate user specified arguments in Python in order to remove--remoteorspark.remoteconfiguration).SparkSubmitCommandBuilderso invalid combination can fail fast (e.g., setting both remote and master like--master ...and--conf spark.remote=...)spark.jarsanymore since it adds the jars into the class path of the JVM throughaddJarToCurrentClassLoader.Why are the changes needed?
To correctly pass the configurations specified from users.
Does this PR introduce any user-facing change?
No, Spark Connect has not been released yet.
This is kind of a followup of #39441 to complete its support.
How was this patch tested?
Manually tested all combinations such as: