Skip to content

Conversation

@ConeyLiu
Copy link
Contributor

@ConeyLiu ConeyLiu commented May 4, 2017

What changes were proposed in this pull request?

Currently, when we set the parameter SPARK_EXECUTOR_INSTANCES in the spark-env.sh, it seems that the parameter doesn't parsed by Spark. So this patch parse the 'SPARK_EXECUTOR_INSTANCES' into the parsed arguments.

How was this patch tested?

Existing tests.

@ConeyLiu
Copy link
Contributor Author

ConeyLiu commented May 4, 2017

@srowen @HyukjinKwon Can you take a look, thanks a lot.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@srowen
Copy link
Member

srowen commented May 4, 2017

No, I believe that only exists for YARN client mode.

@ConeyLiu
Copy link
Contributor Author

ConeyLiu commented May 4, 2017

Hi @srowen, the follow is the test result, the submit script: ./spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client --verbose ../examples/target/scala-2.11/jars/spark-examples_2.11-2.2.0-SNAPS HOT.jar 10:
before:

Parsed arguments:
  master                  yarn
  deployMode              client
  executorMemory          1G
  executorCores           1
  totalExecutorCores      null
  propertiesFile          null
  driverMemory            null
  driverCores             null
  driverExtraClassPath    null
  driverExtraLibraryPath  null
  driverExtraJavaOptions  null
  supervise               false
  queue                   null
  numExecutors            null
  files                   null
  pyFiles                 null
  archives                null
  mainClass               org.apache.spark.examples.SparkPi
  primaryResource         file:/Users/lxy/git_repository/spark/bin/../examples/target/scala-2.11/jars/spark-examples_2.11-2.2.0-SNAPSHOT.jar
  name                    org.apache.spark.examples.SparkPi
  childArgs               [10]
  jars                    null
  packages                null
  packagesExclusions      null
  repositories            null
  verbose                 true

after

Parsed arguments:
  master                  yarn
  deployMode              client
  executorMemory          1G
  executorCores           1
  totalExecutorCores      null
  propertiesFile          null
  driverMemory            null
  driverCores             null
  driverExtraClassPath    null
  driverExtraLibraryPath  null
  driverExtraJavaOptions  null
  supervise               false
  queue                   null
  numExecutors            2
  files                   null
  pyFiles                 null
  archives                null
  mainClass               org.apache.spark.examples.SparkPi
  primaryResource         file:/Users/lxy/git_repository/spark/bin/../examples/target/scala-2.11/jars/spark-examples_2.11-2.2.0-SNAPSHOT.jar
  name                    org.apache.spark.examples.SparkPi
  childArgs               [10]
  jars                    null
  packages                null
  packagesExclusions      null
  repositories            null
  verbose                 true

And also this parameter doesn't affect other ClusterManager. You can see the follow code:
L458

@srowen
Copy link
Member

srowen commented May 4, 2017

This change causes it to affect all usages of spark-submit right? if you look at the comment about it in the config template, it suggests it is for YARN client mode only. I don't know the history of this value and it may be obsolete and for backwards compatibility, but, I am not clear that it should be used this way, no.

@ConeyLiu
Copy link
Contributor Author

ConeyLiu commented May 4, 2017

It only affect the YARN mode, just see the follow code:

OptionAssigner(args.numExecutors, YARN, ALL_DEPLOY_MODES, sysProp = "spark.executor.instances"),

But from the latest code, this parameter affect both YARN client & YARN cluster, this change after SPARK-9092, but the comment in the config template didn't make the corresponding changes.

And also, this parameter is set in spark-env.sh, so it can only can be fetched by the method
env.get() according to the class of SparkSubmitArguments. However, now the parameter is set by the follow code:

numExecutors = Option(numExecutors)
      .getOrElse(sparkProperties.get("spark.executor.instances").orNull)

So only you set it by --num-executors or defined the configure(spark-default.conf or --conf).

@ConeyLiu
Copy link
Contributor Author

ConeyLiu commented May 4, 2017

Hi @vanzin, can you help to take a look? The affect of spark.executor.instances changed after SPARK-9092, but the comments in the config template didn't change corresponding. If this parameter is deprecated? Thanks a lot.

@vanzin
Copy link
Contributor

vanzin commented May 4, 2017

This was intentionally removed. Env-based configs have been deprecated for a long time, and some were already removed. Use a proper configuration file (or command line option) for that. See SPARK-17979.

@ConeyLiu
Copy link
Contributor Author

ConeyLiu commented May 5, 2017

@vanzin Thanks a lot for you review. Do we need remove the comments from template config? It doesn't work anymore in current version.

@vanzin
Copy link
Contributor

vanzin commented May 5, 2017

Do we need remove the comments from template config?

Ah, that would be a good idea. I also noticed it's still used in YarnSparkHadoopUtil.scala, so that could be removed too.

I also took a closer look at SPARK-17979 and this particular env variable wasn't removed in that change; seems it was removed much earlier (SPARK-9092 as far as I can tell), so looks it isn't very widely used.

@ConeyLiu
Copy link
Contributor Author

ConeyLiu commented May 5, 2017

Ok, I will open another pr to remove it. Thanks a lot both of you.

@ConeyLiu ConeyLiu closed this May 5, 2017
@ConeyLiu ConeyLiu deleted the build branch May 5, 2017 01:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants