Skip to content

Conversation

@Ngone51
Copy link
Member

@Ngone51 Ngone51 commented Jan 12, 2021

What changes were proposed in this pull request?

This PR supposes to rename all decommission configurations to use the same namespace "spark.decommission.*".

Besides, it also refines the config "spark.decommission.storage.fallbackStorage.path" to "spark.decommission.storage.fallbackStoragePath".

Why are the changes needed?

Currently, decommission configurations are using difference namespaces, e.g.,

spark.decommission.*
spark.storage.decommission.*
spark.executor.decommission.*

which may introduce unnecessary overhead for end-users. It's better to keep them under the same namespace.

Does this PR introduce any user-facing change?

No, since Spark 3.1 hasn't officially released.

How was this patch tested?

Pass existing tests.

@github-actions github-actions bot added the CORE label Jan 12, 2021
@Ngone51
Copy link
Member Author

Ngone51 commented Jan 12, 2021

cc @holdenk @cloud-fan Please take a look, thanks!


private[spark] val STORAGE_DECOMMISSION_FALLBACK_STORAGE_PATH =
ConfigBuilder("spark.storage.decommission.fallbackStorage.path")
ConfigBuilder("spark.decommission.storage.fallbackStoragePath")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SparkQA
Copy link

SparkQA commented Jan 12, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38556/

@SparkQA
Copy link

SparkQA commented Jan 12, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38556/

@SparkQA
Copy link

SparkQA commented Jan 12, 2021

Test build #133969 has finished for PR 31151 at commit 6087e59.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

cloud-fan commented Jan 12, 2021

I'm fine with it, as it's easier to track all the configs of the decommission feature. But we may need to make the rule clear: how to name feature configs if the feature crosses many components like worker, master, executor, etc.?

@Ngone51
Copy link
Member Author

Ngone51 commented Jan 12, 2021

I actually once had a short discussion(https://github.com/apache/spark/pull/28053/files#r400238749) with @tgravescs about conf naming at CORE side when working on the "resource" feature. cc @tgravescs for more inputs.

@tgravescs
Copy link
Contributor

Right, so this is going against the current name standards as was brought up in that discussion. the current format of these configs matches with that format where spark.executor, spark.workers, etc are the prefixes. The documentation in many cases categorizes them into sections by these prefixes. I get that naming them by feature may make it easier in some cases, but I also see where naming them as it makes it easier in others. I know that when I setup worker configuration I look for all spark.worker configs and those might go into separate file just for workers, whereas I put spark.executor configs in a file to be loaded by all applications, or when I am looking at things that might be used to optimize shuffle, I look at all spark.shuffle configs. My point is one person may think its easier this new way, others might think its easier as is. We have been using spark.executor, spark.worker, etc for a long time so changing that will introduce some confusion, especially if only done for a single feature.

So I'm against make this change unless we decide in general to change our naming rules for configs and think it should be decided on by the devs and documented appropriately. These configs were put in with the feature and reviewed by committers we should not be changing them last minute without good cause.

@cloud-fan
Copy link
Contributor

I agree with @tgravescs . I believe "spark.decommission.storage.fallbackStorage.path" still worth to fix. @Ngone51 can you revert other changes and only fix that config?

@holdenk
Copy link
Contributor

holdenk commented Jan 12, 2021

-1 on putting this into 3.1, it's not a bug fix and we've started the RC process already.
I'm open to considering a larger refactor in future versions.

@cloud-fan
Copy link
Contributor

cloud-fan commented Jan 12, 2021

@dongjoon-hyun do you think we will have more sub-configs under spark.decommission.storage.fallbackStorage namespace? If not, I think it violates the config naming policy and we should use spark.storage.decommission.fallbackStoragePath instead of spark.storage.decommission.fallbackStorage.path.

@dongjoon-hyun
Copy link
Member

@cloud-fan . Yes. Actually, I have more configurations internally and have a plan to deliver it to the community.
At the first PR, I intentionally drop .enabled to simplify the PR.

@dongjoon-hyun do you think we will have more sub-configs under spark.decommission.storage.fallbackStorage namespace? If not, I think it violates the config naming policy and we should use spark.storage.decommission.fallbackStoragePath instead of spark.storage.decommission.fallbackStorage.path.

@dongjoon-hyun
Copy link
Member

To say explicitly, I also want to give -1 for this.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Jan 13, 2021

I'm against make this change unless we decide in general to change our naming rules for configs and think it should be decided on by the devs and documented appropriately

This i agree. PySpark has the same issue too and I have been tracking this issue:

SQL:

  • spark.sql.pyspark.jvmStacktrace.enabled
  • spark.sql.execution.arrow.pyspark.*
  • spark.sql.execution.pyspark.udf.simplifiedTraceback.enabled

CORE

  • spark.executor.pyspark.memory
  • spark.pyspark.driver.python
  • spark.pyspark.python
  • spark.kubernetes.pyspark.pythonVersion
  • spark.python.daemon.module
  • spark.python.worker.*
  • spark.python.use.daemon
  • spark.python.task.killTimeout
  • spark.python.authenticate.socketTimeout
  • spark.python.profile.*

We should also merge both configurations spark.executor.pyspark.memory and spark.python.worker.memory (SPARK-26679).

It would be great if we have have a general naming to keep. Then we can deprecate and change the name. The current naming is kind of confusing.

@Ngone51
Copy link
Member Author

Ngone51 commented Jan 13, 2021

thanks for everyone's feedback. I'll close this PR first. Although, any ideas about naming rules are still welcomed :)

@Ngone51 Ngone51 closed this Jan 13, 2021
@dongjoon-hyun
Copy link
Member

Thank you for your decision, @Ngone51 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants