-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-34088][CORE] Rename all decommission configurations to use the same namespace "spark.decommission.*" #31151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @holdenk @cloud-fan Please take a look, thanks! |
|
|
||
| private[spark] val STORAGE_DECOMMISSION_FALLBACK_STORAGE_PATH = | ||
| ConfigBuilder("spark.storage.decommission.fallbackStorage.path") | ||
| ConfigBuilder("spark.decommission.storage.fallbackStoragePath") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #133969 has finished for PR 31151 at commit
|
|
I'm fine with it, as it's easier to track all the configs of the decommission feature. But we may need to make the rule clear: how to name feature configs if the feature crosses many components like worker, master, executor, etc.? |
|
I actually once had a short discussion(https://github.com/apache/spark/pull/28053/files#r400238749) with @tgravescs about conf naming at CORE side when working on the "resource" feature. cc @tgravescs for more inputs. |
|
Right, so this is going against the current name standards as was brought up in that discussion. the current format of these configs matches with that format where spark.executor, spark.workers, etc are the prefixes. The documentation in many cases categorizes them into sections by these prefixes. I get that naming them by feature may make it easier in some cases, but I also see where naming them as it makes it easier in others. I know that when I setup worker configuration I look for all spark.worker configs and those might go into separate file just for workers, whereas I put spark.executor configs in a file to be loaded by all applications, or when I am looking at things that might be used to optimize shuffle, I look at all spark.shuffle configs. My point is one person may think its easier this new way, others might think its easier as is. We have been using spark.executor, spark.worker, etc for a long time so changing that will introduce some confusion, especially if only done for a single feature. So I'm against make this change unless we decide in general to change our naming rules for configs and think it should be decided on by the devs and documented appropriately. These configs were put in with the feature and reviewed by committers we should not be changing them last minute without good cause. |
|
I agree with @tgravescs . I believe |
|
-1 on putting this into 3.1, it's not a bug fix and we've started the RC process already. |
|
@dongjoon-hyun do you think we will have more sub-configs under |
|
@cloud-fan . Yes. Actually, I have more configurations internally and have a plan to deliver it to the community.
|
|
To say explicitly, I also want to give -1 for this. |
This i agree. PySpark has the same issue too and I have been tracking this issue: SQL:
CORE
We should also merge both configurations It would be great if we have have a general naming to keep. Then we can deprecate and change the name. The current naming is kind of confusing. |
|
thanks for everyone's feedback. I'll close this PR first. Although, any ideas about naming rules are still welcomed :) |
|
Thank you for your decision, @Ngone51 . |
What changes were proposed in this pull request?
This PR supposes to rename all decommission configurations to use the same namespace "spark.decommission.*".
Besides, it also refines the config "spark.decommission.storage.fallbackStorage.path" to "spark.decommission.storage.fallbackStoragePath".
Why are the changes needed?
Currently, decommission configurations are using difference namespaces, e.g.,
which may introduce unnecessary overhead for end-users. It's better to keep them under the same namespace.
Does this PR introduce any user-facing change?
No, since Spark 3.1 hasn't officially released.
How was this patch tested?
Pass existing tests.