-
Notifications
You must be signed in to change notification settings - Fork 29k
[WIP][SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling #28818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
holdenk
wants to merge
10
commits into
apache:master
from
holdenk:SPARK-31198-use-graceful-decommissioning-as-part-of-dynamic-scaling-on-top-of-SPARK-1197
Closed
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
525b335
Shutdown executor once we are done decommissioning
holdenk 63da300
Make Spark's dynamic allocation use decommissioning
holdenk 9f7d752
Track the decommissioning executors in the core dynamic scheduler so …
holdenk efbe9a3
Fix our exiting and cleanup thread for better debugging next time. Cl…
holdenk ccbef6b
Verify executors decommissioned, then killed by external external clu…
holdenk b69784d
Verify some additional calls are not occuring in the executor allocat…
holdenk d4961d9
Dont' close the watcher until the end of the test
holdenk d5c5ef1
Use decommissionExecutors and set adjustTargetNumExecutors to false s…
holdenk 683c83c
bump numparts up to 6
holdenk f921ddd
Revert "bump numparts up to 6"
holdenk File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is
adjustTargetNumExecutorsdefaulting to true here ? This would mean that all schedulers would try to replinish the executor when asked toDecommissionExecutor(...)-- for example by the Master or when an executor gets a SIGPWR.I think it shouldn't be the default -- it should atleast be configurable. It only makes sense to have
adjustTargetNumExecutors=truewhen called fromorg.apache.spark.streaming.scheduler.ExecutorAllocationManager#killExecutor(ie when it is truly called from dynamic allocation codepath and we have decided that we want to replinish the executor).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look above there is a configurable call. This matches how
killExecutoris implemented down on line 124.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please point me to where is the configurable call ? I don't see a config check in the code paths that call this method.
It's fine for killExecutor to unconditionally adjust the target number of executors because it is only called in the dynamic allocation codepath, but decommissionExecutor would be called from many other codepaths as well (for example when the driver gets a DecommissionExecutor message) -- and thus I think it should just assume that it should replenish the executor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look on line 95 of this file. I think we should match the semantics of
killExecutoras much as possible. If there's a place where we don't want it we can usedecommissionExecutorsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, Should we rename decommissionExecutor (singular) to decommissionAndKillExecutor to reflect its purpose better ? It would be too easy to confuse it with decommissionExecutors (on line 95 of this file which allows to not replenish the target number of executors).
Do you want to make the change to the callers of decommissionExecutor in this PR and switch them to
decommissioExecutors(Seq(executorId), false)instead. The ones I am most concerned about are:In both the above cases, I think we may not always want replenishing. For example, in the standalone case, when the Worker gets a SIGPWR -- do we want to replenish the executors on the remaining workers (ie oversubscribe the remaining workers) ? Similarly when an executor gets a SIGPWR, do we want to put that load on the remaining executors ? I think the answer to both should be NO unless we are doing a dynamic allocation.
Personally I am fine with any choice of naming here as long as the semantics are not silently changed under the cover, as is the case presently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a new function, what are we changing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExecutorAllocationClient is a base class of CoarseGrainedSchedulerBackend. We moved decommissionExecutor from the latter class to the former and as such it is not a new function. Since CoarseGrainedSchedulerBackend no longer overrides decommissionExecutor, ExecutorAllocationClient.decommissionExecutor will be called when CoarseGrainedSchedulerBackend gets a DecommissionExecutor message -- and the semantics of that codepath have been changed to unconditionally impose adjustTargetNumExecutors=true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, I'll update the previous calls to
decommissioExecutor