-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-15926] Improve readability of DAGScheduler stage creation methods #13677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit renames shuffleToMapStage to shuffleIdToMapStage to make it clear that the shuffle id (and not the shuffle dependency, or the shuffle map stage, or the shuffle map stage id) is the key in the hash map.
I think having it nested made the code harder to read.
|
Test build #60540 has finished for PR 13677 at commit
|
Something that I have been looking at of late, and I know that @squito has looked at some, too. In short, I'm pretty confident that we are doing some silliness around creating new stages instead of reusing already existing stages, then recognizing that all the tasks for the "new" stages are already completed (at least we're smart enough to reuse the map outputs), so the "new" stages just become "skipped". I'll take a closer look at this tomorrow, and may have a follow-on PR in the not too distant future. |
| jobId: Int, | ||
| callSite: CallSite): ResultStage = { | ||
| val id = nextStageId.getAndIncrement() | ||
| val stage = new ResultStage( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getOrCreateParentStages should be called before getting the id for the result stage, otherwise the result stage will get numbered below the dependent stages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from a quick check, I think that might solve all the failing tests
|
everything proposed here makes sense to me, I think just needs the one minor reordering I mentioned. about creating extra stages, I looked into this some here: https://issues.apache.org/jira/browse/SPARK-10193. Actually seems fairly straight-forward, though I never followed up on profiling the extra memory involved. (happy to let someone else takeover, I doubt I'll get back to it anytime soon ...) |
|
Thanks for pointing out the re-ordering issue! Fixed. Re: creating extra stages, I agree we could do better with that, because it is confusing. I have another commit with some light cleanup of that (but wanted to keep refactoring separate from functionality changes). |
|
Test build #60580 has finished for PR 13677 at commit
|
|
This test failure is the flakey scheduler test -- but I'll hold off on re-running Jenkins since I'm guessing there will be some comments to address on this anyway (so the tests will need to be re-run after that). |
|
LGTM, and runs without flakiness for me when rebased onto master with the #13688 HOTFIX. |
|
Jenkins, retest this please |
|
Test build #60714 has finished for PR 13677 at commit
|
|
Merged into master (FYI @squito, since I'm guessing there may be minor merge conflicts with your blacklisting work) |
What changes were proposed in this pull request?
This pull request refactors parts of the DAGScheduler to improve readability, focusing on the code around stage creation. One goal of this change it to make it clearer which functions may create new stages (as opposed to looking up stages that already exist). There are no functionality changes in this pull request. In more detail:
cc @squito @markhamstra @JoshRosen (who requested more DAG scheduler commenting long ago -- an issue this pull request tries, in part, to address)
FYI @rxin