-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17171][WEB UI] DAG will list all partitions in the graph #14737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #64158 has finished for PR 14737 at commit
|
| if (parentIds.size == 0) { | ||
| rootNodeCount < retainedNodes | ||
| } else { | ||
| if (ids.size > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole if-else is just ids.isEmpty || parentIds.exists(id => ids.contains(id) || !dropRDDIds.contains(id)) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you are right...
|
Test build #64167 has finished for PR 14737 at commit
|
| val DEFAULT_POOL_NAME = "default" | ||
| val DEFAULT_RETAINED_STAGES = 1000 | ||
| val DEFAULT_RETAINED_JOBS = 1000 | ||
| val DEFAULT_RETAINED_NODES = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NODES, both here and in spark.ui.retainedNodes if far too ambiguous and non-specific for this configuration value -- "node" is already overloaded too many times in the existing Spark code and documentation; we don't need or want to add another overload.
Additionally, the default behavior should be the same as current behavior, since the change in behavior would be unexpected and it is far from clear to me that the overwhelming majority of users would prefer the proposed new behavior.
|
Test build #64173 has finished for PR 14737 at commit
|
|
@srowen please review the latest codes, thank you. |
|
Pardon the dumb question, but your before and after pictures seem like different graphs. Is that really showing the before and after of one display? |
|
I can see the value in this, but it also removes any indication that there are more elements to the graph that are not displayed. I wonder if there is any easy way to do that or do I misunderstand? Also why 2, just arbitrary? |
|
@srowen Why I set this value 2, because a "JOIN" action needs 2 elements.Users want to know the relations from DAG graphs, if there are too manys elements, it is meesy. I have changed the codes as @markhamstra said, it will not remove elements by default. Users can set this value as they like. But I recommend that 2 is better. |
|
OK, another config eh. If there's no default change in behavior, and it's undocumented and it doesn't introduce much complexity, seems OK |
|
@srowen If it is ok,can you merge this pr to master?thank you. |
| dropRDDIds.add(rdd.id) | ||
| } | ||
|
|
||
| if (rdd.parentIds.size == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor stuff like:
if (rdd.parnetIds.isEmpty) {
rootNoteCount += 1
}
|
Test build #65152 has finished for PR 14737 at commit
|
|
Test build #65190 has finished for PR 14737 at commit
|
| stage.rddInfos.sortBy(_.id).foreach { rdd => | ||
| var isAllowed = true | ||
| val parentIds = rdd.parentIds | ||
| if (parentIds.isEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we're just down to style stuff but I how about...
val isAllowed =
if (parentId.isEmpty) {
rootNodeCount += 1
rootNodeCount <= retainedNodes
} else {
parentIds.exists(...)
}
|
Test build #65209 has finished for PR 14737 at commit
|
|
I looked one more time and the only thing that crossed my mind is whether spark.ui.dagGraph.retainedRootRDDs is the rightest naming convention, but, it's a hidden internal property at the moment, and I don't have a better idea, so let's leave it. |
|
Merged to master |
|
thank you! |
## What changes were proposed in this pull request? DAG will list all partitions in the graph, it is too slow and hard to see all graph. Always we don't want to see all partitions,we just want to see the relations of DAG graph. So I just show 2 root nodes for Rdds. Before this PR, the DAG graph looks like [dag1.png](https://issues.apache.org/jira/secure/attachment/12824702/dag1.png), [dag3.png](https://issues.apache.org/jira/secure/attachment/12825456/dag3.png), after this PR, the DAG graph looks like [dag2.png](https://issues.apache.org/jira/secure/attachment/12824703/dag2.png),[dag4.png](https://issues.apache.org/jira/secure/attachment/12825457/dag4.png) Author: cenyuhai <cenyuhai@didichuxing.com> Author: 岑玉海 <261810726@qq.com> Closes apache#14737 from cenyuhai/SPARK-17171.
What changes were proposed in this pull request?
DAG will list all partitions in the graph, it is too slow and hard to see all graph.
Always we don't want to see all partitions,we just want to see the relations of DAG graph.
So I just show 2 root nodes for Rdds.
Before this PR, the DAG graph looks like dag1.png, dag3.png, after this PR, the DAG graph looks like dag2.png,dag4.png