Skip to content

Conversation

@cenyuhai
Copy link
Contributor

@cenyuhai cenyuhai commented Aug 21, 2016

What changes were proposed in this pull request?

DAG will list all partitions in the graph, it is too slow and hard to see all graph.
Always we don't want to see all partitions,we just want to see the relations of DAG graph.
So I just show 2 root nodes for Rdds.

Before this PR, the DAG graph looks like dag1.png, dag3.png, after this PR, the DAG graph looks like dag2.png,dag4.png

@SparkQA
Copy link

SparkQA commented Aug 21, 2016

Test build #64158 has finished for PR 14737 at commit 595453f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

if (parentIds.size == 0) {
rootNodeCount < retainedNodes
} else {
if (ids.size > 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole if-else is just ids.isEmpty || parentIds.exists(id => ids.contains(id) || !dropRDDIds.contains(id)) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you are right...

@SparkQA
Copy link

SparkQA commented Aug 21, 2016

Test build #64167 has finished for PR 14737 at commit 770b3de.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cenyuhai
Copy link
Contributor Author

@srowen

@cenyuhai cenyuhai changed the title [Spark-17171][WEB UI] DAG will list all partitions in the graph [SPARK-17171][WEB UI] DAG will list all partitions in the graph Aug 21, 2016
val DEFAULT_POOL_NAME = "default"
val DEFAULT_RETAINED_STAGES = 1000
val DEFAULT_RETAINED_JOBS = 1000
val DEFAULT_RETAINED_NODES = 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NODES, both here and in spark.ui.retainedNodes if far too ambiguous and non-specific for this configuration value -- "node" is already overloaded too many times in the existing Spark code and documentation; we don't need or want to add another overload.

Additionally, the default behavior should be the same as current behavior, since the change in behavior would be unexpected and it is far from clear to me that the overwhelming majority of users would prefer the proposed new behavior.

@SparkQA
Copy link

SparkQA commented Aug 22, 2016

Test build #64173 has finished for PR 14737 at commit 4019842.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cenyuhai
Copy link
Contributor Author

@srowen please review the latest codes, thank you.

@srowen
Copy link
Member

srowen commented Aug 25, 2016

Pardon the dumb question, but your before and after pictures seem like different graphs. Is that really showing the before and after of one display?

@cenyuhai
Copy link
Contributor Author

cenyuhai commented Aug 25, 2016

I am very sorry about, the first picture is for stage, the second picture is for job, but it is the same job "select count(1) from partitionedTables ". I upload new pictures. dag1.png vs dag4.png, dag3.png vs dag2.png.

@srowen
Copy link
Member

srowen commented Aug 25, 2016

I can see the value in this, but it also removes any indication that there are more elements to the graph that are not displayed. I wonder if there is any easy way to do that or do I misunderstand? Also why 2, just arbitrary?

@cenyuhai
Copy link
Contributor Author

cenyuhai commented Aug 25, 2016

@srowen Why I set this value 2, because a "JOIN" action needs 2 elements.Users want to know the relations from DAG graphs, if there are too manys elements, it is meesy. I have changed the codes as @markhamstra said, it will not remove elements by default. Users can set this value as they like. But I recommend that 2 is better.

@srowen
Copy link
Member

srowen commented Aug 25, 2016

OK, another config eh. If there's no default change in behavior, and it's undocumented and it doesn't introduce much complexity, seems OK

@cenyuhai
Copy link
Contributor Author

cenyuhai commented Sep 9, 2016

@srowen If it is ok,can you merge this pr to master?thank you.

dropRDDIds.add(rdd.id)
}

if (rdd.parentIds.size == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor stuff like:

if (rdd.parnetIds.isEmpty) {
  rootNoteCount += 1
}

@SparkQA
Copy link

SparkQA commented Sep 9, 2016

Test build #65152 has finished for PR 14737 at commit 40def0f.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 10, 2016

Test build #65190 has finished for PR 14737 at commit 3c72dde.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

stage.rddInfos.sortBy(_.id).foreach { rdd =>
var isAllowed = true
val parentIds = rdd.parentIds
if (parentIds.isEmpty) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we're just down to style stuff but I how about...

val isAllowed =
  if (parentId.isEmpty) {
    rootNodeCount += 1
    rootNodeCount <= retainedNodes
  } else {
    parentIds.exists(...)
  }

@SparkQA
Copy link

SparkQA commented Sep 10, 2016

Test build #65209 has finished for PR 14737 at commit 5163a51.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Sep 12, 2016

I looked one more time and the only thing that crossed my mind is whether spark.ui.dagGraph.retainedRootRDDs is the rightest naming convention, but, it's a hidden internal property at the moment, and I don't have a better idea, so let's leave it.

@srowen
Copy link
Member

srowen commented Sep 12, 2016

Merged to master

@asfgit asfgit closed this in cc87280 Sep 12, 2016
@cenyuhai
Copy link
Contributor Author

thank you!

wgtmac pushed a commit to wgtmac/spark that referenced this pull request Sep 19, 2016
## What changes were proposed in this pull request?
DAG will list all partitions in the graph, it is too slow and hard to see all graph.
Always we don't want to see all partitions,we just want to see the relations of DAG graph.
So I just show 2 root nodes for Rdds.

Before this PR, the DAG graph looks like [dag1.png](https://issues.apache.org/jira/secure/attachment/12824702/dag1.png), [dag3.png](https://issues.apache.org/jira/secure/attachment/12825456/dag3.png), after this PR, the DAG graph looks like [dag2.png](https://issues.apache.org/jira/secure/attachment/12824703/dag2.png),[dag4.png](https://issues.apache.org/jira/secure/attachment/12825457/dag4.png)

Author: cenyuhai <cenyuhai@didichuxing.com>
Author: 岑玉海 <261810726@qq.com>

Closes apache#14737 from cenyuhai/SPARK-17171.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants