SHS-NG M4.4: Port JobsTab and StageTab to the new backend. #47

vanzin · 2017-08-10T01:34:35Z

This change is a little larger because there's a whole lot of logic
behind these pages, all really tied to internal types and listeners.
There's also a lot of code that was moved to the new module.

Added missing StageData and ExecutorStageSummary fields which are
used by the UI. Some json golden files needed to be updated to account
for new fields.
Save RDD graph data in the store. This tries to re-use existing types as
much as possible, so that the code doesn't need to be re-written. So it's
probably not very optimal.
Some old classes (e.g. JobProgressListener) still remain, since they're used
in other parts of the code; they're not used by the UI anymore, though, and
will be cleaned up in a separate change.
Save information about active pools in the store. This data is not really used
in the SHS, but it's not a lot of data so it's still recorded when replaying
applications.
Because the new store sorts things slightly differently from the previous
code, some json golden files had some elements within them shuffled around.
The retention unit test in UISeleniumSuite was disabled because the code
to throw away old stages / tasks hasn't been added yet.
The job description field in the API tries to follow the old behavior, which
makes it be empty most of the time, even though there's information to fill it
in. For stages, a new field was added to hold the description (which is basically
the job description), so that the UI can be rendered in the old way.
A new stage status ("SKIPPED") was added to account for the fact that the API
couldn't represent that state before. Without this, the stage would show up as
"PENDING" in the UI, which is now based on API types.
The API used to expose "executorRunTime" as the value of the task's duration,
which wasn't really correct (also because that value was easily available
from the metrics object); this change fixes that by storing the correct duration,
which also means a few expectation files needed to be updated to account for
the new durations and sorting differences due to the changed values.

squito

ok this one is pretty big, will take a few passes.

@cloud-fan @jerryshao @ajbozarth Would also appreciate getting more eyes on this if you want to start looking at this already

squito · 2017-10-27T15:49:39Z

core/src/main/scala/org/apache/spark/status/AppStatusListener.scala

+
+    // Create the graph data for all the job's stages.
+    event.stageInfos.foreach { stage =>
+      val graph = RDDOperationGraph.makeOperationGraph(stage, Int.MaxValue)


it looks like the config for maxNodes was lost, even in M6

squito · 2017-10-27T16:00:31Z

core/src/main/scala/org/apache/spark/status/LiveEntity.scala

@@ -63,6 +64,10 @@ private class LiveJob(
  var activeTasks = 0
  var completedTasks = 0
  var failedTasks = 0
+  val completedIndices = new OpenHashSet[Long]()


can you add a comment explaining this is stageId + taskIndex packed into one Long

squito · 2017-10-27T16:52:19Z

core/src/main/scala/org/apache/spark/status/AppStatusStore.scala

@@ -209,6 +240,56 @@ private[spark] class AppStatusStore(store: KVStore) {
    indexed.skip(offset).max(length).asScala.map(_.info).toSeq
  }

+  private def stageWithDetails(stage: v1.StageData): v1.StageData = {
+    // TODO: limit tasks returned.
+    val maxTasks = Int.MaxValue


TODO looks like its still present in M6 https://github.com/vanzin/spark/blob/90ddd8b529d24ae6133d2fa0ddb16efa41c678e1/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala

Actually I should just remove this TODO. There's no way to change this without breaking the semantics of the current API endpoint.

What I did is add a new parameter to the API ("details") which controls whether the tasks are returned when you get the stage data.

squito · 2017-10-27T17:14:24Z

core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala

-              {listener.schedulingMode.map(_.toString).getOrElse("Unknown")}
-            </li>
+    val completedJobs = _completedJobs.toSeq.reverse
+    val failedJobs = _failedJobs.toSeq.reverse


minor, but seems like you could create a Vector instead of a ListBuffer, and then just call reverseIterator.

Wouldn't that require changing all downstream calls to take an Iterator instead of a Seq?

cloud-fan · 2017-10-28T23:27:50Z

hmmm, is it possible to split it into smaller PRs? It's really hard to review...

vanzin · 2017-10-29T02:32:48Z

It is really hard to split this PR into smaller chunks. If I separate the API changes from the UI changes, which is really the only thing that could be done, I will have to write throw-away code to make things work until the second PR is pushed, which is something I'd like to avoid.

This change is a little larger because there's a whole lot of logic behind these pages, all really tied to internal types and listeners. There's also a lot of code that was moved to the new module. - Added missing StageData and ExecutorStageSummary fields which are used by the UI. Some json golden files needed to be updated to account for new fields. - Save RDD graph data in the store. This tries to re-use existing types as much as possible, so that the code doesn't need to be re-written. So it's probably not very optimal. - Some old classes (e.g. JobProgressListener) still remain, since they're used in other parts of the code; they're not used by the UI anymore, though, and will be cleaned up in a separate change. - Save information about active pools in the store. This data is not really used in the SHS, but it's not a lot of data so it's still recorded when replaying applications. - Because the new store sorts things slightly differently from the previous code, some json golden files had some elements within them shuffled around. - The retention unit test in UISeleniumSuite was disabled because the code to throw away old stages / tasks hasn't been added yet. - The job description field in the API tries to follow the old behavior, which makes it be empty most of the time, even though there's information to fill it in. For stages, a new field was added to hold the description (which is basically the job description), so that the UI can be rendered in the old way. - A new stage status ("SKIPPED") was added to account for the fact that the API couldn't represent that state before. Without this, the stage would show up as "PENDING" in the UI, which is now based on API types. - The API used to expose "executorRunTime" as the value of the task's duration, which wasn't really correct (also because that value was easily available from the metrics object); this change fixes that by storing the correct duration, which also means a few expectation files needed to be updated to account for the new durations and sorting differences due to the changed values. - Implement SPARK-20713 and SPARK-21922.

vanzin force-pushed the shs-ng/M4.4 branch from 93d4fe4 to 23fadf6 Compare August 10, 2017 01:36

vanzin force-pushed the shs-ng/M4.3 branch from 9fdf63b to 7c775bc Compare September 28, 2017 17:55

vanzin force-pushed the shs-ng/M4.4 branch from 23fadf6 to e83ca67 Compare September 28, 2017 17:55

vanzin force-pushed the shs-ng/M4.3 branch from 7c775bc to f7d1766 Compare October 26, 2017 18:28

vanzin force-pushed the shs-ng/M4.4 branch from e83ca67 to d869e71 Compare October 26, 2017 18:29

vanzin force-pushed the shs-ng/M4.3 branch from f7d1766 to 83463c0 Compare October 26, 2017 21:13

vanzin force-pushed the shs-ng/M4.4 branch from d869e71 to 98460bb Compare October 26, 2017 21:13

squito reviewed Oct 27, 2017

View reviewed changes

vanzin force-pushed the shs-ng/M4.4 branch from 98460bb to 71164ed Compare October 28, 2017 00:18

vanzin force-pushed the shs-ng/M4.4 branch from 71164ed to b7f39fa Compare November 3, 2017 21:01

vanzin force-pushed the shs-ng/M4.3 branch from 83463c0 to 0609a67 Compare November 6, 2017 19:34

vanzin force-pushed the shs-ng/M4.4 branch from b7f39fa to 406c3ad Compare November 6, 2017 19:34

vanzin mentioned this pull request Nov 8, 2017

[SPARK-20648][core] Port JobsTab and StageTab to the new UI backend. apache/spark#19698

Closed

vanzin closed this Nov 8, 2017

vanzin deleted the shs-ng/M4.4 branch April 25, 2019 16:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SHS-NG M4.4: Port JobsTab and StageTab to the new backend. #47

SHS-NG M4.4: Port JobsTab and StageTab to the new backend. #47

Uh oh!

vanzin commented Aug 10, 2017

Uh oh!

squito left a comment

Uh oh!

squito Oct 27, 2017

Uh oh!

squito Oct 27, 2017

Uh oh!

squito Oct 27, 2017

Uh oh!

vanzin Oct 27, 2017

Uh oh!

squito Oct 27, 2017

Uh oh!

vanzin Oct 27, 2017

Uh oh!

cloud-fan commented Oct 28, 2017

Uh oh!

vanzin commented Oct 29, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SHS-NG M4.4: Port JobsTab and StageTab to the new backend. #47

SHS-NG M4.4: Port JobsTab and StageTab to the new backend. #47

Uh oh!

Conversation

vanzin commented Aug 10, 2017

Uh oh!

squito left a comment

Choose a reason for hiding this comment

Uh oh!

squito Oct 27, 2017

Choose a reason for hiding this comment

Uh oh!

squito Oct 27, 2017

Choose a reason for hiding this comment

Uh oh!

squito Oct 27, 2017

Choose a reason for hiding this comment

Uh oh!

vanzin Oct 27, 2017

Choose a reason for hiding this comment

Uh oh!

squito Oct 27, 2017

Choose a reason for hiding this comment

Uh oh!

vanzin Oct 27, 2017

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Oct 28, 2017

Uh oh!

vanzin commented Oct 29, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants