-
Notifications
You must be signed in to change notification settings - Fork 0
SHS-NG M4.4: Port JobsTab and StageTab to the new backend. #47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
squito
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok this one is pretty big, will take a few passes.
@cloud-fan @jerryshao @ajbozarth Would also appreciate getting more eyes on this if you want to start looking at this already
|
|
||
| // Create the graph data for all the job's stages. | ||
| event.stageInfos.foreach { stage => | ||
| val graph = RDDOperationGraph.makeOperationGraph(stage, Int.MaxValue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like the config for maxNodes was lost, even in M6
| @@ -63,6 +64,10 @@ private class LiveJob( | |||
| var activeTasks = 0 | |||
| var completedTasks = 0 | |||
| var failedTasks = 0 | |||
| val completedIndices = new OpenHashSet[Long]() | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment explaining this is stageId + taskIndex packed into one Long
| @@ -209,6 +240,56 @@ private[spark] class AppStatusStore(store: KVStore) { | |||
| indexed.skip(offset).max(length).asScala.map(_.info).toSeq | |||
| } | |||
|
|
|||
| private def stageWithDetails(stage: v1.StageData): v1.StageData = { | |||
| // TODO: limit tasks returned. | |||
| val maxTasks = Int.MaxValue | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO looks like its still present in M6 https://github.com/vanzin/spark/blob/90ddd8b529d24ae6133d2fa0ddb16efa41c678e1/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I should just remove this TODO. There's no way to change this without breaking the semantics of the current API endpoint.
What I did is add a new parameter to the API ("details") which controls whether the tasks are returned when you get the stage data.
| {listener.schedulingMode.map(_.toString).getOrElse("Unknown")} | ||
| </li> | ||
| val completedJobs = _completedJobs.toSeq.reverse | ||
| val failedJobs = _failedJobs.toSeq.reverse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor, but seems like you could create a Vector instead of a ListBuffer, and then just call reverseIterator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't that require changing all downstream calls to take an Iterator instead of a Seq?
|
hmmm, is it possible to split it into smaller PRs? It's really hard to review... |
|
It is really hard to split this PR into smaller chunks. If I separate the API changes from the UI changes, which is really the only thing that could be done, I will have to write throw-away code to make things work until the second PR is pushed, which is something I'd like to avoid. |
This change is a little larger because there's a whole lot of logic
behind these pages, all really tied to internal types and listeners.
There's also a lot of code that was moved to the new module.
- Added missing StageData and ExecutorStageSummary fields which are
used by the UI. Some json golden files needed to be updated to account
for new fields.
- Save RDD graph data in the store. This tries to re-use existing types as
much as possible, so that the code doesn't need to be re-written. So it's
probably not very optimal.
- Some old classes (e.g. JobProgressListener) still remain, since they're used
in other parts of the code; they're not used by the UI anymore, though, and
will be cleaned up in a separate change.
- Save information about active pools in the store. This data is not really used
in the SHS, but it's not a lot of data so it's still recorded when replaying
applications.
- Because the new store sorts things slightly differently from the previous
code, some json golden files had some elements within them shuffled around.
- The retention unit test in UISeleniumSuite was disabled because the code
to throw away old stages / tasks hasn't been added yet.
- The job description field in the API tries to follow the old behavior, which
makes it be empty most of the time, even though there's information to fill it
in. For stages, a new field was added to hold the description (which is basically
the job description), so that the UI can be rendered in the old way.
- A new stage status ("SKIPPED") was added to account for the fact that the API
couldn't represent that state before. Without this, the stage would show up as
"PENDING" in the UI, which is now based on API types.
- The API used to expose "executorRunTime" as the value of the task's duration,
which wasn't really correct (also because that value was easily available
from the metrics object); this change fixes that by storing the correct duration,
which also means a few expectation files needed to be updated to account for
the new durations and sorting differences due to the changed values.
- Implement SPARK-20713 and SPARK-21922.
This change is a little larger because there's a whole lot of logic
behind these pages, all really tied to internal types and listeners.
There's also a lot of code that was moved to the new module.
Added missing StageData and ExecutorStageSummary fields which are
used by the UI. Some json golden files needed to be updated to account
for new fields.
Save RDD graph data in the store. This tries to re-use existing types as
much as possible, so that the code doesn't need to be re-written. So it's
probably not very optimal.
Some old classes (e.g. JobProgressListener) still remain, since they're used
in other parts of the code; they're not used by the UI anymore, though, and
will be cleaned up in a separate change.
Save information about active pools in the store. This data is not really used
in the SHS, but it's not a lot of data so it's still recorded when replaying
applications.
Because the new store sorts things slightly differently from the previous
code, some json golden files had some elements within them shuffled around.
The retention unit test in UISeleniumSuite was disabled because the code
to throw away old stages / tasks hasn't been added yet.
The job description field in the API tries to follow the old behavior, which
makes it be empty most of the time, even though there's information to fill it
in. For stages, a new field was added to hold the description (which is basically
the job description), so that the UI can be rendered in the old way.
A new stage status ("SKIPPED") was added to account for the fact that the API
couldn't represent that state before. Without this, the stage would show up as
"PENDING" in the UI, which is now based on API types.
The API used to expose "executorRunTime" as the value of the task's duration,
which wasn't really correct (also because that value was easily available
from the metrics object); this change fixes that by storing the correct duration,
which also means a few expectation files needed to be updated to account for
the new durations and sorting differences due to the changed values.