-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32446][SHS] Add percentile distribution REST API of peak memory metrics for all executors #29247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@gengliangwang Could you help take a look? |
1 similar comment
|
@gengliangwang Could you help take a look? |
|
Add to whitelist |
|
ok to test |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #130353 has finished for PR 29247 at commit
|
|
@gengliangwang Could you help retest? The failed sparkr tests should be unrelated with this change. |
|
retest this please. |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #130394 has finished for PR 29247 at commit
|
|
@gengliangwang Any ideas about sparkr build failure? |
|
Kubernetes integration test starting |
|
Test build #130441 has finished for PR 29247 at commit
|
|
Kubernetes integration test status success |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #130549 has finished for PR 29247 at commit
|
|
Test build #130560 has finished for PR 29247 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #130571 has finished for PR 29247 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
@tgravescs @gengliangwang Do you have more comments for this? |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #132700 has finished for PR 29247 at commit
|
…y metrics for all executors
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #133031 has finished for PR 29247 at commit
|
|
Test build #133036 has finished for PR 29247 at commit
|
|
@tgravescs @gengliangwang Could you help merge this? |
|
I'm not against this but this is a one-off additional API endpoint for some summary stats of metrics available elsewhere? |
@srowen I will send another PR to add Web UI of this feature in executors page. |
@warrenzhu25 Many Spark users like to look at the executorSummary information using Web UI. it is a good idea to keep the feature's web UI and REST API consistent. |
| def executorSummary( | ||
| @QueryParam("activeOnly") @DefaultValue("true") activeOnly: Boolean, | ||
| @DefaultValue("0.05,0.25,0.5,0.75,0.95") @QueryParam("quantiles") quantileString: String) | ||
| : ExecutorMetricsDistributions = withUI { ui => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should be correct style? I follow same style as https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/api/v1/StagesResource.scala#L77
| def executorSummary( | ||
| @QueryParam("activeOnly") @DefaultValue("true") activeOnly: Boolean, | ||
| @DefaultValue("0.05,0.25,0.5,0.75,0.95") @QueryParam("quantiles") quantileString: String) | ||
| : ExecutorMetricsDistributions = withUI { ui => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indent
| @Path("executorMetricsDistribution") | ||
| def executorSummary( | ||
| @QueryParam("activeOnly") @DefaultValue("true") activeOnly: Boolean, | ||
| @DefaultValue("0.05,0.25,0.5,0.75,0.95") @QueryParam("quantiles") quantileString: String) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 0.05,0.25,0.5,0.75,0.95? In stage page, Spark shows the quantiles of Min/Max
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep consistency with /taskSummary api in https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/api/v1/StagesResource.scala#L77, I used this default quantiles.
And it seems stage page provided its own percentiles as below in https://github.com/apache/spark/blob/master/core/src/main/resources/org/apache/spark/ui/static/stagepage.js#L617
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI. In the corresponding web UI, the quantiles of Min/Max are displayed in the table "Summary Metrics for Completed Tasks" for a given stage page. In a parallel system, the duration of a stage is often determined by the slowest task/executor. To monitor/debug a skew issue, the maximal value (or 100% percentile value) is more useful than the 95% percentile value. On the other hand, 95% percentile value has been used in the past. One wise man once told me: Consistency means to repeat yesterday's mistake.
|
@warrenzhu25 Sorry for the late reply. My major concern is how this info shows in the executor page. The current summary section is more like an aggregated one. |
We could add a new section like stage page, but this issue could be discussed further in another pr. Do you have more comments for this pr? |
|
Just general comment; it'd be easier to pursue reviewers if you have some UI page the REST API is leveraged. (Doesn't need to be a code, screenshot would be sufficient.) It's not easy to see values in REST API without the actual use case. |
| def executorList(): Seq[ExecutorSummary] = withUI(_.store.executorList(true)) | ||
|
|
||
| @GET | ||
| @Path("executorMetricsDistribution") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
executorPeakMemoryMetricsDistribution?
|
Any update for this PR? |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
Add REST API for summary of executor peak memory metrics
Why are the changes needed?
Help users understand executor peak memory metrics distributions
Does this PR introduce any user-facing change?
Yes
How was this patch tested?
Added UT in HistoryServerSuite