Skip to content

Conversation

@warrenzhu25
Copy link
Contributor

What changes were proposed in this pull request?

Add REST API for summary of executor peak memory metrics

Why are the changes needed?

Help users understand executor peak memory metrics distributions

Does this PR introduce any user-facing change?

Yes

How was this patch tested?

Added UT in HistoryServerSuite

@warrenzhu25
Copy link
Contributor Author

@gengliangwang Could you help take a look?

1 similar comment
@warrenzhu25
Copy link
Contributor Author

@gengliangwang Could you help take a look?

@gengliangwang
Copy link
Member

Add to whitelist

@gengliangwang
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Oct 28, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34956/

@SparkQA
Copy link

SparkQA commented Oct 28, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34956/

@SparkQA
Copy link

SparkQA commented Oct 28, 2020

Test build #130353 has finished for PR 29247 at commit 1854e74.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@warrenzhu25
Copy link
Contributor Author

@gengliangwang Could you help retest? The failed sparkr tests should be unrelated with this change.

@gengliangwang
Copy link
Member

retest this please.

@SparkQA
Copy link

SparkQA commented Oct 29, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34997/

@SparkQA
Copy link

SparkQA commented Oct 29, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34997/

@SparkQA
Copy link

SparkQA commented Oct 29, 2020

Test build #130394 has finished for PR 29247 at commit 1854e74.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@warrenzhu25
Copy link
Contributor Author

@gengliangwang Any ideas about sparkr build failure?

@SparkQA
Copy link

SparkQA commented Oct 30, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35046/

@SparkQA
Copy link

SparkQA commented Oct 30, 2020

Test build #130441 has finished for PR 29247 at commit 9028552.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 30, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35046/

@SparkQA
Copy link

SparkQA commented Nov 3, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35148/

@SparkQA
Copy link

SparkQA commented Nov 3, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35148/

@SparkQA
Copy link

SparkQA commented Nov 3, 2020

Test build #130549 has finished for PR 29247 at commit 4ba4a15.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 3, 2020

Test build #130560 has finished for PR 29247 at commit 71a8e50.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 3, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35160/

@SparkQA
Copy link

SparkQA commented Nov 3, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35160/

@SparkQA
Copy link

SparkQA commented Nov 3, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35173/

@SparkQA
Copy link

SparkQA commented Nov 3, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35173/

@SparkQA
Copy link

SparkQA commented Nov 3, 2020

Test build #130571 has finished for PR 29247 at commit 32a55a6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 4, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35182/

@SparkQA
Copy link

SparkQA commented Nov 4, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35182/

@warrenzhu25
Copy link
Contributor Author

@tgravescs @gengliangwang Do you have more comments for this?

@SparkQA
Copy link

SparkQA commented Dec 12, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37303/

@SparkQA
Copy link

SparkQA commented Dec 12, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37303/

@SparkQA
Copy link

SparkQA commented Dec 12, 2020

Test build #132700 has finished for PR 29247 at commit c12b25f.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 18, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37630/

@SparkQA
Copy link

SparkQA commented Dec 18, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37630/

@SparkQA
Copy link

SparkQA commented Dec 18, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37635/

@SparkQA
Copy link

SparkQA commented Dec 18, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37635/

@SparkQA
Copy link

SparkQA commented Dec 18, 2020

Test build #133031 has finished for PR 29247 at commit 7acc98f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 18, 2020

Test build #133036 has finished for PR 29247 at commit 4443f1f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@warrenzhu25
Copy link
Contributor Author

@tgravescs @gengliangwang Could you help merge this?

@srowen
Copy link
Member

srowen commented Dec 19, 2020

I'm not against this but this is a one-off additional API endpoint for some summary stats of metrics available elsewhere?

@warrenzhu25
Copy link
Contributor Author

I'm not against this but this is a one-off additional API endpoint for some summary stats of metrics available elsewhere?

@srowen I will send another PR to add Web UI of this feature in executors page.

@ron8hu
Copy link
Contributor

ron8hu commented Jan 2, 2021

I'm not against this but this is a one-off additional API endpoint for some summary stats of metrics available elsewhere?

@srowen I will send another PR to add Web UI of this feature in executors page.

@warrenzhu25 Many Spark users like to look at the executorSummary information using Web UI. it is a good idea to keep the feature's web UI and REST API consistent.

def executorSummary(
@QueryParam("activeOnly") @DefaultValue("true") activeOnly: Boolean,
@DefaultValue("0.05,0.25,0.5,0.75,0.95") @QueryParam("quantiles") quantileString: String)
: ExecutorMetricsDistributions = withUI { ui =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def executorSummary(
@QueryParam("activeOnly") @DefaultValue("true") activeOnly: Boolean,
@DefaultValue("0.05,0.25,0.5,0.75,0.95") @QueryParam("quantiles") quantileString: String)
: ExecutorMetricsDistributions = withUI { ui =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indent

@Path("executorMetricsDistribution")
def executorSummary(
@QueryParam("activeOnly") @DefaultValue("true") activeOnly: Boolean,
@DefaultValue("0.05,0.25,0.5,0.75,0.95") @QueryParam("quantiles") quantileString: String)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 0.05,0.25,0.5,0.75,0.95? In stage page, Spark shows the quantiles of Min/Max

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI. In the corresponding web UI, the quantiles of Min/Max are displayed in the table "Summary Metrics for Completed Tasks" for a given stage page. In a parallel system, the duration of a stage is often determined by the slowest task/executor. To monitor/debug a skew issue, the maximal value (or 100% percentile value) is more useful than the 95% percentile value. On the other hand, 95% percentile value has been used in the past. One wise man once told me: Consistency means to repeat yesterday's mistake.

@gengliangwang
Copy link
Member

@warrenzhu25 Sorry for the late reply. My major concern is how this info shows in the executor page. The current summary section is more like an aggregated one.
The peak memory won't be displayed by default, it becomes tricky to add a new section.

@warrenzhu25
Copy link
Contributor Author

@warrenzhu25 Sorry for the late reply. My major concern is how this info shows in the executor page. The current summary section is more like an aggregated one.
The peak memory won't be displayed by default, it becomes tricky to add a new section.

We could add a new section like stage page, but this issue could be discussed further in another pr. Do you have more comments for this pr?

@HeartSaVioR
Copy link
Contributor

Just general comment; it'd be easier to pursue reviewers if you have some UI page the REST API is leveraged. (Doesn't need to be a code, screenshot would be sufficient.) It's not easy to see values in REST API without the actual use case.

def executorList(): Seq[ExecutorSummary] = withUI(_.store.executorList(true))

@GET
@Path("executorMetricsDistribution")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

executorPeakMemoryMetricsDistribution?

@AngersZhuuuu
Copy link
Contributor

Any update for this PR?

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants