Skip to content

Conversation

@AngersZhuuuu
Copy link
Contributor

@AngersZhuuuu AngersZhuuuu commented Nov 24, 2021

What changes were proposed in this pull request?

This pr continue the work of #29247 since origin author didn't reply for a long time.
Will add as co-author.

For the whole process of application, user may want to know each executor's peak memory usage to see the Resource utilization. The distribution of all executor's peak memory metrics usage can help users know whether or not there is a skew/bottleneck among executor resource utilization in a given stage.

We define activeOnly and quantiles query parameter in the REST API for all executors peak memory metrics distribution:

applications/<application_id>/<application_attempt/executorPeakMemoryMetricsDistribution?activeOnly=[true (default) | false]&quantiles=0.05,0.25,0.5,0.75,0.95
  1. withSummaries: default is false, define whether to show current stage's taskMetricsDistribution and executorMetricsDistribution
  2. quantiles: default is 0.0,0.25,0.5,0.75,1.0 only effect when withSummaries=true, it define the quantiles we use when calculating metrics distributions.

When withSummaries=true, both task metrics in percentile distribution and executor metrics in percentile distribution are included in the REST API output.  The default value of withSummaries is false, i.e. no metrics percentile distribution will be included in the REST API output.

 

Why are the changes needed?

Always user care about executor peak usage distribution, this pr help users understand executor peak memory metrics distributions.

Does this PR introduce any user-facing change?

User can use below restful API to get all executor's peak memory metrics distribution:

applications/<application_id>/<application_attempt>/executorPeakMemoryMetricsDistribution

How was this patch tested?

截屏2022-01-07 下午2 09 09

image

@AngersZhuuuu
Copy link
Contributor Author

AngersZhuuuu commented Nov 24, 2021

@sarutak @Patil Hi, I am trying to build this pr's UI table, but I found when I write such

$('#executorSummaryMetricsTitle').html("Summary Metrics for " + "<a href='#executorsTitle'>" + allExecCnt + " Executors" + "</a>");
      $('#executorsTitle').html("Executors (" + allExecCnt + ")");

It won't change the html content. I want to know how can I enable this? Hope for your suggestion.

@AngersZhuuuu
Copy link
Contributor Author

Also ping @pgandhi999

@SparkQA
Copy link

SparkQA commented Nov 24, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50035/

@SparkQA
Copy link

SparkQA commented Nov 24, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50035/

@SparkQA
Copy link

SparkQA commented Nov 24, 2021

Test build #145563 has finished for PR 34695 at commit fa685bf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@pgandhi999
Copy link

I shall definitely review the PR, thank you.

@pgandhi999
Copy link

Could you please post screenshots of the UI that was tested with your code changes in the PR description? Thank you.

return "Mapped Pool Memory";

case "ProcessTreeJVMVMemory":
return "Process Tree JVM Memory";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be VMemory

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there are ProcessTreeJVMVMemory and ProcessTreePythonVMemory

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am talking about the column heading, it should be Process Tree JVM VMemory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am talking about the column heading, it should be Process Tree JVM VMemory.

Hmmm, sorry for my mistake...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am talking about the column heading, it should be Process Tree JVM VMemory.

updated

@AngersZhuuuu
Copy link
Contributor Author

Could you please post screenshots of the UI that was tested with your code changes in the PR description? Thank you.

Done

@AngersZhuuuu AngersZhuuuu changed the title [WIP][SPARK-32446][CORE] Add percentile distribution REST API & UI of peak memory metrics for all executors [SPARK-32446][CORE] Add percentile distribution REST API & UI of peak memory metrics for all executors Dec 7, 2021
@SparkQA
Copy link

SparkQA commented Dec 7, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50461/

@SparkQA
Copy link

SparkQA commented Dec 7, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50460/

@SparkQA
Copy link

SparkQA commented Dec 7, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50461/

@SparkQA
Copy link

SparkQA commented Dec 7, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50460/

@SparkQA
Copy link

SparkQA commented Dec 7, 2021

Test build #145985 has finished for PR 34695 at commit b01750e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 7, 2021

Test build #145987 has finished for PR 34695 at commit b2a4289.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 8, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50468/

@SparkQA
Copy link

SparkQA commented Dec 8, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50468/

@SparkQA
Copy link

SparkQA commented Dec 8, 2021

Test build #145992 has finished for PR 34695 at commit 5e12e0c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member

I feel that there are too many metrics in the new table..I am using external monitor but I can't see the details of each executor when I open the executor page:
image

shall we remove/hide some minor metrics?

@AngersZhuuuu
Copy link
Contributor Author

I feel that there are too many metrics in the new table..I am using external monitor but I can't see the details of each executor when I open the executor page: image

shall we remove/hide some minor metrics?

How about just collapse this table by default? and support click to expand this?

@gengliangwang
Copy link
Member

How about just collapse this table by default? and support click to expand this?

Still, the table is quite long after expanding..
Shall we hide metrics in the following checkboxes(we can include GC count as well):
image
Many of the metrics are already there, and the UI doesn't show them by default

@AngersZhuuuu
Copy link
Contributor Author

@gengliangwang Have updated, I think current code can match your requirement.

@AngersZhuuuu
Copy link
Contributor Author

Gentle ping @gengliangwang

@AngersZhuuuu
Copy link
Contributor Author

ping @gengliangwang

@gengliangwang
Copy link
Member

@AngersZhuuuu I will check this one in the weekend.

@gengliangwang
Copy link
Member

image

@AngersZhuuuu shall we hide this by default if no additional metrics is selected?

@gengliangwang
Copy link
Member

cc @rednaxelafx @jasonli-db as well

@AngersZhuuuu
Copy link
Contributor Author

image

@AngersZhuuuu shall we hide this by default if no additional metrics is selected?

Updated

}

function getColumnNameForExecutorMetricSummary(columnKey) {
switch(columnKey) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we use a string map instead of switch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we use a string map instead of switch?

Hmm, same method from stagingspage.js

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we use a string map instead of switch?

Updated

sumCol.visible(!sumCol.visible());
}
var para = thisBox.attr('exec-sum-idx');
if(para != '') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use !==

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@gengliangwang
Copy link
Member

@AngersZhuuuu this LGTM overall.
My only concern is whether we need to all these metrics. cc @rednaxelafx
image

@AngersZhuuuu
Copy link
Contributor Author

@gengliangwang No reply for a long time....How about merge first then got reply form users?

@AngersZhuuuu
Copy link
Contributor Author

Gentle ping

@AngersZhuuuu
Copy link
Contributor Author

gentle ping @rednaxelafx @jasonli-db

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Sep 29, 2022
@github-actions github-actions bot closed this Sep 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants