Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-26928][CORE] Add driver CPU Time to the metrics system #23838

Closed
wants to merge 5 commits into from

Conversation

LucaCanali
Copy link
Contributor

What changes were proposed in this pull request?

This proposes to add instrumentation for the driver's JVM CPU time via the Spark Dropwizard/Codahale metrics system. It follows directly from previous work SPARK-25228 and shares similar motivations: it is intended as an improvement to be used for Spark performance dashboards and monitoring tools/instrumentation.

Implementation details: this PR takes the code introduced in SPARK-25228 and moves it to a new separate Source JVMCPUSource, which is then used to register the jvmCpuTime gauge metric for both executor and driver.
The registration of the jvmCpuTime metric for the driver is conditional, a new configuration parameter spark.metrics.cpu.time.driver.enabled (proposed default: false) is introduced for this purpose.

How was this patch tested?

Manually tested, using local mode and using YARN.

@@ -568,6 +569,10 @@ class SparkContext(config: SparkConf) extends Logging {
_taskScheduler.postStartHook()
_env.metricsSystem.registerSource(_dagScheduler.metricsSource)
_env.metricsSystem.registerSource(new BlockManagerSource(_env.blockManager))
// Register JVMCPUSource to the metrics system if spark.metrics.cpu.time.driver.enabled==true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment just repeats the code.

Why would you not enable this source? It seems to be always on for executors, what would be the drawback of doing the same for the driver?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Actually, I would not mind to have this metrics gauge always on for the driver too. I don't think the overhead is high. However, I was just being cautious here following previous discussions on the implementation of this gauge for the executor instance [SPARK-25228]. I would like to ping @srowen for comment as he helped back then with [SPARK-25228].

@@ -499,6 +499,13 @@ package object config {
.stringConf
.createOptional

private[spark] val METRICS_CPU_TIME_DRIVER_ENABLED =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need a flag -- is there a drawback to enabling it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thoughts adding a flag/configuratoni parameter is probably not necessary and can be removed, as the JVMCPU source should be quite safe. We have been using it for the executor instance for a while. Also the implementation takes care of various error conditions that can occur for exmaple with JVMs that don't implement "ProcessCpuTime"

@LucaCanali LucaCanali force-pushed the addCPUTimeMetricDriver branch from c665fb5 to 5068836 Compare February 28, 2019 21:55
@LucaCanali LucaCanali force-pushed the addCPUTimeMetricDriver branch from 5146f55 to b50bcb6 Compare March 1, 2019 19:38

private[spark] class JVMCPUSource extends Source {

override implicit val metricRegistry = new MetricRegistry()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't need to be implicit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I'll correct this, thanks.

@vanzin
Copy link
Contributor

vanzin commented Mar 4, 2019

ok to test

@SparkQA
Copy link

SparkQA commented Mar 4, 2019

Test build #103004 has finished for PR 23838 at commit 11ac210.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 5, 2019

Test build #103009 has finished for PR 23838 at commit 3c86ea4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Mar 5, 2019

Merging to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants