[SPARK-35215][SQL] Update custom metric per certain rows and at the end of the task #32330

viirya · 2021-04-25T06:47:57Z

What changes were proposed in this pull request?

This patch changes custom metric updating to update per certain rows (currently 100), instead of per row.

Why are the changes needed?

Based on previous discussion #31451 (comment), we should only update custom metrics per certain (e.g. 100) rows and also at the end of the task. Updating per row doesn't make too much benefit.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing unit test.

SparkQA · 2021-04-25T08:05:40Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42432/

SparkQA · 2021-04-25T08:05:41Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42432/

SparkQA · 2021-04-25T11:40:57Z

Test build #137908 has finished for PR 32330 at commit 1a98660.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2021-04-26T03:25:59Z

cc @cloud-fan

cloud-fan · 2021-04-26T06:39:19Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceRDD.scala

-          s"${metric.name()}")
-      customMetrics(metric.name()).set(metric.value())
+    if (numRow % CustomMetrics.numRowsPerUpdate == 0) {
+      reader.currentMetricsValues.foreach { metric =>


can we move it into a method to reuse code?

Added a reused method.

cloud-fan · 2021-04-26T06:40:06Z

sql/core/src/main/scala/org/apache/spark/sql/execution/metric/CustomMetrics.scala

 object CustomMetrics {
  private[spark] val V2_CUSTOM = "v2Custom"

+  private[spark] val numRowsPerUpdate = 100L


does it need to be a long?

numRow is a long, I guess this can be just int.

Made it as int.

cloud-fan · 2021-04-26T06:41:41Z

...main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousDataSourceRDD.scala

-          customMetrics(metric.name()).set(metric.value())
+        if (numRow % CustomMetrics.numRowsPerUpdate == 0) {
+          partitionReader.currentMetricsValues.foreach { metric =>
+            assert(customMetrics.contains(metric.name()),


I'm not sure how useful is the assert here. It's for internal error only and customMetrics(metric.name()) will fail too.

I can remove it. I also thought it is not necessary but just added for a comment before.

dongjoon-hyun · 2021-04-26T14:22:36Z

#32348 is merged. Do we need to rebase this PR, @viirya ?

viirya · 2021-04-26T16:19:37Z

@dongjoon-hyun Yes, I will rebase this PR. Thanks.

viirya · 2021-05-04T21:06:29Z

Rebased and updated for the comments. @cloud-fan @dongjoon-hyun

SparkQA · 2021-05-05T03:52:39Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42682/

SparkQA · 2021-05-05T03:52:40Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42682/

SparkQA · 2021-05-05T05:46:55Z

Test build #138161 has finished for PR 32330 at commit 4f0be6c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-05-06T07:09:20Z

sql/core/src/main/scala/org/apache/spark/sql/execution/metric/CustomMetrics.scala

 object CustomMetrics {
  private[spark] val V2_CUSTOM = "v2Custom"

+  private[spark] val numRowsPerUpdate = 100


nit: NUM_ROWS_PER_UPDATE since it's a constant?

SparkQA · 2021-05-06T08:43:06Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42719/

SparkQA · 2021-05-06T08:43:07Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42719/

SparkQA · 2021-05-06T12:20:09Z

Test build #138198 has finished for PR 32330 at commit 5f6cf5a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-05-06T13:21:04Z

thanks, merging to master!

Update metric per certain rows and at the end of the task.

1a98660

github-actions bot added SQL STRUCTURED STREAMING labels Apr 25, 2021

cloud-fan reviewed Apr 26, 2021

View reviewed changes

viirya added 2 commits May 4, 2021 13:24

Merge remote-tracking branch 'upstream/master' into metric-update

31262d7

For review comment.

4f0be6c

cloud-fan reviewed May 6, 2021

View reviewed changes

cloud-fan approved these changes May 6, 2021

View reviewed changes

For review comment.

5f6cf5a

cloud-fan closed this in 6cd5cf5 May 6, 2021

viirya deleted the metric-update branch December 27, 2023 18:25

[SPARK-35215][SQL] Update custom metric per certain rows and at the end of the task #32330

[SPARK-35215][SQL] Update custom metric per certain rows and at the end of the task #32330

Uh oh!

Conversation

viirya commented Apr 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented Apr 25, 2021

Uh oh!

SparkQA commented Apr 25, 2021

Uh oh!

SparkQA commented Apr 25, 2021

Uh oh!

viirya commented Apr 26, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Apr 26, 2021

Uh oh!

viirya commented Apr 26, 2021

Uh oh!

viirya commented May 4, 2021

Uh oh!

SparkQA commented May 5, 2021

Uh oh!

SparkQA commented May 5, 2021

Uh oh!

SparkQA commented May 5, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented May 6, 2021

Uh oh!

SparkQA commented May 6, 2021

Uh oh!

SparkQA commented May 6, 2021

Uh oh!

cloud-fan commented May 6, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

viirya commented Apr 25, 2021 •

edited

Loading