[SPARK-14669] [SQL] Fix some SQL metrics in codegen and added more #12425

davies · 2016-04-15T20:23:04Z

What changes were proposed in this pull request?

Fix the "spill size" of TungstenAggregate and Sort
Rename "data size" to "peak memory" to match the actual meaning (also consistent with task metrics)
Added "data size" for ShuffleExchange and BroadcastExchange
Added some timing for Sort, Aggregate and BroadcastExchange (this requires another patch to work)

How was this patch tested?

Existing tests.

davies · 2016-04-15T20:23:47Z

cc @zsxwing

SparkQA · 2016-04-15T20:29:00Z

Test build #55954 has finished for PR 12425 at commit a9b29e2.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

ericl · 2016-04-15T22:07:39Z

sql/core/src/main/scala/org/apache/spark/sql/execution/Sort.scala

      // Remember spill data size of this task before execute this operator so that we can
      // figure out how many bytes we spilled for this operator.
      val spillSizeBefore = metrics.memoryBytesSpilled
+      val beforeSort = System.currentTimeMillis()


Should we use nanoTime() instead of currentTimeMillis(), which is not guaranteed to be monotonic?

SparkQA · 2016-04-15T22:21:11Z

Test build #55957 has finished for PR 12425 at commit 1378dd2.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

ericl · 2016-04-15T22:51:37Z

Should we also add a metric back for dataSize, since peak memory usage might not be quite the same? Though maybe adding up peak memory and spill size can approximate it.

SparkQA · 2016-04-16T00:30:58Z

Test build #55970 has finished for PR 12425 at commit 696aafe.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

davies · 2016-04-16T05:08:01Z

@ericl Exchange has dataSize, should that be enough?

ericl · 2016-04-16T05:44:55Z

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchange.scala

+    val rdd = child.execute().mapPartitionsInternal { iter =>
+      val localDataSize = dataSize.localValue
+      iter.map { row =>
+        localDataSize.add(row.asInstanceOf[UnsafeRow].getSizeInBytes)


Isn't this iteration over each row a significant added overhead? Seems it would be better to count the data size in bulk instead where the sort is done.

I also worried the overhead added here, or remove the iterator here and count the size in UsafeRowSerializer (tried in the beginning, less clear than current one)?

ericl · 2016-04-16T05:45:51Z

I slightly prefer to have dataSize in the following stage so all the relevant metrics are together, but having it in Exchange seems ok too.

Also, I think it would be nice to have at some basic tests for the metrics, otherwise they are likely to become inaccurate since it's easy to break them without noticing.

davies · 2016-04-21T03:20:18Z

cc @sameeragarwal Could you also take a look?

zsxwing · 2016-04-21T18:18:24Z

sql/core/src/main/scala/org/apache/spark/sql/execution/Sort.scala

      val sortedIterator = sorter.sort(iter.asInstanceOf[Iterator[UnsafeRow]])

-      dataSize += sorter.getPeakMemoryUsage
+      sortingTime += (System.nanoTime() - beforeSort) >> 20


">> 20"? I think it should be / 1000000.

davies · 2016-04-22T05:29:34Z

@zsxwing Addressed you comments.

SparkQA · 2016-04-22T05:30:42Z

Test build #56654 has finished for PR 12425 at commit cc65830.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-22T07:28:57Z

Test build #56656 has finished for PR 12425 at commit 1076c75.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

remove over counting

sameeragarwal · 2016-04-22T18:08:24Z

LGTM. +1 on having tests arounds metrics.

SparkQA · 2016-04-22T19:34:47Z

Test build #56712 has finished for PR 12425 at commit faeb593.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

davies · 2016-04-22T19:59:09Z

Merging this into master, thanks!

## What changes were proposed in this pull request? Currently, the SQL metrics looks like `number of rows: 111111111111`, it's very hard to read how large the number is. So a separator was added by #12425, but removed by #14142, because the separator is weird in some locales (for example, pl_PL), this PR will add that back, but always use "," as the separator, since the SQL UI are all in English. ## How was this patch tested? Existing tests. ![metrics](https://cloud.githubusercontent.com/assets/40902/14573908/21ad2f00-030d-11e6-9e2c-c544f30039ea.png) Author: Davies Liu <davies@databricks.com> Closes #15106 from davies/metric_sep. (cherry picked from commit e063206) Signed-off-by: Davies Liu <davies.liu@gmail.com>

## What changes were proposed in this pull request? Currently, the SQL metrics looks like `number of rows: 111111111111`, it's very hard to read how large the number is. So a separator was added by #12425, but removed by #14142, because the separator is weird in some locales (for example, pl_PL), this PR will add that back, but always use "," as the separator, since the SQL UI are all in English. ## How was this patch tested? Existing tests. ![metrics](https://cloud.githubusercontent.com/assets/40902/14573908/21ad2f00-030d-11e6-9e2c-c544f30039ea.png) Author: Davies Liu <davies@databricks.com> Closes #15106 from davies/metric_sep.

## What changes were proposed in this pull request? Currently, the SQL metrics looks like `number of rows: 111111111111`, it's very hard to read how large the number is. So a separator was added by apache#12425, but removed by apache#14142, because the separator is weird in some locales (for example, pl_PL), this PR will add that back, but always use "," as the separator, since the SQL UI are all in English. ## How was this patch tested? Existing tests. ![metrics](https://cloud.githubusercontent.com/assets/40902/14573908/21ad2f00-030d-11e6-9e2c-c544f30039ea.png) Author: Davies Liu <davies@databricks.com> Closes apache#15106 from davies/metric_sep.

fix sql metrics

a9b29e2

fix bug

1378dd2

ericl reviewed Apr 15, 2016
View reviewed changes

address comments

696aafe

ericl reviewed Apr 16, 2016
View reviewed changes

zsxwing reviewed Apr 21, 2016
View reviewed changes

Merge branch 'master' of github.com:apache/spark into fix_metrics

1076c75

davies force-pushed the fix_metrics branch from cc65830 to 1076c75 Compare April 22, 2016 05:28

Update TungstenAggregate.scala

faeb593

remove over counting

asfgit closed this in 0dcf9db Apr 22, 2016

davies mentioned this pull request Sep 14, 2016

[SPARK-16439] [SQL] bring back the separator in SQL UI #15106

Closed

[SPARK-14669] [SQL] Fix some SQL metrics in codegen and added more #12425

[SPARK-14669] [SQL] Fix some SQL metrics in codegen and added more #12425

Uh oh!

Conversation

davies commented Apr 15, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

davies commented Apr 15, 2016

Uh oh!

SparkQA commented Apr 15, 2016

Uh oh!

ericl Apr 15, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Apr 15, 2016

Uh oh!

ericl commented Apr 15, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Apr 16, 2016

Uh oh!

davies commented Apr 16, 2016

Uh oh!

ericl Apr 16, 2016

Choose a reason for hiding this comment

Uh oh!

davies Apr 19, 2016

Choose a reason for hiding this comment

Uh oh!

ericl commented Apr 16, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davies commented Apr 21, 2016

Uh oh!

zsxwing Apr 21, 2016

Choose a reason for hiding this comment

Uh oh!

davies commented Apr 22, 2016

Uh oh!

SparkQA commented Apr 22, 2016

Uh oh!

SparkQA commented Apr 22, 2016

Uh oh!

sameeragarwal commented Apr 22, 2016

Uh oh!

SparkQA commented Apr 22, 2016

Uh oh!

davies commented Apr 22, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ericl commented Apr 15, 2016 •

edited

Loading

ericl commented Apr 16, 2016 •

edited

Loading