Skip to content

Conversation

@davies
Copy link
Contributor

@davies davies commented Apr 15, 2016

What changes were proposed in this pull request?

  1. Fix the "spill size" of TungstenAggregate and Sort
  2. Rename "data size" to "peak memory" to match the actual meaning (also consistent with task metrics)
  3. Added "data size" for ShuffleExchange and BroadcastExchange
  4. Added some timing for Sort, Aggregate and BroadcastExchange (this requires another patch to work)

How was this patch tested?

Existing tests.
metrics

@davies
Copy link
Contributor Author

davies commented Apr 15, 2016

cc @zsxwing

@SparkQA
Copy link

SparkQA commented Apr 15, 2016

Test build #55954 has finished for PR 12425 at commit a9b29e2.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// Remember spill data size of this task before execute this operator so that we can
// figure out how many bytes we spilled for this operator.
val spillSizeBefore = metrics.memoryBytesSpilled
val beforeSort = System.currentTimeMillis()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use nanoTime() instead of currentTimeMillis(), which is not guaranteed to be monotonic?

@SparkQA
Copy link

SparkQA commented Apr 15, 2016

Test build #55957 has finished for PR 12425 at commit 1378dd2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ericl
Copy link
Contributor

ericl commented Apr 15, 2016

Should we also add a metric back for dataSize, since peak memory usage might not be quite the same? Though maybe adding up peak memory and spill size can approximate it.

@SparkQA
Copy link

SparkQA commented Apr 16, 2016

Test build #55970 has finished for PR 12425 at commit 696aafe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor Author

davies commented Apr 16, 2016

@ericl Exchange has dataSize, should that be enough?

val rdd = child.execute().mapPartitionsInternal { iter =>
val localDataSize = dataSize.localValue
iter.map { row =>
localDataSize.add(row.asInstanceOf[UnsafeRow].getSizeInBytes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this iteration over each row a significant added overhead? Seems it would be better to count the data size in bulk instead where the sort is done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also worried the overhead added here, or remove the iterator here and count the size in UsafeRowSerializer (tried in the beginning, less clear than current one)?

@ericl
Copy link
Contributor

ericl commented Apr 16, 2016

I slightly prefer to have dataSize in the following stage so all the relevant metrics are together, but having it in Exchange seems ok too.

Also, I think it would be nice to have at some basic tests for the metrics, otherwise they are likely to become inaccurate since it's easy to break them without noticing.

@davies
Copy link
Contributor Author

davies commented Apr 21, 2016

cc @sameeragarwal Could you also take a look?

val sortedIterator = sorter.sort(iter.asInstanceOf[Iterator[UnsafeRow]])

dataSize += sorter.getPeakMemoryUsage
sortingTime += (System.nanoTime() - beforeSort) >> 20
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

">> 20"? I think it should be / 1000000.

@davies
Copy link
Contributor Author

davies commented Apr 22, 2016

@zsxwing Addressed you comments.

@SparkQA
Copy link

SparkQA commented Apr 22, 2016

Test build #56654 has finished for PR 12425 at commit cc65830.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 22, 2016

Test build #56656 has finished for PR 12425 at commit 1076c75.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

remove over counting
@sameeragarwal
Copy link
Member

LGTM. +1 on having tests arounds metrics.

@SparkQA
Copy link

SparkQA commented Apr 22, 2016

Test build #56712 has finished for PR 12425 at commit faeb593.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor Author

davies commented Apr 22, 2016

Merging this into master, thanks!

@asfgit asfgit closed this in 0dcf9db Apr 22, 2016
asfgit pushed a commit that referenced this pull request Sep 19, 2016
## What changes were proposed in this pull request?

Currently, the SQL metrics looks like `number of rows: 111111111111`, it's very hard to read how large the number is. So a separator was added by #12425, but removed by #14142, because the separator is weird in some locales (for example, pl_PL), this PR will add that back, but always use "," as the separator, since the SQL UI are all in English.

## How was this patch tested?

Existing tests.
![metrics](https://cloud.githubusercontent.com/assets/40902/14573908/21ad2f00-030d-11e6-9e2c-c544f30039ea.png)

Author: Davies Liu <davies@databricks.com>

Closes #15106 from davies/metric_sep.

(cherry picked from commit e063206)
Signed-off-by: Davies Liu <davies.liu@gmail.com>
asfgit pushed a commit that referenced this pull request Sep 19, 2016
## What changes were proposed in this pull request?

Currently, the SQL metrics looks like `number of rows: 111111111111`, it's very hard to read how large the number is. So a separator was added by #12425, but removed by #14142, because the separator is weird in some locales (for example, pl_PL), this PR will add that back, but always use "," as the separator, since the SQL UI are all in English.

## How was this patch tested?

Existing tests.
![metrics](https://cloud.githubusercontent.com/assets/40902/14573908/21ad2f00-030d-11e6-9e2c-c544f30039ea.png)

Author: Davies Liu <davies@databricks.com>

Closes #15106 from davies/metric_sep.
wgtmac pushed a commit to wgtmac/spark that referenced this pull request Sep 19, 2016
## What changes were proposed in this pull request?

Currently, the SQL metrics looks like `number of rows: 111111111111`, it's very hard to read how large the number is. So a separator was added by apache#12425, but removed by apache#14142, because the separator is weird in some locales (for example, pl_PL), this PR will add that back, but always use "," as the separator, since the SQL UI are all in English.

## How was this patch tested?

Existing tests.
![metrics](https://cloud.githubusercontent.com/assets/40902/14573908/21ad2f00-030d-11e6-9e2c-c544f30039ea.png)

Author: Davies Liu <davies@databricks.com>

Closes apache#15106 from davies/metric_sep.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants