-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-14669] [SQL] Fix some SQL metrics in codegen and added more #12425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @zsxwing |
|
Test build #55954 has finished for PR 12425 at commit
|
| // Remember spill data size of this task before execute this operator so that we can | ||
| // figure out how many bytes we spilled for this operator. | ||
| val spillSizeBefore = metrics.memoryBytesSpilled | ||
| val beforeSort = System.currentTimeMillis() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use nanoTime() instead of currentTimeMillis(), which is not guaranteed to be monotonic?
|
Test build #55957 has finished for PR 12425 at commit
|
|
Should we also add a metric back for |
|
Test build #55970 has finished for PR 12425 at commit
|
|
@ericl Exchange has |
| val rdd = child.execute().mapPartitionsInternal { iter => | ||
| val localDataSize = dataSize.localValue | ||
| iter.map { row => | ||
| localDataSize.add(row.asInstanceOf[UnsafeRow].getSizeInBytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this iteration over each row a significant added overhead? Seems it would be better to count the data size in bulk instead where the sort is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also worried the overhead added here, or remove the iterator here and count the size in UsafeRowSerializer (tried in the beginning, less clear than current one)?
|
I slightly prefer to have Also, I think it would be nice to have at some basic tests for the metrics, otherwise they are likely to become inaccurate since it's easy to break them without noticing. |
|
cc @sameeragarwal Could you also take a look? |
| val sortedIterator = sorter.sort(iter.asInstanceOf[Iterator[UnsafeRow]]) | ||
|
|
||
| dataSize += sorter.getPeakMemoryUsage | ||
| sortingTime += (System.nanoTime() - beforeSort) >> 20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
">> 20"? I think it should be / 1000000.
|
@zsxwing Addressed you comments. |
|
Test build #56654 has finished for PR 12425 at commit
|
|
Test build #56656 has finished for PR 12425 at commit
|
remove over counting
|
LGTM. +1 on having tests arounds metrics. |
|
Test build #56712 has finished for PR 12425 at commit
|
|
Merging this into master, thanks! |
## What changes were proposed in this pull request? Currently, the SQL metrics looks like `number of rows: 111111111111`, it's very hard to read how large the number is. So a separator was added by #12425, but removed by #14142, because the separator is weird in some locales (for example, pl_PL), this PR will add that back, but always use "," as the separator, since the SQL UI are all in English. ## How was this patch tested? Existing tests.  Author: Davies Liu <davies@databricks.com> Closes #15106 from davies/metric_sep. (cherry picked from commit e063206) Signed-off-by: Davies Liu <davies.liu@gmail.com>
## What changes were proposed in this pull request? Currently, the SQL metrics looks like `number of rows: 111111111111`, it's very hard to read how large the number is. So a separator was added by #12425, but removed by #14142, because the separator is weird in some locales (for example, pl_PL), this PR will add that back, but always use "," as the separator, since the SQL UI are all in English. ## How was this patch tested? Existing tests.  Author: Davies Liu <davies@databricks.com> Closes #15106 from davies/metric_sep.
## What changes were proposed in this pull request? Currently, the SQL metrics looks like `number of rows: 111111111111`, it's very hard to read how large the number is. So a separator was added by apache#12425, but removed by apache#14142, because the separator is weird in some locales (for example, pl_PL), this PR will add that back, but always use "," as the separator, since the SQL UI are all in English. ## How was this patch tested? Existing tests.  Author: Davies Liu <davies@databricks.com> Closes apache#15106 from davies/metric_sep.
What changes were proposed in this pull request?
How was this patch tested?
Existing tests.
