-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-9646][SQL]Add metrics for all join and aggregate operators #8060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @rxin |
|
Test build #40268 has finished for PR 8060 at commit
|
|
Test build #40271 has finished for PR 8060 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it'd be great if we can avoid the extra iterator map, because an iterator wrapper actually introduces a lot of overhead.
|
@zsxwing my main feedback here is to get rid of the extra iterator overhead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this because there are only two places using it and it's not worth to add an abstract method to the parent class.
|
Test build #40293 has finished for PR 8060 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now i think about it - since most operators right after this one already checks the number of input rows, we can just remove this to avoid the iterator overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, if there is no filter, aggregate or join in a query, it won't display any number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make sure we add it for Project / TungstenProject.
…icalRDD and LocalTableScan
|
retest this please |
|
Test build #40382 has finished for PR 8060 at commit
|
|
Test build #40402 has finished for PR 8060 at commit
|
|
Aggregation related changes look good. btw, how is the overhead of it? Also, since this one has lots of changes and may easily get conflicted with master. Let's get it in as soon as possible. |
|
retest this please |
|
Test build #1453 has finished for PR 8060 at commit
|
|
I am merging it to master and branch 1.5. We can continue our review and have a follow up pr to address the comments. |
This PR added metrics for all join and aggregate operators. However, I found the metrics may be confusing in the following two case: 1. The iterator is not totally consumed and the metric values will be less. 2. Recreating the iterators will make metric values look bigger than the size of the input source, such as `CartesianProduct`. Author: zsxwing <zsxwing@gmail.com> Closes #8060 from zsxwing/sql-metrics and squashes the following commits: 40f3fc1 [zsxwing] Mark LongSQLMetric private[metric] to avoid using incorrectly and leak memory b1b9071 [zsxwing] Merge branch 'master' into sql-metrics 4bef25a [zsxwing] Add metrics for SortMergeOuterJoin 95ccfc6 [zsxwing] Merge branch 'master' into sql-metrics 67cb4dd [zsxwing] Add metrics for Project and TungstenProject; remove metrics from PhysicalRDD and LocalTableScan 0eb47d4 [zsxwing] Merge branch 'master' into sql-metrics dd9d932 [zsxwing] Avoid creating new Iterators 589ea26 [zsxwing] Add metrics for all join and aggregate operators (cherry picked from commit 5831294) Signed-off-by: Yin Huai <yhuai@databricks.com>
This PR added metrics for all join and aggregate operators. However, I found the metrics may be confusing in the following two case: 1. The iterator is not totally consumed and the metric values will be less. 2. Recreating the iterators will make metric values look bigger than the size of the input source, such as `CartesianProduct`. Author: zsxwing <zsxwing@gmail.com> Closes apache#8060 from zsxwing/sql-metrics and squashes the following commits: 40f3fc1 [zsxwing] Mark LongSQLMetric private[metric] to avoid using incorrectly and leak memory b1b9071 [zsxwing] Merge branch 'master' into sql-metrics 4bef25a [zsxwing] Add metrics for SortMergeOuterJoin 95ccfc6 [zsxwing] Merge branch 'master' into sql-metrics 67cb4dd [zsxwing] Add metrics for Project and TungstenProject; remove metrics from PhysicalRDD and LocalTableScan 0eb47d4 [zsxwing] Merge branch 'master' into sql-metrics dd9d932 [zsxwing] Avoid creating new Iterators 589ea26 [zsxwing] Add metrics for all join and aggregate operators
This PR added metrics for all join and aggregate operators. However, I found the metrics may be confusing in the following two case:
CartesianProduct.