Skip to content

Conversation

@zsxwing
Copy link
Member

@zsxwing zsxwing commented Aug 9, 2015

This PR added metrics for all join and aggregate operators. However, I found the metrics may be confusing in the following two case:

  1. The iterator is not totally consumed and the metric values will be less.
  2. Recreating the iterators will make metric values look bigger than the size of the input source, such as CartesianProduct.

@zsxwing
Copy link
Member Author

zsxwing commented Aug 9, 2015

cc @rxin

@SparkQA
Copy link

SparkQA commented Aug 9, 2015

Test build #40268 has finished for PR 8060 at commit 263ff83.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 9, 2015

Test build #40271 has finished for PR 8060 at commit 589ea26.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'd be great if we can avoid the extra iterator map, because an iterator wrapper actually introduces a lot of overhead.

@rxin
Copy link
Contributor

rxin commented Aug 9, 2015

@zsxwing my main feedback here is to get rid of the extra iterator overhead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this because there are only two places using it and it's not worth to add an abstract method to the parent class.

@SparkQA
Copy link

SparkQA commented Aug 10, 2015

Test build #40293 has finished for PR 8060 at commit dd9d932.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now i think about it - since most operators right after this one already checks the number of input rows, we can just remove this to avoid the iterator overhead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, if there is no filter, aggregate or join in a query, it won't display any number.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure we add it for Project / TungstenProject.

@rxin
Copy link
Contributor

rxin commented Aug 11, 2015

@davies can you take a look at the join changes?

@yhuai can you take a quick look at the aggregation changes?

@zsxwing
Copy link
Member Author

zsxwing commented Aug 11, 2015

retest this please

@SparkQA
Copy link

SparkQA commented Aug 11, 2015

Test build #40382 has finished for PR 8060 at commit 67cb4dd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 11, 2015

Test build #40402 has finished for PR 8060 at commit 4bef25a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class SortMergeOuterJoin(

@yhuai
Copy link
Contributor

yhuai commented Aug 11, 2015

Aggregation related changes look good.

btw, how is the overhead of it?

Also, since this one has lots of changes and may easily get conflicted with master. Let's get it in as soon as possible.

@zsxwing
Copy link
Member Author

zsxwing commented Aug 11, 2015

retest this please

@SparkQA
Copy link

SparkQA commented Aug 11, 2015

Test build #1453 has finished for PR 8060 at commit 40f3fc1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented Aug 11, 2015

I am merging it to master and branch 1.5. We can continue our review and have a follow up pr to address the comments.

asfgit pushed a commit that referenced this pull request Aug 11, 2015
This PR added metrics for all join and aggregate operators. However, I found the metrics may be confusing in the following two case:
1. The iterator is not totally consumed and the metric values will be less.
2. Recreating the iterators will make metric values look bigger than the size of the input source, such as `CartesianProduct`.

Author: zsxwing <zsxwing@gmail.com>

Closes #8060 from zsxwing/sql-metrics and squashes the following commits:

40f3fc1 [zsxwing] Mark LongSQLMetric private[metric] to avoid using incorrectly and leak memory
b1b9071 [zsxwing] Merge branch 'master' into sql-metrics
4bef25a [zsxwing] Add metrics for SortMergeOuterJoin
95ccfc6 [zsxwing] Merge branch 'master' into sql-metrics
67cb4dd [zsxwing] Add metrics for Project and TungstenProject; remove metrics from PhysicalRDD and LocalTableScan
0eb47d4 [zsxwing] Merge branch 'master' into sql-metrics
dd9d932 [zsxwing] Avoid creating new Iterators
589ea26 [zsxwing] Add metrics for all join and aggregate operators

(cherry picked from commit 5831294)
Signed-off-by: Yin Huai <yhuai@databricks.com>
@asfgit asfgit closed this in 5831294 Aug 11, 2015
@zsxwing zsxwing deleted the sql-metrics branch August 11, 2015 23:59
CodingCat pushed a commit to CodingCat/spark that referenced this pull request Aug 17, 2015
This PR added metrics for all join and aggregate operators. However, I found the metrics may be confusing in the following two case:
1. The iterator is not totally consumed and the metric values will be less.
2. Recreating the iterators will make metric values look bigger than the size of the input source, such as `CartesianProduct`.

Author: zsxwing <zsxwing@gmail.com>

Closes apache#8060 from zsxwing/sql-metrics and squashes the following commits:

40f3fc1 [zsxwing] Mark LongSQLMetric private[metric] to avoid using incorrectly and leak memory
b1b9071 [zsxwing] Merge branch 'master' into sql-metrics
4bef25a [zsxwing] Add metrics for SortMergeOuterJoin
95ccfc6 [zsxwing] Merge branch 'master' into sql-metrics
67cb4dd [zsxwing] Add metrics for Project and TungstenProject; remove metrics from PhysicalRDD and LocalTableScan
0eb47d4 [zsxwing] Merge branch 'master' into sql-metrics
dd9d932 [zsxwing] Avoid creating new Iterators
589ea26 [zsxwing] Add metrics for all join and aggregate operators
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants