-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: add rule to fold two grouping operators into one #49627
Conversation
Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR. My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool! I left a few questions for my own understanding.
It's outside the scope of this PR, but I think there's another case that can be folded: when there is an outer aggregate function (like max or min) on the grouped inner column. For example:
SELECT max(a), sum(b) FROM (SELECT x, sum(y) FROM r GROUP BY x) AS s(a, b)
=>
SELECT max(x), sum(y) FROM r
Not sure if this case is worth handling in the future, but thought I'd point it out.
Reviewed 1 of 7 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball)
pkg/sql/opt/operator.go, line 374 at r1 (raw file):
case CountOp, CountRowsOp: return outer == SumIntOp
So you're only checking for SumIntOp
here because outer
could never be a SumOp
because CountOp
and CountRowsOp
always result in integers. Is my understanding correct?
pkg/sql/opt/norm/groupby_funcs.go, line 321 at r1 (raw file):
innerAgg := innerAggs[i].Agg if !opt.IsAggregateOp(innerAgg) { return false
When would an innerAgg or outerAgg be something other than an AggregateOp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner)
pkg/sql/opt/operator.go, line 374 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
So you're only checking for
SumIntOp
here becauseouter
could never be aSumOp
becauseCountOp
andCountRowsOp
always result in integers. Is my understanding correct?
Yes, exactly.
pkg/sql/opt/norm/groupby_funcs.go, line 321 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
When would an innerAgg or outerAgg be something other than an AggregateOp?
They could be AggFilters or AggDistincts. These could be allowed in the future but I wanted to keep things simple for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. I wonder what the motivation for this was, did you encounter a case like this?
Also, I want to commend you on your use of "effects" haha
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball and @mgartner)
pkg/sql/opt/norm/groupby_funcs.go, line 321 at r1 (raw file):
Previously, DrewKimball (Andrew Kimball) wrote…
They could be AggFilters or AggDistincts. These could be allowed in the future but I wanted to keep things simple for this PR.
It's worth adding a comment, mentioning AggFilters/AggDistinct (or checking for those particular ops) to make things more clear.
pkg/sql/opt/norm/rules/groupby.opt, line 378 at r1 (raw file):
# ignore orderings). [FoldGroupingOperators, Normalize] (GroupBy | ScalarGroupBy | DistinctOn
We document FirstAgg as only being used in DistinctOn. I don't think we can execbuild an expression that has a FirstAgg inside a GroupBy. Conversely, DistinctOn doesn't support anything other than ConstAgg and FirstAgg. I think we should remove FirstAgg and DistinctOn altogether from this rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner)
pkg/sql/opt/norm/groupby_funcs.go, line 321 at r1 (raw file):
Previously, RaduBerinde wrote…
It's worth adding a comment, mentioning AggFilters/AggDistinct (or checking for those particular ops) to make things more clear.
Done.
pkg/sql/opt/norm/rules/groupby.opt, line 378 at r1 (raw file):
Previously, RaduBerinde wrote…
We document FirstAgg as only being used in DistinctOn. I don't think we can execbuild an expression that has a FirstAgg inside a GroupBy. Conversely, DistinctOn doesn't support anything other than ConstAgg and FirstAgg. I think we should remove FirstAgg and DistinctOn altogether from this rule.
Done.
Thank you for updating your pull request. My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
Once the joins are removed from the query at line 593 in xform/testdata/external/trading, we end up with a ScalarGroupBy on top of a GroupBy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
That's a good idea, I hadn't thought of that. Now that you mention it, I think you could also fold any outer aggregate on the grouped inner column if it has an AggDistinct. |
I think that makes sense. Do you mean something like this:
|
Yes, exactly. Although it's possible that there's some rule that would remove the AggDistinct. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Just a few nits/questions.
Reviewed 3 of 7 files at r1, 3 of 3 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball)
pkg/sql/opt/operator.go, line 358 at r2 (raw file):
// // SELECT sum(s) FROM (SELECT sum(y) FROM xy GROUP BY x) AS f(s); // SELECT sum(y) FROM xyz;
[nit] xyz -> xy (here and below)
pkg/sql/opt/operator.go, line 374 at r2 (raw file):
case CountOp, CountRowsOp: return outer == SumIntOp
Is there a reason it can't also be SumOp
?
pkg/sql/opt/norm/groupby_funcs.go, line 357 at r2 (raw file):
// MergeAggs returns an AggregationsExpr that is equivalent to the two given // AggregationsExprs.
[nit] add a note that this will panic if CanMergeAggs
is false.
pkg/sql/opt/norm/groupby_funcs.go, line 370 at r2 (raw file):
// For each outer aggregate, construct a new aggregate that takes the Agg // field of the referenced inner aggregate and the Col field of the outer // aggregate.
[nit] can you explain why you need the Agg from the inner and Col from the input to the outer?
pkg/sql/opt/norm/rules/groupby.opt, line 357 at r2 (raw file):
# 6 6 5 # # Its functional dependencies: key(r), r->(a, b, c), a->(b, c)
c is not included in the table -- these FDs seem wrong
pkg/sql/opt/norm/testdata/rules/groupby, line 3023 at r2 (raw file):
# Case with a GroupBy on a GroupBy. norm expect=FoldGroupingOperators SELECT sum_int(s) FROM (SELECT y, sum_int(x) AS s FROM xy GROUP BY y) GROUP BY y
Do you have any tests where the inner and outer group by columns are different but the inner still determines the outer? How about a test with more than one group by column?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 7 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)
pkg/sql/opt/operator.go, line 358 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] xyz -> xy (here and below)
Done.
pkg/sql/opt/operator.go, line 374 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Is there a reason it can't also be
SumOp
?
CountOp and CountRowsOp always produce ints, while (I believe) SumOp produces a decimal number. So, replacing a SumOp on a CountOp would change the outputted type from decimal to int. I'll add a comment.
pkg/sql/opt/norm/groupby_funcs.go, line 357 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] add a note that this will panic if
CanMergeAggs
is false.
Done.
pkg/sql/opt/norm/groupby_funcs.go, line 370 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] can you explain why you need the Agg from the inner and Col from the input to the outer?
Done.
pkg/sql/opt/norm/rules/groupby.opt, line 357 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
c is not included in the table -- these FDs seem wrong
Right, I forgot to prune those after removing the c column from the table. Done.
pkg/sql/opt/norm/testdata/rules/groupby, line 3023 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Do you have any tests where the inner and outer group by columns are different but the inner still determines the outer? How about a test with more than one group by column?
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)
pkg/sql/opt/operator.go, line 374 at r2 (raw file):
Previously, DrewKimball (Andrew Kimball) wrote…
CountOp and CountRowsOp always produce ints, while (I believe) SumOp produces a decimal number. So, replacing a SumOp on a CountOp would change the outputted type from decimal to int. I'll add a comment.
Doh. I left the same question yesterday... I should have suggested a comment then. This is definitely comment-worthy, so thanks for adding it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 6 files at r3.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball)
pkg/sql/opt/norm/groupby_funcs.go, line 373 at r3 (raw file):
// every inner-outer aggregate pair forms a valid decomposition for the // inner aggregate. The column from the outer aggregate has to be used to // preserve logical equivalency.
It would also help to add a reminder that you use the inner Agg field since both inner and outer are the same in most cases, but for count/count rows you need the inner. You can also add a reference to see opt.AggregatesCanMerge
for details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @rytaft)
pkg/sql/opt/norm/groupby_funcs.go, line 373 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
It would also help to add a reminder that you use the inner Agg field since both inner and outer are the same in most cases, but for count/count rows you need the inner. You can also add a reference to see
opt.AggregatesCanMerge
for details.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @DrewKimball)
pkg/sql/opt/norm/groupby_funcs.go, line 374 at r4 (raw file):
// inner aggregate. In most cases, the inner and outer aggregates are the // same, but in the count and count-rows cases the inner aggregate must // be used (see opt.aggregatesCanMerge for details). The column from the
[nit] opt.aggregatesCanMerge -> opt.AggregatesCanMerge
Previously, the optimizer could not fold two grouping operators into a single equivalent grouping operator. This patch adds a rule that effects this transformation under the following conditions: 1. All of the outer aggregates are aggregating on the output columns of the inner aggregates. 2. Every inner-outer aggregation pair can be replaced with an equivalent single aggregate. 3. The inner grouping columns functionally determine the outer grouping columns. 4. Both grouping operators are unordered. As an example, the following query pairs are equivalent: ``` SELECT sum(t) FROM (SELECT sum(b) FROM ab GROUP BY a) AS g(t); SELECT sum(b) FROM ab; SELECT max(t) FROM (SELECT max(b) FROM ab GROUP BY a) AS g(t); SELECT max(b) FROM ab; SELECT sum_int(t) FROM (SELECT count(b) FROM ab GROUP BY a) AS g(t); SELECT count(b) FROM ab; ``` This situation is rare in direct SQL queries, but can arise when composing views and queries. Release note (sql change): The optimizer can now fold two grouping operators together when they are aggregating over functions like sum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @rytaft)
pkg/sql/opt/norm/groupby_funcs.go, line 374 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] opt.aggregatesCanMerge -> opt.AggregatesCanMerge
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r5.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale)
bors r+ |
Build succeeded |
Previously, the optimizer could not fold two grouping operators into
a single equivalent grouping operator.
This patch adds a rule that effects this transformation under the
following conditions:
of the inner aggregates.
equivalent single aggregate.
grouping columns.
As an example, the following query pairs are equivalent:
This situation is rare in direct SQL queries, but can arise when
composing views and queries.
Release note (sql change): The optimizer can now fold two grouping
operators together when they are aggregating over functions like sum.