-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
plan: convert in subquery to agg and inner join #7531
Conversation
First one is unfolding, second is rewriting, third is do nothing. Another test in TPCH Query 18. Though due to the join order and not so good row count estimation. This rule currently cannot take effect on it. |
plan/rule_aggregation_elimination.go
Outdated
return nil | ||
} | ||
|
||
func (a *aggregationEliminater) convertAggToProj(agg *LogicalAggregation) *LogicalProjection { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most methods are copied from aggregate push down
. I think it's not good to leave them as function that anyone can call.
Extract a struct that can be used both by aggregate elimination
and aggregate push down
, make them as method of this struct ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, can we set aggregationOptimizer
as an anonymous field of aggregationEliminater
? then we only need to override the optimize
method for aggregationEliminater
.
aggregationOptimizer
also handles the aggregate elimination, why not use it here? because AllowAggPushDown
is false by default?
Extract a struct that can be used both by aggregate elimination and aggregate push down, make them as method of this struct ?
IMHO, aggregationOptimizer
should be this struct, and we should add an aggregationPushDownSolver
as subclass of it to hold the aggregate pushdown logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aggregationOptimizer
also handles the aggregate elimination, why not use it here? becauseAllowAggPushDown
is false by default?
Yes, that one is mainly to push down aggregation. The elimination is done there for convenience and due to some limitation the current planner has.
IMHO,
aggregationOptimizer
should be this struct, and we should add anaggregationPushDownSolver
as subclass of it to hold the aggregate pushdown logic.
But there's no overriding in golang. So we cannot make one containing another one. They both have a recursive method optimize
to implement logicalOptRule
interface.
@eurekaka PTAL |
plan/optimizer.go
Outdated
flagDecorrelate | ||
flagEliminateProjection2 | ||
flagMaxMinEliminate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why flagEliminateProjection2 is applied after flagDecorrelate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems i can just place it after the ElimiateAgg
before the Decorrelate
. This will be done in the newly coming commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok tpch Q20's result will change if we change the position.
This should be caused that we don't maintain the KeyInfo
very carefully.
So move it after the decorrelation.
plan/rule_aggregation_elimination.go
Outdated
return nil | ||
} | ||
|
||
func (a *aggregationEliminater) convertAggToProj(agg *LogicalAggregation) *LogicalProjection { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, can we set aggregationOptimizer
as an anonymous field of aggregationEliminater
? then we only need to override the optimize
method for aggregationEliminater
.
aggregationOptimizer
also handles the aggregate elimination, why not use it here? because AllowAggPushDown
is false by default?
Extract a struct that can be used both by aggregate elimination and aggregate push down, make them as method of this struct ?
IMHO, aggregationOptimizer
should be this struct, and we should add an aggregationPushDownSolver
as subclass of it to hold the aggregate pushdown logic.
plan/expression_rewriter.go
Outdated
return v, true | ||
// If it's not the form of `not in (SUBQUERY)`, has no correlated column and don't need to append a scalar value. We can rewrite it to inner join. | ||
if er.ctx.GetSessionVars().AllowInSubqueryRewriting && !v.Not && !asScalar && len(np.extractCorrelatedCols()) == 0 { | ||
// We need to try to eliminate the agg and the projection produced by this operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused about the expressionRewriter::asScalar
field, could you please shed some light on the purpose of this field? thx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can take a look at buildSemiJoin
. The asScalar
indicates that whether this semi join need to be considered as scalar and output a special row to the plan above it.
@@ -5,7 +5,6 @@ create table t2 (c1 int unique, c2 int); | |||
insert into t2 values(1, 0), (2, 1); | |||
create table t3 (a bigint, b bigint, c bigint, d bigint); | |||
create table t4 (a int, b int, c int, index idx(a, b), primary key(a)); | |||
set @@session.tidb_opt_insubquery_unfold = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to keep this test and add another test which disables tidb_opt_insubquery_unfold
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but this variable is removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I just noticed that after reviewing all the code changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just changed the name. We still test both open it and close it.
sessionctx/variable/tidb_vars.go
Outdated
// tidb_opt_insubquery_unfold is used to enable/disable the optimizer rule of in subquery unfold. | ||
TiDBOptInSubqUnFolding = "tidb_opt_insubquery_unfold" | ||
// tidb_opt_insubquery_rewriting is used to enable/disable the optimizer rule of rewriting IN subquery. | ||
TiDBOptInSubqRewriting = "tidb_opt_insubquery_rewriting" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about:
s/tidb_opt_insubquery_rewriting/tidb_opt_insubquery_to_innerjoin/
s/TiDBOptInSubqRewriting/TiDBOptInSubqueryToInnerJoin/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actual form is inner join together with an aggregation
. So ToInnerJoin
is not complete but InnerJoinAndAgg
it too long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "tidb_opt_insubquery_to_innerjoin" is clearer than "tidb_opt_insubquery_rewriting", because the key information about this transformation is displayed in the variable name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe "tidb_opt_insubquery_to_innerjoin" can be shorten to "tidb_insubq_to_innerjoin"
sessionctx/variable/session.go
Outdated
// AllowInSubqueryUnFolding can be set to true to fold in subquery | ||
AllowInSubqueryUnFolding bool | ||
// AllowInSubqueryRewriting can be set to false to forbid rewriting the semi join to inner join with agg. | ||
AllowInSubqueryRewriting bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about s/AllowInSubqueryRewriting/InSubqueryToInnerJoin/
plan/rule_aggregation_elimination.go
Outdated
@@ -0,0 +1,129 @@ | |||
// Copyright 2018 PingCAP, Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#7676 already did the same thing, this patch can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be done when merge that into this. Or the test result will be strange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let's get #7676 merged as soon as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we should merge 7680 first...
/run-common-test |
/run-all-tests |
@winoros please merge master && resolve conflicts. |
@zz-jason updated |
/run-all-tests |
LGTM |
@eurekaka @lamxTyler PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@winoros Please resolve the conflict. |
./run-all-tests |
What problem does this PR solve?
Implement #7205.
What is changed and how it works?
I decided to unfold the subquery first. And implement this one in future new planner. But in testing, unfolding subquery's performance the far more worse than rewriting.(Listed below).
So decides to implement it in current planner.
This rewriting will cause new aggregation or projection. So add a aggregate elimination rule and one more project elimination.
Note that we can only do one
project elimination
, just at the place where we add the new project elimination. Only do once will change the column name in some explain result, though won't change the plan structure. I've not decided which one is better.Check List
Tests
Related changes
Rewrite in subquery to inner join with aggregation
.