-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
planner:dont push down right condition for anti semi join #12075
Conversation
90f89be
to
9ac89fe
Compare
Codecov Report
@@ Coverage Diff @@
## master #12075 +/- ##
================================================
- Coverage 80.0859% 79.9433% -0.1426%
================================================
Files 473 473
Lines 116787 116186 -601
================================================
- Hits 93530 92883 -647
- Misses 15943 15978 +35
- Partials 7314 7325 +11 |
9ac89fe
to
6910ffc
Compare
@@ -173,6 +173,12 @@ func (p *LogicalJoin) PredicatePushDown(predicates []expression.Expression) (ret | |||
p.OtherConditions = otherCond | |||
leftCond = leftPushCond | |||
rightCond = rightPushCond | |||
if p.JoinType == AntiSemiJoin { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might be too strict. Look at the explain-test(tpc-h q21)this PR has affected.
select
s_name,
count(*) as numwait
from
supplier,
lineitem l1,
orders,
nation
where
s_suppkey = l1.l_suppkey
and o_orderkey = l1.l_orderkey
and o_orderstatus = 'F'
and l1.l_receiptdate > l1.l_commitdate
and exists (
select
*
from
lineitem l2
where
l2.l_orderkey = l1.l_orderkey
and l2.l_suppkey <> l1.l_suppkey
)
and not exists (
select
*
from
lineitem l3
where
l3.l_orderkey = l1.l_orderkey
and l3.l_suppkey <> l1.l_suppkey
and l3.l_receiptdate > l3.l_commitdate
)
and s_nationkey = n_nationkey
and n_name = 'EGYPT'
group by
s_name
After this change, the subquery condition l3.l_receiptdate > l3.l_commitdate
cannot be pushed down. This can casue a performance regression.
Actually, although not exists
and not in
both use anti-semi-join
, but they are not totally equal. I think we'd better fix this issue for not in
, and not affect the behavior of not exists
.
Here are some reference you can take a look:
This is how mysql deals with exists strategy
.
https://dev.mysql.com/doc/refman/8.0/en/subquery-optimization-with-exists.html
And this is how SparkSQL uses a null aware anti join
for not in
(I prefer this method to rewrite the not in
condition). https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/2728434780191932/1483312212640900/6987336228780374/latest.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is how null aware anti join
works (quoted from Apache Spark):
Expand the NOT IN expression with the NULL-aware semantic
to its full form. That is from:
(a1,a2,...) = (b1,b2,...)
to
(a1=b1 OR isnull(a1=b1)) AND (a2=b2 OR isnull(a2=b2)) AND ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks,maybe I need to rehink about it. Please remove the review request.This commit is too hasty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can check the InOperand
field of RightConditions
's arguments to decide if we should disable pushdown for it. InOperand
is introduced to differentiate the AntiSemiJoin
/ LeftAntiSemiJoin
/ AntiLeftOuterSemiJoin
generated from NOT IN
and NOT EXISTS
.
Signed-off-by: jingyugao <1121087373@qq.com>
6910ffc
to
fb776a6
Compare
/rebuild |
Signed-off-by: jingyugao <1121087373@qq.com>
Signed-off-by: jingyugao <1121087373@qq.com>
Signed-off-by: jingyugao <1121087373@qq.com>
Signed-off-by: jingyugao <1121087373@qq.com>
└─Projection_22 10000.00 root cast(5_aux_0) | ||
└─HashLeftJoin_21 10000.00 root CARTESIAN left outer semi join, inner:TableReader_20 | ||
└─Projection_21 10000.00 root cast(5_aux_0) | ||
└─HashLeftJoin_20 10000.00 root CARTESIAN left outer semi join, inner:TableReader_19, other cond:eq(6, test.t2.c2) | ||
├─IndexReader_17 10000.00 root index:IndexScan_16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here the eq(6, test.t2.c2)
is moved into other condition because it is from 6 in (subq)
node [style=filled, color=lightgrey] | ||
color=black | ||
label = "cop" | ||
"Selection_13" -> "TableScan_12" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Selection_13
is eq(1,t2.c2)
which is from 1 in (subq)
@@ -123,12 +123,11 @@ set @@session.tidb_opt_insubq_to_join_and_agg=0; | |||
explain select 1 in (select c2 from t2) from t1; | |||
id count task operator info | |||
Projection_6 1999.00 root 5_aux_0 | |||
└─HashLeftJoin_7 1999.00 root CARTESIAN left outer semi join, inner:TableReader_14 | |||
└─HashLeftJoin_7 1999.00 root CARTESIAN left outer semi join, inner:TableReader_13, other cond:eq(1, test.t2.c2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eq(1, test.t2.c2)
is from 1 in (subq)
node [style=filled, color=lightgrey] | ||
color=black | ||
label = "cop" | ||
"Selection_13" -> "TableScan_12" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Selection_13
is from 1 in (subq)
executor/join_test.go
Outdated
tk.MustQuery("select 1 in (select b from t2) from t1").Check(testkit.Rows("<nil>")) | ||
tk.MustQuery("select 1 not in (select b from t2) from t1").Check(testkit.Rows("<nil>")) | ||
// TODO: this query will cause an index out of range panic | ||
// tk.MustQuery("select 1 not in (select null from t1) from t2").Check(testkit.Rows()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is another bug when executing physical_plan
,so I can't add this test case.
513bfc1
to
b7f4e5c
Compare
Signed-off-by: jingyugao <1121087373@qq.com>
Signed-off-by: jingyugao <1121087373@qq.com>
d190c25
to
f8b7619
Compare
@@ -299,7 +303,7 @@ func (p *LogicalJoin) extractOnCondition(conditions []expression.Expression, der | |||
// false even if t.a is null or s.a is null. To make this join "empty aware", | |||
// we should differentiate `t.a = s.a` from other column equal conditions, so | |||
// we put it into OtherConditions instead of EqualConditions of join. | |||
if binop.FuncName.L == ast.EQ && !arg0.InOperand && !arg1.InOperand { | |||
if binop.FuncName.L == ast.EQ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move the comments here to the front as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
expression/expression.go
Outdated
cols := make([]*Column, 0, 1) | ||
cols = ExtractColumnsFromExpressions(cols, sf.GetArgs(), isColumnInOperand) | ||
return len(cols) > 0 | ||
return exprsContainInOperand(sf.GetArgs()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is unnecessary now?
expression/util.go
Outdated
@@ -114,6 +114,22 @@ func ExtractColumns(expr Expression) []*Column { | |||
return extractColumns(result, expr, nil) | |||
} | |||
|
|||
func exprsContainInOperand(exprs []Expression) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
Some comments not about the main changes. |
if ok && mysql.HasNotNullFlag(lCol.GetType().Flag) && mysql.HasNotNullFlag(rCol.GetType().Flag) { | ||
// If both input columns of `!= all / = any` expression are not null, we can treat the expression | ||
// as normal column equal condition. | ||
if mysql.HasNotNullFlag(rCol.GetType().Flag) && expression.NotNull(larg) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conditions to get rid of in
are limited;
mysql> explain select (t.a in (select s.a from s)) is true from t;
+-------------------------+----------+-----------+-----------------------------------------------------------------------------------------+
| id | count | task | operator info |
+-------------------------+----------+-----------+-----------------------------------------------------------------------------------------+
| Projection_6 | 10000.00 | root | istrue(Column#8) |
| └─HashLeftJoin_7 | 10000.00 | root | CARTESIAN left outer semi join, inner:TableReader_11, other cond:eq(Column#1, Column#4) |
| ├─TableReader_9 | 10000.00 | root | data:TableScan_8 |
| │ └─TableScan_8 | 10000.00 | cop[tikv] | table:t, range:[-inf,+inf], keep order:false, stats:pseudo |
| └─TableReader_11 | 3.00 | root | data:TableScan_10 |
| └─TableScan_10 | 3.00 | cop[tikv] | table:s, range:[-inf,+inf], keep order:false, stats:pseudo |
+-------------------------+----------+-----------+-----------------------------------------------------------------------------------------+
6 rows in set (0.00 sec)
() is true
above the subquery can also get rid of in
, so that it can be converted to a inner join.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After considering scalar function, eq(Column#1, Column#4)
is treated as euqal condition now.But how can it be converted to inner join?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find that () is true
is the only special case where the in subquery
can be converted to exists subquery
. In other most cases, the in subquery
should return null, false or true in the projection clause.
So we may ignore the optimization for () is true
here.
Yes, code would be cleaner then. We can do it after this PR get merged. |
I find the converted conditions from
|
I also got the same way to store the null-aware conditions, and they cannot be pushed down. Adding a |
@fzhedu |
case *CorrelatedColumn: | ||
return mysql.HasNotNullFlag(l.GetType().Flag) | ||
case *ScalarFunction: | ||
for _, arg := range l.GetArgs() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though the current implementation may satisfy the InSubquery
's needs. But for the real NotNull
property of scalar function, we need to take the IsNull
sig in to special consideration at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, for example arg is true
is not null when arg is null.
Yes. The way in this PR and adding a |
This pr seems not change this behavior. Execute plan is same between master and this pr. |
I found the PR also has errors, please see #13743 |
@jingyugao, please update your pull request. |
1 similar comment
@jingyugao, please update your pull request. |
@jingyugao PR closed due to no update for a long time. Feel free to reopen it anytime. |
What problem does this PR solve?
TiDB will push down right condition for anti semi join.
But if we push down the right condition, we can't know why there is no matched rows, null or false.
Fix #12074.
What is changed and how it works?
For anti semi join don't push down the righ condition.
Check List
Tests
Code changes
Simple change.
Side effects
After this pr,we can't push down the right condition for anti semi join.
So this might decrease the performance.
Related changes
Maybe not
Release note