-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TiDB logical optimization: extend outer join elimination #7559
Comments
If there are no other idle people, I can try to implement it. |
this proposal is still being reviewed. We need all agree on design first and then implement it. We are really happy if you could jump in and contribute to our project. |
@tianjiqx The first example: TiDB(localhost:4000) > desc select t1.c1,t2.c1,t3.c1 from t1 left join t2 on t1.c1=t2.c2 left join t3 on t2.c1=t3.c2 where t3.c3< 5;
+------------------------------+----------+------+---------------------------------------------------------------------------+
| id | count | task | operator info |
+------------------------------+----------+------+---------------------------------------------------------------------------+
| Projection_8 | 5192.71 | root | test.t1.c1, test.t2.c1, test.t3.c1 |
| └─HashLeftJoin_9 | 5192.71 | root | inner join, inner:TableReader_18, equal:[eq(test.t2.c1, test.t3.c2)] |
| ├─HashLeftJoin_11 | 12500.00 | root | left outer join, inner:TableReader_15, equal:[eq(test.t1.c1, test.t2.c2)] |
| │ ├─TableReader_13 | 10000.00 | root | data:TableScan_12 |
| │ │ └─TableScan_12 | 10000.00 | cop | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo |
| │ └─TableReader_15 | 10000.00 | root | data:TableScan_14 |
| │ └─TableScan_14 | 10000.00 | cop | table:t2, range:[-inf,+inf], keep order:false, stats:pseudo |
| └─TableReader_18 | 3323.33 | root | data:Selection_17 |
| └─Selection_17 | 3323.33 | cop | lt(test.t3.c3, 5) |
| └─TableScan_16 | 10000.00 | cop | table:t3, range:[-inf,+inf], keep order:false, stats:pseudo |
+------------------------------+----------+------+---------------------------------------------------------------------------+
10 rows in set (0.00 sec) Because the filter
The two filters are:
With the derived filter For now, the filter derivation is not adopted, so the second outer join simplification failed. This can be improved in the current planner. The filter derivation based on the join condition has already proposed by @bb7133 in this PR: #7276 |
@zhexuany This optimization can be done in the current planner, no need to wait for the new planner being implemented. |
Good |
@tianjiqx It would be great if you can investigate the code logic about the predicate push down on the |
Addtional: 1. a predicate containing two tables on where is also a null value rejection.
tidb is not processed at present.
It can be rewritten into the following form(reduce left join and from where push down to join on condition). =>
this SQL's explain plan is in PostgreSQL(hash join is represent inner join, otherwise hash right join/hash left join).
sometimes it can be pushed down to the base table by equivalent analogy.
2. Maybe another thing to note is that the use of () changes the join order.
PostgreSQL can reduce all left join:
reduce left join process like this:
tidb retains one left join:
|
Sorry, I am a little busy recently. Only now can I reply to you. @zz-jason I review the code logic about the predicate push down on the join operator in PostgreSQL(include reduce_outer_joins() and distribute_qual_to_rels()).However, I don't fully understand it.It seems that it first completes the outer join elimination and then makes the predicate push down. I want to illustrate the elimination of outer join by examples. All of the following of push down case are correct in tidb. about predicate(filter) push downa predicate can appear in four places
a predicate(filter) can mounted on the three places
(I) for inner join:
It means:
(II) for left join:
|
I analyzed the process of predicates push down in TiDB and then tried to improve the outer join elimination.It seems to work, hopefully bug free. I didn't change too much code and only did two things.
func simplifyOuterJoin(p *LogicalJoin, predicates []expression.Expression) {
if p.JoinType != LeftOuterJoin && p.JoinType != RightOuterJoin && p.JoinType != InnerJoin {
return
}
innerTable := p.children[0]
outerTable := p.children[1]
if p.JoinType == LeftOuterJoin {
innerTable, outerTable = outerTable, innerTable
}
var fullConditions []expression.Expression
// first simplify embedded outer join.
// When trying to simplify an embedded outer join operation in a query,
// we must take into account the join condition for the embedding outer join together with the WHERE condition.
if innerPlan, ok := innerTable.(*LogicalJoin); ok {
fullConditions = concatOnAndWhereConds(p, predicates)
simplifyOuterJoin(innerPlan, fullConditions)
}
//del by qx [simplify outerJoin] 20180906 :b
/*
if outerPlan, ok := outerTable.(*LogicalJoin); ok {
if fullConditions != nil {
fullConditions = concatOnAndWhereConds(p, predicates)
}
simplifyOuterJoin(outerPlan, fullConditions)
}
*/
//del :e
if p.JoinType == InnerJoin {
// also can generate join column is not null condition,omitted // add by qx 20180906
return
}
// then simplify embedding outer join.
canBeSimplified := false
for _, expr := range predicates {
isOk := isNullRejected(p.ctx, innerTable.Schema(), expr)
if isOk {
canBeSimplified = true
break
}
}
//add by qx [simplify outerJoin] 20180906:b
var nullExprs = make([]expression.Expression, 0, 0)
//add :e
if canBeSimplified {
p.JoinType = InnerJoin
//add by qx [simplify outerJoin] 20180906:b
//generate join column is not null predicate
for _, expr := range p.EqualConditions {
for _, arg := range expr.GetArgs() {
col, ok := arg.(*expression.Column)
if ok {
args := make([]expression.Expression, 2)
args[0] = col
args[1] = expression.Null
nullExpr := expression.NewFunctionInternal(p.ctx, "ne", types.NewFieldType(mysql.TypeTiny), args...)
nullExprs = append(nullExprs, nullExpr)
//predicates = append(predicates,nullExpr) // can't work
}
}
}
//add :e
}
//add by qx [simplify outerJoin] 20180906:b
if outerPlan, ok := outerTable.(*LogicalJoin); ok {
if fullConditions != nil {
fullConditions = concatOnAndWhereConds(p, predicates)
}
if len(nullExprs) != 0 {
if fullConditions != nil {
fullConditions = append(fullConditions, nullExprs...)
} else {
fullConditions = nullExprs
}
}
simplifyOuterJoin(outerPlan, fullConditions)
}
//add :e
} Results: What I have changed is not perfect.
|
@tianjiqx Could you please open a pull request for your code change? GitHub issue is not that friendly for following code changes. Thanks. |
@eurekaka |
It seems that these examples have been resolved in the current version. MySQL [test]> select tidb_version();
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------+
| tidb_version()
|
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------+
| Release Version: v4.0.0-beta.2-135-g2a31878c5
Git Commit Hash: 2a31878c530c050775a67ca8beb1c819e80c1764
Git Branch: master
UTC Build Time: 2020-03-31 07:58:52
GoVersion: go1.14.1
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------+
1 row in set (0.00 sec)
MySQL [test]>
MySQL [test]> -- example 1
MySQL [test]> explain select t1.c1, t2.c1, t3.c1 from t1 left join t2 on t1.c1=t2.c2 left join t3 on t2.c1=t3.c2 where t3.c3<5;
+--------------------------------+---------+-----------+---------------+------------------------------------------------+
| id | estRows | task | access object | operator info |
+--------------------------------+---------+-----------+---------------+------------------------------------------------+
| HashJoin_9 | 1.56 | root | | inner join, equal:[eq(test.t2.c1, test.t3.c2)] |
| ├─TableReader_29(Build) | 1.00 | root | | data:Selection_28 |
| │ └─Selection_28 | 1.00 | cop[tikv] | | lt(test.t3.c3, 5), not(isnull(test.t3.c2)) |
| │ └─TableFullScan_27 | 3.00 | cop[tikv] | table:t3 | keep order:false, stats:pseudo |
| └─HashJoin_20(Probe) | 5.00 | root | | inner join, equal:[eq(test.t1.c1, test.t2.c2)] |
| ├─TableReader_24(Build) | 4.00 | root | | data:Selection_23 |
| │ └─Selection_23 | 4.00 | cop[tikv] | | not(isnull(test.t2.c2)) |
| │ └─TableFullScan_22 | 4.00 | cop[tikv] | table:t2 | keep order:false, stats:pseudo |
| └─TableReader_26(Probe) | 5.00 | root | | data:TableFullScan_25 |
| └─TableFullScan_25 | 5.00 | cop[tikv] | table:t1 | keep order:false, stats:pseudo |
+--------------------------------+---------+-----------+---------------+------------------------------------------------+
10 rows in set (0.00 sec)
MySQL [test]>
MySQL [test]> -- example 2
MySQL [test]> explain select t1.c1,t3.c1,t4.c1 from t1 left join t3 on t1.c1=t3.c2 left join t4 on t1.c1=t4.c2 where t4.c3<5;
+--------------------------------+---------+-----------+---------------+-----------------------------------------------------+
| id | estRows | task | access object | operator info |
+--------------------------------+---------+-----------+---------------+-----------------------------------------------------+
| HashJoin_9 | 1.66 | root | | inner join, equal:[eq(test.t1.c1, test.t4.c2)] |
| ├─TableReader_20(Build) | 1.33 | root | | data:Selection_19 |
| │ └─Selection_19 | 1.33 | cop[tikv] | | lt(test.t4.c3, 5), not(isnull(test.t4.c2)) |
| │ └─TableFullScan_18 | 4.00 | cop[tikv] | table:t4 | keep order:false, stats:pseudo |
| └─HashJoin_11(Probe) | 5.00 | root | | left outer join, equal:[eq(test.t1.c1, test.t3.c2)] |
| ├─TableReader_17(Build) | 3.00 | root | | data:Selection_16 |
| │ └─Selection_16 | 3.00 | cop[tikv] | | not(isnull(test.t3.c2)) |
| │ └─TableFullScan_15 | 3.00 | cop[tikv] | table:t3 | keep order:false, stats:pseudo |
| └─TableReader_14(Probe) | 5.00 | root | | data:TableFullScan_13 |
| └─TableFullScan_13 | 5.00 | cop[tikv] | table:t1 | keep order:false, stats:pseudo |
+--------------------------------+---------+-----------+---------------+-----------------------------------------------------+
10 rows in set (0.00 sec)
MySQL [test]>
MySQL [test]> -- example 3
MySQL [test]> explain select t1.c1, t3.* from t1 left join t3 on t1.c1=t3.c2 inner join t5 on t3.c1=t5.c2;
+------------------------------------+---------+-----------+---------------+------------------------------------------------+
| id | estRows | task | access object | operator info |
+------------------------------------+---------+-----------+---------------+------------------------------------------------+
| Projection_10 | 4.68 | root | | test.t1.c1, test.t3.c1, test.t3.c2, test.t3.c3 |
| └─HashJoin_21 | 4.68 | root | | inner join, equal:[eq(test.t3.c2, test.t1.c1)] |
| ├─HashJoin_34(Build) | 3.75 | root | | inner join, equal:[eq(test.t3.c1, test.t5.c2)] |
| │ ├─TableReader_40(Build) | 3.00 | root | | data:Selection_39 |
| │ │ └─Selection_39 | 3.00 | cop[tikv] | | not(isnull(test.t3.c2)) |
| │ │ └─TableFullScan_38 | 3.00 | cop[tikv] | table:t3 | keep order:false, stats:pseudo |
| │ └─TableReader_37(Probe) | 5.00 | root | | data:Selection_36 |
| │ └─Selection_36 | 5.00 | cop[tikv] | | not(isnull(test.t5.c2)) |
| │ └─TableFullScan_35 | 5.00 | cop[tikv] | table:t5 | keep order:false, stats:pseudo |
| └─TableReader_42(Probe) | 5.00 | root | | data:TableFullScan_41 |
| └─TableFullScan_41 | 5.00 | cop[tikv] | table:t1 | keep order:false, stats:pseudo |
+------------------------------------+---------+-----------+---------------+------------------------------------------------+
11 rows in set (0.00 sec)
MySQL [test]>
MySQL [test]> -- example 4
MySQL [test]> explain select * from t1 left join t2 on t1.c1=t2.c2 where t1.c3=t2.c3;
+------------------------------+---------+-----------+---------------+---------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+------------------------------+---------+-----------+---------------+---------------------------------------------------------------------------+
| HashJoin_18 | 4.99 | root | | inner join, equal:[eq(test.t1.c1, test.t2.c2) eq(test.t1.c3, test.t2.c3)] |
| ├─TableReader_22(Build) | 3.99 | root | | data:Selection_21 |
| │ └─Selection_21 | 3.99 | cop[tikv] | | not(isnull(test.t2.c2)), not(isnull(test.t2.c3)) |
| │ └─TableFullScan_20 | 4.00 | cop[tikv] | table:t2 | keep order:false, stats:pseudo |
| └─TableReader_25(Probe) | 5.00 | root | | data:Selection_24 |
| └─Selection_24 | 5.00 | cop[tikv] | | not(isnull(test.t1.c3)) |
| └─TableFullScan_23 | 5.00 | cop[tikv] | table:t1 | keep order:false, stats:pseudo |
+------------------------------+---------+-----------+---------------+---------------------------------------------------------------------------+
7 rows in set (0.00 sec)
MySQL [test]>
MySQL [test]> -- example 5
MySQL [test]> explain select * from (t1 left join t2 on t1.c1=t2.c2) left join (t3 left join t4 on t3.c2=t4.c1) on t2.c2=t3.c2 where t4.c2<10;
+----------------------------------+---------+-----------+---------------+------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+----------------------------------+---------+-----------+---------------+------------------------------------------------------------------------------+
| HashJoin_11 | 2.60 | root | | inner join, equal:[eq(test.t2.c2, test.t3.c2)] |
| ├─IndexMergeJoin_37(Build) | 1.66 | root | | inner join, inner:TableReader_35, outer key:test.t3.c2, inner key:test.t4.c1 |
| │ ├─TableReader_44(Build) | 3.00 | root | | data:Selection_43 |
| │ │ └─Selection_43 | 3.00 | cop[tikv] | | not(isnull(test.t3.c2)) |
| │ │ └─TableFullScan_42 | 3.00 | cop[tikv] | table:t3 | keep order:false, stats:pseudo |
| │ └─TableReader_35(Probe) | 0.33 | root | | data:Selection_34 |
| │ └─Selection_34 | 0.33 | cop[tikv] | | lt(test.t4.c2, 10) |
| │ └─TableRangeScan_33 | 1.00 | cop[tikv] | table:t4 | range: decided by [test.t3.c2], keep order:true, stats:pseudo |
| └─HashJoin_22(Probe) | 5.00 | root | | inner join, equal:[eq(test.t1.c1, test.t2.c2)] |
| ├─TableReader_26(Build) | 4.00 | root | | data:Selection_25 |
| │ └─Selection_25 | 4.00 | cop[tikv] | | not(isnull(test.t2.c2)) |
| │ └─TableFullScan_24 | 4.00 | cop[tikv] | table:t2 | keep order:false, stats:pseudo |
| └─TableReader_28(Probe) | 5.00 | root | | data:TableFullScan_27 |
| └─TableFullScan_27 | 5.00 | cop[tikv] | table:t1 | keep order:false, stats:pseudo |
+----------------------------------+---------+-----------+---------------+------------------------------------------------------------------------------+
14 rows in set (0.00 sec) |
TiDB(Release Version: v2.1.0-beta-171-g7223353) support outer join elimination, but it can be further expanded.
The text was updated successfully, but these errors were encountered: