-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix](nereids) pull up left join right predicate with or is null #58372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
6c2e7b9 to
3fdb1c7
Compare
|
run buildall |
TPC-H: Total hot run time: 34257 ms |
TPC-DS: Total hot run time: 181245 ms |
ClickBench: Total hot run time: 27.86 s |
FE Regression Coverage ReportIncrement line coverage |
a051974 to
7658f89
Compare
|
run buildall |
|
|
||
| // test left join right table predicate pull up | ||
| qt_leftjoin_right_pull_up_shape """ | ||
| explain shape plan select * from extend_infer_t3 t1 left join extend_infer_t4 t2 on t1.a=t2.a left join extend_infer_t5 t3 on t2.a= t3.a where t1.a=1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add more cases:
- more t1 where condition:
t1.a is not null,t1.a in (1, 2) - more table join: t1 left join t2 left join t3 left join t4 ... and condition is the middle table
where t2.a = 1
FE UT Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 34239 ms |
TPC-DS: Total hot run time: 183939 ms |
ClickBench: Total hot run time: 27.24 s |
FE Regression Coverage ReportIncrement line coverage |
7658f89 to
1d1c12b
Compare
|
run buildall |
TPC-H: Total hot run time: 34465 ms |
TPC-DS: Total hot run time: 183958 ms |
ClickBench: Total hot run time: 27.95 s |
|
run buildall |
TPC-H: Total hot run time: 34296 ms |
TPC-DS: Total hot run time: 182316 ms |
ClickBench: Total hot run time: 27.39 s |
FE Regression Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
…che#58372) Related PR: apache#41731 Problem Summary: The optimizer cannot derive predicates across multiple LEFT JOINs. For example, given a filter on the leftmost table in a chain of LEFT JOINs, the optimizer should be able to derive predicates on the rightmost table, but it currently fails to do so. create table t1(a int, b int); create table t2(a int, b int); create table t3(a int, b int); insert into t1 values(1,2); insert into t2 values(1,2); insert into t3 values(1,2); insert into t3 values(null,2); explain logical plan select * from t1 left join t2 on t1.a=t2.a left join t3 on t2.a=t3.a where t1.a=1; LogicalResultSink[110] ( outputExprs=[a#0, b#1, a#2, b#3, a#4, b#5] ) +--LogicalProject[109] ( distinct=false, projects=[a#0, b#1, a#2, b#3, a#4, b#5] ) +--LogicalJoin[108] ( type=LEFT_OUTER_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(a#2 = a#4)], otherJoinConjuncts=[], markJoinConjuncts=[] ) |--LogicalProject[105] ( distinct=false, projects=[a#0, b#1, a#2, b#3] ) | +--LogicalJoin[104] ( type=LEFT_OUTER_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(a#0 = a#2)], otherJoinConjuncts=[], markJoinConjuncts=[] ) | |--LogicalFilter[101] ( predicates=(a#0 = 1) ) | | +--LogicalOlapScan ( qualified=internal.maldb.t1, indexName=<index_not_selected>, selectedIndexId=1764043369852, preAgg=ON, operativeCol=[a#0], virtualColumns=[] ) | +--LogicalFilter[103] ( predicates=(a#2 = 1) ) | +--LogicalOlapScan ( qualified=internal.maldb.t2, indexName=<index_not_selected>, selectedIndexId=1764043369875, preAgg=ON, operativeCol=[a#2], virtualColumns=[] ) +--LogicalOlapScan ( qualified=internal.maldb.t3, indexName=<index_not_selected>, selectedIndexId=1764043369898, preAgg=ON, operativeCol=[a#4], virtualColumns=[] ) The optimizer should derive t3.a=1 from t1.a=1 and the join conditions, but it currently doesn't. The root cause is that the PullUpPredicates rule doesn't properly handle predicate pull-up from the right side of LEFT JOINs. This PR fixes this by generating null-tolerant predicates when pulling up from RIGHT JOIN's right table and strengthening them when possible based on upper-level join conditions. after this pr: LogicalResultSink[110] ( outputExprs=[a#0, b#1, a#2, b#3, a#4, b#5] ) +--LogicalProject[109] ( distinct=false, projects=[a#0, b#1, a#2, b#3, a#4, b#5] ) +--LogicalJoin[108] ( type=LEFT_OUTER_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(a#2 = a#4)], otherJoinConjuncts=[], markJoinConjuncts=[] ) |--LogicalProject[105] ( distinct=false, projects=[a#0, b#1, a#2, b#3] ) | +--LogicalJoin[104] ( type=LEFT_OUTER_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(a#0 = a#2)], otherJoinConjuncts=[], markJoinConjuncts=[] ) | |--LogicalFilter[101] ( predicates=(a#0 = 1) ) | | +--LogicalOlapScan ( qualified=internal.maldb.t1, indexName=<index_not_selected>, selectedIndexId=1764043369852, preAgg=ON, operativeCol=[a#0], virtualColumns=[] ) | +--LogicalFilter[103] ( predicates=(a#2 = 1) ) | +--LogicalOlapScan ( qualified=internal.maldb.t2, indexName=<index_not_selected>, selectedIndexId=1764043369875, preAgg=ON, operativeCol=[a#2], virtualColumns=[] ) +--LogicalFilter[107] ( predicates=(a#4 = 1) ) +--LogicalOlapScan ( qualified=internal.maldb.t3, indexName=<index_not_selected>, selectedIndexId=1764043369898, preAgg=ON, operativeCol=[a#4], virtualColumns=[] )
…che#58372) Related PR: apache#41731 Problem Summary: The optimizer cannot derive predicates across multiple LEFT JOINs. For example, given a filter on the leftmost table in a chain of LEFT JOINs, the optimizer should be able to derive predicates on the rightmost table, but it currently fails to do so. create table t1(a int, b int); create table t2(a int, b int); create table t3(a int, b int); insert into t1 values(1,2); insert into t2 values(1,2); insert into t3 values(1,2); insert into t3 values(null,2); explain logical plan select * from t1 left join t2 on t1.a=t2.a left join t3 on t2.a=t3.a where t1.a=1; LogicalResultSink[110] ( outputExprs=[a#0, b#1, a#2, b#3, a#4, b#5] ) +--LogicalProject[109] ( distinct=false, projects=[a#0, b#1, a#2, b#3, a#4, b#5] ) +--LogicalJoin[108] ( type=LEFT_OUTER_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(a#2 = a#4)], otherJoinConjuncts=[], markJoinConjuncts=[] ) |--LogicalProject[105] ( distinct=false, projects=[a#0, b#1, a#2, b#3] ) | +--LogicalJoin[104] ( type=LEFT_OUTER_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(a#0 = a#2)], otherJoinConjuncts=[], markJoinConjuncts=[] ) | |--LogicalFilter[101] ( predicates=(a#0 = 1) ) | | +--LogicalOlapScan ( qualified=internal.maldb.t1, indexName=<index_not_selected>, selectedIndexId=1764043369852, preAgg=ON, operativeCol=[a#0], virtualColumns=[] ) | +--LogicalFilter[103] ( predicates=(a#2 = 1) ) | +--LogicalOlapScan ( qualified=internal.maldb.t2, indexName=<index_not_selected>, selectedIndexId=1764043369875, preAgg=ON, operativeCol=[a#2], virtualColumns=[] ) +--LogicalOlapScan ( qualified=internal.maldb.t3, indexName=<index_not_selected>, selectedIndexId=1764043369898, preAgg=ON, operativeCol=[a#4], virtualColumns=[] ) The optimizer should derive t3.a=1 from t1.a=1 and the join conditions, but it currently doesn't. The root cause is that the PullUpPredicates rule doesn't properly handle predicate pull-up from the right side of LEFT JOINs. This PR fixes this by generating null-tolerant predicates when pulling up from RIGHT JOIN's right table and strengthening them when possible based on upper-level join conditions. after this pr: LogicalResultSink[110] ( outputExprs=[a#0, b#1, a#2, b#3, a#4, b#5] ) +--LogicalProject[109] ( distinct=false, projects=[a#0, b#1, a#2, b#3, a#4, b#5] ) +--LogicalJoin[108] ( type=LEFT_OUTER_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(a#2 = a#4)], otherJoinConjuncts=[], markJoinConjuncts=[] ) |--LogicalProject[105] ( distinct=false, projects=[a#0, b#1, a#2, b#3] ) | +--LogicalJoin[104] ( type=LEFT_OUTER_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(a#0 = a#2)], otherJoinConjuncts=[], markJoinConjuncts=[] ) | |--LogicalFilter[101] ( predicates=(a#0 = 1) ) | | +--LogicalOlapScan ( qualified=internal.maldb.t1, indexName=<index_not_selected>, selectedIndexId=1764043369852, preAgg=ON, operativeCol=[a#0], virtualColumns=[] ) | +--LogicalFilter[103] ( predicates=(a#2 = 1) ) | +--LogicalOlapScan ( qualified=internal.maldb.t2, indexName=<index_not_selected>, selectedIndexId=1764043369875, preAgg=ON, operativeCol=[a#2], virtualColumns=[] ) +--LogicalFilter[107] ( predicates=(a#4 = 1) ) +--LogicalOlapScan ( qualified=internal.maldb.t3, indexName=<index_not_selected>, selectedIndexId=1764043369898, preAgg=ON, operativeCol=[a#4], virtualColumns=[] )
…che#58372) ### What problem does this PR solve? Related PR: apache#41731 Problem Summary: The optimizer cannot derive predicates across multiple LEFT JOINs. For example, given a filter on the leftmost table in a chain of LEFT JOINs, the optimizer should be able to derive predicates on the rightmost table, but it currently fails to do so. create table t1(a int, b int); create table t2(a int, b int); create table t3(a int, b int); insert into t1 values(1,2); insert into t2 values(1,2); insert into t3 values(1,2); insert into t3 values(null,2); explain logical plan select * from t1 left join t2 on t1.a=t2.a left join t3 on t2.a=t3.a where t1.a=1; LogicalResultSink[110] ( outputExprs=[a#0, b#1, a#2, b#3, a#4, b#5] ) +--LogicalProject[109] ( distinct=false, projects=[a#0, b#1, a#2, b#3, a#4, b#5] ) +--LogicalJoin[108] ( type=LEFT_OUTER_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(a#2 = a#4)], otherJoinConjuncts=[], markJoinConjuncts=[] ) |--LogicalProject[105] ( distinct=false, projects=[a#0, b#1, a#2, b#3] ) | +--LogicalJoin[104] ( type=LEFT_OUTER_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(a#0 = a#2)], otherJoinConjuncts=[], markJoinConjuncts=[] ) | |--LogicalFilter[101] ( predicates=(a#0 = 1) ) | | +--LogicalOlapScan ( qualified=internal.maldb.t1, indexName=<index_not_selected>, selectedIndexId=1764043369852, preAgg=ON, operativeCol=[a#0], virtualColumns=[] ) | +--LogicalFilter[103] ( predicates=(a#2 = 1) ) | +--LogicalOlapScan ( qualified=internal.maldb.t2, indexName=<index_not_selected>, selectedIndexId=1764043369875, preAgg=ON, operativeCol=[a#2], virtualColumns=[] ) +--LogicalOlapScan ( qualified=internal.maldb.t3, indexName=<index_not_selected>, selectedIndexId=1764043369898, preAgg=ON, operativeCol=[a#4], virtualColumns=[] ) The optimizer should derive t3.a=1 from t1.a=1 and the join conditions, but it currently doesn't. The root cause is that the PullUpPredicates rule doesn't properly handle predicate pull-up from the right side of LEFT JOINs. This PR fixes this by generating null-tolerant predicates when pulling up from RIGHT JOIN's right table and strengthening them when possible based on upper-level join conditions. after this pr: LogicalResultSink[110] ( outputExprs=[a#0, b#1, a#2, b#3, a#4, b#5] ) +--LogicalProject[109] ( distinct=false, projects=[a#0, b#1, a#2, b#3, a#4, b#5] ) +--LogicalJoin[108] ( type=LEFT_OUTER_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(a#2 = a#4)], otherJoinConjuncts=[], markJoinConjuncts=[] ) |--LogicalProject[105] ( distinct=false, projects=[a#0, b#1, a#2, b#3] ) | +--LogicalJoin[104] ( type=LEFT_OUTER_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(a#0 = a#2)], otherJoinConjuncts=[], markJoinConjuncts=[] ) | |--LogicalFilter[101] ( predicates=(a#0 = 1) ) | | +--LogicalOlapScan ( qualified=internal.maldb.t1, indexName=<index_not_selected>, selectedIndexId=1764043369852, preAgg=ON, operativeCol=[a#0], virtualColumns=[] ) | +--LogicalFilter[103] ( predicates=(a#2 = 1) ) | +--LogicalOlapScan ( qualified=internal.maldb.t2, indexName=<index_not_selected>, selectedIndexId=1764043369875, preAgg=ON, operativeCol=[a#2], virtualColumns=[] ) +--LogicalFilter[107] ( predicates=(a#4 = 1) ) +--LogicalOlapScan ( qualified=internal.maldb.t3, indexName=<index_not_selected>, selectedIndexId=1764043369898, preAgg=ON, operativeCol=[a#4], virtualColumns=[] )
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #41731
Problem Summary:
Release note
The optimizer cannot derive predicates across multiple LEFT JOINs. For example, given a filter on the leftmost table in a chain of LEFT JOINs, the optimizer should be able to derive predicates on the rightmost table, but it currently fails to do so.
The optimizer should derive t3.a=1 from t1.a=1 and the join conditions, but it currently doesn't.
The root cause is that the PullUpPredicates rule doesn't properly handle predicate pull-up from the right side of LEFT JOINs. This PR fixes this by generating null-tolerant predicates when pulling up from RIGHT JOIN's right table and strengthening them when possible based on upper-level join conditions.
after this pr:
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)