Skip to content

Conversation

@feiniaofeiafei
Copy link
Contributor

picked from #58372

…che#58372)

Related PR: apache#41731

Problem Summary:

The optimizer cannot derive predicates across multiple LEFT JOINs. For
example, given a filter on the leftmost table in a chain of LEFT JOINs,
the optimizer should be able to derive predicates on the rightmost
table, but it currently fails to do so.

create table t1(a int, b int);
create table t2(a int, b int);
create table t3(a int, b int);

insert into t1 values(1,2);
insert into t2 values(1,2);
insert into t3 values(1,2);
insert into t3 values(null,2);

explain logical plan
select * from t1 left join t2 on t1.a=t2.a left join t3 on t2.a=t3.a where t1.a=1;

LogicalResultSink[110] ( outputExprs=[a#0, b#1, a#2, b#3, a#4, b#5] )
+--LogicalProject[109] ( distinct=false, projects=[a#0, b#1, a#2, b#3, a#4, b#5] )
   +--LogicalJoin[108] ( type=LEFT_OUTER_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(a#2 = a#4)], otherJoinConjuncts=[], markJoinConjuncts=[] )
      |--LogicalProject[105] ( distinct=false, projects=[a#0, b#1, a#2, b#3] )
      |  +--LogicalJoin[104] ( type=LEFT_OUTER_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(a#0 = a#2)], otherJoinConjuncts=[], markJoinConjuncts=[] )
      |     |--LogicalFilter[101] ( predicates=(a#0 = 1) )
      |     |  +--LogicalOlapScan ( qualified=internal.maldb.t1, indexName=<index_not_selected>, selectedIndexId=1764043369852, preAgg=ON, operativeCol=[a#0], virtualColumns=[] )
      |     +--LogicalFilter[103] ( predicates=(a#2 = 1) )
      |        +--LogicalOlapScan ( qualified=internal.maldb.t2, indexName=<index_not_selected>, selectedIndexId=1764043369875, preAgg=ON, operativeCol=[a#2], virtualColumns=[] )
     +--LogicalOlapScan ( qualified=internal.maldb.t3, indexName=<index_not_selected>, selectedIndexId=1764043369898, preAgg=ON, operativeCol=[a#4], virtualColumns=[] )

The optimizer should derive t3.a=1 from t1.a=1 and the join conditions,
but it currently doesn't.

The root cause is that the PullUpPredicates rule doesn't properly handle
predicate pull-up from the right side of LEFT JOINs. This PR fixes this
by generating null-tolerant predicates when pulling up from RIGHT JOIN's
right table and strengthening them when possible based on upper-level
join conditions.
after this pr:

LogicalResultSink[110] ( outputExprs=[a#0, b#1, a#2, b#3, a#4, b#5] )
+--LogicalProject[109] ( distinct=false, projects=[a#0, b#1, a#2, b#3, a#4, b#5] )
   +--LogicalJoin[108] ( type=LEFT_OUTER_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(a#2 = a#4)], otherJoinConjuncts=[], markJoinConjuncts=[] )
      |--LogicalProject[105] ( distinct=false, projects=[a#0, b#1, a#2, b#3] )
      |  +--LogicalJoin[104] ( type=LEFT_OUTER_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(a#0 = a#2)], otherJoinConjuncts=[], markJoinConjuncts=[] )
      |     |--LogicalFilter[101] ( predicates=(a#0 = 1) )
      |     |  +--LogicalOlapScan ( qualified=internal.maldb.t1, indexName=<index_not_selected>, selectedIndexId=1764043369852, preAgg=ON, operativeCol=[a#0], virtualColumns=[] )
      |     +--LogicalFilter[103] ( predicates=(a#2 = 1) )
      |        +--LogicalOlapScan ( qualified=internal.maldb.t2, indexName=<index_not_selected>, selectedIndexId=1764043369875, preAgg=ON, operativeCol=[a#2], virtualColumns=[] )
      +--LogicalFilter[107] ( predicates=(a#4 = 1) )
         +--LogicalOlapScan ( qualified=internal.maldb.t3, indexName=<index_not_selected>, selectedIndexId=1764043369898, preAgg=ON, operativeCol=[a#4], virtualColumns=[] )
@Thearas
Copy link
Contributor

Thearas commented Dec 3, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@feiniaofeiafei
Copy link
Contributor Author

run buildall

1 similar comment
@feiniaofeiafei
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 64.29% (63/98) 🎉
Increment coverage report
Complete coverage report

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 3, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Dec 3, 2025

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 3, 2025

PR approved by anyone and no changes requested.

@yiguolei yiguolei merged commit 6aeeee3 into apache:branch-4.0 Dec 3, 2025
25 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants