Skip to content

Conversation

@yujun777
Copy link
Contributor

@yujun777 yujun777 commented Nov 20, 2025

What problem does this PR solve?

cherry-pick: #56424, #56469, #56756, #56932, #57025, #56899, #56941, #57537, #58451, #58475, #58430, #59671, #59818

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@yujun777 yujun777 requested a review from yiguolei as a code owner November 20, 2025 08:50
@Thearas
Copy link
Contributor

Thearas commented Nov 20, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@yujun777
Copy link
Contributor Author

run buildall

@yujun777
Copy link
Contributor Author

run feut

@yujun777 yujun777 marked this pull request as draft November 21, 2025 02:49
@yujun777 yujun777 changed the title branch-4.0: [feat](nereids) optimize case when expression branch-4.0: [draft](nereids) optimize case when expression Nov 21, 2025
@yujun777 yujun777 force-pushed the branch-4.0-opt-case-when branch 2 times, most recently from 97be4b8 to 73887a8 Compare December 2, 2025 03:21
@yujun777
Copy link
Contributor Author

yujun777 commented Dec 2, 2025

run buildall

1 similar comment
@yujun777
Copy link
Contributor Author

yujun777 commented Dec 3, 2025

run buildall

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 73.57% (1055/1434) 🎉
Increment coverage report
Complete coverage report

@yujun777 yujun777 changed the title branch-4.0: [draft](nereids) optimize case when expression branch-4.0: [draft](case when) optimize case when expression Dec 3, 2025
@yujun777 yujun777 force-pushed the branch-4.0-opt-case-when branch from 0d041bf to fc9b42e Compare December 8, 2025 03:25
@yujun777
Copy link
Contributor Author

yujun777 commented Dec 8, 2025

run buildall

@yujun777 yujun777 force-pushed the branch-4.0-opt-case-when branch from fc9b42e to e744db0 Compare January 9, 2026 07:27
@yujun777
Copy link
Contributor Author

yujun777 commented Jan 9, 2026

run buildall

@yujun777 yujun777 force-pushed the branch-4.0-opt-case-when branch from e744db0 to f8cbbea Compare January 9, 2026 07:37
@yujun777
Copy link
Contributor Author

yujun777 commented Jan 9, 2026

run buildall

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 73.84% (1067/1445) 🎉
Increment coverage report
Complete coverage report

…ache#56424)

for a case when condition, the condition evaluate result is null or
false have the same effect: not hit the condition.

in most case, nullable cann't fold in logistic expression, for example
`null and a = 1` and `null or a = 1` cann't fold.
but false can fold in logistic expression, `false and a=1` can fold to
false, `false or a = 1` can fold to `a = 1`.

so if we replace the null to false in case when condition, then the
expression may be fold more simple.

in fact, for case/if condition, null can replace with FALSE when it is
the expression root or all its ancestors are AND/OR/CASE IF CONDITION,
and this rewrite will not change the hit or not of the branch.

for example:  

for sql:   'case  when null and a > 1 then ...':
1. after use this rule rewrite to 'case when false and a > 1 then ... ',
2. then constant fold rule will rewrite it to 'case when false then ...',
3. then case when can remove this branch since its condition is false.
…to true/false (apache#56469)

for nested case when, replace the inner case duplicate condition to
true/false when this condition also exists in outer case condition:

1. if it exists in outer case's current branch condition, replace it with TRUE:
case when A then (case when A and B then 1 else 2 end)
 ...
end

then inner case condition A will replace with TRUE:
case when A then  (case when TRUE and B then 1 else 2 end)
...
end

2. if it exists in outer case's previous branch condition, replace it with FALSE:
case when A then C
     when B then (case when A and D then 1 else 2 end)
 ...
end

then inner case condition A will replace with FALSE:
case when A then C
     when B then (case when FALSE and D then 1 else 2 end)
...
end

this PR also opt fold case when and fold if statement. for case when /
if expression, if all their branches values equals, then rewrite them to
the same value.
for a boolean data type case when expression, if all its when clauses'
result are true / false literal, then can rewrite this case when to AND
/ OR expression.

for example:

case when a = 1 then true when b = 1 then false else c = 1 end
rewrite to:  (a = 1) <=> true  or (not((b = 1) <=> true) and c = 1)

if (a = 1, true,  b = 1)
rewrite to:  (a=1) <=> true or b = 1
1. Add fold constant for nullif function;
2. Opt fold nvl:   `nvl(a, a)`  => `a`,  `nvl(a, null)` => `a`;
3. Make AggregateFunction and TableGeneratingFunction to non-foldable.
…when (apache#57025)

For a condition expression, replace null to false, null safe equal to
equal. The condition include filter condition, join condition, if
condition, case when condition.

And for the condition expression, only replace those sub expression
which its ancestors to the condition root are all AND / OR / CASE WHEN /
IF. So, for a expression  in a filter, the first
null can be replaced, but the second null cann't be replaced because its
parent is NOT, not in AND / OR / CASE WHEN / IF.

For a expression  in condition, if one of them is not-nullable,
then it can rewrite to . Note that if a expression is not a
condition,  can rewrite to  require that both x and y
are not-nullable.
…che#56899)

simplify expression equals with true / false literal.
…6941)

push upper function into case when branch.

for expression f with n arguments a1, a2, ..., an, if one of its
argument is case when/if/nvl/nullif,
and the others are literals, then we can push f into each branch of case
when/if/nvl/nullif.
for RangeAll(a) intersect IsNull(a), cann't simplify it to FALSE because
when a is null, the result is null. Only when a is not null, can
simplify it to FALSE.

introduce by apache#57537
…on (apache#58475)

Fix immutable map removeIf throw UnsupportedOperationException 

introduce by apache#57537
…llif expression (apache#58430)

Join extract OR expressions from case when  expression.

1. extract conditions for one side, latter can push down the one side
condition:

```
    t1 join t2 on not (case when t1.a = 1 then t2.a else t2.b) + t2.b + t2.c > 10)
    =>
    t1 join t2 on not (case when t1.a = 1 then t2.a else t2.b end) + t2.b + t2.c > 10)
                  AND (not (t2.a + t2.b + t2.c > 10) or not (t2.b + t2.b + t2.c > 10))
```

2. extract condition for both sides, which use for OR EXPANSION rule:
 
the OR EXPANSION condition is an OR expression and all its disjuncts are
all hash condition.

```
    t1 join t2 on (case when t1.a = 1 then t2.a else t2.b end) = t1.a + t1.b
    =>
    t1 join t2 on (case when t1.a = 1 then t2.a else t2.b end) = t1.a + t1.b
                AND (t2.a = t1.a + t1.b or t2.b = t1.a + t1.b)
```


Notice We don't extract more than one case when like expressions.
because it may generate expressions with combinatorial explosion.

for example:

```
 (((case c1 then p1 else p2 end) + (case when d1 then q1 else q2 end))) + a  > 10
 =>
 (p1 + q1 + a > 10)
     or (p1 + q2 + a > 10)
     or (p2 + q1 + a > 10)
     or (p2 + q2 + a > 10)
```
so we only extract at most one case when like expression for each
condition.
…es (apache#59671)

simplify range have exception:

mysql >  explain select pk from table_0_500_undef_partitions2_keys3_properties4_distributed_by5 where  not ( date_sub(col_date_undef_signed, interval 1 day) > '2010-03-21' or null ) or lower(col_varchar_20__undef_signed) between null and 'e';
(1105, "errCode = 2, detailMessage = Cannot compare two values with different data types: 2010-03-22 (DATEV2) vs 'e' (VARCHAR(1))")

PR apache#57537  introduce  merging compound value with other value desc.

when merge value desc, it will merge values with the same reference.

but for compound value,  its reference have other meaning:
1) a < 1 or a > 10,  this is a compound value,  its reference is a;
2) a < 1 or b > 10, this is a compound value, its reference is 'a < 1 or
b > 10'.

then for expression `TA > 1 and FALSE or FALSE and SB > 'abc''`, for the
operator OR, it will have two compound values:
a. C1 = CompoundValue(referece = getCompoundExpression(TA > 1 and FALSE)
= FALSE, source values={TA > 1, FALSE})
b. C2 = CompoundValue(reference = getCompoundExpression(FALSE and SB >
'abc') = FALSE, source values={FALSE, SB > 'abc'})

because the function getCompoundExpression will fold constant, then C1
andC2's referece will be 'FALSE', and since C1 and C2's reference
equals, then will try merge C1 and C2, then will check merge ' > 1' and
' > abc' will cause the above exception.

to fix this, for a compound value if its source values different
reference (like a < 1 or b > 10), then don't merge it with other values
descs.
@yujun777 yujun777 force-pushed the branch-4.0-opt-case-when branch from f8cbbea to 3c14b07 Compare January 12, 2026 02:20
@yujun777
Copy link
Contributor Author

run buildall

…che#59818)

Add more inference rules:

1.  simplify: `IsNull AND xx`

```
TA is null and TA > 10  => TA is null and null
TA is null and TA = 10  => TA is null and null
TA is null and TA != 10 => TA is null and null 
```

then if this expression is in where condition, it will replace null to
FALSE, and fold this expression to FALSE

2. simplify: `IsNotNull OR xx`

```
TA is not null or TA > 10  => TA is not null or null
TA is not null or TA = 10 => TA is not null or null
TA is not null or TA != 10 => TA is not null or null
```

then if this expression is in where condition, it will replace null to
FALSE, and fold this expression to TA is not null


3. simplify: IsNotNull(TA) AND RangeAll(TA) = IsNotNull(TA)

```
TA is not null and (TA is not null or null) => TA is not null
```

4. simplify: IsNull(TA) AND RangeAll(TA) = EmptyValue(TA)

```
TA is null and (TA is not null or null) = TA is null and null
```

5. simplify: IsNotNull(TA) or EmptyValue(TA) = RangeAll(TA)

```
TA is not null or (TA is null and null) = TA is not null or null
``` 

6. Fix merge compound value bug

if `A  AND B` is FALSE/EmptyValue,   A AND (B OR C) = A AND C.

but this has bug,  if 'A AND B' is FALSE,  the equation is correct.

but if 'A AND B' is EmptyValue and A nullable, the equation maybe still
wrong, for example: `a > 100 and (a = 1 or a is not null)`, it cann't
simplify to `a > 100 and a is not null`.

so we give up simplify this expression just in one step, if `A and B` is
empty value, we replace B with EmptyValue, then we will have `A AND (B
OR C)` = `A AND (EmptyValue OR C)`, then in the next expression rewrite
iteration, it will try to simplify `EmptyValue OR C`.

for the same logical for OR, then for 'A OR (B AND C)', if A OR B is
RangeAll, then we replace B with RangeAll.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants