-
Notifications
You must be signed in to change notification settings - Fork 3.7k
branch-4.0: [draft](case when) optimize case when expression #58186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: branch-4.0
Are you sure you want to change the base?
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
run feut |
97be4b8 to
73887a8
Compare
|
run buildall |
1 similar comment
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
0d041bf to
fc9b42e
Compare
|
run buildall |
fc9b42e to
e744db0
Compare
|
run buildall |
e744db0 to
f8cbbea
Compare
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
…ache#56424) for a case when condition, the condition evaluate result is null or false have the same effect: not hit the condition. in most case, nullable cann't fold in logistic expression, for example `null and a = 1` and `null or a = 1` cann't fold. but false can fold in logistic expression, `false and a=1` can fold to false, `false or a = 1` can fold to `a = 1`. so if we replace the null to false in case when condition, then the expression may be fold more simple. in fact, for case/if condition, null can replace with FALSE when it is the expression root or all its ancestors are AND/OR/CASE IF CONDITION, and this rewrite will not change the hit or not of the branch. for example: for sql: 'case when null and a > 1 then ...': 1. after use this rule rewrite to 'case when false and a > 1 then ... ', 2. then constant fold rule will rewrite it to 'case when false then ...', 3. then case when can remove this branch since its condition is false.
…to true/false (apache#56469) for nested case when, replace the inner case duplicate condition to true/false when this condition also exists in outer case condition: 1. if it exists in outer case's current branch condition, replace it with TRUE: case when A then (case when A and B then 1 else 2 end) ... end then inner case condition A will replace with TRUE: case when A then (case when TRUE and B then 1 else 2 end) ... end 2. if it exists in outer case's previous branch condition, replace it with FALSE: case when A then C when B then (case when A and D then 1 else 2 end) ... end then inner case condition A will replace with FALSE: case when A then C when B then (case when FALSE and D then 1 else 2 end) ... end this PR also opt fold case when and fold if statement. for case when / if expression, if all their branches values equals, then rewrite them to the same value.
for a boolean data type case when expression, if all its when clauses' result are true / false literal, then can rewrite this case when to AND / OR expression. for example: case when a = 1 then true when b = 1 then false else c = 1 end rewrite to: (a = 1) <=> true or (not((b = 1) <=> true) and c = 1) if (a = 1, true, b = 1) rewrite to: (a=1) <=> true or b = 1
1. Add fold constant for nullif function; 2. Opt fold nvl: `nvl(a, a)` => `a`, `nvl(a, null)` => `a`; 3. Make AggregateFunction and TableGeneratingFunction to non-foldable.
…when (apache#57025) For a condition expression, replace null to false, null safe equal to equal. The condition include filter condition, join condition, if condition, case when condition. And for the condition expression, only replace those sub expression which its ancestors to the condition root are all AND / OR / CASE WHEN / IF. So, for a expression in a filter, the first null can be replaced, but the second null cann't be replaced because its parent is NOT, not in AND / OR / CASE WHEN / IF. For a expression in condition, if one of them is not-nullable, then it can rewrite to . Note that if a expression is not a condition, can rewrite to require that both x and y are not-nullable.
…che#56899) simplify expression equals with true / false literal.
…6941) push upper function into case when branch. for expression f with n arguments a1, a2, ..., an, if one of its argument is case when/if/nvl/nullif, and the others are literals, then we can push f into each branch of case when/if/nvl/nullif.
for RangeAll(a) intersect IsNull(a), cann't simplify it to FALSE because when a is null, the result is null. Only when a is not null, can simplify it to FALSE. introduce by apache#57537
…on (apache#58475) Fix immutable map removeIf throw UnsupportedOperationException introduce by apache#57537
…llif expression (apache#58430) Join extract OR expressions from case when expression. 1. extract conditions for one side, latter can push down the one side condition: ``` t1 join t2 on not (case when t1.a = 1 then t2.a else t2.b) + t2.b + t2.c > 10) => t1 join t2 on not (case when t1.a = 1 then t2.a else t2.b end) + t2.b + t2.c > 10) AND (not (t2.a + t2.b + t2.c > 10) or not (t2.b + t2.b + t2.c > 10)) ``` 2. extract condition for both sides, which use for OR EXPANSION rule: the OR EXPANSION condition is an OR expression and all its disjuncts are all hash condition. ``` t1 join t2 on (case when t1.a = 1 then t2.a else t2.b end) = t1.a + t1.b => t1 join t2 on (case when t1.a = 1 then t2.a else t2.b end) = t1.a + t1.b AND (t2.a = t1.a + t1.b or t2.b = t1.a + t1.b) ``` Notice We don't extract more than one case when like expressions. because it may generate expressions with combinatorial explosion. for example: ``` (((case c1 then p1 else p2 end) + (case when d1 then q1 else q2 end))) + a > 10 => (p1 + q1 + a > 10) or (p1 + q2 + a > 10) or (p2 + q1 + a > 10) or (p2 + q2 + a > 10) ``` so we only extract at most one case when like expression for each condition.
…es (apache#59671) simplify range have exception: mysql > explain select pk from table_0_500_undef_partitions2_keys3_properties4_distributed_by5 where not ( date_sub(col_date_undef_signed, interval 1 day) > '2010-03-21' or null ) or lower(col_varchar_20__undef_signed) between null and 'e'; (1105, "errCode = 2, detailMessage = Cannot compare two values with different data types: 2010-03-22 (DATEV2) vs 'e' (VARCHAR(1))") PR apache#57537 introduce merging compound value with other value desc. when merge value desc, it will merge values with the same reference. but for compound value, its reference have other meaning: 1) a < 1 or a > 10, this is a compound value, its reference is a; 2) a < 1 or b > 10, this is a compound value, its reference is 'a < 1 or b > 10'. then for expression `TA > 1 and FALSE or FALSE and SB > 'abc''`, for the operator OR, it will have two compound values: a. C1 = CompoundValue(referece = getCompoundExpression(TA > 1 and FALSE) = FALSE, source values={TA > 1, FALSE}) b. C2 = CompoundValue(reference = getCompoundExpression(FALSE and SB > 'abc') = FALSE, source values={FALSE, SB > 'abc'}) because the function getCompoundExpression will fold constant, then C1 andC2's referece will be 'FALSE', and since C1 and C2's reference equals, then will try merge C1 and C2, then will check merge ' > 1' and ' > abc' will cause the above exception. to fix this, for a compound value if its source values different reference (like a < 1 or b > 10), then don't merge it with other values descs.
f8cbbea to
3c14b07
Compare
|
run buildall |
…che#59818) Add more inference rules: 1. simplify: `IsNull AND xx` ``` TA is null and TA > 10 => TA is null and null TA is null and TA = 10 => TA is null and null TA is null and TA != 10 => TA is null and null ``` then if this expression is in where condition, it will replace null to FALSE, and fold this expression to FALSE 2. simplify: `IsNotNull OR xx` ``` TA is not null or TA > 10 => TA is not null or null TA is not null or TA = 10 => TA is not null or null TA is not null or TA != 10 => TA is not null or null ``` then if this expression is in where condition, it will replace null to FALSE, and fold this expression to TA is not null 3. simplify: IsNotNull(TA) AND RangeAll(TA) = IsNotNull(TA) ``` TA is not null and (TA is not null or null) => TA is not null ``` 4. simplify: IsNull(TA) AND RangeAll(TA) = EmptyValue(TA) ``` TA is null and (TA is not null or null) = TA is null and null ``` 5. simplify: IsNotNull(TA) or EmptyValue(TA) = RangeAll(TA) ``` TA is not null or (TA is null and null) = TA is not null or null ``` 6. Fix merge compound value bug if `A AND B` is FALSE/EmptyValue, A AND (B OR C) = A AND C. but this has bug, if 'A AND B' is FALSE, the equation is correct. but if 'A AND B' is EmptyValue and A nullable, the equation maybe still wrong, for example: `a > 100 and (a = 1 or a is not null)`, it cann't simplify to `a > 100 and a is not null`. so we give up simplify this expression just in one step, if `A and B` is empty value, we replace B with EmptyValue, then we will have `A AND (B OR C)` = `A AND (EmptyValue OR C)`, then in the next expression rewrite iteration, it will try to simplify `EmptyValue OR C`. for the same logical for OR, then for 'A OR (B AND C)', if A OR B is RangeAll, then we replace B with RangeAll.
What problem does this PR solve?
cherry-pick: #56424, #56469, #56756, #56932, #57025, #56899, #56941, #57537, #58451, #58475, #58430, #59671, #59818
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)