Commit e2faf1a
[CARMEL-4286][CARMEL-4380] Backport SPARK-33910 Simplify/Optimize conditional expressions (#291)
* [SPARK-32721][SQL] Simplify if clauses with null and boolean
### What changes were proposed in this pull request?
The following if clause:
```sql
if(p, null, false)
```
can be simplified to:
```sql
and(p, null)
```
Similarly, the clause:
```sql
if(p, null, true)
```
can be simplified to
```sql
or(not(p), null)
```
iff the predicate `p` is non-nullable, i.e., can be evaluated to either true or false, but not null.
### Why are the changes needed?
Converting if to or/and clauses can better push filters down.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Unit tests.
Closes #29567 from sunchao/SPARK-32721.
Authored-by: Chao Sun <sunchao@apache.org>
Signed-off-by: DB Tsai <d_tsai@apple.com>
(cherry picked from commit 1453a09)
* [SPARK-32721][SQL][FOLLOWUP] Simplify if clauses with null and boolean
### What changes were proposed in this pull request?
This is a follow-up on SPARK-32721 and PR #29567. In the previous PR we missed two more cases that can be optimized:
```
if(p, false, null) ==> and(not(p), null)
if(p, true, null) ==> or(p, null)
```
### Why are the changes needed?
By transforming if to boolean conjunctions or disjunctions, we can enable more filter pushdown to datasources.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Added unit tests.
Closes #29603 from sunchao/SPARK-32721-2.
Authored-by: Chao Sun <sunchao@apache.org>
Signed-off-by: DB Tsai <d_tsai@apple.com>
(cherry picked from commit 94d313b)
* [SPARK-33798][SQL] Add new rule to push down the foldable expressions through CaseWhen/If
### What changes were proposed in this pull request?
This pr add a new rule(`PushFoldableIntoBranches`) to push down the foldable expressions through `CaseWhen/If`. This is a real case from production:
```sql
create table t1 using parquet as select * from range(100);
create table t2 using parquet as select * from range(200);
create temp view v1 as
select 'a' as event_type, * from t1
union all
select CASE WHEN id = 1 THEN 'b' WHEN id = 3 THEN 'c' end as event_type, * from t2
explain select * from v1 where event_type = 'a';
```
Before this PR:
```
== Physical Plan ==
Union
:- *(1) Project [a AS event_type#30533, id#30535L]
: +- *(1) ColumnarToRow
: +- FileScan parquet default.t1[id#30535L] Batched: true, DataFilters: [], Format: Parquet
+- *(2) Project [CASE WHEN (id#30536L = 1) THEN b WHEN (id#30536L = 3) THEN c END AS event_type#30534, id#30536L]
+- *(2) Filter (CASE WHEN (id#30536L = 1) THEN b WHEN (id#30536L = 3) THEN c END = a)
+- *(2) ColumnarToRow
+- FileScan parquet default.t2[id#30536L] Batched: true, DataFilters: [(CASE WHEN (id#30536L = 1) THEN b WHEN (id#30536L = 3) THEN c END = a)], Format: Parquet
```
After this PR:
```
== Physical Plan ==
*(1) Project [a AS event_type#8, id#4L]
+- *(1) ColumnarToRow
+- FileScan parquet default.t1[id#4L] Batched: true, DataFilters: [], Format: Parquet
```
### Why are the changes needed?
Improve query performance.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Unit test.
Closes #30790 from wangyum/SPARK-33798.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 06b1bbb)
* [SPARK-33845][SQL] Remove unnecessary if when trueValue and falseValue are foldable boolean types
### What changes were proposed in this pull request?
Improve `SimplifyConditionals`.
Simplify `If(cond, TrueLiteral, FalseLiteral)` to `cond`.
Simplify `If(cond, FalseLiteral, TrueLiteral)` to `Not(cond)`.
The use case is:
```sql
create table t1 using parquet as select id from range(10);
select if (id > 2, false, true) from t1;
```
Before this pr:
```
== Physical Plan ==
*(1) Project [if ((id#1L > 2)) false else true AS (IF((id > CAST(2 AS BIGINT)), false, true))#2]
+- *(1) ColumnarToRow
+- FileScan parquet default.t1[id#1L] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:bigint>
```
After this pr:
```
== Physical Plan ==
*(1) Project [(id#1L <= 2) AS (IF((id > CAST(2 AS BIGINT)), false, true))#2]
+- *(1) ColumnarToRow
+- FileScan parquet default.t1[id#1L] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:bigint>
```
### Why are the changes needed?
Improve query performance.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Unit test.
Closes #30849 from wangyum/SPARK-33798-2.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
* [SPARK-33848][SQL] Push the UnaryExpression into (if / case) branches
### What changes were proposed in this pull request?
This pr push the `UnaryExpression` into (if / case) branches. The use case is:
```sql
create table t1 using parquet as select id from range(10);
explain select id from t1 where (CASE WHEN id = 1 THEN '1' WHEN id = 3 THEN '2' end) > 3;
```
Before this pr:
```
== Physical Plan ==
*(1) Filter (cast(CASE WHEN (id#1L = 1) THEN 1 WHEN (id#1L = 3) THEN 2 END as int) > 3)
+- *(1) ColumnarToRow
+- FileScan parquet default.t1[id#1L] Batched: true, DataFilters: [(cast(CASE WHEN (id#1L = 1) THEN 1 WHEN (id#1L = 3) THEN 2 END as int) > 3)], Format: Parquet, Location: InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:bigint>
```
After this pr:
```
== Physical Plan ==
LocalTableScan <empty>, [id#1L]
```
This change can also improve this case:
https://github.com/apache/spark/blob/a78d6ce376edf2a8836e01f47b9dff5371058d4c/sql/core/src/test/resources/tpcds/q62.sql#L5-L22
### Why are the changes needed?
Improve query performance.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Unit test.
Closes #30853 from wangyum/SPARK-33848.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit 1c77605)
* [SPARK-33847][SQL] Simplify CaseWhen if elseValue is None
### What changes were proposed in this pull request?
1. Enhance `ReplaceNullWithFalseInPredicate` to replace None of elseValue inside `CaseWhen` with `FalseLiteral` if all branches are `FalseLiteral` . The use case is:
```sql
create table t1 using parquet as select id from range(10);
explain select id from t1 where (CASE WHEN id = 1 THEN 'a' WHEN id = 3 THEN 'b' end) = 'c';
```
Before this pr:
```
== Physical Plan ==
*(1) Filter CASE WHEN (id#1L = 1) THEN false WHEN (id#1L = 3) THEN false END
+- *(1) ColumnarToRow
+- FileScan parquet default.t1[id#1L] Batched: true, DataFilters: [CASE WHEN (id#1L = 1) THEN false WHEN (id#1L = 3) THEN false END], Format: Parquet, Location: InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:bigint>
```
After this pr:
```
== Physical Plan ==
LocalTableScan <empty>, [id#1L]
```
2. Enhance `SimplifyConditionals` if elseValue is None and all outputs are null.
### Why are the changes needed?
Improve query performance.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Unit test.
Closes #30852 from wangyum/SPARK-33847.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 7ffcfcf)
* [SPARK-33861][SQL] Simplify conditional in predicate
### What changes were proposed in this pull request?
This pr simplify conditional in predicate, after this change we can push down the filter to datasource:
Expression | After simplify
-- | --
IF(cond, trueVal, false) | AND(cond, trueVal)
IF(cond, trueVal, true) | OR(NOT(cond), trueVal)
IF(cond, false, falseVal) | AND(NOT(cond), elseVal)
IF(cond, true, falseVal) | OR(cond, elseVal)
CASE WHEN cond THEN trueVal ELSE false END | AND(cond, trueVal)
CASE WHEN cond THEN trueVal END | AND(cond, trueVal)
CASE WHEN cond THEN trueVal ELSE null END | AND(cond, trueVal)
CASE WHEN cond THEN trueVal ELSE true END | OR(NOT(cond), trueVal)
CASE WHEN cond THEN false ELSE elseVal END | AND(NOT(cond), elseVal)
CASE WHEN cond THEN false END | false
CASE WHEN cond THEN true ELSE elseVal END | OR(cond, elseVal)
CASE WHEN cond THEN true END | cond
### Why are the changes needed?
Improve query performance.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Unit test.
Closes #30865 from wangyum/SPARK-33861.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 32d4a2b)
* Fix
* [SPARK-33845][SQL][FOLLOWUP] fix SimplifyConditionals
### What changes were proposed in this pull request?
This is a followup of #30849, to fix a correctness issue caused by null value handling.
### Why are the changes needed?
Fix a correctness issue. `If(null, true, false)` should return false, not true.
### Does this PR introduce _any_ user-facing change?
Yes, but the bug only exist in the master branch.
### How was this patch tested?
updated tests.
Closes #30953 from cloud-fan/bug.
Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit c2eac1d)
* Fix
* [SPARK-33884][SQL] Simplify CaseWhenclauses with (true and false) and (false and true)
### What changes were proposed in this pull request?
This pr simplify `CaseWhen`clauses with (true and false) and (false and true):
Expression | cond.nullable | After simplify
-- | -- | --
case when cond then true else false end | true | cond <=> true
case when cond then true else false end | false | cond
case when cond then false else true end | true | !(cond <=> true)
case when cond then false else true end | false | !cond
### Why are the changes needed?
Improve query performance.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Unit test.
Closes #30898 from wangyum/SPARK-33884.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit f7bdea3)
* [SPARK-33848][SQL][FOLLOWUP] Introduce allowList for push into (if / case) branches
### What changes were proposed in this pull request?
Introduce allowList push into (if / case) branches to fix potential bug.
### Why are the changes needed?
Fix potential bug.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing test.
Closes #30955 from wangyum/SPARK-33848-2.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 872107f)
* Fix
* [SPARK-33847][SQL][FOLLOWUP] Remove the CaseWhen should consider deterministic
### What changes were proposed in this pull request?
This pr fix remove the `CaseWhen` if elseValue is empty and other outputs are null because of we should consider deterministic.
### Why are the changes needed?
Fix bug.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Unit test.
Closes #30960 from wangyum/SPARK-33847-2.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit c425024)
Co-authored-by: Chao Sun <sunchao@apache.org>
Co-authored-by: Wenchen Fan <wenchen@databricks.com>1 parent 7811c34 commit e2faf1a
File tree
11 files changed
+879
-40
lines changed- sql
- catalyst/src
- main/scala/org/apache/spark/sql/catalyst
- expressions
- optimizer
- test/scala/org/apache/spark/sql/catalyst/optimizer
- core/src/test/scala/org/apache/spark/sql
11 files changed
+879
-40
lines changedLines changed: 11 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
521 | 521 | | |
522 | 522 | | |
523 | 523 | | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
524 | 530 | | |
525 | 531 | | |
526 | 532 | | |
| |||
621 | 627 | | |
622 | 628 | | |
623 | 629 | | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
624 | 635 | | |
625 | 636 | | |
626 | 637 | | |
| |||
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
96 | 96 | | |
97 | 97 | | |
98 | 98 | | |
| 99 | + | |
99 | 100 | | |
100 | 101 | | |
101 | 102 | | |
| 103 | + | |
102 | 104 | | |
103 | 105 | | |
104 | 106 | | |
| |||
Lines changed: 9 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
20 | | - | |
21 | | - | |
22 | | - | |
| 20 | + | |
| 21 | + | |
23 | 22 | | |
24 | 23 | | |
25 | | - | |
26 | 24 | | |
27 | 25 | | |
28 | 26 | | |
| |||
55 | 53 | | |
56 | 54 | | |
57 | 55 | | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
58 | 62 | | |
59 | 63 | | |
60 | 64 | | |
| |||
92 | 96 | | |
93 | 97 | | |
94 | 98 | | |
95 | | - | |
| 99 | + | |
96 | 100 | | |
97 | 101 | | |
98 | 102 | | |
| |||
Lines changed: 80 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
Lines changed: 99 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
24 | | - | |
| 24 | + | |
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
| |||
461 | 461 | | |
462 | 462 | | |
463 | 463 | | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
464 | 468 | | |
465 | 469 | | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
466 | 479 | | |
467 | 480 | | |
468 | 481 | | |
| |||
488 | 501 | | |
489 | 502 | | |
490 | 503 | | |
491 | | - | |
492 | | - | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
493 | 507 | | |
494 | 508 | | |
495 | 509 | | |
| |||
510 | 524 | | |
511 | 525 | | |
512 | 526 | | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
| 595 | + | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
513 | 609 | | |
514 | 610 | | |
515 | 611 | | |
| |||
Lines changed: 4 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
218 | 218 | | |
219 | 219 | | |
220 | 220 | | |
221 | | - | |
| 221 | + | |
222 | 222 | | |
223 | | - | |
| 223 | + | |
224 | 224 | | |
225 | 225 | | |
226 | | - | |
| 226 | + | |
227 | 227 | | |
228 | | - | |
| 228 | + | |
229 | 229 | | |
230 | 230 | | |
231 | 231 | | |
| |||
0 commit comments