Commit f5900a5
[SPARK-43393][SQL][3.4] Address sequence expression overflow bug
### What changes were proposed in this pull request?
Spark has a (long-standing) overflow bug in the `sequence` expression.
Consider the following operations:
```
spark.sql("CREATE TABLE foo (l LONG);")
spark.sql(s"INSERT INTO foo VALUES (${Long.MaxValue});")
spark.sql("SELECT sequence(0, l) FROM foo;").collect()
```
The result of these operations will be:
```
Array[org.apache.spark.sql.Row] = Array([WrappedArray()])
```
an unintended consequence of overflow.
The sequence is applied to values `0` and `Long.MaxValue` with a step size of `1` which uses a length computation defined [here](https://github.com/apache/spark/blob/16411188c7ba6cb19c46a2bd512b2485a4c03e2c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L3451). In this calculation, with `start = 0`, `stop = Long.MaxValue`, and `step = 1`, the calculated `len` overflows to `Long.MinValue`. The computation, in binary looks like:
```
0111111111111111111111111111111111111111111111111111111111111111
- 0000000000000000000000000000000000000000000000000000000000000000
------------------------------------------------------------------
0111111111111111111111111111111111111111111111111111111111111111
/ 0000000000000000000000000000000000000000000000000000000000000001
------------------------------------------------------------------
0111111111111111111111111111111111111111111111111111111111111111
+ 0000000000000000000000000000000000000000000000000000000000000001
------------------------------------------------------------------
1000000000000000000000000000000000000000000000000000000000000000
```
The following [check](https://github.com/apache/spark/blob/16411188c7ba6cb19c46a2bd512b2485a4c03e2c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L3454) passes as the negative `Long.MinValue` is still `<= MAX_ROUNDED_ARRAY_LENGTH`. The following cast to `toInt` uses this representation and [truncates the upper bits](https://github.com/apache/spark/blob/16411188c7ba6cb19c46a2bd512b2485a4c03e2c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L3457) resulting in an empty length of `0`.
Other overflows are similarly problematic.
This PR addresses the issue by checking numeric operations in the length computation for overflow.
### Why are the changes needed?
There is a correctness bug from overflow in the `sequence` expression.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Tests added in `CollectionExpressionsSuite.scala`.
Closes #43819 from thepinetree/spark-sequence-overflow-3.4.
Authored-by: Deepayan Patra <deepayan.patra@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>1 parent 23f15af commit f5900a5
File tree
2 files changed
+71
-20
lines changed- sql/catalyst/src
- main/scala/org/apache/spark/sql/catalyst/expressions
- test/scala/org/apache/spark/sql/catalyst/expressions
2 files changed
+71
-20
lines changedLines changed: 32 additions & 15 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| 25 | + | |
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
| |||
39 | 40 | | |
40 | 41 | | |
41 | 42 | | |
42 | | - | |
43 | 43 | | |
44 | 44 | | |
45 | 45 | | |
| |||
3011 | 3011 | | |
3012 | 3012 | | |
3013 | 3013 | | |
| 3014 | + | |
| 3015 | + | |
| 3016 | + | |
| 3017 | + | |
| 3018 | + | |
| 3019 | + | |
| 3020 | + | |
| 3021 | + | |
| 3022 | + | |
| 3023 | + | |
| 3024 | + | |
| 3025 | + | |
| 3026 | + | |
| 3027 | + | |
| 3028 | + | |
| 3029 | + | |
| 3030 | + | |
| 3031 | + | |
| 3032 | + | |
| 3033 | + | |
| 3034 | + | |
| 3035 | + | |
| 3036 | + | |
| 3037 | + | |
| 3038 | + | |
| 3039 | + | |
| 3040 | + | |
| 3041 | + | |
3014 | 3042 | | |
3015 | 3043 | | |
3016 | 3044 | | |
| |||
3382 | 3410 | | |
3383 | 3411 | | |
3384 | 3412 | | |
3385 | | - | |
3386 | | - | |
3387 | | - | |
3388 | | - | |
3389 | | - | |
3390 | | - | |
3391 | | - | |
| 3413 | + | |
3392 | 3414 | | |
3393 | 3415 | | |
3394 | 3416 | | |
| |||
3398 | 3420 | | |
3399 | 3421 | | |
3400 | 3422 | | |
3401 | | - | |
| 3423 | + | |
3402 | 3424 | | |
3403 | 3425 | | |
3404 | 3426 | | |
3405 | 3427 | | |
3406 | 3428 | | |
3407 | 3429 | | |
3408 | 3430 | | |
3409 | | - | |
3410 | | - | |
3411 | | - | |
3412 | | - | |
3413 | | - | |
3414 | | - | |
| 3431 | + | |
3415 | 3432 | | |
3416 | 3433 | | |
3417 | 3434 | | |
| |||
Lines changed: 39 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
37 | | - | |
| 37 | + | |
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
| |||
769 | 769 | | |
770 | 770 | | |
771 | 771 | | |
772 | | - | |
773 | | - | |
774 | | - | |
775 | | - | |
776 | 772 | | |
777 | 773 | | |
778 | 774 | | |
| |||
782 | 778 | | |
783 | 779 | | |
784 | 780 | | |
| 781 | + | |
| 782 | + | |
| 783 | + | |
| 784 | + | |
| 785 | + | |
| 786 | + | |
| 787 | + | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
| 791 | + | |
| 792 | + | |
| 793 | + | |
| 794 | + | |
| 795 | + | |
| 796 | + | |
| 797 | + | |
| 798 | + | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
| 802 | + | |
| 803 | + | |
| 804 | + | |
| 805 | + | |
| 806 | + | |
| 807 | + | |
| 808 | + | |
| 809 | + | |
| 810 | + | |
| 811 | + | |
| 812 | + | |
| 813 | + | |
| 814 | + | |
| 815 | + | |
| 816 | + | |
| 817 | + | |
| 818 | + | |
785 | 819 | | |
786 | 820 | | |
787 | 821 | | |
| |||
0 commit comments