-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-41793][SQL] Incorrect result for window frames defined by a range clause on large decimals #40138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| | (1, cast('11342371013783243717493546650944543.47' as decimal(38,2))), | ||
| | (1, cast('999999999999999999999999999999999999.99' as decimal(38,2))) | ||
| |as data(a, b); | ||
| |""".stripMargin) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems simpler to use scala API for it
Seq(
1 -> "113...",
1 -> "999 ..."
).toDF("a", "b")
.select("a", $"b".cast("decimal(38, 2)"))
.createTempView("...")
ef2692b to
fde25b1
Compare
|
@ulysses-you to help other reviewers understand it, can you add more explaination in the PR description about how |
gerashegalov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
(verified that the failing spark-rapids test succeeds now)
|
@cloud-fan addressed, hope it is helpful |
|
thanks, merging to master/3.4! |
…nge clause on large decimals ### What changes were proposed in this pull request? Use `DecimalAddNoOverflowCheck` instead of `Add` to craete bound ordering for window range frame ### Why are the changes needed? Before 3.4, the `Add` did not check overflow. Instead, we always wrapped `Add` with a `CheckOverflow`. After #36698, we make `Add` check overflow by itself. However, the bound ordering of window range frame uses `Add` to calculate the boundary that is used to determine which input row lies within the frame boundaries of an output row. Then the behavior is changed with an extra overflow check. Technically,We could allow an overflowing value if it is just an intermediate result. So this pr use `DecimalAddNoOverflowCheck` to replace the `Add` to restore the previous behavior. ### Does this PR introduce _any_ user-facing change? yes, restore the previous(before 3.4) behavior ### How was this patch tested? add test Closes #40138 from ulysses-you/SPARK-41793. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit fec4f7f) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
apache/spark#40138 fixed SPARK-41793, and Xfail is no longer necessary for 3.4+ Signed-off-by: Gera Shegalov <gera@apache.org>
…nge clause on large decimals ### What changes were proposed in this pull request? Use `DecimalAddNoOverflowCheck` instead of `Add` to craete bound ordering for window range frame ### Why are the changes needed? Before 3.4, the `Add` did not check overflow. Instead, we always wrapped `Add` with a `CheckOverflow`. After apache#36698, we make `Add` check overflow by itself. However, the bound ordering of window range frame uses `Add` to calculate the boundary that is used to determine which input row lies within the frame boundaries of an output row. Then the behavior is changed with an extra overflow check. Technically,We could allow an overflowing value if it is just an intermediate result. So this pr use `DecimalAddNoOverflowCheck` to replace the `Add` to restore the previous behavior. ### Does this PR introduce _any_ user-facing change? yes, restore the previous(before 3.4) behavior ### How was this patch tested? add test Closes apache#40138 from ulysses-you/SPARK-41793. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit fec4f7f) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
Use
DecimalAddNoOverflowCheckinstead ofAddto craete bound ordering for window range frameWhy are the changes needed?
Before 3.4, the
Adddid not check overflow. Instead, we always wrappedAddwith aCheckOverflow. After #36698, we makeAddcheck overflow by itself. However, the bound ordering of window range frame usesAddto calculate the boundary that is used to determine which input row lies within the frame boundaries of an output row. Then the behavior is changed with an extra overflow check.Technically,We could allow an overflowing value if it is just an intermediate result. So this pr use
DecimalAddNoOverflowCheckto replace theAddto restore the previous behavior.Does this PR introduce any user-facing change?
yes, restore the previous(before 3.4) behavior
How was this patch tested?
add test