-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-35449][SQL][3.1] Only extract common expressions from CaseWhen values if elseValue is set #32651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-35449][SQL][3.1] Only extract common expressions from CaseWhen values if elseValue is set #32651
Conversation
maropu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good if the tests pass.
|
ok to test |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #138871 has finished for PR 32651 at commit
|
|
cc @dongjoon-hyun as this touches branch-3.1. |
|
Thank you, @viirya . |
|
Thanks all! Merging to branch-3.1. |
… values if elseValue is set ### What changes were proposed in this pull request? This PR fixes a bug with subexpression elimination for CaseWhen statements. #30245 added support for creating subexpressions that are present in all branches of conditional statements. However, for a statement to be in "all branches" of a CaseWhen statement, it must also be in the elseValue. ### Why are the changes needed? Fix a bug where a subexpression can be created and run for branches of a conditional that don't pass. This can cause issues especially with a UDF in a branch that gets executed assuming the condition is true. ### Does this PR introduce _any_ user-facing change? Yes, fixes a potential bug where a UDF could be eagerly executed even though it might expect to have already passed some form of validation. For example: ``` val col = when($"id" < 0, myUdf($"id")) spark.range(1).select(when(col > 0, col)).show() ``` `myUdf($"id")` is considered a subexpression and eagerly evaluated, because it is pulled out as a common expression from both executions of the when clause, but if `id >= 0` it should never actually be run. ### How was this patch tested? Updated existing test with new case. Closes #32651 from Kimahriman/bug-case-subexpr-elimination-3.1. Authored-by: Adam Binford <adamq43@gmail.com> Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
… values if elseValue is set ### What changes were proposed in this pull request? This PR fixes a bug with subexpression elimination for CaseWhen statements. apache#30245 added support for creating subexpressions that are present in all branches of conditional statements. However, for a statement to be in "all branches" of a CaseWhen statement, it must also be in the elseValue. ### Why are the changes needed? Fix a bug where a subexpression can be created and run for branches of a conditional that don't pass. This can cause issues especially with a UDF in a branch that gets executed assuming the condition is true. ### Does this PR introduce _any_ user-facing change? Yes, fixes a potential bug where a UDF could be eagerly executed even though it might expect to have already passed some form of validation. For example: ``` val col = when($"id" < 0, myUdf($"id")) spark.range(1).select(when(col > 0, col)).show() ``` `myUdf($"id")` is considered a subexpression and eagerly evaluated, because it is pulled out as a common expression from both executions of the when clause, but if `id >= 0` it should never actually be run. ### How was this patch tested? Updated existing test with new case. Closes apache#32651 from Kimahriman/bug-case-subexpr-elimination-3.1. Authored-by: Adam Binford <adamq43@gmail.com> Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
… values if elseValue is set ### What changes were proposed in this pull request? This PR fixes a bug with subexpression elimination for CaseWhen statements. apache#30245 added support for creating subexpressions that are present in all branches of conditional statements. However, for a statement to be in "all branches" of a CaseWhen statement, it must also be in the elseValue. ### Why are the changes needed? Fix a bug where a subexpression can be created and run for branches of a conditional that don't pass. This can cause issues especially with a UDF in a branch that gets executed assuming the condition is true. ### Does this PR introduce _any_ user-facing change? Yes, fixes a potential bug where a UDF could be eagerly executed even though it might expect to have already passed some form of validation. For example: ``` val col = when($"id" < 0, myUdf($"id")) spark.range(1).select(when(col > 0, col)).show() ``` `myUdf($"id")` is considered a subexpression and eagerly evaluated, because it is pulled out as a common expression from both executions of the when clause, but if `id >= 0` it should never actually be run. ### How was this patch tested? Updated existing test with new case. Closes apache#32651 from Kimahriman/bug-case-subexpr-elimination-3.1. Authored-by: Adam Binford <adamq43@gmail.com> Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
What changes were proposed in this pull request?
This PR fixes a bug with subexpression elimination for CaseWhen statements. #30245 added support for creating subexpressions that are present in all branches of conditional statements. However, for a statement to be in "all branches" of a CaseWhen statement, it must also be in the elseValue.
Why are the changes needed?
Fix a bug where a subexpression can be created and run for branches of a conditional that don't pass. This can cause issues especially with a UDF in a branch that gets executed assuming the condition is true.
Does this PR introduce any user-facing change?
Yes, fixes a potential bug where a UDF could be eagerly executed even though it might expect to have already passed some form of validation. For example:
myUdf($"id")is considered a subexpression and eagerly evaluated, because it is pulled out as a common expression from both executions of the when clause, but ifid >= 0it should never actually be run.How was this patch tested?
Updated existing test with new case.